3D Gaussian Splatting · Evaluation & Survey

Image Quality Metrics: PSNR, SSIM, and LPIPS

14 min read

By the end of this reading you will be able to:

Compute PSNR from MSE and interpret what a 3 dB difference means in terms of relative signal quality; identify the typical PSNR range for 3DGS reconstructions
Explain how SSIM measures local luminance, contrast, and structural similarity, and identify the types of distortion (blur, edge shift) that SSIM detects but PSNR misses
Distinguish LPIPS from PSNR and SSIM in terms of what it measures (perceptual deep features vs. pixel statistics), and identify when LPIPS should take priority as the evaluation criterion
Select the appropriate metric or metric combination given a specific evaluation goal — detecting compression artifacts, measuring perceptual quality, or tracking pixel-level fidelity

Why Metrics Matter

Evaluating a 3DGS compression method requires comparing rendered images against ground-truth photographs. Three metrics dominate the literature: PSNR, SSIM, and LPIPS. Each captures a different aspect of image quality and they are often complementary — a method can improve one while degrading another.

PSNR — Peak Signal-to-Noise Ratio

PSNR measures average pixel-level error in log scale:

$\text{PSNR} = 10 \log_{10}\!\left(\frac{\text{MAX}^2}{\text{MSE}}\right)$

where $\text{MAX} = 1.0$ for float images (or 255 for uint8) and MSE is the mean squared error over all pixels and channels.

Expanded: $\text{MSE} = \frac{1}{HWC}\sum_{h,w,c}\bigl(\hat{I}_{h,w,c} - I_{h,w,c}\bigr)^2$

Interpreting PSNR:

< 20 dB: visibly poor quality
25–30 dB: acceptable for compressed video
30–35 dB: good quality (typical baseline 3DGS on standard scenes)
> 35 dB: excellent quality

Limitations: PSNR is purely pixel-wise. Two images with the same PSNR can have completely different perceptual quality — a blurry image and a sharp-but-shifted image can score identically. It also treats all spatial locations equally, ignoring perceptual importance.

SSIM — Structural Similarity Index

SSIM compares images along three dimensions: luminance, contrast, and structure.

$\text{SSIM}(x, y) = \frac{(2\mu_x\mu_y + c_1)(2\sigma_{xy} + c_2)}{(\mu_x^2 + \mu_y^2 + c_1)(\sigma_x^2 + \sigma_y^2 + c_2)}$

where $\mu_x, \mu_y$ are local means, $\sigma_x^2, \sigma_y^2$ are local variances, $\sigma_{xy}$ is the local covariance, and $c_1, c_2$ are stability constants.

SSIM is computed locally over 11×11 Gaussian-weighted patches, then averaged. The result is in $[0, 1]$ , with 1.0 meaning identical images.

Why SSIM matters for 3DGS: Compression artifacts often smear or blur texture detail. SSIM's structure term penalizes loss of local correlation patterns (edges, textures) even when mean values match, catching the kinds of degradation that PSNR misses.

LPIPS — Learned Perceptual Image Patch Similarity

LPIPS (Zhang et al. 2018) measures perceptual similarity using features extracted from a pretrained deep network (typically AlexNet or VGG):

$\text{LPIPS}(x, y) = \sum_l \frac{1}{H_l W_l} \sum_{h,w} \|w_l \odot (\hat{f}_l^{hw} - f_l^{hw})\|_2^2$

where $f_l$ are feature maps at layer $l$ , $\hat{f}_l$ are unit-normalized, and $w_l$ are learned channel weights.

Key property: LPIPS is trained to match human perceptual judgments. Two images that look identical to humans but differ in pixel values (e.g., a slight spatial offset) will have low LPIPS but high MSE. Lower LPIPS = more perceptually similar.

Typical values for 3DGS:

< 0.05: excellent perceptual quality
0.05–0.15: good, minor perceptual degradation
> 0.2: noticeable degradation

How the Three Metrics Complement Each Other

Metric	Measures	Blind To	Scale
PSNR	Pixel MSE	Structure, perception	dB, higher better
SSIM	Luminance/contrast/structure	Semantic content	0–1, higher better
LPIPS	Perceptual features	Exact pixel values	0–∞, lower better

In 3DGS compression papers, all three are reported. A method that only optimizes for PSNR can produce blurry results that score well on MSE but poorly on SSIM and LPIPS. The standard evaluation includes all three on the same test set.

References

Wang et al. 2004 — Image Quality Assessment: From Error Visibility to Structural Similarity

Zhang et al. 2018 — The Unreasonable Effectiveness of Deep Features as a Perceptual Metric

Overview Next →

Image Quality Metrics: PSNR, SSIM, and LPIPS

Why Metrics Matter

PSNR — Peak Signal-to-Noise Ratio

SSIM — Structural Similarity Index

LPIPS — Learned Perceptual Image Patch Similarity

How the Three Metrics Complement Each Other

Privacy Policy

What we collect

What we don't collect

Your choices

Contact