Image Quality Metrics: PSNR, SSIM, and LPIPS
- Compute PSNR from MSE and interpret what a 3 dB difference means in terms of relative signal quality; identify the typical PSNR range for 3DGS reconstructions
- Explain how SSIM measures local luminance, contrast, and structural similarity, and identify the types of distortion (blur, edge shift) that SSIM detects but PSNR misses
- Distinguish LPIPS from PSNR and SSIM in terms of what it measures (perceptual deep features vs. pixel statistics), and identify when LPIPS should take priority as the evaluation criterion
- Select the appropriate metric or metric combination given a specific evaluation goal — detecting compression artifacts, measuring perceptual quality, or tracking pixel-level fidelity
Why Metrics Matter
Evaluating a 3DGS compression method requires comparing rendered images against ground-truth photographs. Three metrics dominate the literature: PSNR, SSIM, and LPIPS. Each captures a different aspect of image quality and they are often complementary — a method can improve one while degrading another.
PSNR — Peak Signal-to-Noise Ratio
PSNR measures average pixel-level error in log scale:
where for float images (or 255 for uint8) and MSE is the mean squared error over all pixels and channels.
Expanded:
Interpreting PSNR:
- < 20 dB: visibly poor quality
- 25–30 dB: acceptable for compressed video
- 30–35 dB: good quality (typical baseline 3DGS on standard scenes)
- > 35 dB: excellent quality
Limitations: PSNR is purely pixel-wise. Two images with the same PSNR can have completely different perceptual quality — a blurry image and a sharp-but-shifted image can score identically. It also treats all spatial locations equally, ignoring perceptual importance.
SSIM — Structural Similarity Index
SSIM compares images along three dimensions: luminance, contrast, and structure.
where are local means, are local variances, is the local covariance, and are stability constants.
SSIM is computed locally over 11×11 Gaussian-weighted patches, then averaged. The result is in , with 1.0 meaning identical images.
Why SSIM matters for 3DGS: Compression artifacts often smear or blur texture detail. SSIM's structure term penalizes loss of local correlation patterns (edges, textures) even when mean values match, catching the kinds of degradation that PSNR misses.
LPIPS — Learned Perceptual Image Patch Similarity
LPIPS (Zhang et al. 2018) measures perceptual similarity using features extracted from a pretrained deep network (typically AlexNet or VGG):
where are feature maps at layer , are unit-normalized, and are learned channel weights.
Key property: LPIPS is trained to match human perceptual judgments. Two images that look identical to humans but differ in pixel values (e.g., a slight spatial offset) will have low LPIPS but high MSE. Lower LPIPS = more perceptually similar.
Typical values for 3DGS:
- < 0.05: excellent perceptual quality
- 0.05–0.15: good, minor perceptual degradation
- > 0.2: noticeable degradation
How the Three Metrics Complement Each Other
| Metric | Measures | Blind To | Scale |
|---|---|---|---|
| PSNR | Pixel MSE | Structure, perception | dB, higher better |
| SSIM | Luminance/contrast/structure | Semantic content | 0–1, higher better |
| LPIPS | Perceptual features | Exact pixel values | 0–∞, lower better |
In 3DGS compression papers, all three are reported. A method that only optimizes for PSNR can produce blurry results that score well on MSE but poorly on SSIM and LPIPS. The standard evaluation includes all three on the same test set.