Benchmark Datasets for Novel View Synthesis
- Distinguish Tanks & Temples, Deep Blending, Mip-NeRF 360, and Synthetic NeRF benchmarks by the scene properties and reconstruction challenges each is designed to stress-test
- Explain why PSNR values are not comparable across datasets, identifying the specific scene properties (unbounded vs. bounded, real vs. synthetic, near-field vs. large-scale) that cause this non-comparability
- Identify the failure mode of optimizing a compression method on a single dataset and explain how dataset coverage (all four benchmarks) reveals generalization failures
- Select the appropriate benchmark dataset(s) to evaluate a new 3DGS compression method given specific scene characteristics and the tradeoffs being investigated
The Four Standard Benchmarks
The 3DGS compression literature evaluates almost exclusively on four datasets. Knowing their characteristics is essential for understanding which methods generalize and which overfit to specific scene types.
- High-resolution real-world captures with natural lighting
- Unbounded scenes — background extends to infinity
- Objects at varying scales: a full tank, a train car
- Challenging background coverage for Gaussian rasterizers
The Truck scene is the de facto single-scene calibration point. Compare any paper's Truck PSNR to its own baseline 3DGS run — that delta is what matters, not the absolute number.
- Challenging near-field geometry with heavy occlusion
- Specular surfaces: windows, glossy floors
- Significant depth-of-field variation across the scene
The primary stress test for view-dependent color (spherical harmonics). Methods that reduce SH degree show the largest PSNR drops here.
- Cameras placed inside the scene, pointing outward in all directions
- Background extends to infinity — the unbounded challenge
- 9 scenes, giving the most statistically robust averages
- Wide variation in lighting, scale, and texture complexity
Spatial quantization and background pruning are stress-tested hardest here. Methods over-tuned on bounded datasets often badly over-prune the infinite background.
- Computer-generated with perfect ground-truth geometry
- Objects on white backgrounds with complex materials
- Zero capture noise, motion blur, or lens distortion
- PSNR systematically higher (∼28–35 dB) than real-capture datasets
Clean ground truth isolates algorithmic quality from capture artefacts. Do not compare PSNR values here directly to real-capture scenes — they are not on the same scale.
Cross-Dataset Generalization
A key warning from the 3DGS.zip survey: methods optimized on one dataset may not generalize. The Synthetic NeRF dataset's object-centric, bounded setup is very different from Mip-NeRF 360's unbounded 360° captures. A compression method that aggressively prunes Gaussians based on viewing-frequency statistics tuned to bounded scenes may badly over-prune the infinite background in unbounded scenes.