Rendering Pipeline and Differentiable Training
- Trace the 3DGS rendering pipeline from 3D Gaussians to final pixel colors, identifying the projection (Σ′ = JWΣW⊤J⊤), tile assignment, depth sort, and alpha compositing steps
- Explain why per-tile depth sorting is an approximation, identify the conditions under which it produces visible artifacts, and state why it is used despite this limitation
- Distinguish cloning from splitting in adaptive densification and identify the specific positional-gradient signal and Gaussian scale condition that triggers each operation
- Explain why the photometric loss combines L1 and D-SSIM terms, identify what artifact each term is sensitive to, and state the role of opacity resets in preventing opacity fog
From 3D Gaussians to a 2D Image
The 3DGS rendering pipeline is a carefully engineered rasterizer that trades the exactness of ray marching for real-time performance. Understanding it precisely matters because the rendering pipeline is also the backward pass through which gradients flow during training.
Step 1: Projection — 3D Gaussians to 2D Splats
A 3D Gaussian with mean and covariance projects to a 2D Gaussian on the image plane. Given viewing transform and Jacobian of the projection :
The 3D ellipsoid becomes a 2D ellipse ("splat") in screen space. The projected mean is simply the perspective projection of .
Step 2: Tile-Based Rasterization
The screen is divided into a grid of 16×16 pixel tiles. For each Gaussian, 3DGS determines which tiles its 2D extent intersects and creates a list entry per tile.
Key steps:
- Sorting: within each tile, all contributing Gaussians are sorted by depth (front to back) using a fast radix sort.
- Alpha accumulation: each tile is processed in parallel on GPU with one thread per pixel, accumulating color front-to-back.
Sorting is per-tile, not globally, which is an approximation — but in practice it produces minimal artifacts while enabling orders-of-magnitude speedup over exact sorting.
Step 3: Alpha Compositing
The rendered color at pixel is the alpha-composited sum over all depth-sorted Gaussians visible in that pixel:
where:
- is the view-dependent color of Gaussian (evaluated via SH)
- is the effective alpha at pixel , combining opacity and the 2D Gaussian density
- is the accumulated transmittance — how much light reaches Gaussian after the Gaussians in front absorb/occlude it
Early termination: once the accumulated transmittance drops below a threshold (typically 0.0001), all remaining Gaussians are skipped. This is what makes deep scenes tractable: most rays terminate well before reaching all Gaussians.
Training: Differentiable Rendering + Gradient Descent
All operations above are differentiable with respect to the Gaussian parameters. Training proceeds in three phases:
1. Initialize Gaussian positions from the SfM sparse point cloud.
2. Iterate — at each step, four operations form a tight inner loop:
- Render — rasterize the current Gaussians from a randomly sampled training viewpoint using the tile rasterizer.
- Compute loss — the photometric objective combines a per-pixel absolute error term with a structural similarity term:
penalizes mean absolute color error per pixel. penalizes structural and contrast differences that alone is insensitive to — blurring and edge misalignment both raise it even when the average color is correct. Typical .
- Backpropagate — differentiate through the differentiable rasterizer to obtain , , , and for every visible Gaussian.
- Update — take an Adam step on all Gaussian parameters: .
3. Every iterations: run adaptive densification (described below).
Adaptive Densification
Gradient descent alone changes Gaussian attributes but not their count. Densification periodically adjusts the population:
What is a high positional gradient? During training, the photometric loss is compared against each training image. The gradient measures how much the loss would decrease if a Gaussian moved. A large value signals that the Gaussian is in the wrong place — the scene has structure nearby that it is failing to cover.
Cloning
When a small Gaussian has high positional gradient (the loss strongly wants it to move), it is under-reconstructing the region.
Splitting
When a large Gaussian has high positional gradient, it is over-reconstructing — one blob covers a region that needs finer detail.
Pruning
Gaussians with opacity below a threshold are nearly invisible and waste parameters. They are deleted. Periodically, all opacities are reset to near-zero and re-learned — this prevents "opacity fog" where Gaussians persist with low but non-zero opacity.
Why Densification Matters for Compression
The final Gaussian count after training is not fixed — it emerges from the densification history. Methods that improve densification (Module 3) directly reduce the total parameter count before any encoding, which is often more impactful than encoding improvements applied to a fixed representation.