3D Gaussian Splatting · Foundations of 3DGS

Rendering Pipeline and Differentiable Training

14 min read
By the end of this reading you will be able to:
  • Trace the 3DGS rendering pipeline from 3D Gaussians to final pixel colors, identifying the projection (Σ′ = JWΣW⊤J⊤), tile assignment, depth sort, and alpha compositing steps
  • Explain why per-tile depth sorting is an approximation, identify the conditions under which it produces visible artifacts, and state why it is used despite this limitation
  • Distinguish cloning from splitting in adaptive densification and identify the specific positional-gradient signal and Gaussian scale condition that triggers each operation
  • Explain why the photometric loss combines L1 and D-SSIM terms, identify what artifact each term is sensitive to, and state the role of opacity resets in preventing opacity fog

From 3D Gaussians to a 2D Image

The 3DGS rendering pipeline is a carefully engineered rasterizer that trades the exactness of ray marching for real-time performance. Understanding it precisely matters because the rendering pipeline is also the backward pass through which gradients flow during training.

Step 1: Projection — 3D Gaussians to 2D Splats

A 3D Gaussian with mean μ\mu and covariance Σ\Sigma projects to a 2D Gaussian on the image plane. Given viewing transform WW and Jacobian of the projection JJ:

Σ=JWΣWJ\Sigma' = J W \Sigma W^\top J^\top

The 3D ellipsoid becomes a 2D ellipse ("splat") in screen space. The projected mean μ\mu' is simply the perspective projection of μ\mu.

Step 2: Tile-Based Rasterization

The screen is divided into a grid of 16×16 pixel tiles. For each Gaussian, 3DGS determines which tiles its 2D extent intersects and creates a list entry per tile.

Key steps:

  1. Sorting: within each tile, all contributing Gaussians are sorted by depth (front to back) using a fast radix sort.
  2. Alpha accumulation: each tile is processed in parallel on GPU with one thread per pixel, accumulating color front-to-back.

Sorting is per-tile, not globally, which is an approximation — but in practice it produces minimal artifacts while enabling orders-of-magnitude speedup over exact sorting.

Step 3: Alpha Compositing

The rendered color at pixel p\mathbf{p} is the alpha-composited sum over all NN depth-sorted Gaussians visible in that pixel:

c(p)=i=1Nciαi(p)j=1i1(1αj(p))c(\mathbf{p}) = \sum_{i=1}^{N} c_i\, \alpha_i'(\mathbf{p}) \prod_{j=1}^{i-1} \bigl(1 - \alpha_j'(\mathbf{p})\bigr)

where:

  • cic_i is the view-dependent color of Gaussian ii (evaluated via SH)
  • αi(p)=σα,ifi(p)\alpha_i'(\mathbf{p}) = \sigma_{\alpha,i} \cdot f_i'(\mathbf{p}) is the effective alpha at pixel p\mathbf{p}, combining opacity and the 2D Gaussian density
  • j<i(1αj)\prod_{j<i}(1-\alpha_j') is the accumulated transmittance — how much light reaches Gaussian ii after the Gaussians in front absorb/occlude it

Early termination: once the accumulated transmittance drops below a threshold (typically 0.0001), all remaining Gaussians are skipped. This is what makes deep scenes tractable: most rays terminate well before reaching all Gaussians.

Training: Differentiable Rendering + Gradient Descent

All operations above are differentiable with respect to the Gaussian parameters. Training proceeds in three phases:

1. Initialize Gaussian positions from the SfM sparse point cloud.

2. Iterate — at each step, four operations form a tight inner loop:

① Render
Rasterize from a training viewpoint
② Loss
L=(1λ)L1+λLD-SSIM\mathcal{L} = (1{-}\lambda)\mathcal{L}_1 + \lambda\mathcal{L}_{D\text{-}SSIM}
③ Backprop
Differentiate through the rasterizer
④ Update
Adam step on μ,Σ,σα,SH\mu, \Sigma, \sigma_\alpha, \text{SH}
↺ repeat for each training iteration
  • Render — rasterize the current Gaussians from a randomly sampled training viewpoint using the tile rasterizer.
  • Compute loss — the photometric objective combines a per-pixel absolute error term with a structural similarity term:

L=(1λ)L1+λLD-SSIM\mathcal{L} = (1-\lambda)\mathcal{L}_1 + \lambda\mathcal{L}_{D\text{-}SSIM}

L1\mathcal{L}_1 penalizes mean absolute color error per pixel. LD-SSIM\mathcal{L}_{D\text{-}SSIM} penalizes structural and contrast differences that L1\mathcal{L}_1 alone is insensitive to — blurring and edge misalignment both raise it even when the average color is correct. Typical λ=0.2\lambda = 0.2.

  • Backpropagatedifferentiate through the differentiable rasterizer to obtain L/μ\partial\mathcal{L}/\partial\mu, L/Σ\partial\mathcal{L}/\partial\Sigma, L/σα\partial\mathcal{L}/\partial\sigma_\alpha, and L/fSH\partial\mathcal{L}/\partial\mathbf{f}_{SH} for every visible Gaussian.
  • Update — take an Adam step on all Gaussian parameters: μ,Σ,σα,SH\mu, \Sigma, \sigma_\alpha, \text{SH}.

3. Every NN iterations: run adaptive densification (described below).

Adaptive Densification

Gradient descent alone changes Gaussian attributes but not their count. Densification periodically adjusts the population:

What is a high positional gradient? During training, the photometric loss L\mathcal{L} is compared against each training image. The gradient L/μ\partial\mathcal{L}/\partial\mu measures how much the loss would decrease if a Gaussian moved. A large value signals that the Gaussian is in the wrong place — the scene has structure nearby that it is failing to cover.

Positional Gradient
The amber region shows unrecovered photometric signal. The arrow is ∂L/∂μ — pointing from the Gaussian toward the region it should cover. When this magnitude stays high across iterations, gradient descent alone cannot fix it: the Gaussian needs to be cloned or split.

Cloning

When a small Gaussian has high positional gradient (the loss strongly wants it to move), it is under-reconstructing the region.

Solution
Duplicate the Gaussian — two copies converge to slightly different positions, together covering the under-reconstructed region.

Splitting

When a large Gaussian has high positional gradient, it is over-reconstructing — one blob covers a region that needs finer detail.

Solution
Replace with two smaller Gaussians placed along the gradient direction, each covering part of the over-reconstructed region at higher fidelity.

Pruning

Gaussians with opacity σα\sigma_\alpha below a threshold are nearly invisible and waste parameters. They are deleted. Periodically, all opacities are reset to near-zero and re-learned — this prevents "opacity fog" where Gaussians persist with low but non-zero opacity.

Why Densification Matters for Compression

The final Gaussian count after training is not fixed — it emerges from the densification history. Methods that improve densification (Module 3) directly reduce the total parameter count before any encoding, which is often more impactful than encoding improvements applied to a fixed representation.