3D Gaussian Splatting · Foundations of 3DGS

Implicit vs. Explicit: NeRF and 3DGS

12 min read
By the end of this reading you will be able to:
  • Distinguish implicit (NeRF) and explicit (3DGS) scene representations across storage size, training speed, and rendering speed, using concrete numbers from the reading
  • Trace the NeRF volume rendering integral C(r) and identify the role of transmittance T(t), density σ, and emitted color c in computing a pixel value
  • Explain why the discrete, explicit nature of 3DGS simultaneously creates the storage problem (millions of 59-parameter Gaussians) and enables classical image compression solutions
  • Identify the role of Structure-from-Motion in initializing both NeRF and 3DGS, and explain how SfM point-cloud density and pose accuracy affect downstream reconstruction quality

The Problem: Reconstructing the 3D World from Images

The core task is straightforward to state: given a collection of photographs of a scene taken from known (or estimated) viewpoints, produce a 3D representation that allows rendering novel views with photorealistic quality. The devil is in the representation.

Two dominant paradigms have emerged from the deep learning era: Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS). They sit at opposite ends of the implicit–explicit spectrum, and understanding that contrast is essential for everything that follows in this course.

Implicit Representations: NeRF

NeRF, introduced by Mildenhall et al. in 2020, encodes the entire scene inside the weights of a multi-layer perceptron (MLP). You query the network with a 5D input — spatial position (x,y,z)(x,y,z) plus viewing direction (θ,ϕ)(\theta, \phi) — and it outputs a color (r,g,b)(r,g,b) and a volume density σ\sigma.

To render a pixel, NeRF casts a ray through the scene and samples the MLP at dozens to hundreds of points along the ray. Colors and densities are integrated via volume rendering:

C(r)=tntfT(t)σ(r(t))c(r(t),d)dtC(\mathbf{r}) = \int_{t_n}^{t_f} T(t)\,\sigma(\mathbf{r}(t))\,\mathbf{c}(\mathbf{r}(t),\mathbf{d})\,dt

where T(t)=exp ⁣(tntσ(r(s))ds)T(t) = \exp\!\left(-\int_{t_n}^{t} \sigma(\mathbf{r}(s))\,ds\right) is the accumulated transmittance.

NeRF Strengths and Weaknesses

Strengths:

  • Compact: the scene is stored in MLP weights, typically a few MB.
  • Continuous representation: you can query any point at any resolution.
  • Handles view-dependent effects naturally via the direction input.

Weaknesses:

  • Slow training: a single scene requires hours of gradient descent.
  • Slow rendering: every pixel requires dozens of MLP forward passes. Original NeRF renders at < 1 FPS.
  • Not easily editable: to move an object, you must retrain or apply complex latent-space surgery.

Explicit Representations: 3D Gaussian Splatting

3DGS, introduced by Kerbl et al. (2023), takes a fundamentally different approach. The scene is represented as a point cloud of oriented, anisotropic 3D Gaussians — an explicit, discrete data structure.

Each Gaussian ii contributes to the scene density as:

fi(p)=σαexp ⁣(12(pμi)Σi1(pμi))f_i(\mathbf{p}) = \sigma_\alpha \exp\!\left(-\frac{1}{2}(\mathbf{p}-\mu_i)^\top \Sigma_i^{-1}(\mathbf{p}-\mu_i)\right)

where μi\mu_i is the mean (position), Σi\Sigma_i is the covariance (shape), and σα\sigma_\alpha is the opacity.

3DGS Strengths and Weaknesses

Strengths:

  • Real-time rendering: tile-based rasterization enables 30–120+ FPS at 1080p on modern GPUs.
  • Fast training: typically 30–60 minutes per scene vs. hours for NeRF.
  • Editable: individual Gaussians are discrete objects that can be moved, added, or removed.
  • Supports 4D animation by allowing Gaussians to move over time.

Weaknesses:

  • Huge storage: a typical scene has millions of Gaussians, each with 59 parameters, resulting in 0.5–1.5 GB per scene.
  • Discrete structure means some continuous-space interpolation artifacts.
  • Geometry extraction is non-trivial (surfaces are fuzzy volumes, not meshes).

The Explicit–Implicit Axis

Think of this as a spectrum:

Property Pure Implicit (NeRF) Explicit (3DGS)
Storage MLP weights (~5 MB) Point cloud (~1 GB)
Training Hours ~30 min
Rendering < 1 FPS Real-time
Editability Hard Easy
Continuous Yes Continuous coords, discrete points
Color model MLP (view-dep.) Spherical harmonics

This course focuses on closing the storage gap for 3DGS — reducing that ~1 GB to tens of MB while preserving rendering quality. The subsequent modules will show that the explicit, discrete nature of 3DGS is both the problem (it's huge) and the solution (discreteness makes the data amenable to classical image/video compression tools).

Initialization: Structure-from-Motion

Neither NeRF nor 3DGS operates from scratch. Both require known camera poses, which are computed via Structure-from-Motion (SfM) — typically using COLMAP. SfM also produces a sparse point cloud that 3DGS uses to initialize Gaussian positions before optimization begins.