Implicit vs. Explicit: NeRF and 3DGS
- Distinguish implicit (NeRF) and explicit (3DGS) scene representations across storage size, training speed, and rendering speed, using concrete numbers from the reading
- Trace the NeRF volume rendering integral C(r) and identify the role of transmittance T(t), density σ, and emitted color c in computing a pixel value
- Explain why the discrete, explicit nature of 3DGS simultaneously creates the storage problem (millions of 59-parameter Gaussians) and enables classical image compression solutions
- Identify the role of Structure-from-Motion in initializing both NeRF and 3DGS, and explain how SfM point-cloud density and pose accuracy affect downstream reconstruction quality
The Problem: Reconstructing the 3D World from Images
The core task is straightforward to state: given a collection of photographs of a scene taken from known (or estimated) viewpoints, produce a 3D representation that allows rendering novel views with photorealistic quality. The devil is in the representation.
Two dominant paradigms have emerged from the deep learning era: Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS). They sit at opposite ends of the implicit–explicit spectrum, and understanding that contrast is essential for everything that follows in this course.
Implicit Representations: NeRF
NeRF, introduced by Mildenhall et al. in 2020, encodes the entire scene inside the weights of a multi-layer perceptron (MLP). You query the network with a 5D input — spatial position plus viewing direction — and it outputs a color and a volume density .
To render a pixel, NeRF casts a ray through the scene and samples the MLP at dozens to hundreds of points along the ray. Colors and densities are integrated via volume rendering:
where is the accumulated transmittance.
NeRF Strengths and Weaknesses
Strengths:
- Compact: the scene is stored in MLP weights, typically a few MB.
- Continuous representation: you can query any point at any resolution.
- Handles view-dependent effects naturally via the direction input.
Weaknesses:
- Slow training: a single scene requires hours of gradient descent.
- Slow rendering: every pixel requires dozens of MLP forward passes. Original NeRF renders at < 1 FPS.
- Not easily editable: to move an object, you must retrain or apply complex latent-space surgery.
Explicit Representations: 3D Gaussian Splatting
3DGS, introduced by Kerbl et al. (2023), takes a fundamentally different approach. The scene is represented as a point cloud of oriented, anisotropic 3D Gaussians — an explicit, discrete data structure.
Each Gaussian contributes to the scene density as:
where is the mean (position), is the covariance (shape), and is the opacity.
3DGS Strengths and Weaknesses
Strengths:
- Real-time rendering: tile-based rasterization enables 30–120+ FPS at 1080p on modern GPUs.
- Fast training: typically 30–60 minutes per scene vs. hours for NeRF.
- Editable: individual Gaussians are discrete objects that can be moved, added, or removed.
- Supports 4D animation by allowing Gaussians to move over time.
Weaknesses:
- Huge storage: a typical scene has millions of Gaussians, each with 59 parameters, resulting in 0.5–1.5 GB per scene.
- Discrete structure means some continuous-space interpolation artifacts.
- Geometry extraction is non-trivial (surfaces are fuzzy volumes, not meshes).
The Explicit–Implicit Axis
Think of this as a spectrum:
| Property | Pure Implicit (NeRF) | Explicit (3DGS) |
|---|---|---|
| Storage | MLP weights (~5 MB) | Point cloud (~1 GB) |
| Training | Hours | ~30 min |
| Rendering | < 1 FPS | Real-time |
| Editability | Hard | Easy |
| Continuous | Yes | Continuous coords, discrete points |
| Color model | MLP (view-dep.) | Spherical harmonics |
This course focuses on closing the storage gap for 3DGS — reducing that ~1 GB to tens of MB while preserving rendering quality. The subsequent modules will show that the explicit, discrete nature of 3DGS is both the problem (it's huge) and the solution (discreteness makes the data amenable to classical image/video compression tools).
Initialization: Structure-from-Motion
Neither NeRF nor 3DGS operates from scratch. Both require known camera poses, which are computed via Structure-from-Motion (SfM) — typically using COLMAP. SfM also produces a sparse point cloud that 3DGS uses to initialize Gaussian positions before optimization begins.