3D Gaussian Splatting · Foundations of 3DGS

Extensions: 2DGS, 4DGS, Hierarchical, Relighting, and Editing

10 min read
By the end of this reading you will be able to:
  • Distinguish 2DGS from 3DGS in terms of Gaussian primitive geometry and explain why flat disk primitives improve surface reconstruction and mesh extraction
  • Explain how 4DGS extends static Gaussian attributes to dynamic scenes, identifying which parameters are time-varied and how temporal continuity is enforced
  • Identify the motivation for hierarchical 3DGS (HierarchicalGS, OctreeGS) and explain how level-of-detail via octree partitioning enables city-scale rendering
  • Explain how the discrete, explicit structure of 3DGS enables editing operations (moving, removing, and inserting Gaussians; GAN inversion; text-driven editing) that are impractical with implicit NeRF representations

Why Extend 3DGS?

Base 3DGS excels at novel-view synthesis for static, bounded scenes. Several important real-world tasks require extensions: dynamic scenes, large-scale environments, physically accurate lighting, and semantic editing. This reading surveys the key extension directions.

2D Gaussians (2DGS, SuGaR)

Problem: 3D Gaussians are volumetric blobs. Object surfaces — flat walls, skin, leaves — are better modeled as flat structures. A 3D Gaussian can approximate a planar surface, but it wastes degrees of freedom on the thin axis.

2DGS: Collapses the smallest scale dimension toward zero, yielding a flat disk. This improves surface accuracy and enables better mesh extraction and depth estimation.

SuGaR: Attaches Gaussians to a reconstructed mesh surface. Each Gaussian lives on the mesh, inheriting its motion when the mesh deforms. This is the key enabler for animatable human avatars: deform the underlying body mesh via a skinning model, and all attached Gaussians move with it.

4D Gaussians (4DGS)

For dynamic scenes (people walking, fire burning), each Gaussian can have a time-varying mean and covariance:

μi(t),Σi(t)\mu_i(t), \quad \Sigma_i(t)

The temporal dimension is typically handled with:

  • Explicit keyframes: store Gaussian states at TT keyframes, interpolate between them.
  • Deformation fields: a small MLP takes (i,t)(i, t) and outputs a position offset δμi(t)\delta\mu_i(t).

Topology changes (a hand appearing from behind an object) are handled via keyframe restarts.

Hierarchical 3DGS (HierarchicalGS)

Problem: city-scale or campus-scale scenes cannot fit in GPU memory.

Solution: partition the scene into tiles; each tile holds its own Gaussian set. At render time, only tiles visible from the current camera are loaded and rendered. Level-of-detail (LOD) rendering additionally uses coarser Gaussian representations for distant tiles, avoiding aliasing from undersampled fine-grained structure.

This extends 3DGS from room-scale to city-scale while maintaining real-time performance on high-end hardware.

Relighting (R3GW)

Base 3DGS bakes illumination into the learned colors. This means the scene looks wrong under novel lighting conditions.

Relighting methods decompose appearance into:

  • A reflectance/material component (BRDF)
  • An illumination component (environment map)

The BRDF and environment map are optimized jointly with the Gaussian geometry. At inference, you can swap in a new environment map and re-render with physically accurate new lighting. This enables reconstruction from in-the-wild images (outdoor scenes with varying sun positions across capture sessions).

GAN-Based Synthesis (CGSGAN)

Rather than reconstructing from multi-view images, generative models can synthesize Gaussian point clouds from a latent vector zz. A GAN is trained to produce plausible 3DGS representations of human faces.

GAN inversion: fit a real person's face by finding the latent zz^* whose generated Gaussian cloud best matches input images. Once inverted, edit the face by moving zz^* in latent space (change expression, age, identity).

Prompt-Based GS Editing

Given an existing 3DGS scene, a user specifies a text edit: "Turn the teddy bear into a golden bear." The editing pipeline:

  1. Render images from multiple views.
  2. Apply a text-driven image editing model (e.g., InstructPix2Pix) to produce edited images.
  3. Fine-tune the 3DGS representation to match the edited images.

The explicit nature of 3DGS (individual movable Gaussians) also allows direct geometric edits: duplicating, deleting, or translating groups of Gaussians with 3D selection tools.