Extensions: 2DGS, 4DGS, Hierarchical, Relighting, and Editing
- Distinguish 2DGS from 3DGS in terms of Gaussian primitive geometry and explain why flat disk primitives improve surface reconstruction and mesh extraction
- Explain how 4DGS extends static Gaussian attributes to dynamic scenes, identifying which parameters are time-varied and how temporal continuity is enforced
- Identify the motivation for hierarchical 3DGS (HierarchicalGS, OctreeGS) and explain how level-of-detail via octree partitioning enables city-scale rendering
- Explain how the discrete, explicit structure of 3DGS enables editing operations (moving, removing, and inserting Gaussians; GAN inversion; text-driven editing) that are impractical with implicit NeRF representations
Why Extend 3DGS?
Base 3DGS excels at novel-view synthesis for static, bounded scenes. Several important real-world tasks require extensions: dynamic scenes, large-scale environments, physically accurate lighting, and semantic editing. This reading surveys the key extension directions.
2D Gaussians (2DGS, SuGaR)
Problem: 3D Gaussians are volumetric blobs. Object surfaces — flat walls, skin, leaves — are better modeled as flat structures. A 3D Gaussian can approximate a planar surface, but it wastes degrees of freedom on the thin axis.
2DGS: Collapses the smallest scale dimension toward zero, yielding a flat disk. This improves surface accuracy and enables better mesh extraction and depth estimation.
SuGaR: Attaches Gaussians to a reconstructed mesh surface. Each Gaussian lives on the mesh, inheriting its motion when the mesh deforms. This is the key enabler for animatable human avatars: deform the underlying body mesh via a skinning model, and all attached Gaussians move with it.
4D Gaussians (4DGS)
For dynamic scenes (people walking, fire burning), each Gaussian can have a time-varying mean and covariance:
The temporal dimension is typically handled with:
- Explicit keyframes: store Gaussian states at keyframes, interpolate between them.
- Deformation fields: a small MLP takes and outputs a position offset .
Topology changes (a hand appearing from behind an object) are handled via keyframe restarts.
Hierarchical 3DGS (HierarchicalGS)
Problem: city-scale or campus-scale scenes cannot fit in GPU memory.
Solution: partition the scene into tiles; each tile holds its own Gaussian set. At render time, only tiles visible from the current camera are loaded and rendered. Level-of-detail (LOD) rendering additionally uses coarser Gaussian representations for distant tiles, avoiding aliasing from undersampled fine-grained structure.
This extends 3DGS from room-scale to city-scale while maintaining real-time performance on high-end hardware.
Relighting (R3GW)
Base 3DGS bakes illumination into the learned colors. This means the scene looks wrong under novel lighting conditions.
Relighting methods decompose appearance into:
- A reflectance/material component (BRDF)
- An illumination component (environment map)
The BRDF and environment map are optimized jointly with the Gaussian geometry. At inference, you can swap in a new environment map and re-render with physically accurate new lighting. This enables reconstruction from in-the-wild images (outdoor scenes with varying sun positions across capture sessions).
GAN-Based Synthesis (CGSGAN)
Rather than reconstructing from multi-view images, generative models can synthesize Gaussian point clouds from a latent vector . A GAN is trained to produce plausible 3DGS representations of human faces.
GAN inversion: fit a real person's face by finding the latent whose generated Gaussian cloud best matches input images. Once inverted, edit the face by moving in latent space (change expression, age, identity).
Prompt-Based GS Editing
Given an existing 3DGS scene, a user specifies a text edit: "Turn the teddy bear into a golden bear." The editing pipeline:
- Render images from multiple views.
- Apply a text-driven image editing model (e.g., InstructPix2Pix) to produce edited images.
- Fine-tune the 3DGS representation to match the edited images.
The explicit nature of 3DGS (individual movable Gaussians) also allows direct geometric edits: duplicating, deleting, or translating groups of Gaussians with 3D selection tools.