3D Gaussian Splatting · Foundations of 3DGS

Extensions: 2DGS, 4DGS, Hierarchical, Relighting, and Editing

10 min read

By the end of this reading you will be able to:

Distinguish 2DGS from 3DGS in terms of Gaussian primitive geometry and explain why flat disk primitives improve surface reconstruction and mesh extraction
Explain how 4DGS extends static Gaussian attributes to dynamic scenes, identifying which parameters are time-varied and how temporal continuity is enforced
Identify the motivation for hierarchical 3DGS (HierarchicalGS, OctreeGS) and explain how level-of-detail via octree partitioning enables city-scale rendering
Explain how the discrete, explicit structure of 3DGS enables editing operations (moving, removing, and inserting Gaussians; GAN inversion; text-driven editing) that are impractical with implicit NeRF representations

Why Extend 3DGS?

Base 3DGS excels at novel-view synthesis for static, bounded scenes. Several important real-world tasks require extensions: dynamic scenes, large-scale environments, physically accurate lighting, and semantic editing. This reading surveys the key extension directions.

2D Gaussians (2DGS, SuGaR)

Problem: 3D Gaussians are volumetric blobs. Object surfaces — flat walls, skin, leaves — are better modeled as flat structures. A 3D Gaussian can approximate a planar surface, but it wastes degrees of freedom on the thin axis.

2DGS: Collapses the smallest scale dimension toward zero, yielding a flat disk. This improves surface accuracy and enables better mesh extraction and depth estimation.

SuGaR: Attaches Gaussians to a reconstructed mesh surface. Each Gaussian lives on the mesh, inheriting its motion when the mesh deforms. This is the key enabler for animatable human avatars: deform the underlying body mesh via a skinning model, and all attached Gaussians move with it.

4D Gaussians (4DGS)

For dynamic scenes (people walking, fire burning), each Gaussian can have a time-varying mean and covariance:

$\mu_i(t), \quad \Sigma_i(t)$

The temporal dimension is typically handled with:

Explicit keyframes: store Gaussian states at $T$ keyframes, interpolate between them.
Deformation fields: a small MLP takes $(i, t)$ and outputs a position offset $\delta\mu_i(t)$ .

Topology changes (a hand appearing from behind an object) are handled via keyframe restarts.

Hierarchical 3DGS (HierarchicalGS)

Problem: city-scale or campus-scale scenes cannot fit in GPU memory.

Solution: partition the scene into tiles; each tile holds its own Gaussian set. At render time, only tiles visible from the current camera are loaded and rendered. Level-of-detail (LOD) rendering additionally uses coarser Gaussian representations for distant tiles, avoiding aliasing from undersampled fine-grained structure.

This extends 3DGS from room-scale to city-scale while maintaining real-time performance on high-end hardware.

Relighting (R3GW)

Base 3DGS bakes illumination into the learned colors. This means the scene looks wrong under novel lighting conditions.

Relighting methods decompose appearance into:

A reflectance/material component (BRDF)
An illumination component (environment map)

The BRDF and environment map are optimized jointly with the Gaussian geometry. At inference, you can swap in a new environment map and re-render with physically accurate new lighting. This enables reconstruction from in-the-wild images (outdoor scenes with varying sun positions across capture sessions).

GAN-Based Synthesis (CGSGAN)

Rather than reconstructing from multi-view images, generative models can synthesize Gaussian point clouds from a latent vector $z$ . A GAN is trained to produce plausible 3DGS representations of human faces.

GAN inversion: fit a real person's face by finding the latent $z^*$ whose generated Gaussian cloud best matches input images. Once inverted, edit the face by moving $z^*$ in latent space (change expression, age, identity).

Prompt-Based GS Editing

Given an existing 3DGS scene, a user specifies a text edit: "Turn the teddy bear into a golden bear." The editing pipeline:

Render images from multiple views.
Apply a text-driven image editing model (e.g., InstructPix2Pix) to produce edited images.
Fine-tune the 3DGS representation to match the edited images.

The explicit nature of 3DGS (individual movable Gaussians) also allows direct geometric edits: duplicating, deleting, or translating groups of Gaussians with 3D selection tools.

References

Huang et al. 2024 — 2D Gaussian Splatting for Geometrically Accurate Radiance Fields

Guedon & Lepetit 2024 — SuGaR: Surface-Aligned Gaussian Splatting for Efficient 3D Mesh Reconstruction

Wu et al. 2024 — 4D Gaussian Splatting for Real-Time Dynamic Scene Rendering

Kerbl et al. 2024 — A Hierarchical 3D Gaussian Representation for Real-Time Rendering of Very Large Datasets

Corona et al. 2026 — R3GW: Relightable 3D Gaussians for Outdoor Scenes in the Wild

Barthel et al. 2025 — CGS-GAN: 3D Consistent Gaussian Splatting GANs for High Resolution Human Head Synthesis

Wang et al. 2024 — GaussianEditor: Editing 3D Gaussians Delicately with Text Instructions

Previous Next →