3D Gaussian Splatting · End-to-End Scene Reconstruction

The .splat Binary Format: Opacity Sorting, SH Coefficients & Byte Layout

14 min read
By the end of this reading you will be able to:
  • Decode the 32-byte .splat format layout — position (12 B), scale (12 B), color+alpha (4 B), rotation (4 B) — and identify how each attribute is quantized from float32
  • Explain the normalization applied to the DC spherical harmonic coefficient for .splat uint8 color storage, including the SH₀ constant factor and the sigmoid-to-uint8 opacity encoding
  • Explain the opacity-weighted volume sort heuristic V_k·α_k and state why it produces a view-independent Gaussian ordering suitable for real-time rendering without per-frame sorting
  • Implement a field-aware PLY-to-.splat converter that detects whether a PLY file is a standard 3DGS PLY or a simple point cloud and dispatches to the correct conversion path

Two Formats, One Pipeline

A completed 3DGS training run produces a .ply file — a standard polygon mesh container storing every Gaussian's full attribute set. A web viewer such as antimatter15's splat.js consumes a .splat file — a compact binary format designed for streaming and real-time rendering in the browser. Converting between them requires understanding both formats precisely, and the conversion encodes several non-trivial mathematical decisions.

The 3DGS .ply Attribute Schema

A 3DGS .ply file stores one vertex per Gaussian with the following fields (59 floats total):

Field group Field names Count Semantics
Position x, y, z 3 World-space mean μ\boldsymbol{\mu}
Normals nx, ny, nz 3 Unused in 3DGS; always zero
DC SH f_dc_0, f_dc_1, f_dc_2 3 Zeroth-order spherical harmonic coefficients for RGB
AC SH f_rest_0, …, f_rest_44 45 Higher-order view-dependent SH coefficients
Opacity opacity 1 Pre-sigmoid opacity oo (raw logit)
Scale scale_0, scale_1, scale_2 3 Log-scale axes logsk\log s_k
Rotation rot_0, rot_1, rot_2, rot_3 4 Unit quaternion q\mathbf{q} (raw, before normalization)

Total: 3+3+3+45+1+3+4 = 62 fields at float32 = 248 bytes per Gaussian.

A simple point cloud .ply (e.g., from COLMAP or a LiDAR scanner) has only x, y, z, red, green, blue. When you attempt to access scale_0 on such a file, you get:

ValueError: no field of name scale_0

The correct diagnostic is to inspect the available fields before assuming a format:

plydata = PlyData.read(path)
print(plydata['vertex'].data.dtype.names)
# ('x', 'y', 'z', 'red', 'green', 'blue')  ← simple cloud
# ('x', 'y', 'z', 'nx', 'ny', 'nz', 'f_dc_0', ..., 'rot_3')  ← 3DGS

The .splat Binary Format: 32 Bytes per Gaussian

The .splat format strips all data the browser renderer doesn't need and reencodes the remainder in a compact binary layout. Each Gaussian occupies exactly 32 bytes:

Bytes  0–11  │  Position     │  3 × float32   │  x, y, z (raw world-space)
Bytes 12–23  │  Scale        │  3 × float32   │  exp(scale_0), exp(scale_1), exp(scale_2)
Bytes 24–27  │  Color + α    │  4 × uint8     │  R, G, B (SH₀), α (sigmoid)
Bytes 28–31  │  Rotation     │  4 × uint8     │  q_w, q_x, q_y, q_z (normalized, encoded)

Total: 12 + 12 + 4 + 4 = 32 bytes vs. 248 bytes7.75× size reduction before any compression. The viewer reconstructs scale from the stored values via identity (already exp-activated); it reconstructs opacity via sigmoid (already applied); it reconstructs rotation by reversing the uint8 encoding.

The Zeroth-Order Spherical Harmonic Coefficient

The DC color fields f_dc_0, f_dc_1, f_dc_2 are the coefficients of the zeroth-order real spherical harmonic — the constant function on the sphere:

Y00(ω)=12π=SHC00.28209479177387814Y_0^0(\boldsymbol{\omega}) = \frac{1}{2\sqrt{\pi}} = SH_{C0} \approx 0.28209479177387814

This is derived from the normalization condition S2Y002dω=1\int_{S^2} |Y_0^0|^2 \, d\boldsymbol{\omega} = 1.

The full spherical harmonic color model in 3DGS is:

c(ω)=l=0Lm=llclmYlm(ω)\mathbf{c}(\boldsymbol{\omega}) = \sum_{l=0}^{L} \sum_{m=-l}^{l} \mathbf{c}_{lm} Y_l^m(\boldsymbol{\omega})

where L=3L=3 for the standard 3DGS SH degree. The 16 real SH basis functions for l=03l=0\ldots3 are shown in the interactive viewer below — green lobes indicate positive values, red indicate negative. Click any function to see its interpretation in the 3DGS context.

The DC contribution — the view-independent base color — is:

c0(ω)=fdcSHC0\mathbf{c}_0(\boldsymbol{\omega}) = \mathbf{f}_{dc} \cdot SH_{C0}

The .splat viewer uses only this DC term (no view-dependent AC), so the color encoding is:

R=clip ⁣((0.5+SHC0fdc,0)×255,0,255)R = \text{clip}\!\left( (0.5 + SH_{C0} \cdot f_{dc,0}) \times 255, \, 0, 255 \right)

The 0.5 offset ensures that f_dc = 0 maps to R = 128 (neutral gray), not black. The SHC0SH_{C0} scale ensures that fdc±1.77f_{dc} \approx \pm 1.77 drives the channel to 0 or 255 — the realistic range of trained 3DGS coefficients.

Opacity Encoding

The opacity field stores a pre-sigmoid logit oo. The activated opacity (probability of being opaque) is:

α=σ(o)=11+eo\alpha = \sigma(o) = \frac{1}{1 + e^{-o}}

In the .splat format, this is stored as a uint8:

opacity_byte=clip(α×255,0,255)\text{opacity\_byte} = \text{clip}(\alpha \times 255, 0, 255)

The browser decodes α=opacity_byte/255\alpha = \text{opacity\_byte} / 255 — no sigmoid needed at render time.

Scale Encoding

The scale_k fields store log-scale values — the log of each Gaussian's half-axis length. In the .splat format:

scale_floatk=exp(scale_k)\text{scale\_float}_k = \exp(\texttt{scale\_k})

This stores the activated (positive) scale directly as a float32. The Gaussian's 3D covariance is reconstructed in the browser as:

Σ=RSSR,S=diag(s0,s1,s2)\Sigma = R S S^\top R^\top, \quad S = \text{diag}(s_0, s_1, s_2)

where RR is the rotation matrix from the quaternion and sk=scale_floatks_k = \text{scale\_float}_k.

Quaternion Encoding

The rotation quaternion q=(q0,q1,q2,q3)\mathbf{q} = (q_0, q_1, q_2, q_3) is stored in the .ply as raw floats (not guaranteed to be unit norm due to training dynamics). The .splat encoder normalizes and then maps to uint8:

rot_normalized = q / np.linalg.norm(q)          # unit quaternion
rot_bytes = (rot_normalized * 128 + 128)          # map [-1,1] → [0,256]
             .clip(0, 255).astype(np.uint8)

Decoding inverts this: q=(byte128)/128\mathbf{q} = (\text{byte} - 128) / 128, followed by renormalization. The quantization error is ±1/1280.0078\pm 1/128 \approx 0.0078 per component — small enough that orientation artifacts are imperceptible for typical Gaussians.

The Opacity-Weighted Volume Sort

The .splat format requires pre-sorted Gaussians. Web viewer alpha compositing traverses the array in order, blending each Gaussian's color with the accumulated transmittance:

C=kckαkj<k(1αj)C = \sum_k \mathbf{c}_k \alpha_k \prod_{j < k} (1 - \alpha_j)

For depth-correct rendering, Gaussians should be sorted front-to-back or back-to-front relative to the viewer. However, a static .splat file has no per-frame depth information. Instead, the converter uses a view-independent importance heuristic:

sort keyk=exp(s0,k+s1,k+s2,k)1+exp(ok)=Vkαk\text{sort key}_k = \frac{\exp(s_{0,k} + s_{1,k} + s_{2,k})}{1 + \exp(-o_k)} = V_k \cdot \alpha_k

where:

  • Vk=exp(s0,k+s1,k+s2,k)=s0,kexps1,kexps2,kexpV_k = \exp(s_{0,k} + s_{1,k} + s_{2,k}) = s_{0,k}^{\exp} \cdot s_{1,k}^{\exp} \cdot s_{2,k}^{\exp} is proportional to the log-volume of the Gaussian ellipsoid
  • αk\alpha_k is the activated opacity

Gaussians are sorted descending by this key — large, opaque Gaussians first. The rationale: large opaque Gaussians dominate the visual contribution regardless of viewpoint. Rendering them early saturates the transmittance quickly, correctly obscuring smaller Gaussians behind them. Small or nearly transparent Gaussians contribute little regardless of order.

This is an approximation: it can produce artifacts at oblique views where a small-but-foreground Gaussian is obscured by a large-but-background one. Depth-sorted rendering (computed per frame on the GPU) would be exact but is too expensive for real-time browser use. The sort heuristic works well in practice for scenes with normal viewing distance distributions.

A Robust Field-Aware Converter

Production converters must handle both simple and 3DGS PLY files:

def detect_ply_type(plydata):
    fields = set(plydata['vertex'].data.dtype.names)
    required_3dgs = {'scale_0', 'scale_1', 'scale_2', 'opacity',
                     'f_dc_0', 'f_dc_1', 'f_dc_2',
                     'rot_0', 'rot_1', 'rot_2', 'rot_3'}
    if required_3dgs.issubset(fields):
        return '3dgs'
    elif {'red', 'green', 'blue'}.issubset(fields):
        return 'simple'
    else:
        raise ValueError(f"Unrecognized PLY schema: {fields}")

def convert(plydata, ply_type):
    if ply_type == '3dgs':
        return process_ply_to_splat(plydata)     # full encoder
    elif ply_type == 'simple':
        return process_simple_ply_to_splat(plydata)  # fixed-scale fallback

The simple fallback assigns unit scale and identity rotation — representing each point as a small spherical Gaussian — which is geometrically approximate but sufficient for point-cloud preview. This pattern — inspect schema, dispatch to type-specific handler, fail explicitly on unknown schema — will recur in Module 8 as a foundational enterprise application pattern.