3D Gaussian Splatting — How It Works

What Is 3D Gaussian Splatting?

3D Gaussian Splatting (3DGS), introduced by Kerbl et al. at SIGGRAPH 2023, is a technique for representing and rendering 3D scenes as collections of millions of semi-transparent 3D Gaussian ellipsoids. Like Neural Radiance Fields, it takes photographs of a scene and produces a representation capable of rendering photorealistic novel views. Unlike NeRF, it achieves real-time rendering (30 to 200+ FPS at 1080p) by replacing neural network evaluation with a fast, GPU-friendly rasterization-style pipeline.

Each "Gaussian" in the scene is a small, oriented 3D blob defined by a position (where it is in space), a covariance matrix (its shape and orientation — from tiny spheres to stretched ellipsoids), a color (represented as spherical harmonics for view-dependent appearance), and an opacity. Hundreds of thousands to millions of these Gaussians collectively represent surfaces, textures, and fine details throughout the scene.

3DGS has rapidly become one of the most actively researched topics in computer graphics and vision because it combines NeRF-quality visual results with the real-time performance that interactive applications demand.

How It Works

The 3DGS pipeline has three stages: initialization, optimization, and rendering.

Initialization

A sparse point cloud is generated from the input photographs using structure-from-motion (typically COLMAP). Each point becomes the center of an initial 3D Gaussian with small, isotropic (spherical) covariance, uniform color, and moderate opacity. This gives the optimizer a reasonable geometric starting point.

Optimization

The Gaussians' parameters — position, covariance (shape and orientation), color (spherical harmonic coefficients), and opacity — are optimized through gradient-based training, similar in spirit to neural network training:

Differentiable rendering — The Gaussians are projected onto a training camera's image plane. Each 3D Gaussian projects to a 2D Gaussian (an ellipse) on screen. These 2D ellipses are composited front-to-back using alpha blending — this is the "splatting," where each Gaussian splats its color contribution onto the image.
Loss computation — The rendered image is compared to the real training photograph using a combination of L1 loss and a structural similarity (SSIM) loss.
Backpropagation — Gradients flow back through the differentiable renderer to update each Gaussian's parameters. Over thousands of iterations, the Gaussians reshape, recolor, and reposition themselves to match the training views.
Adaptive density control — Periodically during training, the optimizer adds new Gaussians in under-reconstructed regions (where rendering error is high), splits large Gaussians into smaller ones for finer detail, and removes nearly-transparent Gaussians that contribute little. This adaptive process is key to achieving both efficiency and quality.

Rendering

Once optimized, rendering a novel view is straightforward and fast:

Sort Gaussians by depth relative to the camera (tile-based sorting on GPU)
Project each 3D Gaussian to a 2D screen-space ellipse
For each pixel, alpha-composite the overlapping Gaussian contributions front-to-back

This pipeline is implemented entirely on the GPU using custom CUDA kernels (or Vulkan compute shaders in some implementations), achieving frame rates of 30 to 200+ FPS depending on scene complexity and resolution.

Strengths

The most compelling advantage of 3DGS is real-time rendering quality on par with state-of-the-art NeRF methods. The SIGGRAPH 2023 paper demonstrated equivalent or superior visual quality to Mip-NeRF 360 while rendering over 100 times faster. This combination of quality and speed opens applications that NeRF cannot serve: real-time scene exploration, VR/AR experiences, game asset creation, and interactive digital twins.

The explicit point-based representation is also more editable and inspectable than a neural network. Gaussians can be directly manipulated — moved, deleted, recolored — enabling scene editing workflows. The representation can be exported, streamed, and compressed, making it more practical for deployment than a black-box MLP.

Training is also relatively fast (typically 5 to 30 minutes on a modern GPU), compared to hours for the original NeRF, though Instant-NGP has closed this gap for NeRFs.

Tradeoffs

3DGS scenes can have large memory footprints. A well-optimized scene may contain 1 to 5 million Gaussians, each with dozens of parameters (position, full covariance matrix, spherical harmonic coefficients, opacity). Uncompressed, this can reach hundreds of megabytes to gigabytes. Compression and level-of-detail techniques are active research areas addressing this limitation.

Like NeRF, the quality depends heavily on input data: camera coverage, image quality, and accurate pose estimation. Textureless regions, reflective surfaces, and thin structures can be challenging to reconstruct faithfully.

The technique is also relatively new (2023), so the ecosystem of tools, best practices, and production workflows is still maturing. Integration with traditional 3D pipelines — game engines, DCC tools, web viewers — is improving rapidly but not yet seamless.

Rendering quality can suffer in regions far from any training viewpoint — the Gaussians optimize to reproduce training views and may not extrapolate well to very different camera positions or wide baselines. The discrete nature of the representation can also produce visible artifacts at extreme close-ups where individual Gaussians become distinguishable.

History

3D Gaussian Splatting was introduced by Kerbl et al. at SIGGRAPH 2023, immediately attracting massive attention for its combination of quality and speed. The paper built on decades of earlier work in point-based graphics — Zwicker et al.'s surface splatting (2001), Botsch et al.'s point-based rendering survey (2005), and Yifan et al.'s differentiable surface splatting (2019). Within months of publication, dozens of follow-up works extended the technique to dynamic scenes, text-to-3D generation, SLAM, autonomous driving, avatar creation, and more. The approach revived interest in explicit point-based representations that had been overshadowed by NeRF's implicit neural approach, and the two paradigms are now actively cross-pollinating: hybrid methods combine Gaussian primitives with neural features for the best of both worlds.

Renderers Using 3D Gaussian Splatting

3D Gaussian SplattingPython/CUDA 3DGS.cppC++/Vulkan gaussian-splatting-lightningPython/CUDA gsplatPython/CUDA LichtFeld StudioC++/CUDA NerfstudioPython

Renderers Using 3D Gaussian Splatting

Further Reading