The text version of my explanatory stream (Chinese with English CC) on gaussian splatting https://youtube.com/live/1buFrKUaqwM
- Introduction
- Foward pass
- placeholder
- Backward pass
- placeholder
This guide aims at deciphering the formulae in the rasterization process (forward and backward). It is only focused on these two parts, and I want to provide as many details as possible since here lies the core of the algorithm. I will paste related code from the original repo to help you identify where to look at.
If you see sections starting with 💡, it's something I think important to understand.
Before continuing, please read the original paper of how the gaussian splatting algorithm works in a big picture. Also note that the full algorithm has other important parts such as point densification and pruning which won't be covered in this article since I think those parts are relatively easier to understand.
The forward pass consists of two parts:
- Compute the attributes of each gaussian
- Compute the color of each pixel
Each gaussian holds the following raw attributes:
# https://github.com/graphdeco-inria/gaussian-splatting/blob/main/scene/gaussian_model.py#L47-L52
self._xyz = torch.empty(0) # world coordinate
self._features_dc = torch.empty(0) # diffuse color
self._features_rest = torch.empty(0) # spherical harmonic coefficients
self._scaling = torch.empty(0) # 3d scale
self._rotation = torch.empty(0) # rotation expressed in quaternions
self._opacity = torch.empty(0) # opacity
# they are initialized as empty tensors then assigned with values on
# https://github.com/graphdeco-inria/gaussian-splatting/blob/main/scene/gaussian_model.py#L215
To project the gaussian onto a 2D image, we must go through some more computations to transform the attributes to 2D:
First, from scaling
and rotation
, we can compute 3D covariance from the formula
// https://github.com/graphdeco-inria/diff-gaussian-rasterization/blob/main/cuda_rasterizer/forward.cu#L134-L138
glm::mat3 R = glm::mat3(
1.f - 2.f * (y * y + z * z), 2.f * (x * y - r * z), 2.f * (x * z + r * y),
2.f * (x * y + r * z), 1.f - 2.f * (x * x + z * z), 2.f * (y * z - r * x),
2.f * (x * z - r * y), 2.f * (y * z + r * x), 1.f - 2.f * (x * x + y * y)
);
and
// https://github.com/graphdeco-inria/diff-gaussian-rasterization/blob/main/cuda_rasterizer/forward.cu#L121-L124
glm::mat3 S = glm::mat3(1.0f); // S is a diagonal matrix
S[0][0] = mod * scale.x;
S[1][1] = mod * scale.y;
S[2][2] = mod * scale.z;
Note that S
is multiplied with a scale factor mod
that is kept as 1.0
during training.
In inference, this value (scaling_modifier
) and be modified on
# https://github.com/graphdeco-inria/gaussian-splatting/blob/main/gaussian_renderer/__init__.py#L18
def render(..., scaling_modifier = 1.0, ...):
to control the scale of the gaussians. In their demo they showed how it looks by setting this number to something <1 (shrinking the size). Theoretically this value can also be set >1 to increase the size.
💡 quote from the paper 💡
An obvious approach would be to directly optimize the covariance matrix Σ to obtain 3D Gaussians that represent the radiance field. However, covariance matrices have physical meaning only when they are positive semi-definite. For our optimization of all our pa- rameters, we use gradient descent that cannot be easily constrained to produce such valid matrices, and update steps and gradients can very easily create invalid covariance matrices.
The design of optimizing the 3D covariance by decomposing it to R
and S
separately is not a random choice. It is a trick we call "reparametrization". By making it expressed as
Next, we need to get 3 things: radius
, uv
and cov
(2D covariance, or equivalently its inverse conic
) which are the 2D attributes of a gaussian projected on an image.
We can get cov
by
// https://github.com/graphdeco-inria/diff-gaussian-rasterization/blob/main/cuda_rasterizer/forward.cu#L99-L106
glm::mat3 T = W * J;
glm::mat3 Vrk = glm::mat3(
cov3D[0], cov3D[1], cov3D[2],
cov3D[1], cov3D[3], cov3D[4],
cov3D[2], cov3D[4], cov3D[5]);
glm::mat3 cov = glm::transpose(T) * glm::transpose(Vrk) * T;
Let's put (remember the 2D and 3D covariance matrices are symmetric) for the calculation that we're going to do in the following.
Its inverse conic
(honestly I don't know why they've chosen such a bad variable name, calling it cov_inv
would've been 100x better) can be expressed as (actually it's a very useful thing to remember: to invert a 2D matrix, you invert the diagonal, put negative signs on the off-diagonal entries and finally put a 1/det
in front of everything).
// https://github.com/graphdeco-inria/diff-gaussian-rasterization/blob/main/cuda_rasterizer/forward.cu#L219
float det = (cov.x * cov.z - cov.y * cov.y);
// https://github.com/graphdeco-inria/diff-gaussian-rasterization/blob/main/cuda_rasterizer/forward.cu#L222-L223
float det_inv = 1.f / det;
float3 conic = { cov.z * det_inv, -cov.y * det_inv, cov.x * det_inv }; // since the covariance matrix is symmetric, we only need to save the upper triangle
💡 A small trick to ensure the numerical stability of the inverse of cov
💡
// https://github.com/graphdeco-inria/diff-gaussian-rasterization/blob/main/cuda_rasterizer/forward.cu#L110-L111
cov[0][0] += 0.3f;
cov[1][1] += 0.3f;
By construction, cov
is only positive semi- definite (recall that it's in the form
Here we add 0.3
to the diagonal to make it invertible. Why is this true? Let's put
Having cov
in hand, we can now proceed to compute the radius
of a gaussian.
Theoretically, when projecting an ellipsoid onto an image, you get an ellipse, not a circle. However, storing the attributes of an ellipse is much more complicated: you need to store the center, the long and short axis lengths and the orientation; whereas for a circle, you only need its center and the radius. Therefore, the authors choose to approximate the projection with a circle circumscribing the ellipse (see the following figure). This is what the radius
attribute represents.
How to get the radius
from cov
? Let's make analogy from the 1-dimensional case.
Imagine we have a 1D gaussian like the following:
How can we define the "radius" of such a gaussian? Intuitively, it is some value
Fortunately, the analogy applies to any dimension, just be aware that the "radius" is different along each axis (remember there are two axes in an ellipse).
We said
I could've given you the answer directly, but out of personal preference (I ❤️ linear-algebra), I want to detail it more. First of all, for a square matrix
In general, to calculate all eigenvalues of cov
matrix we have above, this equation can be expressed as
The solutions (eigenvalues) are lambda1
and lambda2
in the following code
// https://github.com/graphdeco-inria/diff-gaussian-rasterization/blob/main/cuda_rasterizer/forward.cu#L219
float det = (cov.x * cov.z - cov.y * cov.y); // this is a*c - b*b in our expression
...
// https://github.com/graphdeco-inria/diff-gaussian-rasterization/blob/main/cuda_rasterizer/forward.cu#L229-L231
float mid = 0.5f * (cov.x + cov.z);
float lambda1 = mid + sqrt(max(0.1f, mid * mid - det)); // I'm not too sure what 0.1 serves here
float lambda2 = mid - sqrt(max(0.1f, mid * mid - det));
Then we finally get radius
as 3 times the square root of the bigger eigenvalue:
https://github.com/graphdeco-inria/diff-gaussian-rasterization/blob/main/cuda_rasterizer/forward.cu#L232
float my_radius = ceil(3.f * sqrt(max(lambda1, lambda2))); // ceil() to make it at least 1 because we operate in pixel space
Last thing, which is probably the most obvious, is the uv
(image coordinates) of the gaussian. It is done via a simple projection from the 3D center:
// https://github.com/graphdeco-inria/diff-gaussian-rasterization/blob/main/cuda_rasterizer/forward.cu#L197-L200
float3 p_orig = { orig_points[3 * idx], orig_points[3 * idx + 1], orig_points[3 * idx + 2] };
float4 p_hom = transformPoint4x4(p_orig, projmatrix);
float p_w = 1.0f / (p_hom.w + 0.0000001f);
float3 p_proj = { p_hom.x * p_w, p_hom.y * p_w, p_hom.z * p_w };
...
// https://github.com/graphdeco-inria/diff-gaussian-rasterization/blob/main/cuda_rasterizer/forward.cu#L233
float2 point_image = { ndc2Pix(p_proj.x, W), ndc2Pix(p_proj.y, H) }; // I like to call it uv
Phew, we finally got the three quantities we need to know: radius, uv and conic. Let's move on to the next part.
Before computing the color of an image, the authors introduces a special but very effective way that significantly accelerates rendering. Specifically, we divide the whole image into tiles
which are 16x16 pixel blocks like the following (the tiles might exceed image borders if height/width is not a multiple of 16):
We also order the tiles in row-major order (left-top is tile 0, the one on its right is 1, etc). The number below the tile number is its tile coordinates.
Then, we compute which tiles each gaussian covers by using the uv
and radius
computed above. See the following figure: