You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The peak probability changes with $\sigma$ due to the normalization term $\frac{1}{\sigma\sqrt{2\pi}}$. For gaussian splatting purposes, this means that the opacity of the gaussian is dependent on the size of the gaussian. Large gaussians will require a very high opacity factor to make them visible and small gaussians may oversaturate the image. Additionally, this makes it difficult to compare opacity across different sized gaussians which is required for the opacity reset and delete in the adaptive control algorithm.
Dropping the normalization term decouples the density from the opacity of the gaussian.
Where $\boldsymbol{\sigma}$ and $\boldsymbol{\mu}$ are the vector standard deviation and means and $\Sigma$ is the covariance matrix. The covariance matrix is a symmetric and positive semi-definite $NxN$ matrix where N is the number of dimensions in the distribution.
Optimizing 3D Gaussians
The 3D gaussians are represented by a $3x3$ covariance matrix. The symmetry of the matrix can be maintained by only optimizing the 6 parameters in the upper triangular portion of the matrix but it is much more difficult to constrain the matrix to be positive semi-definite. Instead of optimizing the 3D covariance matrix directly, the authors construct the matrix representation of an ellipsoid from 3 scale terms and a 3D rotation. This process is effectively an inverse Principal Component Analysis.
The covariance matrix can be decomposed into its eigenvalues and eigenvectors:
$$\Sigma\boldsymbol{v}=\lambda\boldsymbol{v}$$
Where $\boldsymbol{v}$ is an eigenvector of $\Sigma$ and $\lambda$ is the corresponding eigenvalue. This creates the following system of equations:
In matrix form where the columns of $V$ are the eigenvectors:
$$\Sigma V = VL$$
Rearranging yields:
$$\Sigma = VLV^{-1}$$
In PCA, the eigenvectors define the direction of largest variance and the eigenvalues define magnitude of the variance. The eigenvector matrix is equivalent to the rotation matrix of the axes of largest variance to the starting reference frame. Rotation matrices/eigenvectors are orthogonal so the inverse is equal to the transpose.
The $3x3$ rotation matrix can be represented with a quaternion leaving 7 total parameters to be optimized: $q_w, q_x, q_y, q_z$ and $s_1, s_2, s_3$. The resulting covariance matrix is guaranteed to be positive semi-definite since $\lambda_n = s_{n}^{2}$ and therefore all eigenvalues are non-negative.
Projecting 3D Gaussians
The 3D covariance matrix into a 2D covariance matrix by:
$$\Sigma_2 = JW\Sigma_3W^TJ^T$$
Where W is the $3x3$ rotation matrix representing the viewing transform and J is the $2x3$ "Jacobian of the affine approximation of the projective transformation". With large focal lengths and small gaussians, the approximation should work well.
Note: in the CUDA implementation, the matrix representation of the conic is used instead of a 2D covariance matrix. The 2D covariance matrix is symmetric and only has 3 degrees of freedom. Since $b=c$ is always true, the 2D covariance matrix can be compactly stored as 3 variables $a, b, c$.
Tile Based Rasterization
Evaluating the probability of every gaussian at every pixel in the image would be too slow for real-time rendering. Luckily, most gaussians only cover a small portion of the rendered image in order to accurately reconstruct the fine detail in the scene. The authors choose to break the image down in to 16 by 16 pixel tiles and only render the gaussians that have significant contribution for each tile.
Currently, the output images are converted to uint8 and thus only have a resolution of $1/255 \approx 0.00392$ which is very close to probability of a gaussian distribution at $3\sigma$. The mapping of gaussian to tiles is computed by finding the intersection of the gaussian distribution at $3\sigma$ and the tiles.
An oriented bounding box of the ellipse at $3\sigma$ can be computed from the 2D covariance matrix:
$$\Sigma_2 = \begin{bmatrix} a & b \\ c & d \end{bmatrix}$$
First, compute the two eigenvalues for a 2x2 symmetric matrix:
$$ \lambda_1 = \frac{a + d + \sqrt{(a-d)^2 + 4bc}}{2} $$
and
$$ \lambda_2 = \frac{a + d - \sqrt{(a-d)^2 + 4bc}}{2} $$
The four corner points of the oriented bounding box can be constructed from $r_1, r_2, \theta$ by initializing the 4 corner points in the axis-aligned bounding box and rotating them with the 2D rotation matrix:
The intersection between the oriented bounding box and the tile can be computed by using the Separating Axis Theorem. In this case, the SAT can be simplified since each bounding box has two sets of parallel axes - only one of the two need to be checked. Additionally, the axes of the tile are axis-aligned eliminating two projection steps.
This example image shows 3 gaussians splatted on the image. The green boxes are the oriented bounding boxes at $3\sigma$ and the tiles that intersect are colored in white.
Alpha Compositing
The RGB value of each pixel is computed by $\alpha$ blending the gaussians from front-to-back. The rgb values of each pixel can be computed with:
$$C(u, v) = \sum_{i=1}^{N} \alpha_{i}c_{i}w_{i}$$
$$ w_{i} = (1 - \sum_{j=0}^{i-1}\alpha_{j}w_j )$$
$$ w_0 = 1 $$
$$ \alpha_{i} = o_i g_{i}(u, v)$$
Where $c_{i}$ and $o_i$ are the color and opacity of the $i^{th}$ gaussian and $g_{i}(u, v)$ is the probabilty of the the $i^{th}$ gaussian at the pixel coordinates $u, v$.
Backward Pass
For implemented forward/backwards passes in PyTorch - see analytic_diff.ipynb
Notation
For simplicity, all gradients are denoted by $\nabla$
Camera Projection
Computing the reverse-mode derivatives for the camera projection is fairly straightforward. For the forward projection:
For a normalized quaternion $\hat{q}$ (Note: the notation is a little sloppy here. In this section $w$, $x$, $y$, and $z$ are the components of $\hat{q}$ not $q$ unlike the previous section):
$$R = \begin{bmatrix}- 2 y^{2} - 2 z^{2} + 1 & - 2 w z + 2 x y & 2 w y + 2 x z \\ 2 w z + 2 x y & - 2 x^{2} - 2 z^{2} + 1 & - 2 w x + 2 y z \\ -2 w y + 2 x z & 2 w x + 2 y z & - 2 x^{2} - 2 y^{2} + 1\end{bmatrix} $$
Computing the Jacobians and computing the vector-Jacobian product:
$$ \nabla{w}= \nabla{R}^T\begin{bmatrix}0 & - 2 z & 2 y \\ 2 z & 0 & - 2 x \\ -2 y & 2 x & 0\end{bmatrix} $$
$$ \nabla{x} = \nabla{R}^T\begin{bmatrix}0 & 2 y & 2 z \\ 2 y & - 4 x & - 2 w \\ 2 z & 2 w & - 4 x\end{bmatrix} $$
$$ \nabla{y} = \nabla{R}^T \begin{bmatrix}- 4 y & 2 x & 2 w \\ 2 x & 0 & 2 z \\ -2 w & 2 z & - 4 y\end{bmatrix} $$
$$ \nabla{z} = \nabla{R}^T \begin{bmatrix}- 4 z & - 2 w & 2 x \\ 2 w & - 4 z & 2 y \\ 2 x & 2 y & 0\end{bmatrix} $$
The expression can be further simplified since $S$, $\Sigma_{3D}$, and $\nabla{\Sigma_{3D}}$ are all symmetric:
$$\nabla{R} = 2\nabla{\Sigma_{3D}}RSS$$
$$\nabla{S} = 2R^{T}\nabla{\Sigma_{3D}}RS$$
2D Covariance Matrix/Conic
Using the First Quadratic Form from An extended collection of matrix derivative results in section 2.3.2:
Evaluated at the pixel $(u, v)$ and using the conic representation of the 2D covariance matrix:
$$ d_M^2 = \begin{bmatrix} u - \mu_u \\ v - \mu_v \end{bmatrix}^T \begin{bmatrix} a & b \\ b & c \end{bmatrix}^{-1}\begin{bmatrix} u - \mu_u \\ v - \mu_v \end{bmatrix}$$
$$ d_M^2 = \frac{1}{(ac - 2b)}\begin{bmatrix} u - \mu_u \\ v - \mu_v \end{bmatrix}^T \begin{bmatrix} c & -b \\ -b & a \end{bmatrix}\begin{bmatrix} u - \mu_u \\ v - \mu_v \end{bmatrix}$$
Simplifying with:
$$ \Delta u = u - \mu_u $$
$$ \Delta v = v - \mu_v $$
$$ d_M^2 = \frac{a \Delta v^{2} - 2b \Delta u \Delta v + c \Delta u^{2}}{(ac - 2b)}$$
Using the patterns in the above steps, the gradients for compositing N gaussians can be computed. For the color gradient, this is straightforward when computing $\nabla{C_{n}}$ iteratively starting from $n=0$:
The $\alpha$ gradients are trickier as $\nabla{\alpha_0}$ and $\nabla{\alpha_n}$ are both dependent on all $\alpha$ values. The final $\alpha$ value is the easiest to calculate:
The numerator of the fraction is now the accumulated color from the current gaussian to the last gaussian.
$$ \sum_{i=m+1}^{n}C_i\alpha_{i} w_i $$
By computing the gradients from back to front the weight and accumulated color can be efficiently computed by saving the final weight in the forward pass.