-
Notifications
You must be signed in to change notification settings - Fork 349
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fast SH implementation #165
Conversation
This is very cool! And I also verified the diff is basically zero comparing with the original one. basis_dim = 25
dirs = torch.rand((100000, 3), device="cuda")
dirs = dirs / dirs.norm(dim=-1, keepdim=True)
result1 = eval_sh_bases(basis_dim, dirs)
result2 = eval_sh_bases_fast(basis_dim, dirs)
diff = (result1 - result2).abs().mean(dim=0)
>> tensor([0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 7.1335e-09,
2.3338e-08, 7.0950e-09, 0.0000e+00, 1.2602e-08, 8.1815e-09, 1.8609e-08,
3.7991e-08, 1.8580e-08, 0.0000e+00, 1.2594e-08, 1.0718e-08, 1.6333e-08,
1.4538e-08, 2.3519e-08, 3.1723e-08, 2.3399e-08, 1.2196e-08, 1.6297e-08,
1.3505e-08], device='cuda:0') |
@jb-ye I add some profile code in the [Fwd] Method: poly, ellipsed: 2.19 ms
[Bwd] Method: poly, ellipsed: 35.19 ms
[Fwd] Method: fast, ellipsed: 2.46 ms
[Bwd] Method: fast, ellipsed: 35.16 ms Somehow I couldn't get speedup on my end. Could you run the profiler on your end and post the results? |
@liruilong940607 Your previous profile code has a typo ... after fixing it, this is what I got:
|
Oh yeah you are right thanks for the fix! It now makes sense. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! A few minor comments
z2 = z * z | ||
|
||
fTmpA = -0.48860251190292 | ||
result[..., 2] = 0.4886025119029199 * z |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it intentional that this 0.4886025119029199
does not equal to -fTmpA
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just for matching the sample code from the original authors. They should be equal to -fTmpA.
How to activate this feature with splatfacto? @jb-ye ? I did not find any speed improvement, maybe I did it wrong because it was not activated? black line = newest main gsplat (included this merged PR) |
@ichsan2895 It doesn't bring speed-ups for training since the bwd calls take rough the same time. It may increase ns-viewer fps rate. |
I recently read this paper: https://jcgt.org/published/0002/02/06/ (it has sample codes) there is a faster way to evaluate spherical harmonics. I made an attempt to implement the idea in gsplat and was able to observe 40% time saving in forward calls of degree 3 and 50% time saving of degree 4.