-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Accelerate Instant-NGP inference #197
Conversation
nerfacc/cuda/csrc/grid.cu
Outdated
num_steps += tid; | ||
continuous_resume += tid; | ||
t_starts += tid * N_samples; | ||
t_ends += tid * N_samples; | ||
valid_mask += tid * N_samples; | ||
ray_indices += tid * N_samples; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't use +=
anymore because of the forloop:
for (int32_t tid = blockIdx.x * blockDim.x + threadIdx.x; tid < n_rays; tid += blockDim.x * gridDim.x)
It would be wrong if +=
is executed twice within the forloop.
use something like valid_mask[tid * Nsamples]
to read/write the data
Thanks for implementing this! It is pretty nice to have that support |
On the high level, the two ways of ray marching are pretty similar to each other: The "train" way of marching is to take N rays and march all the steps for each ray. The "test" ways of marching is to take all rays and march N steps for each ray (and iterative). I feel it should be not that hard to unify the API (as well as the implementation) for the two. To be more concrete, implementation differences between the two are:
So maybe we can unify them into the same "traverse_grid" function, with extra arguments (max_per_ray_samples=inf, masks=None, t_sorted=None, t_indices=None, hits=None). And for "traverse_grid_test", your can just call that function with an updated "near_planes" at every iteration of marching. In this case, I think it makes sense to let the CUDA kernel return an extra tensor (n_rays,) that indicates the termination distance during grid traversal, which is essentially the "near_planes" for the next iteration of "traverse_grid_test". (the |
Sound good! I think we could unify the API using extra arguments "(max_per_ray_samples=inf, masks=None, t_sorted=None, t_indices=None, hits=None)". Nice idea, BTW! We also need to unify the return values. I suggest using the data structure defined in "data_spect.h" to store As for |
I think you can use the RaySegmentsSpec just like what is being used in the
Do you mean that you inplace change the value of it? I would suggest against doing inplace modification as it is not quite user-friendly. |
I have unified the "traverse_grid" API, and now both "train" and "test" can use the same Python function. On the low level, we still need to call separate C functions to launch the CUDA kernel. Note that the "traverse_grid" function now returns three objects (intervals, samples, termination_planes), and "termination_planes" will be just |
add test mode for traverse_grids
This reverts commit c93eaad.
@Linyou The latest commit should resolve the memory concerns we had before. The test is also updated to match with the actual use case. Lmk what do you think. |
Thanks! I believe that the current API design is now highly usable for test mode rendering, thanks to the latest commit. BTW, after this PR is merged, I will create a new one for ngp test mode rendering in the examples. |
@Linyou I also did some cleanups for |
Comments addressed. Ready to Go? @Linyou |
@liruilong940607 Yeah! All good! |
Thanks for the patience!! Shipped! |
This PR enhances Nerfacc's Instant-NGP inference performance by implementing the following API changes:
traverse_grids
function has been modified to support both train and test modes.mark_invisible_cells
has been added to theocc_grid
module in order to prevent rendering artifacts in unseen spaces.