Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GUI and Memory Constraints #36

Closed
zshureih opened this issue Jan 18, 2022 · 10 comments
Closed

GUI and Memory Constraints #36

zshureih opened this issue Jan 18, 2022 · 10 comments

Comments

@zshureih
Copy link

Hi all, thanks so much for the hard work, this repo is really impressive.

I just have two quick questions regarding scaling the models

  1. In the demo video I noticed that training and rendering the NeRF Lego model used about 10.8 GB of VRAM, and I see everywhere that the software was developed with a 3090. Is there a relatively easy way to scale down the resolution of the NeRF and SDF models so that less VRAM is required?

  2. I've got the repo installed and running on a remote Ubuntu 20.04 server with a RTX 3070 8GB graphics card. When forwarding the output to my local machine, frame rates are incredibly low (less than 1 per second). I'm not sure if this is tied to my local machine's hardware, or the network we are connecting over.

@Tom94
Copy link
Collaborator

Tom94 commented Jan 18, 2022

I just added CLI options to reduce the GUI resolution. E.g. --width 1280 --height 720 should reduce memory consumption by ~500 MB.

You can additionally load fewer training images, e.g. by specifying --scene lego/traingforms_train.json rather than also loading the test and validation images. This will further significantly cut down on memory usage.

Regarding your second question: if you're running in SDF mode and haven't got OptiX configured, slow training times like this are expected. Otherwise, it's likely the network.

@w-m
Copy link

w-m commented Jan 18, 2022

Exciting project! And thank you for giving support on GitHub.

Unfortunately I'm unable to run the fox scene on a Titan Xp (SM 61) with 12 GB RAM, on Ubuntu 18.04, CUDA 11.6. (Building with OptiX 7.4 failed, so I disabled OptiX edit: fixed). I'm rendering to a VirtualGL output, this works fine for the SDF and image examples.

So the RAM requirement for NeRFs is much higher than 8 GB, when using older cards and falling back to CulassMLP?

Reducing the --width and --height does nothing (not even to 128x72).

Output:

18:51:21 INFO     Loading NeRF dataset from
18:51:21 INFO       data/nerf/fox/transforms.json
18:51:21 SUCCESS  Loaded 50 images of size 1080x1920 after 0s
18:51:21 INFO       cam_aabb=[min=[0.5,3.29106e+30,0.5], max=[0.5,3.29106e+30,0.5]]
18:51:21 INFO     Loading network config from: configs/nerf/base.json
18:51:21 INFO     GridEncoding:  Nmin=16 b=1.51572 F=2 T=2^19 L=16
Warning: FullyFusedMLP is not supported for the selected architecture 61. Falling back to CutlassMLP. For maximum performance, raise the target GPU architecture to 75+.
Warning: FullyFusedMLP is not supported for the selected architecture 61. Falling back to CutlassMLP. For maximum performance, raise the target GPU architecture to 75+.
18:51:21 INFO     Density model: 3--[HashGrid]-->32--[FullyFusedMLP(neurons=64,layers=3)]-->1
18:51:21 INFO     Color model:   3--[SphericalHarmonics]-->16+16--[FullyFusedMLP(neurons=64,layers=4)]-->3
18:51:21 INFO       total_encoding_params=13074912 total_network_params=9728
18:51:22 ERROR    Uncaught exception: Could not allocate memory: CUDA Error: cudaMalloc(&rawptr, n_bytes+DEBUG_GUARD_SIZE*2) failed with error out of memory

Disabling GUI has no effect for nerf/fox.

When using nerf_synthetic/lego, it starts running, with 11.8GB GPU memory used. When removing the --no_gui flag on any of the nerf_synthetic samples, it crashes with a 'terminated by signal SIGBUS (Misaligned address error).

@myagues
Copy link
Contributor

myagues commented Jan 19, 2022

It seems that memory consumption is different across architectures, probably due to the CutlassMLP fallback.
These are all runs on the same docker container I posted in #20, but using a base image of nvidia/cuda:11.3.1-devel-ubuntu20.04 and -DNGP_BUILD_WITH_GUI=OFF running $ ./build/testbed --scene data/nerf/fox.

TCNN Driver / CUDA Version Mem usage (empty -> running fox)
RTX 3070 86 470.82.00 / 11.4 24MiB -> 6757MiB / 7982MiB
RTX 2080Ti 75 495.29.05 / 11.5 1MiB -> 6736MiB / 11019MiB
RTX 1080Ti 61 470.82.01 / 11.4 19MiB -> OOM / 11176MiB
RTX P100 61 470.86 / 11.4 4MiB -> 14147MiB / 16280MiB

@Tom94
Copy link
Collaborator

Tom94 commented Jan 19, 2022

To shed some more light on the increased memory usage: there are two factors at play, actually.

  1. Some of the older GPUs can't operate as efficiently on FP16 as they can on FP32, so instant-ngp runs everything at float precision that would otherwise use __half for compute capabilities 61 as well as <=52. I believe this is the bulk of the increased memory usage. That said,
  2. CutlassMLP indeed requires extra RAM to hold intermediate activations of the neural network, which the FullyFusedMLP avoids by operating purely on registers and shared mem. I've pushed a PR to tiny-cuda-nn that improves CutlassMLP memory usage somewhat, but it won't bring the usage in line with tensor core enabled GPUs.

@Tom94
Copy link
Collaborator

Tom94 commented Jan 30, 2022

Just a heads-up that memory requirements are down by ~1gb now. (#99) This'll hopefully make it easier to get things going on 8gb cards.

@Tom94
Copy link
Collaborator

Tom94 commented Feb 11, 2022

For everyone following this: instant-ngp now requires vastly less memory (fox is down to 2.25 GB on my machine), so it's perhaps worth re-trying it if it previously didn't run on your system.

The technical reason for the reduced memory usage is a custom memory allocator that exploits the GPU's virtual memory capabilities. The allocator is now part of tiny-cuda-nn and permits low-overhead allocs/deallocs of stream-ordered temporary storage. (Though not the same as CUDA's own stream-ordered allocations, which are slower and were therefore not an option.) This allows for much more re-use.

@Tom94 Tom94 closed this as completed Feb 11, 2022
@useronym
Copy link
Contributor

I can confirm that the fox dataset can now be loaded on a GTX 1080 using around 5 GB of video memory, great job!

@cduguet
Copy link
Contributor

cduguet commented Feb 23, 2022

My images are 2160x3840, and I wanted to run COLMAP with high-resolution images to improve its accuracy. However I don't need that much resolution to run NeRF. I guess I could manually resize the images and edit transforms.json accordingly. But maybe you/someone has done this already?

Are you aware of a script to downsample the output of COLMAP to lower memory consumption in NeRF?

@3a1b2c3
Copy link

3a1b2c3 commented May 14, 2022

Anybody has tried running in cloud? https://au.pcmag.com/graphics-cards/91529/no-gpu-nvidias-rtx-3080-powered-cloud-gaming-service-is-now-open-to-all
my 4 GB card doesnt stand a chance

@gvbgeomatics
Copy link

Anybody has tried running in cloud? https://au.pcmag.com/graphics-cards/91529/no-gpu-nvidias-rtx-3080-powered-cloud-gaming-service-is-now-open-to-all my 4 GB card doesnt stand a chance

Yes. We have tested and configured it on cloud. It works fine, you can also try. Check this out:
https://nebulacloud.ai/connect/blogs/how-to-use-nvidia-instant-nerf-on-nebula-cloud

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants