Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] Implementing a large number of Cameras #173

Closed
jnzhnd opened this issue Dec 14, 2023 · 7 comments
Closed

[Question] Implementing a large number of Cameras #173

jnzhnd opened this issue Dec 14, 2023 · 7 comments
Labels
question Further information is requested

Comments

@jnzhnd
Copy link

jnzhnd commented Dec 14, 2023

Main Challenge

I'm currently trying to setup an environment where there is a large number of cameras, all taking a single picture at a single point in time. When I initially tried to do that, my simulation crashed with the either one of the following error messages:

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 160.00 MiB (GPU 0; 23.68 GiB total capacity; 8.44 MiB already allocated; 57.12 MiB free; 22.00 MiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
RuntimeError: Array allocation failed on device: cuda:0 for 16777216 bytes

This happens as soon as the number of cloned environments (and therefore cameras) is 10 or larger.
If I read the above error message correctly, I should still have plenty of capacity on my GPU, but it's not marked as "free" - how does this happen?

Also, if my math is correct, a single image of an environment should not take up more than 16 MB of space (2048 * 2048 * 4 * 8 bits).

Workaround

I thought then that a possible workaround for this issue could be that I iteratively position the camera over all of the environments, take a picture and add it to a list. This works, but in practice i need to step the simulation a few times (3) every time i reposition the camera for it to "see" something. This time adds up very quickly: Let's say I have 1000 envs and I'm running at 3 fps - it would take 15 minutes to loop over all of them.

Questions

  • Is there any way to make this more efficient?
  • Am I doing something wrong with the way I implement the parallel cameras?
  • Can I allocate more space on the GPU for the camera data?
  • Is my GPU just not powerful enough?

Was always, If I can provide more code/data/anything to help solve this, I'm more than happy to.

Example Code

My camera setup for the parallel run (simplified) is as follows:

sim_cfg = sim_utils.SimulationCfg(dt=0.01, use_gpu_pipeline=True, device="cuda:0")
sim = sim_utils.SimulationContext(sim_cfg)
...
camera_cfg = CameraCfg(prim_path="/World/envs/env_.*/Camera_RGB",
	update_period=0,
	height=2048,
	width=2048,
	data_types=["rgb"],
	spawn=sim_utils.PinholeCameraCfg(focal_length=24.0,
		focus_distance=400.0,
		horizontal_aperture=20.955,
		clipping_range=(0.1, 1.0e5)
    ),
 )

...
while simulation_app.is_running():
	...
	
	if count == 20:
		sim.pause()
		camera = Camera(cfg=camera_cfg)
		sim.play()
		for _ in range(2):
			sim.step()
		camera.update(sim_dt)
		camera_captures = camera.data.info[0]
		camera.__del__()
	

And my (simplified) setup for iterating through the environments:

sim_cfg = sim_utils.SimulationCfg(dt=0.01, use_gpu_pipeline=True, device="cuda:0")
sim = sim_utils.SimulationContext(sim_cfg)
...
camera_cfg = CameraCfg(prim_path="/Cameras/Camera_RGB",
	 update_period=0,
	height=2048,
	width=2048,
	data_types=["rgb"],
	spawn=sim_utils.PinholeCameraCfg(focal_length=24.0,
		focus_distance=400.0,
		horizontal_aperture=20.955,
		clipping_range=(0.1, 1.0e5)
      ),
)

...
while simulation_app.is_running():
	...
	
	if count == 20:
		sim.pause()
		camera = Camera(cfg=camera_cfg)
		sim.play()
		
		camera_captures = []
		for env_index in envs_to_capture:
			camera_pos = envs_positions[env_index, :] + camera_offset_pos
			camera_rot = camera_offset_rot
			camera.set_world_poses(positions=camera_pos.unsqueeze(0),
								   orientations=camera_rot.unsqueeze(0), 
								   convention="opengl")
			for _ in range(2):
				sim.step()
			camera.update(dt = 0.01)
			camera_captures.append(camera.data.info[0])
		camera.__del__()
	

System specs

  • Using the devel branch of Orbit
  • Commit: aaab27b
  • Isaac Sim Version: 2023.1.0-hotfix.1
  • OS: Ubuntu 22.04
  • GPU: RTX 3090
  • CUDA: 12.0
  • GPU Driver: 525.147.05
@XInyuSong000
Copy link

Main Challenge

I'm currently trying to setup an environment where there is a large number of cameras, all taking a single picture at a single point in time. When I initially tried to do that, my simulation crashed with the either one of the following error messages:

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 160.00 MiB (GPU 0; 23.68 GiB total capacity; 8.44 MiB already allocated; 57.12 MiB free; 22.00 MiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
RuntimeError: Array allocation failed on device: cuda:0 for 16777216 bytes

This happens as soon as the number of cloned environments (and therefore cameras) is 10 or larger. If I read the above error message correctly, I should still have plenty of capacity on my GPU, but it's not marked as "free" - how does this happen?

Also, if my math is correct, a single image of an environment should not take up more than 16 MB of space (2048 * 2048 * 4 * 8 bits).

Workaround

I thought then that a possible workaround for this issue could be that I iteratively position the camera over all of the environments, take a picture and add it to a list. This works, but in practice i need to step the simulation a few times (3) every time i reposition the camera for it to "see" something. This time adds up very quickly: Let's say I have 1000 envs and I'm running at 3 fps - it would take 15 minutes to loop over all of them.

Questions

  • Is there any way to make this more efficient?
  • Am I doing something wrong with the way I implement the parallel cameras?
  • Can I allocate more space on the GPU for the camera data?
  • Is my GPU just not powerful enough?

Was always, If I can provide more code/data/anything to help solve this, I'm more than happy to.

Example Code

My camera setup for the parallel run (simplified) is as follows:

sim_cfg = sim_utils.SimulationCfg(dt=0.01, use_gpu_pipeline=True, device="cuda:0")
sim = sim_utils.SimulationContext(sim_cfg)
...
camera_cfg = CameraCfg(prim_path="/World/envs/env_.*/Camera_RGB",
	update_period=0,
	height=2048,
	width=2048,
	data_types=["rgb"],
	spawn=sim_utils.PinholeCameraCfg(focal_length=24.0,
		focus_distance=400.0,
		horizontal_aperture=20.955,
		clipping_range=(0.1, 1.0e5)
    ),
 )

...
while simulation_app.is_running():
	...
	
	if count == 20:
		sim.pause()
		camera = Camera(cfg=camera_cfg)
		sim.play()
		for _ in range(2):
			sim.step()
		camera.update(sim_dt)
		camera_captures = camera.data.info[0]
		camera.__del__()
	

And my (simplified) setup for iterating through the environments:

sim_cfg = sim_utils.SimulationCfg(dt=0.01, use_gpu_pipeline=True, device="cuda:0")
sim = sim_utils.SimulationContext(sim_cfg)
...
camera_cfg = CameraCfg(prim_path="/Cameras/Camera_RGB",
	 update_period=0,
	height=2048,
	width=2048,
	data_types=["rgb"],
	spawn=sim_utils.PinholeCameraCfg(focal_length=24.0,
		focus_distance=400.0,
		horizontal_aperture=20.955,
		clipping_range=(0.1, 1.0e5)
      ),
)

...
while simulation_app.is_running():
	...
	
	if count == 20:
		sim.pause()
		camera = Camera(cfg=camera_cfg)
		sim.play()
		
		camera_captures = []
		for env_index in envs_to_capture:
			camera_pos = envs_positions[env_index, :] + camera_offset_pos
			camera_rot = camera_offset_rot
			camera.set_world_poses(positions=camera_pos.unsqueeze(0),
								   orientations=camera_rot.unsqueeze(0), 
								   convention="opengl")
			for _ in range(2):
				sim.step()
			camera.update(dt = 0.01)
			camera_captures.append(camera.data.info[0])
		camera.__del__()
	

System specs

  • Using the devel branch of Orbit
  • Commit: aaab27b
  • Isaac Sim Version: 2023.1.0-hotfix.1
  • OS: Ubuntu 22.04
  • GPU: RTX 3090
  • CUDA: 12.0
  • GPU Driver: 525.147.05

Hello, I am encountering a similar issue. I believe that initializing the camera alone consumes a substantial amount of GPU memory. I have conducted tests on this; one camera alone appears to use approximately 1 GB of GPU memory. Therefore, it seems impossible to run over 1000 environments when utilizing cameras.

@jnzhnd
Copy link
Author

jnzhnd commented Dec 15, 2023

Well at least I'm not the only one. That makes me wonder though where all that memory usage is coming from?

Also, if I do not call camera.update(), the simulation at least does not crash. Even though that does not help with the immediate issue, maybe it's an indication for where the issue is coming from.

@ninebestwon
Copy link

I had a similar issue and tried to solve it but failed for various reasons. However, looking at the ORBIT Documentation in the link, I expect that this problem will be solved if “Cameras (parallelized)” is officially updated in “July 2023”. It looks like there's a delay with Isaac-Sim's update.
https://isaac-orbit.github.io/orbit/source/refs/roadmap.html

@jnzhnd
Copy link
Author

jnzhnd commented Dec 19, 2023

Well, the current devel branch supports parallelized cameras, so I think that feature is implemented already. I think something is just causing the cameras to have a massive vRAM overhead and it's just not feasible right now to have more than (in my case) 9 of them. Would be great to have official confirmation of this though.

@Mayankm96
Copy link
Contributor

Hi everyone,

Thanks for bringing up this discussion. We are going to restart this investigation on multiple cameras again.

At least in our previous benchmarks, we could have two cameras per environment and go up to 8 environments at roughly 40FPS on an RTX 3060. Beyond that, the simulation crashes because of vRAM issues.

I think one of the main factors there was to use the app experience file similar to OIGE:

https://github.com/NVIDIA-Omniverse/OmniIsaacGymEnvs/blob/main/apps/omni.isaac.sim.python.gym.camera.kit

We are yet to incorporate this into Orbit, but it should be something that can be added. Similar to how we load different experience files in the workflows:

https://github.com/NVIDIA-Omniverse/Orbit/blob/devel/source/standalone/workflows/rsl_rl/train.py#L38-L41

On a side note, yes, the Camera sensor returns batched data. However, this doesn't mean the rendering is happening in parallel. Some work is going on to make this more efficient in Isaac Sim.

Of all of them, the most straightforward is to adapt Orbit's camera internals to optimize the todo note mentioned here:

https://github.com/NVIDIA-Omniverse/Orbit/blob/devel/source/extensions/omni.isaac.orbit/omni/isaac/orbit/sensors/camera/camera.py#L387-L388

This I think should be possible now in Isaac Sim 2023.1. OIGE does some version of it:

https://github.com/NVIDIA-Omniverse/OmniIsaacGymEnvs/blob/main/omniisaacgymenvs/tasks/cartpole_camera.py#L97-L107

I still have to get around trying this out, but due to my schedule, it has been difficult lately. I would really appreciate help on it if possible.

@Mayankm96 Mayankm96 added the question Further information is requested label Dec 19, 2023
@mikygit
Copy link

mikygit commented Feb 1, 2024

Hello, I am also facing this problem. Even 2 cameras per env is consuming a lot of mem making any visual RL very difficult if possible at all, even without rendering.
Is Nvidia working on it? Can we expect a fix in a very near future?
Side question: is anybody doing visual RL using isaac sim?

@glvov-bdai
Copy link
Collaborator

Resolved in 1.2 release: https://isaac-sim.github.io/IsaacLab/source/overview/reinforcement-learning/performance_benchmarks.html#benchmark-results

Camera benchmark tool for custom environments, I was able to train with 1024 low resolution cameras on an nvidia 3090 laptop gpu : #976

kellyguo11 pushed a commit to kellyguo11/IsaacLab-public that referenced this issue Jan 9, 2025
…im#173)

This change will allow generation of multiple Isaac Lab images based on
different versions of Isaac Sim images. Previously, only a single image
can be created.
kellyguo11 pushed a commit that referenced this issue Jan 30, 2025
This change will allow generation of multiple Isaac Lab images based on
different versions of Isaac Sim images. Previously, only a single image
can be created.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

6 participants