Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU memory leaked when destructing warp.Mesh #225

Closed
MaxWipfli opened this issue May 28, 2024 · 3 comments
Closed

GPU memory leaked when destructing warp.Mesh #225

MaxWipfli opened this issue May 28, 2024 · 3 comments

Comments

@MaxWipfli
Copy link

MaxWipfli commented May 28, 2024

We noticed that GPU memory usage increases when repeatedly creating (and destroying) warp.Mesh objects.

Minimal Example:

import warp as wp
import pynvml       # pip install pynvml

pynvml.nvmlInit()
handle = pynvml.nvmlDeviceGetHandleByIndex(0)
wp.init()

device = "cuda:0"
points = wp.array([[0, 0, 0], [1, 0, 0], [0, 1, 0]], dtype=wp.vec3, device=device)
indices=wp.array([0, 1, 2, 0, 1, 2, 0, 1, 2], dtype=wp.int32, device=device)

for i in range(10_000_000):
    if i % 100_000 == 0:
        gpu_ram_usage = pynvml.nvmlDeviceGetMemoryInfo(handle).used / 1024 ** 2
        print(f"iter = {i:8d}, VRAM usage = {gpu_ram_usage:.0f} MiB")
    mesh = wp.Mesh(points, indices)

Output:

   CUDA Toolkit 12.3, Driver 12.3
   Devices:
     "cpu"      : "x86_64"
     "cuda:0"   : "NVIDIA GeForce RTX 2080 SUPER" (8 GiB, sm_75, mempool enabled)
[...]
iter =     0k, VRAM usage = 521 MiB
iter =   100k, VRAM usage = 565 MiB
iter =   200k, VRAM usage = 629 MiB
[...]
iter =  1900k, VRAM usage = 1429 MiB

As can be seen easily, the GPU memory usage increases steadily, despite the created Mesh being destroyed immediately.

The has been tested on the lastest main commit (ebcc90d). There is no host memory leak when using device = "cpu", as far as we can tell.

@MaxWipfli
Copy link
Author

After an initial investigation, the problem seems to be the following:

  • When creating a mesh (in mesh_create_device), a BVH is created as follows:

    warp/warp/native/mesh.cu

    Lines 211 to 212 in ebcc90d

    uint64_t bvh_id = bvh_create_device(mesh.context, mesh.lowers, mesh.uppers, num_tris);
    wp::bvh_get_descriptor(bvh_id, mesh.bvh);
  • When destroying the mesh again (in mesh_destroy_device), the BVH is destroyed as follows:
    wp::bvh_destroy_device(mesh.bvh);

During creation, the following memory block is allocated on the device (in bvh_create_device):

wp::BVH* bvh_device = (wp::BVH*)alloc_device(WP_CURRENT_CONTEXT, sizeof(wp::BVH));

This allocation does not have a corresponding free_device() call and is thus leaked.

I am not well-versed enough with this code base to propose a nice fix. However, here is a "hacky" patch that resolves the problem: https://gist.github.com/MaxWipfli/3197354809752d377dd90bbd108e1992

@nvlukasz
Copy link
Contributor

Thanks @MaxWipfli, nice catch! Your fix is on the right track. I'll take a closer look and we'll get this leak patched up asap.

@nvlukasz
Copy link
Contributor

Fix is now in main

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants