Memory Leak in Compute Benchmark #674

cwfitzgerald · 2020-05-27T04:26:56Z

Description

I am writing a couple compute benchmarks with wgpu and I am leaking memory quickly enough that it aborts with out of memory within 4-5 seconds of running.

This only happens when the device and queue are persisted over the lifespan of the benchmark. When they are re-created every loop, there is no memory leak. This is obviously a heavy-handed way to stop a leak, and causes issues on nvidia, which obviously doesn't like making a new adapter/device 10k times a second.

I have only been able to reproduce this on intel/linux, but that may be because nvidia/windows has other issues that prevent me from getting this far.

Repro steps

Repo: https://github.com/cwfitzgerald/wgpu-heterogeneous-compute-benchmark/

Working commit: 60be3275984f16b50eb6f45f9c618d6e33e10e07
Leaking commit: 09ce658cd8e968d44efd511ff575ff6344d56b1b

cargo bench -- 'addition/gpu staging/100000'

Should work out of the box. This should crash within 5-10 seconds, depending on exact circumstances.

Expected vs observed behavior

Expected: no leaks
Observed: leaks 😄

Extra materials
Logging from wgpu and backtrace:

Benchmarking addition/gpu staging/100000: Collecting 10 samples in estimated 5.0141 s (2310 iterations)thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: AllocationError(OutOfMemory(Device))', src/libcore/result.rs:1188:5
stack backtrace:
   0:     0x55d24cc737c4 - backtrace::backtrace::libunwind::trace::heb43798aede8bd30
                               at /cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.40/src/backtrace/libunwind.rs:88
   1:     0x55d24cc737c4 - backtrace::backtrace::trace_unsynchronized::had2ba7dec4bd2732
                               at /cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.40/src/backtrace/mod.rs:66
   2:     0x55d24cc737c4 - std::sys_common::backtrace::_print_fmt::hda61f46e822731b2
                               at src/libstd/sys_common/backtrace.rs:84
   3:     0x55d24cc737c4 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::hfe37fa5de6572965
                               at src/libstd/sys_common/backtrace.rs:61
   4:     0x55d24cc9a18c - core::fmt::write::h74887d18db27282c
                               at src/libcore/fmt/mod.rs:1025
   5:     0x55d24cc6fd97 - std::io::Write::write_fmt::h6808f3d5eceed5e5
                               at src/libstd/io/mod.rs:1426
   6:     0x55d24cc7595e - std::sys_common::backtrace::_print::hcc0fd4b3552039ef
                               at src/libstd/sys_common/backtrace.rs:65
   7:     0x55d24cc7595e - std::sys_common::backtrace::print::h1c9c5c1c0505592d
                               at src/libstd/sys_common/backtrace.rs:50
   8:     0x55d24cc7595e - std::panicking::default_hook::{{closure}}::hefb6085c1ab83a59
                               at src/libstd/panicking.rs:193
   9:     0x55d24cc75651 - std::panicking::default_hook::h1b037d2bf0657ab3
                               at src/libstd/panicking.rs:210
  10:     0x55d24cc7603b - std::panicking::rust_panic_with_hook::h787d7f532b084b9a
                               at src/libstd/panicking.rs:471
  11:     0x55d24cc75bee - rust_begin_unwind
                               at src/libstd/panicking.rs:375
  12:     0x55d24cc968be - core::panicking::panic_fmt::h76b979c035808e69
                               at src/libcore/panicking.rs:84
  13:     0x55d24cc969b7 - core::result::unwrap_failed::hca6a012bfa3eb903
                               at src/libcore/result.rs:1188
  14:     0x55d24ca1a534 - core::result::Result<T,E>::unwrap::ha379519e517554fa
                               at /rustc/5e1a799842ba6ed4a57e91f7ab9435947482f7d8/src/libcore/result.rs:956
  15:     0x55d24ca1a534 - wgpu_core::device::Device<B>::create_buffer::h532ca422b2e696b3
                               at /home/connor/.cargo/git/checkouts/wgpu-53e70f8674b08dd4/4e1d760/wgpu-core/src/device/mod.rs:358
  16:     0x55d24c9f5fa4 - wgpu_core::device::<impl wgpu_core::hub::Global<G>>::device_create_buffer_mapped::hb2a72ad595888c08
                               at /home/connor/.cargo/git/checkouts/wgpu-53e70f8674b08dd4/4e1d760/wgpu-core/src/device/mod.rs:595
  17:     0x55d24c9f5fa4 - wgpu::backend::direct::<impl wgpu::Context for wgpu_core::hub::Global<wgpu_core::hub::IdentityManagerFactory>>::device_create_buffer_mapped::h8b372ec79f9de8d8
                               at /home/connor/.cargo/git/checkouts/wgpu-rs-56ea64d4f2a98dfe/d12d142/src/backend/direct.rs:507
  18:     0x55d24c9dfeec - wgpu::Device::create_buffer_mapped::hf57904837fae2a39
                               at /home/connor/.cargo/git/checkouts/wgpu-rs-56ea64d4f2a98dfe/d12d142/src/lib.rs:1087
  19:     0x55d24c9dfeec - wgpu::Device::create_buffer_with_data::h028bfa3f3cbc51fa
                               at /home/connor/.cargo/git/checkouts/wgpu-rs-56ea64d4f2a98dfe/d12d142/src/lib.rs:1099
  20:     0x55d24c9b04a2 - wgpu_heterogeneous_compute_benchmark::AutomatedBuffer::write_to_buffer::h0fb827ef9d9cedbf
                               at src/lib.rs:282
  21:     0x55d24c93e588 - <std::future::GenFuture<T> as core::future::future::Future>::poll::h91e7690e90620556
  22:     0x55d24c9782ee - std::thread::local::LocalKey<T>::with::hc4fafc845f2f548a
  23:     0x55d24c98a6c7 - scoped_tls_hkt::ScopedKey<T>::set::h93fd353cd9387449
  24:     0x55d24c939eee - smol::context::enter::h08ef4095d2fa1c67
  25:     0x55d24c98a5d5 - scoped_tls_hkt::ScopedKey<T>::set::h8f7feaba3213bb85
  26:     0x55d24c944ea9 - smol::work_stealing::Worker::enter::hb79394980338ed8d
  27:     0x55d24c95671f - smol::run::run::h8e18d287f93ea8d5
  28:     0x55d24c977e25 - std::thread::local::LocalKey<T>::with::h3752533907ce3c45
  29:     0x55d24c94622e - async_std::task::builder::Builder::blocking::h09b41ca39c3a4f81
  30:     0x55d24c96d083 - criterion::Bencher<M>::iter_custom::h55eb30a19d7afbdf
  31:     0x55d24c992ed7 - <core::iter::adapters::Map<I,F> as core::iter::traits::iterator::Iterator>::fold::h77000ed80379eab0
  32:     0x55d24c94b5b4 - criterion::routine::Routine::sample::hc24189b61c193317
  33:     0x55d24c981dde - criterion::analysis::common::hf168a60acc2b6d98
  34:     0x55d24c98ecbc - criterion::benchmark_group::BenchmarkGroup<M>::bench_with_input::h3ea46728e4186a42
  35:     0x55d24c961790 - addition::main::hf2fcc76aa22923a7
  36:     0x55d24c95c7d3 - std::rt::lang_start::{{closure}}::h4d84e6073a362e96
  37:     0x55d24cc75a83 - std::rt::lang_start_internal::{{closure}}::h0760fb8bd9f1a4c7
                               at src/libstd/rt.rs:52
  38:     0x55d24cc75a83 - std::panicking::try::do_call::hccaa7cebf2335ab2
                               at src/libstd/panicking.rs:292
  39:     0x55d24cc7e6ea - __rust_maybe_catch_panic
                               at src/libpanic_unwind/lib.rs:78
  40:     0x55d24cc76590 - std::panicking::try::h3ce8e2e4440720f0
                               at src/libstd/panicking.rs:270
  41:     0x55d24cc76590 - std::panic::catch_unwind::h2a767bac361346af
                               at src/libstd/panic.rs:394
  42:     0x55d24cc76590 - std::rt::lang_start_internal::h14e7168ba039f170
                               at src/libstd/rt.rs:51
  43:     0x55d24c961922 - main
  44:     0x7f5ac6ea31e3 - __libc_start_main
  45:     0x55d24c9241be - _start
  46:                0x0 - <unknown>

[2020-05-27T04:19:28Z ERROR gfx_memory::heaps] Heaps still have 2 types live on drop
[2020-05-27T04:19:28Z ERROR gfx_memory::allocator::general] Memory leak: SizeEntry(400128) is still used
[2020-05-27T04:19:28Z ERROR gfx_memory::allocator::general] Memory leak: SizeEntry(256) is still used
[2020-05-27T04:19:28Z ERROR gfx_memory::allocator::general] Memory leak: SizeEntry(2048) is still used
[2020-05-27T04:19:28Z ERROR gfx_memory::allocator::general] Memory leak: SizeEntry(16384) is still used
[2020-05-27T04:19:28Z ERROR gfx_memory::allocator::general] Memory leak: SizeEntry(131072) is still used
[2020-05-27T04:19:28Z ERROR gfx_descriptor::allocator] DescriptorAllocator is dropped
error: bench failed

Info logging (7mb): https://send.firefox.com/download/9e4a921bd2dc7849/#mC2AyYMgPXxYcCutPl3dnA
Trace logging (700mb!): https://send.firefox.com/download/f9d087c4b8a5d796/#3HJq8OJBws9pn6ftEAjdHA

I feel like debug/trace logging might give some interesting information.

Platform
wgpu-rs master branch

I've only tested the vulkan backend.

The text was updated successfully, but these errors were encountered:

cwfitzgerald · 2020-05-27T05:01:52Z

So I have narrowed this down to the code which maps the buffer for readback of the result.

Checkout the branch wgpu-bug-674 and the bug will still happen. Remove these lines and the bug goes away: https://github.com/cwfitzgerald/wgpu-heterogeneous-compute-benchmark/blob/wgpu-bug-674/src/lib.rs#L466-L470.

The code is a bit hard to navigate because of my buffer abstraction, so a few key points:

If you see any distinction between mapping and buffered, assume it will take the buffered branch. This bug doesn't occur in the mapping branch.
This is where the mapping double future is created/set up: https://github.com/cwfitzgerald/wgpu-heterogeneous-compute-benchmark/blob/wgpu-bug-674/src/lib.rs#L206-L231.
Here is the actual definition of the async blocks: https://github.com/cwfitzgerald/wgpu-heterogeneous-compute-benchmark/blob/wgpu-bug-674/src/lib.rs#L163-L178
Wrapper type for helping with creating the mappings: https://github.com/cwfitzgerald/wgpu-heterogeneous-compute-benchmark/blob/wgpu-bug-674/src/lib.rs#L98-L115
There's a double await because I can only actually map the staging buffer after I submit it to the queue. This first await gives me that opportunity. The second await is actually resolving the buffer with help of device.poll in the caller.

My initial hunch is that for some reason the buffer isn't getting unmapped properly, so it can't be freed.

I am by no means skilled at reading wgpu logs, but the trace logs seem to indicate something similar to what I suspected. Buffer (5, 1, Vulkan) gets dropped while submission is underway, it's marked Active so memory isn't reclaimed. When it's marked as idle, it doesn't appear to ever get its underlying memory dropped.

I have tried to reproduce it with just buffer copies, but was unsuccessful, meaning this is probably the smallest reproducible unit.

cwfitzgerald · 2020-05-28T03:34:56Z

In trying to get more information about another issue, I have found that this memory leak still exists on nvidia/windows. The TrackerSets just keep growing and growing seemly without bound when the work I'm doing should have it be very finite. I do wonder if my issue on nvidia is actually the same bug as this one, just manifesting differently. I will file an issue about that for record keeping's sake in case it's different and to give more information.

cwfitzgerald · 2020-05-28T04:36:33Z

I have been able to reproduce this bug on all platforms that I've tried, so hopefully reproducing it locally should be easy:

intel/linux/vulkan
intel/windows/vulkan*
nvidia/windows/vulkan*
intel/windows/DX12
nvidia/windows/DX12

* stalls due to #677, but growing tracker sets can be observed in debug logs.

yoonsikp · 2021-01-18T19:21:31Z

I'm experiencing this leak on macOS as well

cwfitzgerald · 2021-09-24T18:11:50Z

Out of date

cwfitzgerald added the type: bug Something isn't working label May 27, 2020

yoonsikp mentioned this issue Jan 18, 2021

Compute Shader Crash on Linux and macOS pygfx/wgpu-py#129

Closed

kvark added the help required We need community help to make this happen. label Jan 18, 2021

cwfitzgerald closed this as completed Sep 24, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory Leak in Compute Benchmark #674

Memory Leak in Compute Benchmark #674

cwfitzgerald commented May 27, 2020 •

edited

Loading

cwfitzgerald commented May 27, 2020 •

edited

Loading

cwfitzgerald commented May 28, 2020

cwfitzgerald commented May 28, 2020 •

edited

Loading

yoonsikp commented Jan 18, 2021

cwfitzgerald commented Sep 24, 2021

Memory Leak in Compute Benchmark #674

Memory Leak in Compute Benchmark #674

Comments

cwfitzgerald commented May 27, 2020 • edited Loading

cwfitzgerald commented May 27, 2020 • edited Loading

cwfitzgerald commented May 28, 2020

cwfitzgerald commented May 28, 2020 • edited Loading

yoonsikp commented Jan 18, 2021

cwfitzgerald commented Sep 24, 2021

cwfitzgerald commented May 27, 2020 •

edited

Loading

cwfitzgerald commented May 27, 2020 •

edited

Loading

cwfitzgerald commented May 28, 2020 •

edited

Loading