Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory Leak in Compute Benchmark #674

Closed
cwfitzgerald opened this issue May 27, 2020 · 5 comments
Closed

Memory Leak in Compute Benchmark #674

cwfitzgerald opened this issue May 27, 2020 · 5 comments
Labels
help required We need community help to make this happen. type: bug Something isn't working

Comments

@cwfitzgerald
Copy link
Member

cwfitzgerald commented May 27, 2020

Description

I am writing a couple compute benchmarks with wgpu and I am leaking memory quickly enough that it aborts with out of memory within 4-5 seconds of running.

This only happens when the device and queue are persisted over the lifespan of the benchmark. When they are re-created every loop, there is no memory leak. This is obviously a heavy-handed way to stop a leak, and causes issues on nvidia, which obviously doesn't like making a new adapter/device 10k times a second.

I have only been able to reproduce this on intel/linux, but that may be because nvidia/windows has other issues that prevent me from getting this far.

Repro steps

Repo: https://github.com/cwfitzgerald/wgpu-heterogeneous-compute-benchmark/

Working commit: 60be3275984f16b50eb6f45f9c618d6e33e10e07
Leaking commit: 09ce658cd8e968d44efd511ff575ff6344d56b1b

cargo bench -- 'addition/gpu staging/100000'

Should work out of the box. This should crash within 5-10 seconds, depending on exact circumstances.

Expected vs observed behavior

Expected: no leaks
Observed: leaks 😄

Extra materials
Logging from wgpu and backtrace:

Benchmarking addition/gpu staging/100000: Collecting 10 samples in estimated 5.0141 s (2310 iterations)thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: AllocationError(OutOfMemory(Device))', src/libcore/result.rs:1188:5
stack backtrace:
   0:     0x55d24cc737c4 - backtrace::backtrace::libunwind::trace::heb43798aede8bd30
                               at /cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.40/src/backtrace/libunwind.rs:88
   1:     0x55d24cc737c4 - backtrace::backtrace::trace_unsynchronized::had2ba7dec4bd2732
                               at /cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.40/src/backtrace/mod.rs:66
   2:     0x55d24cc737c4 - std::sys_common::backtrace::_print_fmt::hda61f46e822731b2
                               at src/libstd/sys_common/backtrace.rs:84
   3:     0x55d24cc737c4 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::hfe37fa5de6572965
                               at src/libstd/sys_common/backtrace.rs:61
   4:     0x55d24cc9a18c - core::fmt::write::h74887d18db27282c
                               at src/libcore/fmt/mod.rs:1025
   5:     0x55d24cc6fd97 - std::io::Write::write_fmt::h6808f3d5eceed5e5
                               at src/libstd/io/mod.rs:1426
   6:     0x55d24cc7595e - std::sys_common::backtrace::_print::hcc0fd4b3552039ef
                               at src/libstd/sys_common/backtrace.rs:65
   7:     0x55d24cc7595e - std::sys_common::backtrace::print::h1c9c5c1c0505592d
                               at src/libstd/sys_common/backtrace.rs:50
   8:     0x55d24cc7595e - std::panicking::default_hook::{{closure}}::hefb6085c1ab83a59
                               at src/libstd/panicking.rs:193
   9:     0x55d24cc75651 - std::panicking::default_hook::h1b037d2bf0657ab3
                               at src/libstd/panicking.rs:210
  10:     0x55d24cc7603b - std::panicking::rust_panic_with_hook::h787d7f532b084b9a
                               at src/libstd/panicking.rs:471
  11:     0x55d24cc75bee - rust_begin_unwind
                               at src/libstd/panicking.rs:375
  12:     0x55d24cc968be - core::panicking::panic_fmt::h76b979c035808e69
                               at src/libcore/panicking.rs:84
  13:     0x55d24cc969b7 - core::result::unwrap_failed::hca6a012bfa3eb903
                               at src/libcore/result.rs:1188
  14:     0x55d24ca1a534 - core::result::Result<T,E>::unwrap::ha379519e517554fa
                               at /rustc/5e1a799842ba6ed4a57e91f7ab9435947482f7d8/src/libcore/result.rs:956
  15:     0x55d24ca1a534 - wgpu_core::device::Device<B>::create_buffer::h532ca422b2e696b3
                               at /home/connor/.cargo/git/checkouts/wgpu-53e70f8674b08dd4/4e1d760/wgpu-core/src/device/mod.rs:358
  16:     0x55d24c9f5fa4 - wgpu_core::device::<impl wgpu_core::hub::Global<G>>::device_create_buffer_mapped::hb2a72ad595888c08
                               at /home/connor/.cargo/git/checkouts/wgpu-53e70f8674b08dd4/4e1d760/wgpu-core/src/device/mod.rs:595
  17:     0x55d24c9f5fa4 - wgpu::backend::direct::<impl wgpu::Context for wgpu_core::hub::Global<wgpu_core::hub::IdentityManagerFactory>>::device_create_buffer_mapped::h8b372ec79f9de8d8
                               at /home/connor/.cargo/git/checkouts/wgpu-rs-56ea64d4f2a98dfe/d12d142/src/backend/direct.rs:507
  18:     0x55d24c9dfeec - wgpu::Device::create_buffer_mapped::hf57904837fae2a39
                               at /home/connor/.cargo/git/checkouts/wgpu-rs-56ea64d4f2a98dfe/d12d142/src/lib.rs:1087
  19:     0x55d24c9dfeec - wgpu::Device::create_buffer_with_data::h028bfa3f3cbc51fa
                               at /home/connor/.cargo/git/checkouts/wgpu-rs-56ea64d4f2a98dfe/d12d142/src/lib.rs:1099
  20:     0x55d24c9b04a2 - wgpu_heterogeneous_compute_benchmark::AutomatedBuffer::write_to_buffer::h0fb827ef9d9cedbf
                               at src/lib.rs:282
  21:     0x55d24c93e588 - <std::future::GenFuture<T> as core::future::future::Future>::poll::h91e7690e90620556
  22:     0x55d24c9782ee - std::thread::local::LocalKey<T>::with::hc4fafc845f2f548a
  23:     0x55d24c98a6c7 - scoped_tls_hkt::ScopedKey<T>::set::h93fd353cd9387449
  24:     0x55d24c939eee - smol::context::enter::h08ef4095d2fa1c67
  25:     0x55d24c98a5d5 - scoped_tls_hkt::ScopedKey<T>::set::h8f7feaba3213bb85
  26:     0x55d24c944ea9 - smol::work_stealing::Worker::enter::hb79394980338ed8d
  27:     0x55d24c95671f - smol::run::run::h8e18d287f93ea8d5
  28:     0x55d24c977e25 - std::thread::local::LocalKey<T>::with::h3752533907ce3c45
  29:     0x55d24c94622e - async_std::task::builder::Builder::blocking::h09b41ca39c3a4f81
  30:     0x55d24c96d083 - criterion::Bencher<M>::iter_custom::h55eb30a19d7afbdf
  31:     0x55d24c992ed7 - <core::iter::adapters::Map<I,F> as core::iter::traits::iterator::Iterator>::fold::h77000ed80379eab0
  32:     0x55d24c94b5b4 - criterion::routine::Routine::sample::hc24189b61c193317
  33:     0x55d24c981dde - criterion::analysis::common::hf168a60acc2b6d98
  34:     0x55d24c98ecbc - criterion::benchmark_group::BenchmarkGroup<M>::bench_with_input::h3ea46728e4186a42
  35:     0x55d24c961790 - addition::main::hf2fcc76aa22923a7
  36:     0x55d24c95c7d3 - std::rt::lang_start::{{closure}}::h4d84e6073a362e96
  37:     0x55d24cc75a83 - std::rt::lang_start_internal::{{closure}}::h0760fb8bd9f1a4c7
                               at src/libstd/rt.rs:52
  38:     0x55d24cc75a83 - std::panicking::try::do_call::hccaa7cebf2335ab2
                               at src/libstd/panicking.rs:292
  39:     0x55d24cc7e6ea - __rust_maybe_catch_panic
                               at src/libpanic_unwind/lib.rs:78
  40:     0x55d24cc76590 - std::panicking::try::h3ce8e2e4440720f0
                               at src/libstd/panicking.rs:270
  41:     0x55d24cc76590 - std::panic::catch_unwind::h2a767bac361346af
                               at src/libstd/panic.rs:394
  42:     0x55d24cc76590 - std::rt::lang_start_internal::h14e7168ba039f170
                               at src/libstd/rt.rs:51
  43:     0x55d24c961922 - main
  44:     0x7f5ac6ea31e3 - __libc_start_main
  45:     0x55d24c9241be - _start
  46:                0x0 - <unknown>

[2020-05-27T04:19:28Z ERROR gfx_memory::heaps] Heaps still have 2 types live on drop
[2020-05-27T04:19:28Z ERROR gfx_memory::allocator::general] Memory leak: SizeEntry(400128) is still used
[2020-05-27T04:19:28Z ERROR gfx_memory::allocator::general] Memory leak: SizeEntry(256) is still used
[2020-05-27T04:19:28Z ERROR gfx_memory::allocator::general] Memory leak: SizeEntry(2048) is still used
[2020-05-27T04:19:28Z ERROR gfx_memory::allocator::general] Memory leak: SizeEntry(16384) is still used
[2020-05-27T04:19:28Z ERROR gfx_memory::allocator::general] Memory leak: SizeEntry(131072) is still used
[2020-05-27T04:19:28Z ERROR gfx_descriptor::allocator] DescriptorAllocator is dropped
error: bench failed

Info logging (7mb): https://send.firefox.com/download/9e4a921bd2dc7849/#mC2AyYMgPXxYcCutPl3dnA
Trace logging (700mb!): https://send.firefox.com/download/f9d087c4b8a5d796/#3HJq8OJBws9pn6ftEAjdHA

I feel like debug/trace logging might give some interesting information.

Platform
wgpu-rs master branch

I've only tested the vulkan backend.

@cwfitzgerald cwfitzgerald added the type: bug Something isn't working label May 27, 2020
@cwfitzgerald
Copy link
Member Author

cwfitzgerald commented May 27, 2020

So I have narrowed this down to the code which maps the buffer for readback of the result.

Checkout the branch wgpu-bug-674 and the bug will still happen. Remove these lines and the bug goes away: https://github.com/cwfitzgerald/wgpu-heterogeneous-compute-benchmark/blob/wgpu-bug-674/src/lib.rs#L466-L470.

The code is a bit hard to navigate because of my buffer abstraction, so a few key points:

My initial hunch is that for some reason the buffer isn't getting unmapped properly, so it can't be freed.

I am by no means skilled at reading wgpu logs, but the trace logs seem to indicate something similar to what I suspected. Buffer (5, 1, Vulkan) gets dropped while submission is underway, it's marked Active so memory isn't reclaimed. When it's marked as idle, it doesn't appear to ever get its underlying memory dropped.

I have tried to reproduce it with just buffer copies, but was unsuccessful, meaning this is probably the smallest reproducible unit.

@cwfitzgerald
Copy link
Member Author

In trying to get more information about another issue, I have found that this memory leak still exists on nvidia/windows. The TrackerSets just keep growing and growing seemly without bound when the work I'm doing should have it be very finite. I do wonder if my issue on nvidia is actually the same bug as this one, just manifesting differently. I will file an issue about that for record keeping's sake in case it's different and to give more information.

@cwfitzgerald
Copy link
Member Author

cwfitzgerald commented May 28, 2020

I have been able to reproduce this bug on all platforms that I've tried, so hopefully reproducing it locally should be easy:

  • intel/linux/vulkan
  • intel/windows/vulkan*
  • nvidia/windows/vulkan*
  • intel/windows/DX12
  • nvidia/windows/DX12

* stalls due to #677, but growing tracker sets can be observed in debug logs.

@yoonsikp
Copy link

I'm experiencing this leak on macOS as well

@kvark kvark added the help required We need community help to make this happen. label Jan 18, 2021
@cwfitzgerald
Copy link
Member Author

Out of date

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help required We need community help to make this happen. type: bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants