-
Notifications
You must be signed in to change notification settings - Fork 953
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Buffers Leaked when Device Handle is Dropped Before Buffers #5529
Comments
After reading through #3056, I tried setting
But it looks like similar messages are printed with wgpu 0.18.0 (when the program doesn't crash), so I'm not sure whether this is significant. |
I was able to simplify the repro a lot more. It turns out that the png export workflow isn't a relevant factor, and the crash can be reproduced by repeatedly allocating a vertex buffer. For example: use wgpu::util::DeviceExt;
pub async fn run() {
let instance = wgpu::Instance::new(wgpu::InstanceDescriptor {
backends: wgpu::Backends::METAL,
..Default::default()
});
let adapter = instance
.request_adapter(&wgpu::RequestAdapterOptions {
power_preference: wgpu::PowerPreference::default(),
compatible_surface: None,
force_fallback_adapter: false,
})
.await
.unwrap();
// // wgpu 0.18
// let device_descriptor = wgpu::DeviceDescriptor {
// label: None,
// features: wgpu::Features::empty(),
// limits: wgpu::Limits::default(),
// };
// wgpu 0.19.3
let device_descriptor = wgpu::DeviceDescriptor {
label: None,
required_features: wgpu::Features::empty(),
required_limits: wgpu::Limits::default(),
};
let (device, _) = adapter
.request_device(&device_descriptor, None)
.await
.unwrap();
// Allocate a vertex buffer that should be dropped immediately.
// Without this allocation, there is no leak/crash.
let _ = device.create_buffer_init(&wgpu::util::BufferInitDescriptor {
label: Some("Vertex Buffer"),
contents: bytemuck::cast_slice(&[0; 1024]),
usage: wgpu::BufferUsages::VERTEX,
});
}
fn main() {
for i in 0..1000 {
if i % 10 == 0 {
println!("{i}");
}
pollster::block_on(run());
}
}
|
Does the leak happen if you wrap the entire thing in an autorelease pool? I guess you might be noticing this because it's running headless and nothing provides a global autorelease pool (vs. a windowed application that would have an autorelease pool automatically added by the windowing library). |
Thanks for the suggestion @grovesNL I'm getting the same result when I update the above like this: objc::rc::autoreleasepool(|| {
pollster::block_on(run());
}) or if I wrap the objc::rc::autoreleasepool(|| {
let _ = device.create_buffer_init(&wgpu::util::BufferInitDescriptor {
label: Some("Vertex Buffer"),
contents: bytemuck::cast_slice(&[0; 1024]),
usage: wgpu::BufferUsages::VERTEX,
});
}) Let me know if that's not what you had in mind. Thanks again! |
Debugging a bit. I see that when the buffer is created this wgpu/wgpu-hal/src/metal/device.rs Line 285 in d30255f
But I haven't been able to figure out whether there is metal-specific cleanup logic that is supposed to be called when a buffer is dropped, or if the autoreleasepool in the |
Yeah that seems right. We basically need an autorelease pool to be active whenever types like Metal buffers are created/cloned ( I haven't worked with the Metal backend for a while, but maybe something else (internally to wgpu) holds a strong reference to the buffer you're creating? The Xcode debugger might be able to show you if anything's referencing it. |
Thanks @grovesNL, that's helpful. I started playing with Xcode Instruments to analyze leaks. It's a little overwhelming. For the commit before #3626, no leaks are detected (though memory usage does climb a little over time). But for #3626 and beyond, there are dozens of leaks reported and memory usage increases dramatically over time until the program crashes. When I have more time I'll try to look at some of these stack traces more carefully and see if I can make more sense of it. |
I found something potentially suspicious. In my repro, each iteration results in a device being created using the wgpu/wgpu-core/src/device/resource.rs Line 211 in ca729cc
But the corresponding wgpu/wgpu-core/src/device/resource.rs Lines 162 to 176 in ca729cc
I think the leak of buffers may be associated with the fact that the device's buffers aren't getting cleaned up when the top-level handle to the device is dropped. This would suggest that this might not be specifically a metal issue, and perhaps I'm hitting it more than most because I end up creating and dropping devices repeatedly. |
The ownership is actually the other way around, buffers have owning refs on the device - the buffer leak is probably causing the device to stay alive. See #5120 for the ideal ownership diagram. |
Thanks @cwfitzgerald, that's helpful. I think I'm honing in on a circular reference. In the repro above, the call to wgpu/wgpu-core/src/resource.rs Line 435 in 54740d5
This performs a match on the current value of wgpu/wgpu-core/src/resource.rs Lines 446 to 450 in 54740d5
The second to last line of this branch calls wgpu/wgpu-core/src/resource.rs Line 503 in 54740d5
The stage buffer is an instance of this wgpu/wgpu-core/src/resource.rs Lines 384 to 387 in 54740d5
This wgpu/wgpu-core/src/device/queue.rs Lines 197 to 207 in 54740d5
This Vec is never cleared because in this repro I'm not submitting anything to the queue, and the wgpu/wgpu-core/src/device/queue.rs Lines 227 to 238 in 54740d5
because the wgpu/wgpu-core/src/device/resource.rs Lines 162 to 167 in 54740d5
If I add let commands: Vec<CommandBuffer> = Vec::new();
queue.submit(commands.into_iter()); to the end of the repro then the device does get dropped successfully, and there is no leak/crash. Unfortunately, in the original repro (and my real project) I am submitting work the the queue, so this isn't quite what's going on there. |
Ok, I've made a bit more progress. It looks like order in which wgpu objects are dropped determines whether there is a leak or not. Here's another example where I've flattened everything into a single render function (rather than storing the device and other resources in the use std::iter;
use wgpu::util::DeviceExt;
use wgpu::{
CommandBuffer, CommandEncoderDescriptor, Extent3d, ImageCopyBuffer, ImageCopyTexture,
ImageDataLayout, Origin3d, Texture, TextureAspect, TextureDescriptor, TextureDimension,
TextureFormat, TextureUsages, TextureView,
};
#[repr(C)]
#[derive(Copy, Clone, Debug, bytemuck::Pod, bytemuck::Zeroable)]
struct Vertex {
position: [f32; 3],
color: [f32; 3],
}
impl Vertex {
fn desc() -> wgpu::VertexBufferLayout<'static> {
wgpu::VertexBufferLayout {
array_stride: std::mem::size_of::<Vertex>() as wgpu::BufferAddress,
step_mode: wgpu::VertexStepMode::Vertex,
attributes: &[
wgpu::VertexAttribute {
offset: 0,
shader_location: 0,
format: wgpu::VertexFormat::Float32x3,
},
wgpu::VertexAttribute {
offset: std::mem::size_of::<[f32; 3]>() as wgpu::BufferAddress,
shader_location: 1,
format: wgpu::VertexFormat::Float32x3,
},
],
}
}
}
const VERTICES: &[Vertex] = &[
Vertex {
position: [-0.0868241, 0.49240386, 0.0],
color: [0.5, 0.0, 0.5],
}, // A
Vertex {
position: [-0.49513406, 0.06958647, 0.0],
color: [0.5, 0.0, 0.5],
}, // B
Vertex {
position: [-0.21918549, -0.44939706, 0.0],
color: [0.5, 0.0, 0.5],
}, // C
Vertex {
position: [0.35966998, -0.3473291, 0.0],
color: [0.5, 0.0, 0.5],
}, // D
Vertex {
position: [0.44147372, 0.2347359, 0.0],
color: [0.5, 0.0, 0.5],
}, // E
];
const INDICES: &[u16] = &[0, 1, 4, 1, 2, 4, 2, 3, 4, /* padding */ 0];
async fn render() -> Result<(), wgpu::SurfaceError> {
// The instance is a handle to our GPU
let instance = wgpu::Instance::new(wgpu::InstanceDescriptor {
backends: wgpu::Backends::METAL,
..Default::default()
});
let adapter = instance
.request_adapter(&wgpu::RequestAdapterOptions {
power_preference: wgpu::PowerPreference::default(),
compatible_surface: None,
force_fallback_adapter: false,
})
.await
.unwrap();
let device_descriptor = wgpu::DeviceDescriptor {
label: None,
required_features: wgpu::Features::empty(),
required_limits: wgpu::Limits::default(),
};
let (device, queue) = adapter
.request_device(&device_descriptor, None)
.await
.unwrap();
// The instance is a handle to our GPU
let instance = wgpu::Instance::new(wgpu::InstanceDescriptor {
backends: wgpu::Backends::METAL,
..Default::default()
});
let adapter = instance
.request_adapter(&wgpu::RequestAdapterOptions {
power_preference: wgpu::PowerPreference::default(),
compatible_surface: None,
force_fallback_adapter: false,
})
.await
.unwrap();
// wgpu 0.19.3
let device_descriptor = wgpu::DeviceDescriptor {
label: None,
required_features: wgpu::Features::empty(),
required_limits: wgpu::Limits::default(),
};
let (device, queue) = adapter
.request_device(&device_descriptor, None)
.await
.unwrap();
let texture_format = TextureFormat::Rgba8Unorm;
let size = 256u32;
let texture_desc = TextureDescriptor {
size: Extent3d {
width: size,
height: size,
depth_or_array_layers: 1,
},
mip_level_count: 1,
sample_count: 1,
dimension: TextureDimension::D2,
format: texture_format,
usage: TextureUsages::COPY_SRC | TextureUsages::RENDER_ATTACHMENT,
label: None,
view_formats: &[texture_format],
};
let texture = device.create_texture(&texture_desc);
let texture_view = texture.create_view(&Default::default());
let shader = device.create_shader_module(wgpu::ShaderModuleDescriptor {
label: Some("Shader"),
source: wgpu::ShaderSource::Wgsl(include_str!("shader.wgsl").into()),
});
let render_pipeline_layout = device.create_pipeline_layout(&wgpu::PipelineLayoutDescriptor {
label: Some("Render Pipeline Layout"),
bind_group_layouts: &[],
push_constant_ranges: &[],
});
let render_pipeline = device.create_render_pipeline(&wgpu::RenderPipelineDescriptor {
label: Some("Render Pipeline"),
layout: Some(&render_pipeline_layout),
vertex: wgpu::VertexState {
module: &shader,
entry_point: "vs_main",
buffers: &[Vertex::desc()],
},
fragment: Some(wgpu::FragmentState {
module: &shader,
entry_point: "fs_main",
targets: &[Some(wgpu::ColorTargetState {
format: texture_format,
blend: Some(wgpu::BlendState {
color: wgpu::BlendComponent::REPLACE,
alpha: wgpu::BlendComponent::REPLACE,
}),
write_mask: wgpu::ColorWrites::ALL,
})],
}),
primitive: wgpu::PrimitiveState {
topology: wgpu::PrimitiveTopology::TriangleList,
strip_index_format: None,
front_face: wgpu::FrontFace::Ccw,
cull_mode: Some(wgpu::Face::Back),
polygon_mode: wgpu::PolygonMode::Fill,
unclipped_depth: false,
conservative: false,
},
depth_stencil: None,
multisample: wgpu::MultisampleState {
count: 1,
mask: !0,
alpha_to_coverage_enabled: false,
},
multiview: None,
});
let vertex_buffer = device.create_buffer_init(&wgpu::util::BufferInitDescriptor {
label: Some("Vertex Buffer"),
contents: bytemuck::cast_slice(VERTICES),
usage: wgpu::BufferUsages::VERTEX,
});
let index_buffer = device.create_buffer_init(&wgpu::util::BufferInitDescriptor {
label: Some("Index Buffer"),
contents: bytemuck::cast_slice(INDICES),
usage: wgpu::BufferUsages::INDEX,
});
let num_indices = INDICES.len() as u32;
let mut encoder = device.create_command_encoder(&wgpu::CommandEncoderDescriptor {
label: Some("Render Encoder"),
});
{
let mut render_pass = encoder.begin_render_pass(&wgpu::RenderPassDescriptor {
label: Some("Render Pass"),
color_attachments: &[Some(wgpu::RenderPassColorAttachment {
view: &texture_view,
resolve_target: None,
ops: wgpu::Operations {
load: wgpu::LoadOp::Clear(wgpu::Color {
r: 0.1,
g: 0.2,
b: 0.3,
a: 1.0,
}),
store: wgpu::StoreOp::Store,
},
})],
depth_stencil_attachment: None,
occlusion_query_set: None,
timestamp_writes: None,
});
render_pass.set_pipeline(&render_pipeline);
render_pass.set_vertex_buffer(0, vertex_buffer.slice(..));
render_pass.set_index_buffer(index_buffer.slice(..), wgpu::IndexFormat::Uint16);
render_pass.draw_indexed(0..num_indices, 0, 0..1);
}
queue.submit(iter::once(encoder.finish()));
// // Dropping device first triggers a memory leak
// drop(device);
Ok(())
}
pub async fn run() {
render().await.unwrap();
}
fn main() {
for i in 0..1000 {
if i % 10 == 0 {
println!("{i}");
}
pollster::block_on(run());
}
} This works without a leak/crash. But if you uncomment the struct State {
size: u32,
vertex_buffer: wgpu::Buffer,
index_buffer: wgpu::Buffer,
num_indices: u32,
texture_size: Extent3d,
render_pipeline: wgpu::RenderPipeline,
texture_view: TextureView,
texture: Texture,
queue: wgpu::Queue,
device: wgpu::Device,
} Figuring out that this is how things currently work is enough to unblock me, but I am wondering if this should be considered a bug or just something that should be documented (and apologies if it is and I didn't come across it). |
Oh yes, that's definitely a bug. Well done tracking it down! |
Sorry for not being very reactive or present in this period :( |
Confirmed on Windows dx12, and btw, it has a great impact under dx12. I have a scenario where I need to create multiple windows in a process, each using a separate instance of wgpu. These windows will be created and destroyed dynamically. Having just read the discussion here, I've tried to change the way I destroy my “window and wgpu instance struct”, let's say called “instance”. From Problem solved perfectly!!!!1 So I guess that's a temporary workaround solution as well. |
|
This has been fixed by 4255268 (#5910).
|
@teoxoy you're doing an amazing work! Tnx a lot!!! |
Description
After repeated headless rendering with the metal backend, wgpu 0.19 crashes with a "Context leak detected, msgtracer returned -1" error where version 0.18.0 works without a problem.
My workflow is to perform headless rendering to PNG images following the approach in the Wgpu without a window learn-wgpu page.
Repro steps
I've created a repro repository in https://github.com/jonmmease/wgpu-memory-repro that reliable reproduces the crash on my M1 macbook pro. See the README for more details about the repro and full system details.
Expected vs observed behavior
In wgpu 0.18.0, everything works well. in 0.19 (I tried 0.19.0 through 0.19.3) I get this crash after ~100 iterations.
Extra materials
I went through the Metal section of https://github.com/gfx-rs/wgpu/wiki/Debugging-wgpu-Applications, creating the XML file and updating the executable with the
codesign
command. Then I ran the executable withMETAL_DEVICE_WRAPPER_TYPE=1 ./target/debug/wgpu-memory-repro
. The console printed outwgpu-memory-repro[4512:4643030] Metal API Validation Enabled
, but no additional information was displayed. Let me know if that's not the correct process and there's anything more I can do to get more info.Thanks a lot!
Platform
The text was updated successfully, but these errors were encountered: