-
Notifications
You must be signed in to change notification settings - Fork 953
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Access violation when enumerating in an DX12 instance (sync/race issue?) #3485
Comments
Okay this is a weird one from multiple angles. On one hand, D3D is supposed to be fully thread safe, on the other, devices are supposed to be singletons, so we should get the same device both times. My best guess, outside of a driver bug, is maybe we're releasing some COM object too many times, causing some nonsense in the driver. What other (brands of) hardware have you been able to reproduce this on? |
Interestingly, the issue in-the-wild (with the Chromium/Electron renderer being the "other" initialized renderer in the process) seems to predominantly affect people with an NVIDIA GPU.
Also seen across different Windows 10 and Windows 11 versions (the application is Windows-only), and across both Intel and AMD CPUs, of pretty much all recent generations. Another very peculiar thing is that at the same time, we've also a whole host of other crashes (this time much more spread; across AMD GPUs as well), all in The exact code in production to initialize wgpu and request an adapter looked like this - really nothing special: let instance = wgpu::Instance::new(wgpu::InstanceDescriptor {
backends: wgpu::Backends::DX12,
dx12_shader_compiler: Default::default(),
});
let surface = window.and_then(|window| unsafe { instance.create_surface(&**window).ok() });
let adapter = instance
.request_adapter(&wgpu::RequestAdapterOptions {
power_preference: PowerPreference::LowPower,
force_fallback_adapter: false,
compatible_surface: surface.as_ref(),
})
.await?; With this ordering it should imply that WARP would only be used as last resort, meaning that the other DX12 adapters could not be found/enumerated, super weird. Also seeing some crashes from the Chromium side, so a bug there could also be possible (though that wouldn't explain my repro code failing on my system without any Chromium shenanigans running). For now, we've swapped over to Vulkan only, which seems to have resolved/workarounded the issue :S
On that note, AMD has finally released driver 23.2.1, and after updating to that version I cannot repro the issue on my system anymore. :S (EDIT: There has also been the KB5022845 Windows update, ugh) So yeah, I think this issue can be closed for now - there doesn't seem anything that points directly to wgpu being the culprit right now. We'll probably experiment with switching back to DX12 (or ideally enabling both) in the future, but for now Vulkan works! |
Turns out that "Microsoft Basic Render Driver" will also show up as
|
Closing as requested unless futher information shows up that we are at fault - thanks for the thorough investigation. |
Description
I was trying to figure out why we were seeing access violations in production (a native node module running in Electron). It seems like there is a race happening after the Chromium renderer is initialized (and querying DX12) and wgpu doing the adapter querying. Mostly by luck I was able to repro it with the code below (just wanted to create a small tool to dump adapter info for different
InstanceDescriptor
s on affected systems; my system didn't run into that race with our production app, even).Repro steps
Minimal repro, so far, is below. Doesn't happen every time, or on every system. The value of
wgpu::Backends
being used in Test A and Test B matters - at least on my system usingPRIMARY
for A andDX12
for B reproduces the issue the quickest (a handful of tries it might work, then stays broken relatively consistently), but any combination that includes DX12 adapters in bothInstance
s is able to trigger this issue. Both debug and release builds are able to trigger this.So far, I have not been able to repro it when disabling one of the tests.
Expected vs observed behavior
Expected: Both enumerations should work just fine.
Observed:
(2c14.b10): Access violation - code c0000005 (!!! second chance !!!) D3D12Core!CLayeredObject<CDevice>::CContainedObject::QueryInterface+0x18
😢Extra materials
Executable, PDB file, and mini dump of the process at time of crashing (can also upload the full dump somewhere if needed):
wgpu-test.zip
wgpu_test_pdb.zip
minidump.zip
Stack trace from debug build:
Program output on my machine in case of the crash:
Platform
The text was updated successfully, but these errors were encountered: