Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Access violation when enumerating in an DX12 instance (sync/race issue?) #3485

Closed
seritools opened this issue Feb 14, 2023 · 4 comments
Closed
Labels
api: dx12 Issues with DX12 or DXGI type: bug Something isn't working

Comments

@seritools
Copy link

seritools commented Feb 14, 2023

Description
I was trying to figure out why we were seeing access violations in production (a native node module running in Electron). It seems like there is a race happening after the Chromium renderer is initialized (and querying DX12) and wgpu doing the adapter querying. Mostly by luck I was able to repro it with the code below (just wanted to create a small tool to dump adapter info for different InstanceDescriptors on affected systems; my system didn't run into that race with our production app, even).

Repro steps

Minimal repro, so far, is below. Doesn't happen every time, or on every system. The value of wgpu::Backends being used in Test A and Test B matters - at least on my system using PRIMARY for A and DX12 for B reproduces the issue the quickest (a handful of tries it might work, then stays broken relatively consistently), but any combination that includes DX12 adapters in both Instances is able to trigger this issue. Both debug and release builds are able to trigger this.

So far, I have not been able to repro it when disabling one of the tests.

[package]
name = "wgpu-test"
version = "0.1.0"
edition = "2021"

[dependencies]
wgpu = "0.15.1"
fn main() {
    println!("Test A:");
    {
        let instance = wgpu::Instance::new(wgpu::InstanceDescriptor {
            backends: wgpu::Backends::PRIMARY,
            dx12_shader_compiler: Default::default(),
        });

        for adapter in instance.enumerate_adapters(wgpu::Backends::PRIMARY) {
            println!("{:#X?}", adapter.get_info());
        }
    }

    println!("\n\nTest B:");
    {
        let instance = wgpu::Instance::new(wgpu::InstanceDescriptor {
            backends: wgpu::Backends::DX12,
            dx12_shader_compiler: Default::default(),
        });

        for adapter in instance.enumerate_adapters(wgpu::Backends::DX12) {
            println!("{:#X?}", adapter.get_info());
        }
    }
}

Expected vs observed behavior

Expected: Both enumerations should work just fine.
Observed: (2c14.b10): Access violation - code c0000005 (!!! second chance !!!) D3D12Core!CLayeredObject<CDevice>::CContainedObject::QueryInterface+0x18 😢

Extra materials

Executable, PDB file, and mini dump of the process at time of crashing (can also upload the full dump somewhere if needed):
wgpu-test.zip
wgpu_test_pdb.zip
minidump.zip

Stack trace from debug build:

STACK_TEXT:  
000000f7`f814b7e0 00007ffb`3f8e0b60     : 00000000`00012f55 00000000`00012f55 00000000`0000b001 00000221`f1633d60 : D3D12Core!CLayeredObject<CDevice>::CContainedObject::QueryInterface+0x18
000000f7`f814b810 00007ffb`3f8e21b6     : 00000000`0000b000 000000f7`0000b000 00000221`e7693210 000000f7`f814bbe0 : D3D12Core!D3D12CoreCreateDevice+0x30c
000000f7`f814ba30 00007ffb`41f36a4d     : 000000f7`f814bbe0 00000000`0000b000 000000f7`f814bc98 00000000`00000074 : D3D12Core!D3D12ValidateAndCreateDevice+0x146
000000f7`f814bab0 00007ffb`41f3668e     : 00007ffb`41f369f0 000000f7`f814bc98 000000f7`f814bbe0 000000f7`f814fa80 : d3d12!D3D12CreateDeviceImpl+0x5d
000000f7`f814bb00 00007ff6`1d89fc74     : 00000221`e763d410 000000f7`f814bd60 000000f7`f814fab0 000000f7`f814fa80 : d3d12!D3D12CreateDevice+0xae
000000f7`f814bb80 00007ff6`1d871b9e     : 00000000`00000008 00000000`00000000 00000000`00070000 00000000`00000000 : wgpu_test!d3d12::D3D12Lib::create_device<winapi::shared::dxgi::IDXGIAdapter1>+0x104
000000f7`f814bce0 00007ff6`1d929552     : 00000000`00000008 00000000`00000000 00000000`00070000 00000000`00000000 : wgpu_test!wgpu_hal::dx12::Adapter::expose+0xbe
000000f7`f814ce60 00007ff6`1d91f5fd     : 00000000`00000008 00000000`00000000 00000000`00070000 00000000`00000000 : wgpu_test!wgpu_hal::dx12::instance::impl$1::enumerate_adapters::closure$0+0x62
000000f7`f814ced0 00007ff6`1d957683     : 00000000`00000000 00007ff6`1dd3257a 00000000`0000000f 00000000`00000000 : wgpu_test!core::ops::function::impls::impl$3::call_mut<tuple$<enum2$<d3d12::dxgi::DxgiAdapter> >,wgpu_hal::dx12::instance::impl$1::enumerate_adapters::closure_env$0>+0x2d
000000f7`f814cf20 00007ff6`1d7ec6ab     : 00000005`00000000 000000f7`f814d4e0 00000000`00160014 000000f7`f814d510 : wgpu_test!core::iter::traits::iterator::Iterator::find_map::check::closure$0<enum2$<d3d12::dxgi::DxgiAdapter>,wgpu_hal::ExposedAdapter<wgpu_hal::dx12::Api>,ref_mut$<wgpu_hal::dx12::instance::impl$1::enumerate_adapters::closure_env$0> >+0x53
000000f7`f814d3f0 00007ff6`1d7ec3c3     : 00007ffb`c159891d 006c0064`002e0032 00000000`00000000 00000000`00000000 : wgpu_test!core::iter::traits::iterator::Iterator::try_fold<alloc::vec::into_iter::IntoIter<enum2$<d3d12::dxgi::DxgiAdapter>,alloc::alloc::Global>,tuple$<>,core::iter::traits::iterator::Iterator::find_map::check::closure_env$0<enum2$<d3d12::dxgi::DxgiAdapter>,wgpu_hal::ExposedAdapter<wgpu_hal::dx12::Api>,ref_mut$<wgpu_hal::dx12::instance::impl$1::enumerate_adapters::closure_env$0> >,enum2$<core::ops::control_flow::ControlFlow<wgpu_hal::ExposedAdapter<wgpu_hal::dx12::Api>,tuple$<> > > >+0xbb
000000f7`f814da90 00007ff6`1d8b7cbd     : 00000000`00000000 000000f7`f814dd19 00000221`e7630000 00007ffb`c6d6ce8a : wgpu_test!core::iter::traits::iterator::Iterator::find_map<alloc::vec::into_iter::IntoIter<enum2$<d3d12::dxgi::DxgiAdapter>,alloc::alloc::Global>,wgpu_hal::ExposedAdapter<wgpu_hal::dx12::Api>,ref_mut$<wgpu_hal::dx12::instance::impl$1::enumerate_adapters::closure_env$0> >+0x43
000000f7`f814dc60 00007ff6`1d801d97     : 00000000`00000000 00000000`00000000 00000000`00000002 00007ffb`c4d60000 : wgpu_test!core::iter::adapters::filter_map::impl$2::next<wgpu_hal::ExposedAdapter<wgpu_hal::dx12::Api>,alloc::vec::into_iter::IntoIter<enum2$<d3d12::dxgi::DxgiAdapter>,alloc::alloc::Global>,wgpu_hal::dx12::instance::impl$1::enumerate_adapters::closure_env$0>+0x1d
000000f7`f814dca0 00007ff6`1d812245     : 00000000`00000000 00000000`00000003 00000221`e90f9250 00000000`00000000 : wgpu_test!alloc::vec::spec_from_iter_nested::impl$0::from_iter<wgpu_hal::ExposedAdapter<wgpu_hal::dx12::Api>,core::iter::adapters::filter_map::FilterMap<alloc::vec::into_iter::IntoIter<enum2$<d3d12::dxgi::DxgiAdapter>,alloc::alloc::Global>,wgpu_hal::dx12::instance::impl$1::enumerate_adapters::closure_env$0> >+0x37
000000f7`f814e2c0 00007ff6`1d81c57f     : 00000221`e9208370 00000221`e9208370 00000221`e9208370 00000221`e9208370 : wgpu_test!alloc::vec::in_place_collect::impl$1::from_iter<wgpu_hal::ExposedAdapter<wgpu_hal::dx12::Api>,core::iter::adapters::filter_map::FilterMap<alloc::vec::into_iter::IntoIter<enum2$<d3d12::dxgi::DxgiAdapter>,alloc::alloc::Global>,wgpu_hal::dx12::instance::impl$1::enumerate_adapters::closure_env$0> >+0x85
000000f7`f814e4a0 00007ff6`1d8c28ed     : 000000f7`f814e1e8 00000221`e9208370 000000f7`f814e3b8 000000f7`f814e3b8 : wgpu_test!alloc::vec::impl$15::from_iter<wgpu_hal::ExposedAdapter<wgpu_hal::dx12::Api>,core::iter::adapters::filter_map::FilterMap<alloc::vec::into_iter::IntoIter<enum2$<d3d12::dxgi::DxgiAdapter>,alloc::alloc::Global>,wgpu_hal::dx12::instance::impl$1::enumerate_adapters::closure_env$0> >+0x3f
000000f7`f814e530 00007ff6`1d885ff5     : 00000221`e7638299 00000000`00000000 00007ff6`1dd40dd0 00000000`00000046 : wgpu_test!core::iter::traits::iterator::Iterator::collect<core::iter::adapters::filter_map::FilterMap<alloc::vec::into_iter::IntoIter<enum2$<d3d12::dxgi::DxgiAdapter>,alloc::alloc::Global>,wgpu_hal::dx12::instance::impl$1::enumerate_adapters::closure_env$0>,alloc::vec::Vec<wgpu_hal::ExposedAdapter<wgpu_hal::dx12::Api>,alloc::alloc::Global> >+0x2d
000000f7`f814e590 00007ff6`1d67b4a7     : 00000000`00000004 00000000`00000000 00000000`00000004 00000000`00000000 : wgpu_test!wgpu_hal::dx12::instance::impl$1::enumerate_adapters+0xa5
000000f7`f814e660 00007ff6`1d676ca6     : 00000000`00000000 00000000`00000004 00000000`00000001 c5a5782c`7bce34e1 : wgpu_test!wgpu_core::hub::Global<wgpu_core::hub::IdentityManagerFactory>::enumerate<wgpu_core::hub::IdentityManagerFactory,wgpu_hal::dx12::Api>+0xc7
000000f7`f814f050 00007ff6`1d1817aa     : 00000221`f15d62e0 c5a5782c`7bce34e1 c5a5782c`7bce34e1 00000221`f15d62e0 : wgpu_test!wgpu_core::hub::Global<wgpu_core::hub::IdentityManagerFactory>::enumerate_adapters<wgpu_core::hub::IdentityManagerFactory>+0x66
000000f7`f814f0b0 00007ff6`1d1a7f91     : 00000000`00000004 00000000`00000000 000000f7`f8149798 00000000`00000000 : wgpu_test!wgpu::backend::direct::Context::enumerate_adapters+0x3a
000000f7`f814f110 00007ff6`1d14149a     : 000000f7`f814f394 00000221`e76389b0 00000221`e7638450 00007ffb`c6d56cf7 : wgpu_test!wgpu::Instance::enumerate_adapters+0xb1
000000f7`f814f210 00007ff6`1d14107b     : 00000000`00000000 00000000`00000000 00000221`e7646370 00000000`00000005 : wgpu_test!wgpu_test::main+0x3da
000000f7`f814f950 00007ff6`1d14198e     : 00000000`00000005 00000221`e7640000 00000000`00000005 00000221`e7646370 : wgpu_test!core::ops::function::FnOnce::call_once<void (*)(),tuple$<> >+0xb
000000f7`f814f990 00007ff6`1d141951     : 01007ff6`1ddb0d88 00000221`e7648e90 00000221`e7648e90 010000f7`f814fa80 : wgpu_test!std::sys_common::backtrace::__rust_begin_short_backtrace<void (*)(),tuple$<> >+0xe
000000f7`f814f9c0 00007ff6`1dcfbe6e     : 00000221`e763d410 000000f7`f814fab0 00000221`e7646680 00000000`00000000 : wgpu_test!std::rt::lang_start::closure$0<tuple$<> >+0x11
000000f7`f814fa00 00007ff6`1d14192a     : 00000000`00000000 00007ff6`1dd065b3 00000000`00140013 00007ff6`1ddb066e : wgpu_test!std::rt::lang_start_internal+0xbe
000000f7`f814fb50 00007ff6`1d1418d9     : 00000000`00000007 00000000`00000001 00000000`00000000 00000000`00000000 : wgpu_test!std::rt::lang_start<tuple$<> >+0x3a
000000f7`f814fbc0 00007ff6`1dd2282c     : 00000000`00000000 00007ff6`1dd228a5 00000000`00000000 00000000`00000000 : wgpu_test!main+0x19
000000f7`f814fbf0 00007ffb`c52226bd     : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : wgpu_test!__scrt_common_main_seh+0x10c
000000f7`f814fc30 00007ffb`c6d8dfb8     : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : KERNEL32!BaseThreadInitThunk+0x1d
000000f7`f814fc60 00000000`00000000     : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : ntdll!RtlUserThreadStart+0x28

Program output on my machine in case of the crash:

Test A:
AdapterInfo {
    name: "AMD Radeon RX 6900 XT",
    vendor: 0x1002,
    device: 0x73BF,
    device_type: DiscreteGpu,
    driver: "AMD proprietary driver",
    driver_info: "22.11.2",
    backend: Vulkan,
}
AdapterInfo {
    name: "Intel(R) UHD Graphics 770",
    vendor: 0x8086,
    device: 0x4680,
    device_type: IntegratedGpu,
    driver: "Intel Corporation",
    driver_info: "Intel driver",
    backend: Vulkan,
}
AdapterInfo {
    name: "AMD Radeon RX 6900 XT",
    vendor: 0x1002,
    device: 0x73BF,
    device_type: DiscreteGpu,
    driver: "",
    driver_info: "",
    backend: Dx12,
}
AdapterInfo {
    name: "Intel(R) UHD Graphics 770",
    vendor: 0x8086,
    device: 0x4680,
    device_type: IntegratedGpu,
    driver: "",
    driver_info: "",
    backend: Dx12,
}
AdapterInfo {
    name: "Microsoft Basic Render Driver",
    vendor: 0x1414,
    device: 0x8C,
    device_type: Cpu,
    driver: "",
    driver_info: "",
    backend: Dx12,
}


Test B:
<crash>

Platform

  • wgpu 0.15.1
  • Windows 11 (Build 22621; latest stable, fully updated)
  • AMD Radeon RX 6900 XT with driver version 22.11.1 (latest)
  • Intel i7-12700k w/ Intel(R) UHD Graphics 770 with driver version from Windows Update
  • Tested both with Rust 1.67.1 and the Rust nightly I had installed
@cwfitzgerald
Copy link
Member

cwfitzgerald commented Feb 15, 2023

Okay this is a weird one from multiple angles. On one hand, D3D is supposed to be fully thread safe, on the other, devices are supposed to be singletons, so we should get the same device both times. My best guess, outside of a driver bug, is maybe we're releasing some COM object too many times, causing some nonsense in the driver. What other (brands of) hardware have you been able to reproduce this on?

@seritools
Copy link
Author

seritools commented Feb 16, 2023

Interestingly, the issue in-the-wild (with the Chromium/Electron renderer being the "other" initialized renderer in the process) seems to predominantly affect people with an NVIDIA GPU.

value,times_seen
NVIDIA GeForce RTX 3080,97
NVIDIA GeForce RTX 3070,54
NVIDIA GeForce RTX 3080 Ti,53
NVIDIA GeForce RTX 3090,37
NVIDIA GeForce RTX 4090,36
NVIDIA GeForce RTX 3060 Ti,32
NVIDIA GeForce RTX 3070 Ti,31
NVIDIA GeForce RTX 2070 SUPER,19
NVIDIA GeForce RTX 2080 Ti,18
NVIDIA GeForce RTX 3060,18
NVIDIA GeForce RTX 4080,15
NVIDIA GeForce RTX 4070 Ti,12
NVIDIA GeForce GTX 1080 Ti,10
NVIDIA GeForce RTX 2060 SUPER,9
NVIDIA GeForce RTX 3090 Ti,9
NVIDIA GeForce RTX 2060,8
NVIDIA GeForce GTX 1070,7
NVIDIA GeForce RTX 2080 SUPER,7
NVIDIA GeForce RTX 2080,6
"Intel(R) UHD Graphics,NVIDIA GeForce RTX 4090",5
NVIDIA GeForce RTX 3050,5
NVIDIA GeForce GTX 1660 SUPER,5
"Intel(R) UHD Graphics 770,NVIDIA GeForce RTX 3080",4
"Intel(R) UHD Graphics 770,NVIDIA GeForce RTX 3070 Ti",4
"Intel(R) UHD Graphics,NVIDIA GeForce RTX 3090",3
NVIDIA GeForce RTX 2070,3
"Intel(R) UHD Graphics 770,NVIDIA GeForce RTX 3080 Ti",3
"Intel(R) UHD Graphics,Microsoft 基本显示适配器",3
"Intel(R) UHD Graphics 770,NVIDIA GeForce RTX 3090 Ti",3
Intel(R) UHD Graphics 770,2
"Intel(R) HD Graphics 530,NVIDIA GeForce GTX 970",2
NVIDIA GeForce GTX 1660 Ti,2
"AMD Radeon(TM) Graphics,NVIDIA GeForce RTX 3070",2
NVIDIA GeForce GTX 1060 6GB,2
NVIDIA GeForce GTX 1080,2
"Intel(R) UHD Graphics,NVIDIA GeForce RTX 3070 Ti",2
NVIDIA GeForce GTX 1650 SUPER,2
Базовый видеоадаптер (Майкрософт),2
"AMD Radeon(TM) Graphics,NVIDIA GeForce RTX 4070 Ti",2
NVIDIA GeForce GT 730,2
"AMD Radeon(TM) Graphics,NVIDIA GeForce RTX 3080",2
Intel(R) UHD Graphics,2
"Intel(R) UHD Graphics 770,NVIDIA GeForce RTX 3070",2
"Intel(R) HD Graphics 4600,NVIDIA GeForce GT 710",2
NVIDIA GeForce GTX 1070 Ti,2
"AMD Radeon(TM) Graphics,NVIDIA GeForce RTX 4080",1
NVIDIA GeForce GTX 980 Ti,1
"AMD Radeon(TM) Graphics,NVIDIA GeForce RTX 4090",1
"Intel(R) UHD Graphics 630,NVIDIA GeForce RTX 2070 SUPER",1
Intel(R) UHD Graphics 630,1
"Intel(R) UHD Graphics,NVIDIA GeForce RTX 4080",1
"Intel(R) UHD Graphics 750,NVIDIA GeForce RTX 3080 Ti",1
"Intel(R) UHD Graphics,NVIDIA GeForce GTX 1080",1
"AMD Radeon(TM) Graphics,NVIDIA GeForce RTX 3060",1
"Intel(R) HD Graphics 4600,NVIDIA GeForce GTX 1080 Ti",1
Microsoft 기본 디스플레이 어댑터,1
"Intel(R) UHD Graphics 630,NVIDIA GeForce RTX 3080",1
NVIDIA GeForce GT 740,1
"LuminonCore IDDCX Adapter,NVIDIA GeForce RTX 3080",1
"Intel(R) UHD Graphics 770,NVIDIA GeForce RTX 3060 Ti",1
"Intel(R) UHD Graphics 770,NVIDIA GeForce GTX 1070",1
"Intel(R) UHD Graphics 730,NVIDIA GeForce RTX 3060",1
NVIDIA GeForce GTX 1660,1
"Intel(R) UHD Graphics 770,NVIDIA GeForce GTX 970",1
"NVIDIA GeForce RTX 3080 Ti,Trigger 6 External Graphics",1
"Intel(R) UHD Graphics,NVIDIA GeForce RTX 3070",1
"Intel(R) UHD Graphics 630,LuminonCore IDDCX Adapter,NVIDIA GeForce RTX 2070 SUPER",1
"Carte vidéo de base Microsoft,Intel(R) UHD Graphics",1

Also seen across different Windows 10 and Windows 11 versions (the application is Windows-only), and across both Intel and AMD CPUs, of pretty much all recent generations.

Another very peculiar thing is that at the same time, we've also a whole host of other crashes (this time much more spread; across AMD GPUs as well), all in d3d10warp.dll (so the WARP software renderer). I can also confirm that the systems where this was happening definitely had proper GPUs, so it seems weird that d3d10warp is loaded/used at all. The crashes definitely happened much later, e.g. in the shader JIT of WARP (wgpu_hal::dx12::Queue::submit → ... → JITBaseVariable::OptimizeCopy, PixelJitProgram::ClassifyVars).

The exact code in production to initialize wgpu and request an adapter looked like this - really nothing special:

    let instance = wgpu::Instance::new(wgpu::InstanceDescriptor {
        backends: wgpu::Backends::DX12,
        dx12_shader_compiler: Default::default(),
    });

    let surface = window.and_then(|window| unsafe { instance.create_surface(&**window).ok() });
    let adapter = instance
        .request_adapter(&wgpu::RequestAdapterOptions {
            power_preference: PowerPreference::LowPower,
            force_fallback_adapter: false,
            compatible_surface: surface.as_ref(),
        })
        .await?;

With this ordering it should imply that WARP would only be used as last resort, meaning that the other DX12 adapters could not be found/enumerated, super weird.

Also seeing some crashes from the Chromium side, so a bug there could also be possible (though that wouldn't explain my repro code failing on my system without any Chromium shenanigans running).

For now, we've swapped over to Vulkan only, which seems to have resolved/workarounded the issue :S

My best guess, outside of a driver bug,

On that note, AMD has finally released driver 23.2.1, and after updating to that version I cannot repro the issue on my system anymore. :S (EDIT: There has also been the KB5022845 Windows update, ugh)


So yeah, I think this issue can be closed for now - there doesn't seem anything that points directly to wgpu being the culprit right now. We'll probably experiment with switching back to DX12 (or ideally enabling both) in the future, but for now Vulkan works!

@seritools
Copy link
Author

seritools commented Feb 16, 2023

Turns out that "Microsoft Basic Render Driver" will also show up as IntegratedGpu if no driver is installed, which explains the usage of WARP in many cases:

AdapterInfo {
    name: "Microsoft Basic Render Driver",
    vendor: 0x1414,
    device: 0x8C,
    device_type: IntegratedGpu,
    driver: "",
    driver_info: "",
    backend: Dx12,
}
AdapterInfo {
    name: "Microsoft Basic Render Driver",
    vendor: 0x1414,
    device: 0x8C,
    device_type: Cpu,
    driver: "",
    driver_info: "",
    backend: Dx12,
}

@teoxoy teoxoy added type: bug Something isn't working api: dx12 Issues with DX12 or DXGI labels Feb 21, 2023
@cwfitzgerald
Copy link
Member

Closing as requested unless futher information shows up that we are at fault - thanks for the thorough investigation.

@cwfitzgerald cwfitzgerald closed this as not planned Won't fix, can't repro, duplicate, stale Feb 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: dx12 Issues with DX12 or DXGI type: bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants