Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zero-initialize workgroup memory #3174

Merged
merged 2 commits into from
Jan 25, 2023

Conversation

teoxoy
Copy link
Member

@teoxoy teoxoy commented Nov 3, 2022

Depends on gfx-rs/naga#2111

Checklist

  • Run cargo clippy.
  • Run RUSTFLAGS=--cfg=web_sys_unstable_apis cargo clippy --target wasm32-unknown-unknown if applicable.
  • Add change to CHANGELOG.md. See simple instructions inside file.

Connections
fixes #2430

Description
Zero-initialize workgroup memory mostly via polyfills (injected by naga) with extra support for VK_KHR_zero_initialize_workgroup_memory (when available).

Testing
Added a test and tested manually on all backends to make sure everything is working properly.

@codecov-commenter
Copy link

codecov-commenter commented Nov 3, 2022

Codecov Report

Merging #3174 (580dd06) into master (aa46e82) will increase coverage by 0.03%.
The diff coverage is 87.87%.

@@            Coverage Diff             @@
##           master    #3174      +/-   ##
==========================================
+ Coverage   64.10%   64.13%   +0.03%     
==========================================
  Files          86       86              
  Lines       42591    42624      +33     
==========================================
+ Hits        27301    27337      +36     
+ Misses      15290    15287       -3     
Impacted Files Coverage Δ
wgpu-hal/src/vulkan/mod.rs 56.37% <ø> (ø)
wgpu-hal/src/vulkan/adapter.rs 76.67% <87.09%> (+0.23%) ⬆️
wgpu-hal/src/dx12/device.rs 88.46% <100.00%> (+<0.01%) ⬆️
wgpu-hal/src/gles/device.rs 80.20% <100.00%> (+0.01%) ⬆️
wgpu-core/src/hub.rs 60.67% <0.00%> (-0.31%) ⬇️
wgpu-core/src/device/mod.rs 66.77% <0.00%> (+0.04%) ⬆️
wgpu/src/lib.rs 50.86% <0.00%> (+0.05%) ⬆️
wgpu-hal/src/gles/queue.rs 61.39% <0.00%> (+0.08%) ⬆️
wgpu-core/src/binding_model.rs 61.18% <0.00%> (+0.31%) ⬆️
... and 1 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

Cargo.toml Outdated Show resolved Hide resolved
Copy link
Member

@ErichDonGubler ErichDonGubler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I know enough about workgroups yet to be able to review this meaningfully. I've left what I could, though!

@teoxoy teoxoy marked this pull request as draft November 3, 2022 17:20
@teoxoy
Copy link
Member Author

teoxoy commented Nov 3, 2022

Converting to draft so that gfx-rs/naga#2111 can land first.

@teoxoy teoxoy requested a review from jimblandy November 4, 2022 09:13
@teoxoy
Copy link
Member Author

teoxoy commented Nov 7, 2022

Looks like the new improved test is causing some unrelated issues to surface.

  • DX12 is failing with

    [2022-11-07T12:39:55Z ERROR wgpu_hal::auxil::dxgi::exception] ID3D12CommandAllocator::Reset: A command allocator 0x000002882019FFD0:'Unnamed ID3D12CommandAllocator Object' is being reset before previous executions associated with the allocator have completed. [ EXECUTION ERROR #552: COMMAND_ALLOCATOR_SYNC]
    [2022-11-07T12:39:55Z ERROR wgpu_hal::auxil::dxgi::exception] ID3D12Resource2::<final-release>: CORRUPTION: An ID3D12Resource object (0x000002882019FB60:'(wgpu internal) Staging') is referenced by GPU operations in-flight on Command Queue (0x000002882004E570:'Unnamed ID3D12CommandQueue Object').  It is not safe to final-release objects that may have GPU operations pending.  This can result in application instability. [ EXECUTION ERROR #921: OBJECT_DELETED_WHILE_STILL_IN_USE]
    

    which was also reported in ID3D12CommandAllocator::Reset #3031 and possibly ID3D12CommandAllocator Error for heavy computation pipeline #2285

  • Vulkan is failing with no errors?

I can't reproduce these locally through llvmpipe (LLVM 14.0.6, 256 bits)/win11 warp drivers.

@cwfitzgerald any ideas?

@cwfitzgerald
Copy link
Member

Vulkan is failing with no errors?

Vulkan is segfaulting

DX12 is failing with

That's weird, I wonder if the problems are fundamentally related. Especially as swiftshader's cpu based execution will be sensitive to various memory corruption.

We have quite a few issues with freeing too early unfortunately.

@teoxoy
Copy link
Member Author

teoxoy commented Nov 8, 2022

I managed to reproduce the DX12 errors locally and filed #3193.

I'll test swiftshader next.

We have quite a few issues with freeing too early unfortunately.

Can you point me to some?

@teoxoy
Copy link
Member Author

teoxoy commented Nov 8, 2022

Tested swiftshader locally using the same binary CI is using and the test ran just fine.
Maybe it was flaky? Can we rerun the CI (I don't seem to have permissions to do it myself)?

@cwfitzgerald
Copy link
Member

cwfitzgerald commented Nov 9, 2022

I re-ran it, but it still segfaults.

Can you point me to some?

#3160 is the one I hit most recently.

@teoxoy teoxoy marked this pull request as ready for review January 25, 2023 17:17
@teoxoy
Copy link
Member Author

teoxoy commented Jan 25, 2023

[2023-01-25T17:46:25Z ERROR wgpu_hal::gles::egl] EGL 'eglInitialize' code 0x3001: DRI2: failed to load driver
thread 'shader::zero_init_workgroup_mem::zero_init_workgroup_mem' panicked at 'Zero-initialization of workgroup memory failed (12% of disptaches failed).', wgpu/tests/shader/zero_init_workgroup_mem.rs:173:5

I'm confused about the new CI error. Did it fail to load the driver or did it run and 12% of dispatches failed to be 0 initialized?

Copy link
Member

@jimblandy jimblandy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@cwfitzgerald cwfitzgerald merged commit d280913 into gfx-rs:master Jan 25, 2023
@teoxoy teoxoy deleted the zero_init_workgroup_mem branch January 25, 2023 23:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Zero out workgroup memory
5 participants