Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

atomic load/store is not coherent #1488

Closed
raphlinus opened this issue Oct 27, 2021 · 4 comments
Closed

atomic load/store is not coherent #1488

raphlinus opened this issue Oct 27, 2021 · 4 comments
Labels
help wanted Extra attention is needed

Comments

@raphlinus
Copy link
Contributor

raphlinus commented Oct 27, 2021

I've tried coding up my prefix sum example in WGSL; the code is in the prefix branch of compute-shader-101. It almost works, but the atomic stores sent by one workgroup are not coherently seen by atomic loads in another.

I'm not 100% sure what's going wrong, but a couple of clues. First, there's no coherent or volatile decoration on the StateBuf buffer in the naga spv output. The WGSL spec suggests OpMemberDecorate Volatile. I personally think coherent may be closer to what I need here, but am not sure; the distinction is pretty fuzzy to me.

Also, the loads and stores are mapping to OpAtomicLoad and OpAtomicStore. I'm not sure, but I think these may not do anything unless the Vulkan memory model is enabled. Or in any case there's no way to generate them from glsl unless the GL_KHR_memory_scope_semantics extension is enabled.

I did a little experimentation hacking on the spv assembly (adding Coherent/Volatile decoration), but was not able to get it to work. Doing a roundtrip through spirv-cross did, but that made a bigger change to the program, replacing atomicLoad with atomicAdd(, 0), and atomicStore with atomicExchange(). That did work correctly, suggesting that the problem is indeed respecting the atomic semantics, but of course with this kind of thing there's all sorts of things that can go wrong.

@raphlinus
Copy link
Contributor Author

I should add, I'm testing on an AMD 5700 XT on Windows 10. I would not be at all surprised if some other GPU executed this code correctly.

@kvark kvark added the help wanted Extra attention is needed label Oct 27, 2021
@kvark
Copy link
Member

kvark commented Oct 27, 2021

@dneto0 perhaps you could shed some light on this? I imagine this isn't a Naga-specific question.

@raphlinus
Copy link
Contributor Author

Ok, this may be some pretty deep water. First, please don't take my initial comments too seriously, it's speculation and I'm still exploring.

I've tried running the code on some other systems in the hope that sheds more light. Metal translation (M1) runs fine. Windows 10 Vulkan on a laptop with an Nvidia 1060 and an Intel 630 runs fine (I'm not sure which one is being selected, think it's the Intel, can track this down further if that would be helpful).

DX12 translation gives these error messages:

thread 'main' panicked at 'wgpu error: Validation Error

Caused by:
    In Device::create_compute_pipeline
    Internal error: D3DCompile error (0x80004005): C:\Users\raph\github\compute-shader-101\compute-shader-hello\Shader@0x0000016DBE2DBD60(86,13-45): error X4026: thread sync operation must be in non-varying flow control, due to a potential race condition this sync is illegal, consider adding a sync after reading any values controlling shader execution at this point
C:\Users\raph\github\compute-shader-101\compute-shader-hello\Shader@0x0000016DBE2DBD60(87,29-39): error X4026: this variable dependent on potentially varying data: _expr105
C:\Users\raph\github\compute-shader-101\compute-shader-hello\Shader@0x0000016DBE2DBD60(88,13-27): error X4026: this variable dependent on potentially varying data: _expr106
C:\Users\raph\github\compute-shader-101\compute-shader-hello\Shader@0x0000016DBE2DBD60(90,18-32): error X4026: this variable dependent on potentially varying data: flag
C:\Users\raph\github\compute-shader-101\compute-shader-hello\Shader@0x0000016DBE2DBD60(91,18-46): error X4026: this variable dependent on potentially varying data: _expr107
C:\Users\raph\github\compute-shader-101\compute-shader-hello\Shader@0x0000016DBE2DBD60(80,9-19): error X4026: this loop dependent on potentially varying data
C:\Users\raph\github\compute-shader-101\compute-shader-hello\Shader@0x0000016DBE2DBD60(81,18-52): error X4026: this variable dependent on potentially varying data: local_id
C:\Users\raph\github\compute-shader-101\compute-shader-hello\Shader@0x0000016DBE2DBD60(83,22-81): error X4026: this variable dependent on potentially varying data: _expr105

I've attached the naga HLSL output (from master branch, commit 00bbbed, also verified identical with 3b49981 which is 0.7.1). I believe the control flow is workgroup-uniform here, but it's tricky to verify, as the situation is complex. The workgroup as a whole spins until a flag set by another workgroup goes nonzero. (Note that this assumes forward progress, and I will fix that assumption, but that is not the problem here; I'm getting incorrect ouput, while a forward progress stall would just deadlock). If control flow is indeed nonuniform, it would explain the problems and not necessarily indicate a problem with shader compilation.

DXC happily translates the attached HLSL. The above error message seems more specific to D3DCompiler.

shader.hlsl.gz

It might be worth distilling this to a more reduced test case, in particular to diagnose whether the failure of atomic coherence can be demonstrated with clean uniform control flow.

@raphlinus
Copy link
Contributor Author

I'm going to close this issue, as I now think the most likely explanation is a bug in the AMD shader compiler. From what I can tell, the naga translation is correct wrt the current recommended strategy in the wgsl spec. It's possible that strategy may change to become more compatible with existing drivers, but that's a different story.

There's more discussion in googlefonts/compute-shader-101#11. That's a much simpler test case than the full prefix sum implementation, and most notably does not implicate either control flow uniformity analysis or any assumptions about forward progress; those are also really interesting topics and I will probably follow up on them, but complicate the story quite a bit, and atomic coherence is hard enough as it is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants