atomic load/store is not coherent #1488

raphlinus · 2021-10-27T05:58:14Z

I've tried coding up my prefix sum example in WGSL; the code is in the prefix branch of compute-shader-101. It almost works, but the atomic stores sent by one workgroup are not coherently seen by atomic loads in another.

I'm not 100% sure what's going wrong, but a couple of clues. First, there's no coherent or volatile decoration on the StateBuf buffer in the naga spv output. The WGSL spec suggests OpMemberDecorate Volatile. I personally think coherent may be closer to what I need here, but am not sure; the distinction is pretty fuzzy to me.

Also, the loads and stores are mapping to OpAtomicLoad and OpAtomicStore. I'm not sure, but I think these may not do anything unless the Vulkan memory model is enabled. Or in any case there's no way to generate them from glsl unless the GL_KHR_memory_scope_semantics extension is enabled.

I did a little experimentation hacking on the spv assembly (adding Coherent/Volatile decoration), but was not able to get it to work. Doing a roundtrip through spirv-cross did, but that made a bigger change to the program, replacing atomicLoad with atomicAdd(, 0), and atomicStore with atomicExchange(). That did work correctly, suggesting that the problem is indeed respecting the atomic semantics, but of course with this kind of thing there's all sorts of things that can go wrong.

The text was updated successfully, but these errors were encountered:

raphlinus · 2021-10-27T06:03:18Z

I should add, I'm testing on an AMD 5700 XT on Windows 10. I would not be at all surprised if some other GPU executed this code correctly.

kvark · 2021-10-27T12:51:21Z

@dneto0 perhaps you could shed some light on this? I imagine this isn't a Naga-specific question.

raphlinus · 2021-10-27T15:53:27Z

Ok, this may be some pretty deep water. First, please don't take my initial comments too seriously, it's speculation and I'm still exploring.

I've tried running the code on some other systems in the hope that sheds more light. Metal translation (M1) runs fine. Windows 10 Vulkan on a laptop with an Nvidia 1060 and an Intel 630 runs fine (I'm not sure which one is being selected, think it's the Intel, can track this down further if that would be helpful).

DX12 translation gives these error messages:

thread 'main' panicked at 'wgpu error: Validation Error

Caused by:
    In Device::create_compute_pipeline
    Internal error: D3DCompile error (0x80004005): C:\Users\raph\github\compute-shader-101\compute-shader-hello\Shader@0x0000016DBE2DBD60(86,13-45): error X4026: thread sync operation must be in non-varying flow control, due to a potential race condition this sync is illegal, consider adding a sync after reading any values controlling shader execution at this point
C:\Users\raph\github\compute-shader-101\compute-shader-hello\Shader@0x0000016DBE2DBD60(87,29-39): error X4026: this variable dependent on potentially varying data: _expr105
C:\Users\raph\github\compute-shader-101\compute-shader-hello\Shader@0x0000016DBE2DBD60(88,13-27): error X4026: this variable dependent on potentially varying data: _expr106
C:\Users\raph\github\compute-shader-101\compute-shader-hello\Shader@0x0000016DBE2DBD60(90,18-32): error X4026: this variable dependent on potentially varying data: flag
C:\Users\raph\github\compute-shader-101\compute-shader-hello\Shader@0x0000016DBE2DBD60(91,18-46): error X4026: this variable dependent on potentially varying data: _expr107
C:\Users\raph\github\compute-shader-101\compute-shader-hello\Shader@0x0000016DBE2DBD60(80,9-19): error X4026: this loop dependent on potentially varying data
C:\Users\raph\github\compute-shader-101\compute-shader-hello\Shader@0x0000016DBE2DBD60(81,18-52): error X4026: this variable dependent on potentially varying data: local_id
C:\Users\raph\github\compute-shader-101\compute-shader-hello\Shader@0x0000016DBE2DBD60(83,22-81): error X4026: this variable dependent on potentially varying data: _expr105

I've attached the naga HLSL output (from master branch, commit 00bbbed, also verified identical with 3b49981 which is 0.7.1). I believe the control flow is workgroup-uniform here, but it's tricky to verify, as the situation is complex. The workgroup as a whole spins until a flag set by another workgroup goes nonzero. (Note that this assumes forward progress, and I will fix that assumption, but that is not the problem here; I'm getting incorrect ouput, while a forward progress stall would just deadlock). If control flow is indeed nonuniform, it would explain the problems and not necessarily indicate a problem with shader compilation.

DXC happily translates the attached HLSL. The above error message seems more specific to D3DCompiler.

shader.hlsl.gz

It might be worth distilling this to a more reduced test case, in particular to diagnose whether the failure of atomic coherence can be demonstrated with clean uniform control flow.

raphlinus · 2021-10-29T04:22:22Z

I'm going to close this issue, as I now think the most likely explanation is a bug in the AMD shader compiler. From what I can tell, the naga translation is correct wrt the current recommended strategy in the wgsl spec. It's possible that strategy may change to become more compatible with existing drivers, but that's a different story.

There's more discussion in googlefonts/compute-shader-101#11. That's a much simpler test case than the full prefix sum implementation, and most notably does not implicate either control flow uniformity analysis or any assumptions about forward progress; those are also really interesting topics and I will probably follow up on them, but complicate the story quite a bit, and atomic coherence is hard enough as it is.

kvark added the help wanted Extra attention is needed label Oct 27, 2021

raphlinus closed this as completed Oct 29, 2021

kvark mentioned this issue Oct 25, 2023

Make buffers coherent by default gfx-rs/wgpu#4444

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

atomic load/store is not coherent #1488

atomic load/store is not coherent #1488

raphlinus commented Oct 27, 2021 •

edited

Loading

raphlinus commented Oct 27, 2021

kvark commented Oct 27, 2021

raphlinus commented Oct 27, 2021

raphlinus commented Oct 29, 2021

atomic load/store is not coherent #1488

atomic load/store is not coherent #1488

Comments

raphlinus commented Oct 27, 2021 • edited Loading

raphlinus commented Oct 27, 2021

kvark commented Oct 27, 2021

raphlinus commented Oct 27, 2021

raphlinus commented Oct 29, 2021

raphlinus commented Oct 27, 2021 •

edited

Loading