Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect SYNC-HAZARD-WRITE-AFTER-READ error with vkCmdPipelineBarrier2 applied to suballocated buffers #8502

Closed
HuYuxin opened this issue Sep 5, 2024 · 15 comments · Fixed by #8674
Assignees
Labels
Synchronization Synchronization Validation Object Issue

Comments

@HuYuxin
Copy link

HuYuxin commented Sep 5, 2024

Environment:

  • OS: Linux
  • GPU and driver version: NVIDIA, Driver Version: 535.183.01. Vulkan Instance Version: 1.3.283
  • SDK or header version if building from repo: 2d73636
  • Options enabled (synchronization, best practices, etc.):

Describe the Issue

To reproduce:

  1. Sync latest ANGLE: https://github.com/google/angle
  2. Cherry-pick these two changes:
    https://chromium-review.googlesource.com/c/angle/angle/+/5832709/4 (patchset 4)
    https://chromium-review.googlesource.com/c/angle/angle/+/5832710/10 (patchset 10)
  3. Set below gn args gn args out/LinuxDebug
dcheck_always_on = true
is_component_build = true
is_debug = false
symbol_level = 1
angle_enable_cl=true
  1. Compile with command autoninja -C out/LinuxDebug angle_end2end_tests
  2. Run test with command out/LinuxDebug/angle_end2end_tests --gtest_filter="LineLoopIndirectTest.UShortIndexIndirectBuffer/ES3_1_Vulkan" --verbose --local-output

Observe that the test failed with this validation error:

[ SYNC-HAZARD-WRITE-AFTER-READ ] Validation Error: [ SYNC-HAZARD-WRITE-AFTER-READ ] Object 0: handle = 0x5557db026030, type = VK_OBJECT_TYPE_COMMAND_BUFFER; Object 1: handle = 0x200000000020, type = VK_OBJECT_TYPE_BUFFER; | MessageID = 0x376bc9df | vkCmdCopyImageToBuffer():  Hazard WRITE_AFTER_READ for dstBuffer VkBuffer 0x200000000020[], region 0. Access info (usage: SYNC_COPY_TRANSFER_WRITE, prior_usage: SYNC_INDEX_INPUT_INDEX_READ, read_barriers: VK_PIPELINE_STAGE_2_BOTTOM_OF_PIPE_BIT, command: vkCmdDrawIndexedIndirect, seq_no: 8, reset_no: 3, resource: VkBuffer 0x200000000020[]).

Expected behavior

Test should pass without the SYNC-HAZARD-WRITE-AFTER-READ validation error.

Valid Usage ID
If applicable, please include the validation messages encountered leading up to the issue

[ SYNC-HAZARD-WRITE-AFTER-READ ] Validation Error: [ SYNC-HAZARD-WRITE-AFTER-READ ] Object 0: handle = 0x5557db026030, type = VK_OBJECT_TYPE_COMMAND_BUFFER; Object 1: handle = 0x200000000020, type = VK_OBJECT_TYPE_BUFFER; | MessageID = 0x376bc9df | vkCmdCopyImageToBuffer():  Hazard WRITE_AFTER_READ for dstBuffer VkBuffer 0x200000000020[], region 0. Access info (usage: SYNC_COPY_TRANSFER_WRITE, prior_usage: SYNC_INDEX_INPUT_INDEX_READ, read_barriers: VK_PIPELINE_STAGE_2_BOTTOM_OF_PIPE_BIT, command: vkCmdDrawIndexedIndirect, seq_no: 8, reset_no: 3, resource: VkBuffer 0x200000000020[]).

Additional context

The renderdoc capture shows that VVL thinks there is a missing execution barrier on the VkBuffer:

VkBuffer used as index buffer for vkCmdDrawIndexed()
--- VVL thinks there is a missing execution barrier---
VkBuffer used as dest buffer for vkCmdCopyImageToBuffer()

However, the two buffers are suballocated. Even they belong to the same big VkBuffer, they are actually pointing to different memories.

We think that using VK_KHR_Synchronization2 exposed this issue in VVL. Before we use VK_KHR_Synchronization2, the srcStageMask and dstStageMask are shared between VkMemoryBarrier, VkBufferMemoryBarier, and VkImageMemoryBarrier. in the vkCmdPipelineBarrier () call. Between the vkCmdDrawIndex() and vkCmdCopyImageToBuffer(), even if only image barrier is needed, the srcStageMask and dstStageMask values also apply to the VkMemoryBarrier and VkBufferMemoryBarrier, and it fulfills VVL's validation on buffer execution barrier (even there is no execution barrier needed on buffer because they are suballocated and point to different memories). After switching to VK_KHR_Synchronization2, the srcStageMask and dstStageMask are packed within the VkMemoryBarrier2, VkBufferMemoryBarier2, VkImageMemoryBarrier2. So now if VkMemoryBarrier2 is null, there is no execution barrier applied to VkBuffer, and VVL incorrectly thinks that there is a missing execution barrier on the buffer.

@spencer-lunarg spencer-lunarg added the Synchronization Synchronization Validation Object Issue label Sep 5, 2024
@ShabbyX
Copy link
Contributor

ShabbyX commented Sep 6, 2024

There's likely some relationship with #3605

Basically, the range of a buffer resource is not taken into account in syncval (it seems)

@artem-lunarg
Copy link
Contributor

artem-lunarg commented Sep 6, 2024

I wonder if #3605 was fixed by #8141, we indeed did not compute offsets correctly for dynamic descriptors (but I did not look closely at these issues yet, so could be a wrong suggestion). This issue should be something different because it is based on the latest code.

@ShabbyX
Copy link
Contributor

ShabbyX commented Sep 18, 2024

I wonder if #3605 was fixed by #8141, we indeed did not compute offsets correctly for dynamic descriptors (but I did not look closely at these issues yet, so could be a wrong suggestion). This issue should be something different because it is based on the latest code.

Yes, looks like the dynamic offset part is fixed: https://chromium-review.googlesource.com/c/angle/angle/+/5873529

We're still getting false positives because the vertex buffer ranges are not taken into account (it seems)

@artem-lunarg
Copy link
Contributor

Should be fixed by #8674

@ShabbyX
Copy link
Contributor

ShabbyX commented Oct 16, 2024

It doesn't look like this was about indirect buffers only though. I tried removing this suppression and we still get VVL failures: https://chromium-review.googlesource.com/c/angle/angle/+/5937422

Typically, the failures we see are in the form of vertex-attribute usage reported not synced with transfer, but those are happening to different ranges of the buffer.

@artem-lunarg
Copy link
Contributor

artem-lunarg commented Oct 16, 2024

It doesn't look like this was about indirect buffers only though

The error message in this issue is exactly what did not work (indirect command that accesses index or vertex buffer which are suballocated) but I might miss something.

[ SYNC-HAZARD-WRITE-AFTER-READ ] Validation Error: [ SYNC-HAZARD-WRITE-AFTER-READ ] Object 0: handle = 0x5557db026030, type = VK_OBJECT_TYPE_COMMAND_BUFFER; Object 1: handle = 0x200000000020, type = VK_OBJECT_TYPE_BUFFER; | MessageID = 0x376bc9df | vkCmdCopyImageToBuffer(): Hazard WRITE_AFTER_READ for dstBuffer VkBuffer 0x200000000020[], region 0. Access info (usage: SYNC_COPY_TRANSFER_WRITE, prior_usage: SYNC_INDEX_INPUT_INDEX_READ, read_barriers: VK_PIPELINE_STAGE_2_BOTTOM_OF_PIPE_BIT, command: vkCmdDrawIndexedIndirect, seq_no: 8, reset_no: 3, resource: VkBuffer 0x200000000020[]).

@ShabbyX I'm not sure where I need to look for the errors in the above link, could you help to locate it?

@artem-lunarg
Copy link
Contributor

artem-lunarg commented Oct 16, 2024

I'm going to use angle directly for testing to be sure. Will be looking into this tomorrow, hopefully we will fix this.

@artem-lunarg artem-lunarg reopened this Oct 16, 2024
@artem-lunarg
Copy link
Contributor

artem-lunarg commented Oct 16, 2024

@ShabbyX could you confirm that out/LinuxDebug/angle_end2end_tests --gtest_filter="LineLoopIndirectTest.UShortIndexIndirectBuffer/ES3_1_Vulkan" --verbose --local-output command line still produces the errors reported by this issue when the corresponding section in vk_renderer.cpp is uncommented?

Also should I apply any patches to the latest angle code? My impression that the first patch (https://chromium-review.googlesource.com/c/angle/angle/+/5832709/4) is already merged but not the second (https://chromium-review.googlesource.com/c/angle/angle/+/5832710/10).

@ShabbyX
Copy link
Contributor

ShabbyX commented Oct 19, 2024

That test is fixed, I can confirm. Unsuppressing related VVL errors in https://chromium-review.googlesource.com/c/angle/angle/+/5937422, there are other failures, not related to indirect buffers. You don't need to cherry-pick anything, it has nothing to do with https://chromium-review.googlesource.com/c/angle/angle/+/5832710 and the issue is in fact much older than that.

Try taking that first change (unsuppresses VVL errors) and run this test: --gtest_filter=BufferDataTestES3.CopyBufferSubDataDraw/ES3_Vulkan. I get this:

vk_renderer.cpp:777 (DebugUtilsMessenger): [ SYNC-HAZARD-READ-AFTER-WRITE ] Validation Error: [ SYNC-HAZARD-READ-AFTER-WRITE ] Object 0: handle = 0x240000000024, type = VK_OBJECT_TYPE_BUFFER; | MessageID = 0xe4d96472 | vkCmdDrawIndexed():  Hazard READ_AFTER_WRITE for vertex VkBuffer 0x240000000024[] in VkCommandBuffer 0x55df9d7ffb90[]. Access info (usage: SYNC_VERTEX_ATTRIBUTE_INPUT_VERTEX_ATTRIBUTE_READ, prior_usage: SYNC_COPY_TRANSFER_WRITE, write_barriers: 0, command: vkCmdCopyBuffer, seq_no: 2, reset_no: 7, resource: VkBuffer 0x240000000024[]).
                            Object: 0x240000000024 (type = Buffer(9))

@ShabbyX
Copy link
Contributor

ShabbyX commented Oct 19, 2024

Ooooh you know what, I think actually I know where this is coming from. vkCmdBindVertexBuffers doesn't take the size of the buffer, only the offset. So syncval has no way to know how much of the buffer is actually used as vertex input (well, with some gymnastics it might, at least when geometry/tessellation/indirect are not involved).

If you're confident that syncval is taking buffer ranges into account, the VVL issue probably goes away for us if we use vkCmdBindVertexBuffers2 and pass in the sizes.

I remember now we had the same problem with robustness, where we couldn't rely on robustness with vertex data in conjunction with suballocation because the driver couldn't tell where the end of the data is.

Feel free to re-close for now.

@artem-lunarg
Copy link
Contributor

I can reproduce error with --gtest_filter=BufferDataTestES3.CopyBufferSubDataDraw/ES3_Vulkan on my machine it's about index buffer:

[ SYNC-HAZARD-READ-AFTER-WRITE ] Validation Error: [ SYNC-HAZARD-READ-AFTER-WRITE ] Object 0: handle = 0x280000000028, type = VK_OBJECT_TYPE_BUFFER; | MessageID = 0xe4d96472 | vkCmdDrawIndexed(): Hazard READ_AFTER_WRITE for vertex VkBuffer 0x280000000028[] in VkCommandBuffer 0x2309d9c33a0[]. Access info (usage: SYNC_VERTEX_ATTRIBUTE_INPUT_VERTEX_ATTRIBUTE_READ, prior_usage: SYNC_COPY_TRANSFER_WRITE, write_barriers: 0, command: vkCmdCopyBuffer, seq_no: 2, reset_no: 9, resource: VkBuffer 0x280000000028[]).

I can see a potential problem here (and the same with vertex attributes access). The error says data was written by a copy command and then read by the vertex pipeline but sync validation did not detect a barrier that ensures writes can be used by the following read access (write_barriers: 0). Could it be a missing/incomplete barrier in angle? If you think everything is okay then I can debug sync validation to confirm false positive and if that's the case then it should be a separate issue.

@ShabbyX
Copy link
Contributor

ShabbyX commented Oct 22, 2024

No like I said I'm pretty sure this is just a matter of vkCmdBindVertexBuffers implicitly assuming the vertex data is read from the offset to the end of the buffer (because there's no size). So in a way it's a false positive, but it's Vulkan's fault (vkCmdBindVertexBuffers2 fixes it).

You can close this, when we get a chance to use vkCmdBindVertexBuffers2 and give the size, we can check if this continues to be reported.

@artem-lunarg
Copy link
Contributor

So in a way it's a false positive

We are trying to follow the direction now that syncval does not report false positives. Ether to disable specific validation or to make it optional (disabled by default). Something I need to look at.

@artem-lunarg
Copy link
Contributor

artem-lunarg commented Oct 22, 2024

No like I said I'm pretty sure this is just a matter of vkCmdBindVertexBuffers implicitly assuming the vertex data is read from the offset to the end of the buffer

Makes sense, my brain was somewhere else

@ShabbyX
Copy link
Contributor

ShabbyX commented Oct 22, 2024

We are trying to follow the direction now that syncval does not report false positives. Ether to disable specific validation or to make it optional (disabled by default). Something I need to look at.

Sounds good. I guess an easy way to still have some validation for vertex attributes but also remove false positives is to implicitly assume size == 1 when vkCmdBindVertexBuffers is used (or vkCmdBindVertexBuffers2 with pSizes == nullptr).

You won't catch bugs with overlapping areas used for different purposes, but that should be very rare. You would correctly catch syncval errors for the usual usage of a buffer range being used as one entity (because one byte represents the entire thing)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Synchronization Synchronization Validation Object Issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants