-
-
Notifications
You must be signed in to change notification settings - Fork 21.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vulkan: Fix incorrect access to the buffers on Android #84852
Vulkan: Fix incorrect access to the buffers on Android #84852
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
Then let's just request COHERENT memory. I would add an optional parameter to _buffer_allocate() called |
86f530c
to
761f48d
Compare
now: Tests:
|
I'd suggest simplifyng this PR by making And, as someone else has pointed out, On desktop, you don't need to ask for
My guess is that there wouldn't be a performance loss on platforms that already add the proper bit unconditionally. And on platforms that don't, we have to add it explicitly anyway. Therefore, let's always ask for coherent memory when it comes to data transfers via mapped memory. |
761f48d
to
4ba3c49
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This approach looks good to me. Let's merge this early in the 4.3 release cycle and cherrypick to 4.2.1
the new changes have an impact on more functions
new Test project test.buffer.zip MacOs and Android no problems. Screen-2023-11-14-152844.mp4 |
#include "thirdparty/vulkan/vk_mem_alloc.h" @akien-mga Can this change be left in?
|
hmm, I don't think we should disregard the
https://developer.arm.com/documentation/101897/0301/CPU-overheads/Vulkan-CPU-memory-mapping However, the documentation also mentions that manually invaliding/flushing the cache can be expensive:
I wonder if the Also, I think, the |
I think you mixed that up. The document you linked recommends using |
Yes, this memory type is recommended if supported by the hardware:
This random android device for example doesn't support |
But it does support The documentation only recommends In my mind |
As sakrel said:
I noticed the code flushes WHOLE_SIZE instead of flushing only the necessary ranges (Note to self: WAIT A MINUTE! VMA does not do buffer suballocation!?!?). Also make sure validation layers are disabled, because validation layers will make sure non-coherent memory returns garbage to more easily identify bugs. |
@darksylinc What is the benefit of using |
It's more as to how the CPU operates and how we are doing the read. When we do (in pseudo x86 asm but applies to arm as well): loop:
mov eax, [non_cached_address]
mov [cached_address], eax
add non_cached_address, 4
add cached_address, 4
goto loop If "non_cached_address" is actually cached, the CPU reads 64 bytes (a cache line) into L1, and probably prefetches much more into L2. So the next 16 iterations (we're reading 4 bytes at a time) read by the mov will be super fast because it's already in L1. On the 17th iteration it will be slightly slower but it still be very fast because it fetches the next 64 bytes from L2. And so on. If "non_cached_address" is uncached, every 4 bytes is a trip to RAM. I don't know if Out of Order Execution and speculative execution can apply here (if they do, then it can hide several of the next 4 bytes reads). The CPU will have to wait for every read operation to arrive. Now memcpy is a tricky one, because the insides of it are intrinsic driven based on things like CPU being used on, and size of the transfer in bytes. But usually memcpy in x86 ends up using either This heavily reduces the impact of a lack of cache because it's bulk-reading from RAM. So to summarize: I suspect that the impact of If Godot had been designed differently, the user would be reading directly from the mapped memory. This would save a lot of bandwidth (because we save a memcpy), but since we don't control how the user going to read that memory, then using |
4ba3c49
to
2667e2f
Compare
+ During my tests, I did not notice any improvements or deterioration. |
Don't you need Taking the opportunity to ask, is |
I'd suggest considering the cached flag out of the scope of this PR. This patch is about correctness. We can consider the cached thing a potential optimization belonging to a separate discussion, involving the broader topic of how Godot deals with mapped memory. |
2667e2f
to
04a142c
Compare
without |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me again.
HOST_CACHED
might be useful in the future if we are holding the map to memory on the host side. But right now we just memcpy the memory and move on. HOST_VISIBLE | HOST_COHERENT | HOST_CACHED
is not always available, so it definitely shouldn't be a required memory format. In the future we can add it is a preferred option, if we move away from doing a full memcpy
Thanks! Great work tracking down this critical issue! |
Cherry-picked for 4.2.1. |
Fixes #78715
Fixes #75599
Fixes #80371
Fixes #84355
Error description: #78715 (comment) / #78715 (comment)
old tests
Godot 4.2 beta5 78715 and 75599
"Mesh Array_Index" is not read correctly (it sometimes contains 'attrib_data' or 'vertex_data'). The app crashes when "soft_body_3d" is inserted.
crash.Design.ohne.Titel.mp4
this PR 78715 and 75599
"Mesh Array_Index" is always read correctly and there are no crashes when soft_body_3d is inserted.
Screen_recording_20231113_174153.mp4
80371 Godot 4.1.3
cubes.4.1.3-Screen_recording_20231113_16101.mp4
80371 Godot 4.2 (beta.custom, this PR)
cubes.custom.Screen_recording_20231113_161639.mp4