Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for external allocators: #748

Merged
merged 15 commits into from
Sep 16, 2023

Conversation

toomuchvoltage
Copy link
Contributor

@toomuchvoltage toomuchvoltage commented Aug 8, 2023

Support for external allocators:

  • The newly introduced API surface area matches that of VMA's advanced usage and another hand-rolled memory allocator in a content-heavy application.
  • All suballocator callbacks -- allocate, bind image, bind buffer, map, unmap and free -- are expected to guard VkDeviceMemory operations within a mutex.
  • Each texture now also keeps track of its VkDeviceMemory offset. Potential sparse bindings support removed this.
  • The 64 bit allocationId is to be used as a book-keeping measure by external suballocator callbacks to keep track of and free up suballocations. The external allocator can use a hashtable (ala std::unordered_map in C++) to keep track of the page(s) alloted to this suballocation. ('Pages' here refers to potential sparse bindings).

@CLAassistant
Copy link

CLAassistant commented Aug 8, 2023

CLA assistant check
All committers have signed the CLA.

@toomuchvoltage
Copy link
Contributor Author

toomuchvoltage commented Aug 8, 2023

I cannot tell you how much better everything works now in my engine.

Memory fragmentation is waaaay down and it is harder to run out of memory with texture stream-in/out events.

If you need me to clean it up and de-duplicate code: with pleasure. But I cannot stress how much more efficient this is as opposed to constant VkDeviceMemory creation/deletions. Not to mention, this should also save you from the 4K limit on Windows drivers for live VkDeviceMemory objects.

Additionally, the fragmentation this was causing when coupled with too much content GPU-side (i.e. due to too many large textures filling the VRAM close to capacity) would result in DEVICE_LOST events as opposed to an OOM error. The driver would reset. My YouTube videos would stop playing.

Now I properly get a VK_ERROR_OUT_OF_DEVICE_MEMORY which can be gracefully handled. I develop on a 1050Ti btw which is my minspec.

@MarkCallow
Copy link
Collaborator

MarkCallow commented Aug 10, 2023

Thank you for this. Does it resolve the remaining parts of #567?

Please de-duplicate the code. I suggest calling the new function from the existing VkUploadEx passing vkAllocateMemory and vkFreeMemory as the functions to use. Please make typedefs for the various function pointer types passed to the new functions.

I need code to test this. At a minimum please add a new sample to vkloadtests or modify an existing sample to add a command line option to tell it to use a sub-allocator. Add a line to call the new test to the list at the end of VulkanLoadTests.cpp.

@MarkCallow
Copy link
Collaborator

Ping @toomuchvoltage.

@toomuchvoltage
Copy link
Contributor Author

Ping @toomuchvoltage.

Hi @MarkCallow , apologies for the delay, have been very busy. I will get back to you shortly with some updates.

@toomuchvoltage toomuchvoltage force-pushed the external-allocator branch 3 times, most recently from 170ab4d to 398d489 Compare August 24, 2023 18:00
@toomuchvoltage
Copy link
Contributor Author

toomuchvoltage commented Aug 24, 2023

Hi @MarkCallow , so the most recent force-push basically addresses all of #567 . While considering your suggestion I still realized that a new API surface area may be necessary: the reason for this was that the number of function pointers increased to six. Alloc and free are not enough: bind image, bind buffer, map and unmap also need to be guarded with the same mutex as alloc and free as otherwise multi-threaded Vulkan rendering will be very unhappy. This is a consideration that VMA's advanced usage also considers. I can still go back and modify UploadEx's signature, but it would balloon up by a lot and if one is satisfied not relying on a suballocator, they would have to pass 6 new NULL pointers as parameters. (Just updated the UploadEx signature to accept a pointer to a single struct containing all callback pointers).

Speaking of VMA, it exposes these more granular calls for advanced usage, all of which perform mutually exclusive VkDeviceMemory operations:

  • vmaAllocateMemory() or vmaAllocateMemoryPages()
  • vmaBindBufferMemory()
  • vmaBindImageMemory()
  • vmaMapMemory()
  • vmaUnmapMemory()
  • vmaFreeMemory() or vmaFreeMemoryPages()

The six new callbacks provided to an UploadEx_WithPotentialSuballocator() call are supposed to provide callbacks that wrap around these. The allocation callback should also return a uint64_t that references the page procurement(s) rather than VmaAllocation(s). Such a 64 bit number could be generated by a mersenne twister PRNG and be a key into a hashmap (a.l.a std::unordered_map) that references the suballocations as mentioned previously.

An example of how this is done for a hand-rolled suballocator is provided here: https://github.com/toomuchvoltage/HighOmega-public/blob/sauray_vkquake2/HighOmega/src/gl.cpp#L288-L337 . All supplementary memory management code is contained at the top of that module.
Thread-safe allocation is demo'd here: https://github.com/toomuchvoltage/HighOmega-public/blob/sauray_vkquake2/HighOmega/src/gl.cpp#L3487-L3495
with the corresponding deallocation shown here: https://github.com/toomuchvoltage/HighOmega-public/blob/sauray_vkquake2/HighOmega/src/gl.cpp#L3164

I will gladly proceed to add a test, but just make sure: would it be OK for the test to have a dependency on VMA? There is no better way to genuinely test without making that explicit functional dependency.

Also @MarkCallow please let me know, if the specifications provided are in line with our formatting for doxygen.

@toomuchvoltage
Copy link
Contributor Author

toomuchvoltage commented Aug 25, 2023

Hi @MarkCallow , I just made another improvement to reduce the API's surface area. All suballocator callbacks are now packed into a single struct. An optional pointer to that struct can be passed reducing UploadEx suballocator parameters down to 1.

Updated thread-safe allocation usage: https://github.com/toomuchvoltage/HighOmega-public/blob/sauray_vkquake2/HighOmega/src/gl.cpp#L3494-L3503
Updated thread-safe free usage: https://github.com/toomuchvoltage/HighOmega-public/blob/sauray_vkquake2/HighOmega/src/gl.cpp#L3164-L3171

@toomuchvoltage
Copy link
Contributor Author

Actually, @MarkCallow let me know if the signature changes to UploadEx are ok. If so, I will proceed with updating the existing (broken) tests alongside adding new tests.

@MarkCallow
Copy link
Collaborator

I like having the sub-allocation functions in a structure and the typedefs.

The signature changes to ktxTexture_VkUploadEx and ktxVulkanTexture_Destruct are not okay as they can break existing applications, as you've seen with the existing tests. The functions with the suballoc struct pointer parameter will have to have new names. These new functions can be called by the functions with the existing signatures.

The new tests having a dependency on VMA is fine provided that only those tests have the dependency and the rest of the KTX-Software project can be built without it. How big is VMA? Is it small enough to include in the KTX-Software repo or should we require people to download it?

@toomuchvoltage
Copy link
Contributor Author

Hi @MarkCallow , just made sure that the old function signatures remain as is. Would appreciate a quick check to ensure that my specifications are also in line with our doxygen rules. I also managed to make yet another improvement as well: I ensured that the interface can take in (hopefully thread-safe) suballocator callbacks that have the potential to do sparse bindings.

Resultingly, the suballocator callbacks no longer return a VkDeviceMemory along with an offset back as there may be multiple of these pages per a single procurement. Of course, UploadEx's memcpy code can right now only handle a single allocation (i.e. non-sparse) -- and hence the assert(numPages == 1) in the code -- but this leaves the room for future enhancements of UploadEx without interface breaking changes in the event that sparse bindings' uploads are implemented.

I also made accesses to the suballocator directory in my examples thread-safe as well. Here are my handrolled thread-safe suballocator callbacks that have potential sparse bindings support: https://github.com/toomuchvoltage/HighOmega-public/blob/sauray_vkquake2/HighOmega/src/gl.cpp#L288-L411
Allocation usage is provided here: https://github.com/toomuchvoltage/HighOmega-public/blob/sauray_vkquake2/HighOmega/src/gl.cpp#L3568-L3577
Deallocation usage is provided here: https://github.com/toomuchvoltage/HighOmega-public/blob/sauray_vkquake2/HighOmega/src/gl.cpp#L3238-L3245

@toomuchvoltage
Copy link
Contributor Author

toomuchvoltage commented Aug 26, 2023

Oh I forgot to mention, VMA is a single header library... but it is written in C++ while providing a C interface as well. So if the test has to be written in C, a library -- possibly a library containing thread-safe callbacks just like mine -- has to be written around it to be included in the test sample. Seems like C++ tests are ok, I'll provide one soon.

Copy link
Collaborator

@MarkCallow MarkCallow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great overall. I spotted some documentation things I'd like fixed and have a couple of questions about error handling.

You need to fix the code warnings that are causing CI builds to fail. (CI builds are done with warnings as errors set.)

include/ktxvulkan.h Outdated Show resolved Hide resolved
include/ktxvulkan.h Outdated Show resolved Hide resolved
include/ktxvulkan.h Outdated Show resolved Hide resolved
lib/vkloader.c Outdated Show resolved Hide resolved
lib/vkloader.c Outdated Show resolved Hide resolved
lib/vkloader.c Outdated Show resolved Hide resolved
lib/vkloader.c Show resolved Hide resolved
lib/vkloader.c Outdated Show resolved Hide resolved
lib/vkloader.c Show resolved Hide resolved
lib/vkloader.c Outdated Show resolved Hide resolved
@toomuchvoltage
Copy link
Contributor Author

Dear @MarkCallow , I think I pretty much addressed everything. Regarding checking the completeness of suballocator callbacks in Destruct: I basically return a KTX error code if they're incomplete but this does not bubble up to the regular Destroy as that returns nothing. Such an interface is reasonable as generally free/destroy calls return void in Vulkan (i.e. vkFreeDescriptorSets or vkFreeCommandBuffers). Should we make an exception for silent failures in this case?

If everything looks good at this stage, let me know and I will provide the VMA tests shortly.

Much appreciated!

lib/vkloader.c Outdated Show resolved Hide resolved
lib/vkloader.c Outdated Show resolved Hide resolved
lib/vkloader.c Outdated Show resolved Hide resolved
lib/vkloader.c Outdated Show resolved Hide resolved
@MarkCallow
Copy link
Collaborator

The failure in some of the Windows builds is due to a clang update, with a new warning, in the GitHub CI Windows runners. I plan to fix the code later today. After the fix you will need to rebase on main.

@toomuchvoltage
Copy link
Contributor Author

The failure in some of the Windows builds is due to a clang update, with a new warning, in the GitHub CI Windows runners. I plan to fix the code later today. After the fix you will need to rebase on main.

@MarkCallow Sounds good. Just pushed changes addressing the most recent feedbacks. If all looks good, I'll get the tests ready. Also would appreciate a ping here once your changes on main are complete! 🙏

@MarkCallow
Copy link
Collaborator

Just pushed changes addressing the most recent feedbacks. If all looks good, I'll get the tests ready. Also would appreciate a ping here once your changes on main are complete!

All looks good. I'll ping you when the build issue is fixed. I'm suffering from a very very slow computer I have to use for testing fixes to the issue.

@MarkCallow
Copy link
Collaborator

The fix for clang 16 is now in main.

@toomuchvoltage
Copy link
Contributor Author

The fix for clang 16 is now in main.

Thanks @MarkCallow . I will post some updates soon. I'm tied up a bit on my end at the moment.

@toomuchvoltage
Copy link
Contributor Author

Hi @MarkCallow , I just added a new test under vkloadtests. Could you please take a look at it to see if everything looks good? If so, I'll proceed with the rebase. Here's a screenshot of it running:

image

@toomuchvoltage
Copy link
Contributor Author

toomuchvoltage commented Sep 9, 2023

A couple of notes:

  • The contents of the folders in Debug or Release had to be dumped in both folders for vkloadtests to run:
    image
  • Had to explicitly use KTX_FEATURE_LOADTEST_APPS=Vulkan instead of KTX_FEATURE_LOADTEST_APPS=ON. Otherwise, the vkloadtests VC project is not constructed.

@toomuchvoltage
Copy link
Contributor Author

toomuchvoltage commented Sep 11, 2023

Hi @MarkCallow , all done! These functions signatures are general... and the current implementations apply to everything except sparse bindings. However, please note that I had to move #include <ktxvulkan.h> from Texture.h to VulkanLoadTestSample.h as ktxVulkanTexture_subAllocatorCallbacks becomes necessary there. Could this have been the intention from the beginning btw?

I need to amend: using the callbacks more than once (i.e. for different textures and so on), wouldn't really increase coverage. VMA will just keep getting a VkDeviceMemory with an offset. Another test could be added later that could test sparse bindings (once we add support)... but that would have very explicit scenarios set up. For example:

  • Allocating 3 small textures and 1 large texture with vma's Pages call.
  • De-allocating the first and the 3rd textures.
  • Allocating a final texture the size of the first and 3rd combined with the Pages call.
  • Ensuring that two pages are allocated at the final allocation and that the texture displays.

Copy link
Collaborator

@MarkCallow MarkCallow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Name change is fine.

Thinking to also move the useSubAllocator bool to VulkanLoadTestSample but ideally then VulkanLoadTestSample would handle parsing the command line for the matching option. IIRC there is no hierarchical processing of the command line. So we'll leave this idea for later.

Please look into the build errors on Windows. They look like something to do with the way you are using VMA.

@toomuchvoltage
Copy link
Contributor Author

toomuchvoltage commented Sep 14, 2023

Name change is fine.

Thinking to also move the useSubAllocator bool to VulkanLoadTestSample but ideally then VulkanLoadTestSample would handle parsing the command line for the matching option. IIRC there is no hierarchical processing of the command line. So we'll leave this idea for later.

Please look into the build errors on Windows. They look like something to do with the way you are using VMA.

Hi @MarkCallow , I need your help on this one. I've been incapable of replicating it on my side. My usage is perfectly in line with what VMA wants: just 1 include in a CPP file. It seems like it's a bunch of warnings about VMA itself? I thought dropping the file in other_includes prevents warnings... but not from VulkanLoadTestSample? I could use a second look and some hints/pointers. Deeply appreciated! 🙏

@MarkCallow
Copy link
Collaborator

Possibly msvc has no -systemi equivalent that CMake can use. Not sure. But just in case try

#if defined(_MSC_VER)
  #pragma warning(push)
  #pragma warning(disable: 4100)
  #pragma warning(disable: 4234)
#endif

and

#if defined(_MSC_VER)
  #pragma warning(pop)
#endif

around the include of VMA.

@toomuchvoltage
Copy link
Contributor Author

Hi @MarkCallow , can you give this another spin? Hopefully it's fixed.

@toomuchvoltage
Copy link
Contributor Author

Dear @MarkCallow , I also attempted to address some warnings highlighted here in the last commit. If you could review and let me know whether the changes actually address these, it would be greatly appreciated! 🙏

image

Copy link
Collaborator

@MarkCallow MarkCallow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please revert the 2 changes indicated.

lib/vkloader.c Outdated Show resolved Hide resolved
lib/vkloader.c Outdated Show resolved Hide resolved
@MarkCallow
Copy link
Collaborator

MarkCallow commented Sep 16, 2023

You need to also suppress warning 4189 which I missed in the blizzard of 4100 warnings in the CI log and I transposed digits in one of the other warnings. It should be 4324 not 4234. Sorry about that.

As far as I can see, you have fixed fix the undocumented ktxVulkanTexture_subAllocatorCallbacks warning by providing documentation. The other Doxygen warnings (missing reference targets) are new with the latest version of Doxygen. I'm looking into them now. I won't hold up this PR for fixes. Ignore them.

@toomuchvoltage
Copy link
Contributor Author

You need to also suppress warning 4189 which I missed in the blizzard of 4100 warnings in the CI log and I transposed digits in one of the other warnings. It should be 4324 not 4234. Sorry about that.

As far as I can see, you have fixed fix the undocumented ktxVulkanTexture_subAllocatorCallbacks warning by providing documentation. The other Doxygen warnings (missing reference targets) are new with the latest version of Doxygen. I'm looking into them now. I won't hold up this PR for fixes. Ignore them.

All done! I guess time for another go... 🙏

Copy link
Collaborator

@MarkCallow MarkCallow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I'll try the build again.

@MarkCallow MarkCallow merged commit 6856fdb into KhronosGroup:main Sep 16, 2023
14 checks passed
@MarkCallow
Copy link
Collaborator

Thank you very much for your hard work on this.

@MarkCallow
Copy link
Collaborator

MarkCallow commented Sep 16, 2023

Did you run the --use-vma test under the Vulkan debugvalidation layers? I am getting warnings and an error. I'm running under MoltenVK on an M2 MacBook. It says that it is attempting to map memory that was allocated without the VK_MEMORY_PROPERTY_HOST_VISIBLE bit set. Then an error telling me memory mapping failed because GPU-only memory can't be mapped then crash. Please investigate.

vkloadtests --debugvkloadtests --validate will activate the debugvalidation layers when the test starts.

@toomuchvoltage
Copy link
Contributor Author

Hi @MarkCallow , I've identified and fixed the issue. It had to do with my memory property flag detection being tied to a specific vendor (nVidia). I've fixed the issue in my most recent remote branch. You can perhaps rip out the last commit and merge once more if that works. If not, I can open a new PR.

@MarkCallow
Copy link
Collaborator

Please open a new PR. Sorry about the bum steer on the vkloadtests option to use to enable validation.

@toomuchvoltage
Copy link
Contributor Author

Please open a new PR. Sorry about the bum steer on the vkloadtests option to use to enable validation.

No, my bad. That was a really good find. I switched cards to my Radeon VII and the test crashed.

KaperD pushed a commit to KaperD/KTX-Software that referenced this pull request Feb 21, 2024
* The newly introduced API surface area matches that of VMA's advanced
   usage and another hand-rolled memory allocator in a content-heavy application.
* All suballocator callbacks -- allocate, bind image, bind buffer, map, unmap and
  free -- are expected to guard VkDeviceMemory operations within a mutex.
* Each texture now also keeps track of its VkDeviceMemory offset.
* The 64 bit allocationId is to be used as a book-keeping measure by external
   suballocator callbacks to keep track of and free up suballocations. The external
   allocator can use a hashtable (ala std::unordered_map in C++) to keep track of
   the page(s) alloted to this suballocation. ('Pages' here refers to potential sparse
   bindings).

* Add a VkLoadTest for suballocation callbacks
KaperD pushed a commit to KaperD/KTX-Software that referenced this pull request Feb 22, 2024
* The newly introduced API surface area matches that of VMA's advanced
   usage and another hand-rolled memory allocator in a content-heavy application.
* All suballocator callbacks -- allocate, bind image, bind buffer, map, unmap and
  free -- are expected to guard VkDeviceMemory operations within a mutex.
* Each texture now also keeps track of its VkDeviceMemory offset.
* The 64 bit allocationId is to be used as a book-keeping measure by external
   suballocator callbacks to keep track of and free up suballocations. The external
   allocator can use a hashtable (ala std::unordered_map in C++) to keep track of
   the page(s) alloted to this suballocation. ('Pages' here refers to potential sparse
   bindings).

* Add a VkLoadTest for suballocation callbacks
KaperD pushed a commit to KaperD/KTX-Software that referenced this pull request Feb 22, 2024
* The newly introduced API surface area matches that of VMA's advanced
   usage and another hand-rolled memory allocator in a content-heavy application.
* All suballocator callbacks -- allocate, bind image, bind buffer, map, unmap and
  free -- are expected to guard VkDeviceMemory operations within a mutex.
* Each texture now also keeps track of its VkDeviceMemory offset.
* The 64 bit allocationId is to be used as a book-keeping measure by external
   suballocator callbacks to keep track of and free up suballocations. The external
   allocator can use a hashtable (ala std::unordered_map in C++) to keep track of
   the page(s) alloted to this suballocation. ('Pages' here refers to potential sparse
   bindings).

* Add a VkLoadTest for suballocation callbacks
KaperD pushed a commit to KaperD/KTX-Software that referenced this pull request Feb 22, 2024
* The newly introduced API surface area matches that of VMA's advanced
   usage and another hand-rolled memory allocator in a content-heavy application.
* All suballocator callbacks -- allocate, bind image, bind buffer, map, unmap and
  free -- are expected to guard VkDeviceMemory operations within a mutex.
* Each texture now also keeps track of its VkDeviceMemory offset.
* The 64 bit allocationId is to be used as a book-keeping measure by external
   suballocator callbacks to keep track of and free up suballocations. The external
   allocator can use a hashtable (ala std::unordered_map in C++) to keep track of
   the page(s) alloted to this suballocation. ('Pages' here refers to potential sparse
   bindings).

* Add a VkLoadTest for suballocation callbacks
KaperD pushed a commit to KaperD/KTX-Software that referenced this pull request Feb 22, 2024
* The newly introduced API surface area matches that of VMA's advanced
   usage and another hand-rolled memory allocator in a content-heavy application.
* All suballocator callbacks -- allocate, bind image, bind buffer, map, unmap and
  free -- are expected to guard VkDeviceMemory operations within a mutex.
* Each texture now also keeps track of its VkDeviceMemory offset.
* The 64 bit allocationId is to be used as a book-keeping measure by external
   suballocator callbacks to keep track of and free up suballocations. The external
   allocator can use a hashtable (ala std::unordered_map in C++) to keep track of
   the page(s) alloted to this suballocation. ('Pages' here refers to potential sparse
   bindings).

* Add a VkLoadTest for suballocation callbacks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants