Support for external allocators: #748

toomuchvoltage · 2023-08-08T22:57:16Z

Support for external allocators:

The newly introduced API surface area matches that of VMA's advanced usage and another hand-rolled memory allocator in a content-heavy application.
All suballocator callbacks -- allocate, bind image, bind buffer, map, unmap and free -- are expected to guard VkDeviceMemory operations within a mutex.
~~Each texture now also keeps track of its VkDeviceMemory offset.~~ Potential sparse bindings support removed this.
The 64 bit allocationId is to be used as a book-keeping measure by external suballocator callbacks to keep track of and free up suballocations. The external allocator can use a hashtable (ala std::unordered_map in C++) to keep track of the page(s) alloted to this suballocation. ('Pages' here refers to potential sparse bindings).

CLAassistant · 2023-08-08T22:57:20Z

All committers have signed the CLA.

toomuchvoltage · 2023-08-08T23:05:02Z

I cannot tell you how much better everything works now in my engine.

Memory fragmentation is waaaay down and it is harder to run out of memory with texture stream-in/out events.

If you need me to clean it up and de-duplicate code: with pleasure. But I cannot stress how much more efficient this is as opposed to constant VkDeviceMemory creation/deletions. Not to mention, this should also save you from the 4K limit on Windows drivers for live VkDeviceMemory objects.

Additionally, the fragmentation this was causing when coupled with too much content GPU-side (i.e. due to too many large textures filling the VRAM close to capacity) would result in DEVICE_LOST events as opposed to an OOM error. The driver would reset. My YouTube videos would stop playing.

Now I properly get a VK_ERROR_OUT_OF_DEVICE_MEMORY which can be gracefully handled. I develop on a 1050Ti btw which is my minspec.

MarkCallow · 2023-08-10T18:23:38Z

Thank you for this. Does it resolve the remaining parts of #567?

Please de-duplicate the code. I suggest calling the new function from the existing VkUploadEx passing vkAllocateMemory and vkFreeMemory as the functions to use. Please make typedefs for the various function pointer types passed to the new functions.

I need code to test this. At a minimum please add a new sample to vkloadtests or modify an existing sample to add a command line option to tell it to use a sub-allocator. Add a line to call the new test to the list at the end of VulkanLoadTests.cpp.

MarkCallow · 2023-08-22T07:43:18Z

Ping @toomuchvoltage.

toomuchvoltage · 2023-08-23T04:24:04Z

Ping @toomuchvoltage.

Hi @MarkCallow , apologies for the delay, have been very busy. I will get back to you shortly with some updates.

toomuchvoltage · 2023-08-24T19:31:38Z

Hi @MarkCallow , so the most recent force-push basically addresses all of #567 . While considering your suggestion I still realized that a new API surface area may be necessary: the reason for this was that the number of function pointers increased to six. Alloc and free are not enough: bind image, bind buffer, map and unmap also need to be guarded with the same mutex as alloc and free as otherwise multi-threaded Vulkan rendering will be very unhappy. This is a consideration that VMA's advanced usage also considers. I can still go back and modify UploadEx's signature, but it would balloon up by a lot and if one is satisfied not relying on a suballocator, they would have to pass 6 new NULL pointers as parameters. (Just updated the UploadEx signature to accept a pointer to a single struct containing all callback pointers).

Speaking of VMA, it exposes these more granular calls for advanced usage, all of which perform mutually exclusive VkDeviceMemory operations:

vmaAllocateMemory() or vmaAllocateMemoryPages()
vmaBindBufferMemory()
vmaBindImageMemory()
vmaMapMemory()
vmaUnmapMemory()
vmaFreeMemory() or vmaFreeMemoryPages()

~~The six new callbacks provided to an UploadEx_WithPotentialSuballocator() call are supposed to provide callbacks that wrap around these.~~ The allocation callback should also return a uint64_t that references the page procurement(s) rather than VmaAllocation(s). Such a 64 bit number could be generated by a mersenne twister PRNG and be a key into a hashmap (a.l.a std::unordered_map) that references the suballocations as mentioned previously.

An example of how this is done for a hand-rolled suballocator is provided here: https://github.com/toomuchvoltage/HighOmega-public/blob/sauray_vkquake2/HighOmega/src/gl.cpp#L288-L337 . All supplementary memory management code is contained at the top of that module.
Thread-safe allocation is demo'd here: https://github.com/toomuchvoltage/HighOmega-public/blob/sauray_vkquake2/HighOmega/src/gl.cpp#L3487-L3495
with the corresponding deallocation shown here: https://github.com/toomuchvoltage/HighOmega-public/blob/sauray_vkquake2/HighOmega/src/gl.cpp#L3164

I will gladly proceed to add a test, but just make sure: would it be OK for the test to have a dependency on VMA? There is no better way to genuinely test without making that explicit functional dependency.

Also @MarkCallow please let me know, if the specifications provided are in line with our formatting for doxygen.

toomuchvoltage · 2023-08-25T00:52:11Z

Hi @MarkCallow , I just made another improvement to reduce the API's surface area. All suballocator callbacks are now packed into a single struct. An optional pointer to that struct can be passed reducing UploadEx suballocator parameters down to 1.

Updated thread-safe allocation usage: https://github.com/toomuchvoltage/HighOmega-public/blob/sauray_vkquake2/HighOmega/src/gl.cpp#L3494-L3503
Updated thread-safe free usage: https://github.com/toomuchvoltage/HighOmega-public/blob/sauray_vkquake2/HighOmega/src/gl.cpp#L3164-L3171

toomuchvoltage · 2023-08-25T01:33:32Z

Actually, @MarkCallow let me know if the signature changes to UploadEx are ok. If so, I will proceed with updating the existing (broken) tests alongside adding new tests.

MarkCallow · 2023-08-25T09:26:19Z

I like having the sub-allocation functions in a structure and the typedefs.

The signature changes to ktxTexture_VkUploadEx and ktxVulkanTexture_Destruct are not okay as they can break existing applications, as you've seen with the existing tests. The functions with the suballoc struct pointer parameter will have to have new names. These new functions can be called by the functions with the existing signatures.

The new tests having a dependency on VMA is fine provided that only those tests have the dependency and the rest of the KTX-Software project can be built without it. How big is VMA? Is it small enough to include in the KTX-Software repo or should we require people to download it?

toomuchvoltage · 2023-08-26T03:52:26Z

Hi @MarkCallow , just made sure that the old function signatures remain as is. Would appreciate a quick check to ensure that my specifications are also in line with our doxygen rules. I also managed to make yet another improvement as well: I ensured that the interface can take in (hopefully thread-safe) suballocator callbacks that have the potential to do sparse bindings.

Resultingly, the suballocator callbacks no longer return a VkDeviceMemory along with an offset back as there may be multiple of these pages per a single procurement. Of course, UploadEx's memcpy code can right now only handle a single allocation (i.e. non-sparse) -- and hence the assert(numPages == 1) in the code -- but this leaves the room for future enhancements of UploadEx without interface breaking changes in the event that sparse bindings' uploads are implemented.

I also made accesses to the suballocator directory in my examples thread-safe as well. Here are my handrolled thread-safe suballocator callbacks that have potential sparse bindings support: https://github.com/toomuchvoltage/HighOmega-public/blob/sauray_vkquake2/HighOmega/src/gl.cpp#L288-L411
Allocation usage is provided here: https://github.com/toomuchvoltage/HighOmega-public/blob/sauray_vkquake2/HighOmega/src/gl.cpp#L3568-L3577
Deallocation usage is provided here: https://github.com/toomuchvoltage/HighOmega-public/blob/sauray_vkquake2/HighOmega/src/gl.cpp#L3238-L3245

toomuchvoltage · 2023-08-26T04:11:47Z

Oh I forgot to mention, VMA is a single header library... but it is written in C++ while providing a C interface as well. ~~So if the test has to be written in C, a library -- possibly a library containing thread-safe callbacks just like mine -- has to be written around it to be included in the test sample.~~ Seems like C++ tests are ok, I'll provide one soon.

MarkCallow

Looks great overall. I spotted some documentation things I'd like fixed and have a couple of questions about error handling.

You need to fix the code warnings that are causing CI builds to fail. (CI builds are done with warnings as errors set.)

include/ktxvulkan.h

lib/vkloader.c

toomuchvoltage · 2023-09-04T20:34:06Z

Dear @MarkCallow , I think I pretty much addressed everything. Regarding checking the completeness of suballocator callbacks in Destruct: I basically return a KTX error code if they're incomplete but this does not bubble up to the regular Destroy as that returns nothing. Such an interface is reasonable as generally free/destroy calls return void in Vulkan (i.e. vkFreeDescriptorSets or vkFreeCommandBuffers). Should we make an exception for silent failures in this case?

If everything looks good at this stage, let me know and I will provide the VMA tests shortly.

Much appreciated!

lib/vkloader.c

MarkCallow · 2023-09-05T03:01:28Z

The failure in some of the Windows builds is due to a clang update, with a new warning, in the GitHub CI Windows runners. I plan to fix the code later today. After the fix you will need to rebase on main.

toomuchvoltage · 2023-09-05T06:37:39Z

The failure in some of the Windows builds is due to a clang update, with a new warning, in the GitHub CI Windows runners. I plan to fix the code later today. After the fix you will need to rebase on main.

@MarkCallow Sounds good. Just pushed changes addressing the most recent feedbacks. If all looks good, I'll get the tests ready. Also would appreciate a ping here once your changes on main are complete! 🙏

MarkCallow · 2023-09-05T08:06:39Z

Just pushed changes addressing the most recent feedbacks. If all looks good, I'll get the tests ready. Also would appreciate a ping here once your changes on main are complete!

All looks good. I'll ping you when the build issue is fixed. I'm suffering from a very very slow computer I have to use for testing fixes to the issue.

MarkCallow · 2023-09-07T00:15:15Z

The fix for clang 16 is now in main.

toomuchvoltage · 2023-09-07T01:33:34Z

The fix for clang 16 is now in main.

Thanks @MarkCallow . I will post some updates soon. I'm tied up a bit on my end at the moment.

toomuchvoltage · 2023-09-09T07:22:31Z

Hi @MarkCallow , I just added a new test under vkloadtests. Could you please take a look at it to see if everything looks good? If so, I'll proceed with the rebase. Here's a screenshot of it running:

toomuchvoltage · 2023-09-09T07:48:22Z

A couple of notes:

The contents of the folders in Debug or Release had to be dumped in both folders for vkloadtests to run:
Had to explicitly use KTX_FEATURE_LOADTEST_APPS=Vulkan instead of KTX_FEATURE_LOADTEST_APPS=ON. Otherwise, the vkloadtests VC project is not constructed.

toomuchvoltage · 2023-09-11T16:18:03Z

Hi @MarkCallow , all done! These functions signatures are general... and the current implementations apply to everything except sparse bindings. However, please note that I had to move #include <ktxvulkan.h> from Texture.h to VulkanLoadTestSample.h as ktxVulkanTexture_subAllocatorCallbacks becomes necessary there. Could this have been the intention from the beginning btw?

I need to amend: using the callbacks more than once (i.e. for different textures and so on), wouldn't really increase coverage. VMA will just keep getting a VkDeviceMemory with an offset. Another test could be added later that could test sparse bindings (once we add support)... but that would have very explicit scenarios set up. For example:

Allocating 3 small textures and 1 large texture with vma's Pages call.
De-allocating the first and the 3rd textures.
Allocating a final texture the size of the first and 3rd combined with the Pages call.
Ensuring that two pages are allocated at the final allocation and that the texture displays.

tests/loadtests/vkloadtests/VulkanLoadTestSample.h

MarkCallow

Name change is fine.

Thinking to also move the useSubAllocator bool to VulkanLoadTestSample but ideally then VulkanLoadTestSample would handle parsing the command line for the matching option. IIRC there is no hierarchical processing of the command line. So we'll leave this idea for later.

Please look into the build errors on Windows. They look like something to do with the way you are using VMA.

toomuchvoltage · 2023-09-14T23:59:18Z

Name change is fine.

Thinking to also move the useSubAllocator bool to VulkanLoadTestSample but ideally then VulkanLoadTestSample would handle parsing the command line for the matching option. IIRC there is no hierarchical processing of the command line. So we'll leave this idea for later.

Please look into the build errors on Windows. They look like something to do with the way you are using VMA.

Hi @MarkCallow , I need your help on this one. I've been incapable of replicating it on my side. My usage is perfectly in line with what VMA wants: just 1 include in a CPP file. It seems like it's a bunch of warnings about VMA itself? I thought dropping the file in other_includes prevents warnings... but not from VulkanLoadTestSample? I could use a second look and some hints/pointers. Deeply appreciated! 🙏

MarkCallow · 2023-09-15T00:48:19Z

Possibly msvc has no -systemi equivalent that CMake can use. Not sure. But just in case try

#if defined(_MSC_VER)
  #pragma warning(push)
  #pragma warning(disable: 4100)
  #pragma warning(disable: 4234)
#endif

and

#if defined(_MSC_VER)
  #pragma warning(pop)
#endif

around the include of VMA.

toomuchvoltage · 2023-09-15T19:56:09Z

Hi @MarkCallow , can you give this another spin? Hopefully it's fixed.

toomuchvoltage · 2023-09-16T00:11:29Z

Dear @MarkCallow , I also attempted to address some warnings highlighted here in the last commit. If you could review and let me know whether the changes actually address these, it would be greatly appreciated! 🙏

MarkCallow

Please revert the 2 changes indicated.

lib/vkloader.c

MarkCallow · 2023-09-16T03:21:31Z

You need to also suppress warning 4189 which I missed in the blizzard of 4100 warnings in the CI log and I transposed digits in one of the other warnings. It should be 4324 not 4234. Sorry about that.

As far as I can see, you have fixed fix the undocumented ktxVulkanTexture_subAllocatorCallbacks warning by providing documentation. The other Doxygen warnings (missing reference targets) are new with the latest version of Doxygen. I'm looking into them now. I won't hold up this PR for fixes. Ignore them.

toomuchvoltage · 2023-09-16T05:31:55Z

You need to also suppress warning 4189 which I missed in the blizzard of 4100 warnings in the CI log and I transposed digits in one of the other warnings. It should be 4324 not 4234. Sorry about that.

As far as I can see, you have fixed fix the undocumented ktxVulkanTexture_subAllocatorCallbacks warning by providing documentation. The other Doxygen warnings (missing reference targets) are new with the latest version of Doxygen. I'm looking into them now. I won't hold up this PR for fixes. Ignore them.

All done! I guess time for another go... 🙏

MarkCallow

Looks good. I'll try the build again.

MarkCallow · 2023-09-16T08:04:20Z

Thank you very much for your hard work on this.

MarkCallow · 2023-09-16T12:35:01Z

Did you run the --use-vma test under the Vulkan ~~debug~~validation layers? I am getting warnings and an error. I'm running under MoltenVK on an M2 MacBook. It says that it is attempting to map memory that was allocated without the VK_MEMORY_PROPERTY_HOST_VISIBLE bit set. Then an error telling me memory mapping failed because GPU-only memory can't be mapped then crash. Please investigate.

~~vkloadtests --debug~~vkloadtests --validate will activate the ~~debug~~validation layers when the test starts.

toomuchvoltage · 2023-09-16T17:56:43Z

Hi @MarkCallow , I've identified and fixed the issue. It had to do with my memory property flag detection being tied to a specific vendor (nVidia). I've fixed the issue in my most recent remote branch. You can perhaps rip out the last commit and merge once more if that works. If not, I can open a new PR.

MarkCallow · 2023-09-17T00:31:48Z

Please open a new PR. Sorry about the bum steer on the vkloadtests option to use to enable validation.

toomuchvoltage · 2023-09-17T02:03:49Z

Please open a new PR. Sorry about the bum steer on the vkloadtests option to use to enable validation.

No, my bad. That was a really good find. I switched cards to my Radeon VII and the test crashed.

* The newly introduced API surface area matches that of VMA's advanced usage and another hand-rolled memory allocator in a content-heavy application. * All suballocator callbacks -- allocate, bind image, bind buffer, map, unmap and free -- are expected to guard VkDeviceMemory operations within a mutex. * Each texture now also keeps track of its VkDeviceMemory offset. * The 64 bit allocationId is to be used as a book-keeping measure by external suballocator callbacks to keep track of and free up suballocations. The external allocator can use a hashtable (ala std::unordered_map in C++) to keep track of the page(s) alloted to this suballocation. ('Pages' here refers to potential sparse bindings). * Add a VkLoadTest for suballocation callbacks

toomuchvoltage force-pushed the external-allocator branch 3 times, most recently from 170ab4d to 398d489 Compare August 24, 2023 18:00

toomuchvoltage force-pushed the external-allocator branch from 8d8519d to 66a2f6c Compare August 26, 2023 03:31

MarkCallow requested changes Sep 3, 2023

View reviewed changes

MarkCallow reviewed Sep 5, 2023

View reviewed changes

lib/vkloader.c Outdated Show resolved Hide resolved

MarkCallow reviewed Sep 5, 2023

View reviewed changes

lib/vkloader.c Outdated Show resolved Hide resolved

MarkCallow reviewed Sep 5, 2023

View reviewed changes

lib/vkloader.c Outdated Show resolved Hide resolved

MarkCallow reviewed Sep 5, 2023

View reviewed changes

lib/vkloader.c Outdated Show resolved Hide resolved

MarkCallow approved these changes Sep 5, 2023

View reviewed changes

toomuchvoltage force-pushed the external-allocator branch from 01ab174 to e509fc9 Compare September 9, 2023 07:27

MarkCallow reviewed Sep 12, 2023

View reviewed changes

tests/loadtests/vkloadtests/VulkanLoadTestSample.h Outdated Show resolved Hide resolved

Better naming for the callback struct

1c5f8dc

toomuchvoltage requested a review from MarkCallow September 13, 2023 04:56

MarkCallow reviewed Sep 14, 2023

View reviewed changes

Potential fix for the MSVC 2019 compile issue

91fb25e

toomuchvoltage force-pushed the external-allocator branch from 04783ad to 74e7118 Compare September 16, 2023 00:13

MarkCallow requested changes Sep 16, 2023

View reviewed changes

lib/vkloader.c Outdated Show resolved Hide resolved

lib/vkloader.c Outdated Show resolved Hide resolved

Fixing documentation

5eaa1e1

toomuchvoltage force-pushed the external-allocator branch from 74e7118 to 5eaa1e1 Compare September 16, 2023 05:28

toomuchvoltage requested a review from MarkCallow September 16, 2023 05:32

MarkCallow approved these changes Sep 16, 2023

View reviewed changes

MarkCallow merged commit 6856fdb into KhronosGroup:main Sep 16, 2023
13 checks passed

MarkCallow mentioned this pull request Sep 16, 2023

How to combine with VulkanMemoryAllocator #567

Closed

Support for external allocators: #748

Support for external allocators: #748

Conversation

toomuchvoltage commented Aug 8, 2023 • edited Loading

CLAassistant commented Aug 8, 2023 • edited Loading

toomuchvoltage commented Aug 8, 2023 • edited Loading

MarkCallow commented Aug 10, 2023 • edited Loading

MarkCallow commented Aug 22, 2023

toomuchvoltage commented Aug 23, 2023

toomuchvoltage commented Aug 24, 2023 • edited Loading

toomuchvoltage commented Aug 25, 2023 • edited Loading

toomuchvoltage commented Aug 25, 2023

MarkCallow commented Aug 25, 2023

toomuchvoltage commented Aug 26, 2023

toomuchvoltage commented Aug 26, 2023 • edited Loading

MarkCallow left a comment

Choose a reason for hiding this comment

toomuchvoltage commented Sep 4, 2023

MarkCallow commented Sep 5, 2023

toomuchvoltage commented Sep 5, 2023

MarkCallow commented Sep 5, 2023

MarkCallow commented Sep 7, 2023

toomuchvoltage commented Sep 7, 2023

toomuchvoltage commented Sep 9, 2023

toomuchvoltage commented Sep 9, 2023 • edited Loading

toomuchvoltage commented Sep 11, 2023 • edited Loading

MarkCallow left a comment

Choose a reason for hiding this comment

toomuchvoltage commented Sep 14, 2023 • edited Loading

MarkCallow commented Sep 15, 2023

toomuchvoltage commented Sep 15, 2023

toomuchvoltage commented Sep 16, 2023

MarkCallow left a comment

Choose a reason for hiding this comment

MarkCallow commented Sep 16, 2023 • edited Loading

toomuchvoltage commented Sep 16, 2023

MarkCallow left a comment

Choose a reason for hiding this comment

MarkCallow commented Sep 16, 2023

MarkCallow commented Sep 16, 2023 • edited Loading

toomuchvoltage commented Sep 16, 2023

MarkCallow commented Sep 17, 2023

toomuchvoltage commented Sep 17, 2023

toomuchvoltage commented Aug 8, 2023 •

edited

Loading

CLAassistant commented Aug 8, 2023 •

edited

Loading

toomuchvoltage commented Aug 8, 2023 •

edited

Loading

MarkCallow commented Aug 10, 2023 •

edited

Loading

toomuchvoltage commented Aug 24, 2023 •

edited

Loading

toomuchvoltage commented Aug 25, 2023 •

edited

Loading

toomuchvoltage commented Aug 26, 2023 •

edited

Loading

toomuchvoltage commented Sep 9, 2023 •

edited

Loading

toomuchvoltage commented Sep 11, 2023 •

edited

Loading

toomuchvoltage commented Sep 14, 2023 •

edited

Loading

MarkCallow commented Sep 16, 2023 •

edited

Loading

MarkCallow commented Sep 16, 2023 •

edited

Loading