hashtable: Redesign the hashtable API #12290

icculus · 2025-02-13T21:32:09Z

So I thought "I'll just move src/SDL_hashtable.h to include/SDL3 and ship this" and hooboy was that foolish of me.

This makes the hash table API public, after several significant interface changes:

all hashtables have a built-in rwlock now. It was previously optional, but that's going to lead to disaster.
the ability to make "stackable" tables is removed. Apparently this still worked with the current implementation, but I could see a future implementation struggle mightily to support this. It'll be better for something external to build on top of the table if it needs it, inserting a linked list of stacked items as the hash values and managing them separately.
You no longer specify "buckets" when creating a table. This meant less once we moved to an Open Addressing implementation anyhow, since the bucket count isn't static now (and they aren't really "buckets" anymore either), so now you can just report how many items you think the hash will hold and SDL will allocate a reasonable default for you...or "-1" to not guess, and SDL will start small and grow as necessary, which is often the correct thing to do.
There's no more SDL_IterateHashTableKey because there's no more "stackable" hash tables.
SDL_IterateHashTable() now uses a callback, which matches other parts of SDL, and also lets us hold the read-lock for the entire iteration and get rid of the goofy iterator state variable.
SDL_InsertIntoHashTable() now lets you specify whether to replace existing keys or fail if the key already exists.
Callbacks now use SDL conventions (userdata as the first param).
Other naming convention fixes.

I discovered we use a lot of hash tables in SDL3 internally. :) So the bulk of this PR is fixing up that code to use the new interfaces, and simplifying things (like checking for an item to remove it if it already exists before inserting a replacement...just do the insert atomically, it'll do all that for you!).

include/SDL3/SDL_hashtable.h

flibitijibibo

Just to be sure: The point of exposing the string/pointer/id callbacks is so that clients can stack a custom hash table implementation on top of the most common key types?

Either way, I wonder if it'd make sense to have something at the top like CreateStringHashTable/CreatePointerHashTable/CreateIDHashTable to avoid all the extra copypasting on the client side and also make those hash types more appealing to use over something custom-made.

flibitijibibo · 2025-02-14T07:11:33Z

include/SDL3/SDL_hashtable.h

+extern SDL_DECLSPEC void SDLCALL SDL_DestroyHashKeyAndValue(void *unused, const void *key, const void *value);
+
+/**
+ * Free just the value pointer of a hash table item.


Small copypaste thing:

Suggested change

* Free just the value pointer of a hash table item.

* Free just the key pointer of a hash table item.

Akaricchi · 2025-02-14T20:17:07Z

Couple of thoughts:

I don't think having locks built into the hashtable is a good idea. Many uses of hashtables don't require them, so they end up as useless and measurable overhead. For uses that do require them, it's sometimes required to synchronize some external operations with hashtable access anyway. For example you might need to 'atomically' create and insert some resource into the hashtable if it doesn't already exist, and otherwise just return it. To accomplish this you would either need an external lock, or to add a cursed GetOrCreateViaCallbackAndInsert API function that would hold the write lock for the duration of this whole operation. Having had implemented the later option before, I think I strongly prefer the hashtable-using client code to handle synchronization — it probably already has to in other places anyway. Case in point: the new TODO in SDL_getenv.
I'm ok with "stackable" being gone, though personally I don't think it's a big burden to support. Disclaimer: I wrote the current implementation and it happens to support this pretty trivially. Overusing it can lead to some bad clustering/long PSLs (similarly to using a bad hash function), though it's still more efficient than a linked list.
An "expected size" hint instead of "bucket count" seems fine.
I personally prefer iterators over callbacks, but it's not a big deal.
Ultimately, the biggest problem with this API (and it was the case with the old private API as well) is that the hashtable is not templated. You are forced to use pointer-sized keys and values, and if you want anything more than that, you must malloc, which destroys performance. It's especially annoying because we still have to care about platforms with 32-bit pointers. It's often possible to pack your data neatly into 64 bits, but 32 is often not enough. render/gpu/SDL_pipeline_gpu.c is an example of this: GPU_PipelineCacheKeyConverter (renamed from GPU_PipelineCacheKey) could really be used as a key without allocations, if only we could assume void* is at least 64 bits, or if we could specify a custom key size. IIRC the GPU Vulkan implementation has similar silliness going on, because Vulkan object handles are always at least 64-bit (even on 32-bit platforms). You could remedy this partially by defining the key type as something like this:
```
typedef union SDL_HashTableKey {
    uint64_t as_uint64;
    void *as_pointer;
} SDL_HashTableKey;
```
One downsize of this is that the API becomes more awkward to use, forcing the user to manually convert their data into SDL_HashTableKeys. But I still find that preferable over mallocing 64-bit integers.

icculus · 2025-02-15T00:29:09Z

So the reason to put this in the public API was to fix something in sdl2-compat, and Sam solved that in a different way earlier today, so this has become less urgent.

I've moved this back to a private interface, which is to say we can continue to experiment and tweak it however we like without a lot of drama.

I've also re-added the "threadsafe" option, which is pretty trivial. If true, it handles read/write locking for you, if false, it's basically a function call that does if (!ptr) { return; }, so there's not a lot of concern about bulky code here, and none about lock contention overhead. But also, yes, there's tons of places that already had their own locking on top of this and couldn't replace it with the hashtable's lock.

"stackable" was removed in case that became a problem the next time a better hashtable algorithm pops up and the public API had to deal with that...but internally we don't actually use it either, so I say let's leave it out for now. We can readd it later. It was in there originally because this was the hash table I was using the shader compiler, and it was nice to be able to push identifiers in new scopes into the table and remove them when the scope ended...useful, but pretty special-case!

Iterators vs callbacks: this was changed because SDL's public API enumeration things are all callbacks. I think the pre-callback iterator thing was cleaner for internal use but totally gross for a public API, plus it let us simply hold a read lock during iteration. But internally, we can go either way on it.

As far as templating...we could do that with C preprocessor magic, but that gets nasty really fast. I don't doubt it would be faster, but I'd want to know we have a legit hotspot before we do it.

icculus · 2025-02-15T00:31:47Z

So this still has one place where we hit an assertion in process cleanup, because we're recursively locking a rwlock through all the SDL_Properties destructor callbacks. I have to figure out what I changed to cause that, but after that, we can call this PR done (and by "done" at this point I mean "this is an improvement over what's currently there but we could definitely go further later.").

slouken · 2025-02-15T00:41:05Z

Great!

I just fixed the window surface issue with wesnoth, so hopefully we'll be in good shape regarding surfaces and cleanup in sdl2-compat.

This was intended to make the API public, so SDL_hashtable.h got an extreme documentation makeover, but for now this remains a private header. This makes several significant interface changes to SDL_HashTable, and improves code that makes use of it in various ways. - The ability to make "stackable" tables is removed. Apparently this still worked with the current implementation, but I could see a future implementation struggle mightily to support this. It'll be better for something external to build on top of the table if it needs it, inserting a linked list of stacked items as the hash values and managing them separately. There was only one place in SDL using this, unnecessarily, and that has also been cleaned up to not need it. - You no longer specify "buckets" when creating a table, but rather an estimated number of items the table is meant to hold. The bucket count was crucial to our classic hashtable implementation, but meant less once we moved to an Open Addressing implementation anyhow, since the bucket count isn't static (and they aren't really "buckets" anymore either). Now you can just report how many items you think the hash will hold and SDL will allocate a reasonable default for you...or 0 to not guess, and SDL will start small and grow as necessary, which is often the correct thing to do. - There's no more SDL_IterateHashTableKey because there's no more "stackable" hash tables. - SDL_IterateHashTable() now uses a callback, which matches other parts of SDL, and also lets us hold the read-lock for the entire iteration and get rid of the goofy iterator state variable. - SDL_InsertIntoHashTable() now lets you specify whether to replace existing keys or fail if the key already exists. - Callbacks now use SDL conventions (userdata as the first param). - Other naming convention fixes. I discovered we use a lot of hash tables in SDL3 internally. :) So the bulk of this work is fixing up that code to use the new interfaces, and simplifying things (like checking for an item to remove it if it already exists before inserting a replacement...just do the insert atomically, it'll do all that for you!).

icculus · 2025-02-15T17:58:11Z

All issues are resolved, this can merge now!

icculus added this to the 3.2.6 milestone Feb 13, 2025

icculus self-assigned this Feb 13, 2025

icculus force-pushed the sdl3-hashtable-public-api branch from 1b7eeef to c5362da Compare February 13, 2025 21:35

slouken reviewed Feb 13, 2025

View reviewed changes

include/SDL3/SDL_hashtable.h Outdated Show resolved Hide resolved

slouken reviewed Feb 13, 2025

View reviewed changes

include/SDL3/SDL_hashtable.h Outdated Show resolved Hide resolved

icculus force-pushed the sdl3-hashtable-public-api branch 2 times, most recently from 99df814 to 9def54a Compare February 14, 2025 04:12

flibitijibibo reviewed Feb 14, 2025

View reviewed changes

icculus changed the title ~~hashtable: Redesign the hashtable API, make it public.~~ hashtable: Redesign the hashtable API Feb 15, 2025

icculus force-pushed the sdl3-hashtable-public-api branch from a200927 to 64f00a2 Compare February 15, 2025 15:42

icculus force-pushed the sdl3-hashtable-public-api branch from 64f00a2 to 071678b Compare February 15, 2025 17:40

slouken approved these changes Feb 15, 2025

View reviewed changes

icculus merged commit 84a236c into libsdl-org:main Feb 15, 2025
41 checks passed

icculus deleted the sdl3-hashtable-public-api branch February 15, 2025 23:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hashtable: Redesign the hashtable API #12290

hashtable: Redesign the hashtable API #12290

icculus commented Feb 13, 2025

flibitijibibo left a comment

flibitijibibo Feb 14, 2025

Akaricchi commented Feb 14, 2025

icculus commented Feb 15, 2025

icculus commented Feb 15, 2025

slouken commented Feb 15, 2025

icculus commented Feb 15, 2025

	* Free just the value pointer of a hash table item.
	* Free just the key pointer of a hash table item.

hashtable: Redesign the hashtable API #12290

hashtable: Redesign the hashtable API #12290

Conversation

icculus commented Feb 13, 2025

flibitijibibo left a comment

Choose a reason for hiding this comment

flibitijibibo Feb 14, 2025

Choose a reason for hiding this comment

Akaricchi commented Feb 14, 2025

icculus commented Feb 15, 2025

icculus commented Feb 15, 2025

slouken commented Feb 15, 2025

icculus commented Feb 15, 2025