Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hashtable: Redesign the hashtable API #12290

Merged
merged 1 commit into from
Feb 15, 2025

Conversation

icculus
Copy link
Collaborator

@icculus icculus commented Feb 13, 2025

So I thought "I'll just move src/SDL_hashtable.h to include/SDL3 and ship this" and hooboy was that foolish of me.

This makes the hash table API public, after several significant interface changes:

  • all hashtables have a built-in rwlock now. It was previously optional, but that's going to lead to disaster.
  • the ability to make "stackable" tables is removed. Apparently this still worked with the current implementation, but I could see a future implementation struggle mightily to support this. It'll be better for something external to build on top of the table if it needs it, inserting a linked list of stacked items as the hash values and managing them separately.
  • You no longer specify "buckets" when creating a table. This meant less once we moved to an Open Addressing implementation anyhow, since the bucket count isn't static now (and they aren't really "buckets" anymore either), so now you can just report how many items you think the hash will hold and SDL will allocate a reasonable default for you...or "-1" to not guess, and SDL will start small and grow as necessary, which is often the correct thing to do.
  • There's no more SDL_IterateHashTableKey because there's no more "stackable" hash tables.
  • SDL_IterateHashTable() now uses a callback, which matches other parts of SDL, and also lets us hold the read-lock for the entire iteration and get rid of the goofy iterator state variable.
  • SDL_InsertIntoHashTable() now lets you specify whether to replace existing keys or fail if the key already exists.
  • Callbacks now use SDL conventions (userdata as the first param).
  • Other naming convention fixes.

I discovered we use a lot of hash tables in SDL3 internally. :) So the bulk of this PR is fixing up that code to use the new interfaces, and simplifying things (like checking for an item to remove it if it already exists before inserting a replacement...just do the insert atomically, it'll do all that for you!).

@icculus icculus added this to the 3.2.6 milestone Feb 13, 2025
@icculus icculus self-assigned this Feb 13, 2025
@icculus icculus force-pushed the sdl3-hashtable-public-api branch from 1b7eeef to c5362da Compare February 13, 2025 21:35
@icculus icculus force-pushed the sdl3-hashtable-public-api branch 2 times, most recently from 99df814 to 9def54a Compare February 14, 2025 04:12
Copy link
Collaborator

@flibitijibibo flibitijibibo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to be sure: The point of exposing the string/pointer/id callbacks is so that clients can stack a custom hash table implementation on top of the most common key types?

Either way, I wonder if it'd make sense to have something at the top like CreateStringHashTable/CreatePointerHashTable/CreateIDHashTable to avoid all the extra copypasting on the client side and also make those hash types more appealing to use over something custom-made.

extern SDL_DECLSPEC void SDLCALL SDL_DestroyHashKeyAndValue(void *unused, const void *key, const void *value);

/**
* Free just the value pointer of a hash table item.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small copypaste thing:

Suggested change
* Free just the value pointer of a hash table item.
* Free just the key pointer of a hash table item.

@Akaricchi
Copy link
Contributor

Couple of thoughts:

  • I don't think having locks built into the hashtable is a good idea. Many uses of hashtables don't require them, so they end up as useless and measurable overhead. For uses that do require them, it's sometimes required to synchronize some external operations with hashtable access anyway. For example you might need to 'atomically' create and insert some resource into the hashtable if it doesn't already exist, and otherwise just return it. To accomplish this you would either need an external lock, or to add a cursed GetOrCreateViaCallbackAndInsert API function that would hold the write lock for the duration of this whole operation. Having had implemented the later option before, I think I strongly prefer the hashtable-using client code to handle synchronization — it probably already has to in other places anyway. Case in point: the new TODO in SDL_getenv.
  • I'm ok with "stackable" being gone, though personally I don't think it's a big burden to support. Disclaimer: I wrote the current implementation and it happens to support this pretty trivially. Overusing it can lead to some bad clustering/long PSLs (similarly to using a bad hash function), though it's still more efficient than a linked list.
  • An "expected size" hint instead of "bucket count" seems fine.
  • I personally prefer iterators over callbacks, but it's not a big deal.
  • Ultimately, the biggest problem with this API (and it was the case with the old private API as well) is that the hashtable is not templated. You are forced to use pointer-sized keys and values, and if you want anything more than that, you must malloc, which destroys performance. It's especially annoying because we still have to care about platforms with 32-bit pointers. It's often possible to pack your data neatly into 64 bits, but 32 is often not enough. render/gpu/SDL_pipeline_gpu.c is an example of this: GPU_PipelineCacheKeyConverter (renamed from GPU_PipelineCacheKey) could really be used as a key without allocations, if only we could assume void* is at least 64 bits, or if we could specify a custom key size. IIRC the GPU Vulkan implementation has similar silliness going on, because Vulkan object handles are always at least 64-bit (even on 32-bit platforms). You could remedy this partially by defining the key type as something like this:
    typedef union SDL_HashTableKey {
        uint64_t as_uint64;
        void *as_pointer;
    } SDL_HashTableKey;
    One downsize of this is that the API becomes more awkward to use, forcing the user to manually convert their data into SDL_HashTableKeys. But I still find that preferable over mallocing 64-bit integers.

@icculus icculus changed the title hashtable: Redesign the hashtable API, make it public. hashtable: Redesign the hashtable API Feb 15, 2025
@icculus
Copy link
Collaborator Author

icculus commented Feb 15, 2025

So the reason to put this in the public API was to fix something in sdl2-compat, and Sam solved that in a different way earlier today, so this has become less urgent.

I've moved this back to a private interface, which is to say we can continue to experiment and tweak it however we like without a lot of drama.

I've also re-added the "threadsafe" option, which is pretty trivial. If true, it handles read/write locking for you, if false, it's basically a function call that does if (!ptr) { return; }, so there's not a lot of concern about bulky code here, and none about lock contention overhead. But also, yes, there's tons of places that already had their own locking on top of this and couldn't replace it with the hashtable's lock.

"stackable" was removed in case that became a problem the next time a better hashtable algorithm pops up and the public API had to deal with that...but internally we don't actually use it either, so I say let's leave it out for now. We can readd it later. It was in there originally because this was the hash table I was using the shader compiler, and it was nice to be able to push identifiers in new scopes into the table and remove them when the scope ended...useful, but pretty special-case!

Iterators vs callbacks: this was changed because SDL's public API enumeration things are all callbacks. I think the pre-callback iterator thing was cleaner for internal use but totally gross for a public API, plus it let us simply hold a read lock during iteration. But internally, we can go either way on it.

As far as templating...we could do that with C preprocessor magic, but that gets nasty really fast. I don't doubt it would be faster, but I'd want to know we have a legit hotspot before we do it.

@icculus
Copy link
Collaborator Author

icculus commented Feb 15, 2025

So this still has one place where we hit an assertion in process cleanup, because we're recursively locking a rwlock through all the SDL_Properties destructor callbacks. I have to figure out what I changed to cause that, but after that, we can call this PR done (and by "done" at this point I mean "this is an improvement over what's currently there but we could definitely go further later.").

@slouken
Copy link
Collaborator

slouken commented Feb 15, 2025

Great!

I just fixed the window surface issue with wesnoth, so hopefully we'll be in good shape regarding surfaces and cleanup in sdl2-compat.

@icculus icculus force-pushed the sdl3-hashtable-public-api branch from a200927 to 64f00a2 Compare February 15, 2025 15:42
This was intended to make the API public, so SDL_hashtable.h got an extreme
documentation makeover, but for now this remains a private header.

This makes several significant interface changes to SDL_HashTable, and
improves code that makes use of it in various ways.

- The ability to make "stackable" tables is removed. Apparently this still
  worked with the current implementation, but I could see a future
  implementation struggle mightily to support this. It'll be better for
  something external to build on top of the table if it needs it, inserting a
  linked list of stacked items as the hash values and managing them separately.
  There was only one place in SDL using this, unnecessarily, and that has also
  been cleaned up to not need it.
- You no longer specify "buckets" when creating a table, but rather an
  estimated number of items the table is meant to hold. The bucket count was
  crucial to our classic hashtable implementation, but meant less once we
  moved to an Open Addressing implementation anyhow, since the bucket count
  isn't static (and they aren't really "buckets" anymore either). Now you
  can just report how many items you think the hash will hold and SDL will
  allocate a reasonable default for you...or 0 to not guess, and SDL will
  start small and grow as necessary, which is often the correct thing to do.
- There's no more SDL_IterateHashTableKey because there's no more "stackable"
  hash tables.
- SDL_IterateHashTable() now uses a callback, which matches other parts of SDL,
  and also lets us hold the read-lock for the entire iteration and get rid of
  the goofy iterator state variable.
- SDL_InsertIntoHashTable() now lets you specify whether to replace existing
  keys or fail if the key already exists.
- Callbacks now use SDL conventions (userdata as the first param).
- Other naming convention fixes.

I discovered we use a lot of hash tables in SDL3 internally. :) So the bulk
of this work is fixing up that code to use the new interfaces, and simplifying
things (like checking for an item to remove it if it already exists before
inserting a replacement...just do the insert atomically, it'll do all that
for you!).
@icculus icculus force-pushed the sdl3-hashtable-public-api branch from 64f00a2 to 071678b Compare February 15, 2025 17:40
@icculus
Copy link
Collaborator Author

icculus commented Feb 15, 2025

All issues are resolved, this can merge now!

@icculus icculus merged commit 84a236c into libsdl-org:main Feb 15, 2025
41 checks passed
@icculus icculus deleted the sdl3-hashtable-public-api branch February 15, 2025 23:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants