Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove boost thread_specific_ptr #4221

Merged
merged 16 commits into from
Apr 14, 2024

Conversation

fpsunflower
Copy link
Contributor

Description

Remove the dependency on boost's thread_specific_ptr by using hashmap's of static duration between the owning object and the target datatype.

After digging a bit into how boost's pointer works behind the scenes, it is actually a very similar mechanism, with the lookup being into an std::map. I don't have an easy way to measure performance here, but none of these are on a hot path (with the possible exception of the per_thread_info for the image cache, but this one can be managed externally if you want the best performance.

Tests

Going to rely on the CI to run the testsuite here.

Checklist:

  • I have read the contribution guidelines.
  • I have updated the documentation, if applicable.
  • I have ensured that the change is tested somewhere in the testsuite
    (adding new test cases if necessary).
  • If I added or modified a C++ API call, I have also amended the
    corresponding Python bindings (and if altering ImageBufAlgo functions, also
    exposed the new functionality as oiiotool options).
  • My code follows the prevailing code style of this project. If I haven't
    already run clang-format before submitting, I definitely will look at the CI
    test that runs clang-format and fix anything that it highlights as being
    nonconforming.

… reduce the dependency on the boost thread library

Signed-off-by: Chris Kulla <ckulla@gmail.com>
…d inside ImageCacheImpl with a simpler idiom using a thread_local map

Signed-off-by: Chris Kulla <ckulla@gmail.com>
… are no longer used

Signed-off-by: Chris Kulla <ckulla@gmail.com>
@fpsunflower
Copy link
Contributor Author

I split this up into three commits to help with the review.

There is a bit of a pattern with the error messages that could maybe be unified - but I opted to stay as close to the original code as possible for now.

I only tested that this works properly on my local machine - so will be monitoring the CI to ensure its working properly everywhere else.

Copy link
Collaborator

@lgritz lgritz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like this!

I made a comment about ImageOutput, but I think it applies across the board. Are you not missing a provision in the object destructors to remove that object's doodad from the per-thread map?

errptr = new std::string;
m_impl->m_errormessage.reset(errptr);
}
std::string& err_str = error_messages[this];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So there is a per-thread map of ImageOutput -> string. This line adds a slot in this thread's map for this ImageOutput. When the thread terminates, I assume the whole map (for that thread) will delete. But as long as the thread is alive, the strings just sit there even when the ImageOutput they correspond to is long gone, no? Don't we want something in ImageOutput::Impl::~Impl that does an erase for its slot in the map?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is the destructor only runs on one thread. So it can't reach into other thread's storage without additional bookkeeping.

It probably is solveable, though I worry about introducing complicated threaded bookkeeping just to release the memory a few error messages.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oooooh, yes of course.

OK, so for the error things, it might pseudo-leak a few strings -- and only if there are actual errors. So in practice, NBD.

And for ImageCache, the "object" is the cache itself, of which there is generally exactly one, so we don't really worry about a succession of IC's being created from scratch, then being destroyed without freeing their per-thread cruft.

Sounds ok to me, then.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Chris, I still think that maybe the object destructors should have a

if (threadlocal.contains(key)) threadlocal.erase(key);

It's true that it will only free the thing if the same thread that's destroying the ImageOutput (say) is the same one that called error(). But that's a very common case! So we could still leak a few strings, but usually it will cause us to fully clean up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can definitely add this to all the destructors. It shouldn't cost much in the common case and keeps the map for the main thread small(er).

I also just realized that technically, these error strings have the same bug I fixed in the image cache. It's much less likely to cause issues, but the bug would be:

  1. create an ImageInput - gets allocated at address 0x1234 (for example)
  2. do some operations (potentially across threads) which lead to errors, but you don't check or clear them
  3. close the ImageInput
  4. create a new ImageInput which happens to get allocated at 0x1234 again
  5. now all threads still have the error message from the first one which never got cleared :(

The way I solved it for the per thread infos was to make the map use a strictly incrementing counter instead of their address for the lookup (in practice most apps only ever create a single image cache so its not a big diference). I think I should do the same for the error messages (even though its asymptomatic in the testsuite, the potentialy for confusion if it ever happens would be quite high). The only drawback is an extra 64 bit ID per ImageInput,ImageOutput.

I'll ponder this a bit more to see if I think of any other clever way to solve this ....

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

haha, I think our messages crossed, I had typed mine a couple hours ago, but only hit the button now without seeing you commented on the same thing.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In short, I don't think it's worth doing anything elaborate. Let's consider it a user error to destroy an ImageInput that you used, accumulated errors, but never retrieved the error messages. But if you make the destructor clear the error for that thread (which will usually be the only thread), I think the stars will rarely align to give a problem, even when the user makes that mistake.

Signed-off-by: Chris Kulla <ckulla@gmail.com>
if (clear)
errptr->clear();
iter.value().clear();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this was input_error_messages.erase(iter), instead of clear(), then any time an error was retrieved/clear by the right thread, the entry in the map would fully be cleaned up, instead of "leaking", even though the thing it leaks is usually an empty string.

…This could lead to the same pointer being re-used and getting stale data from the previous cache's per thread info structs. Instead of looking up by pointer, lookup by an atomic id that is never re-used. Also remove the shared flag because it was always set to true. Either the thread owns the per thread info (in the case of automatic creation) or the caller down (manual creation). Both cases were being flagged as shared, so the non-shared codepath was never taken

Signed-off-by: Chris Kulla <ckulla@gmail.com>
…e itself own (and free) the memory of all per thread info structs, regardless of if they are created implicitly or explicitly.

Signed-off-by: Chris Kulla <ckulla@gmail.com>
Signed-off-by: Chris Kulla <ckulla@gmail.com>
e = iter.value();
if (clear) {
iter.value().clear();
iter.value().shrink_to_fit();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why shrink_to_fit instead of erase? You're freeing the characters, but not getting rid of the node in the hash table that contains the string itself. You could imagine a pattern of repeated ImageInput constructions (with errors) that would lead to a fuller and fuller hash table containing unusabl empty strings? Versus actually removing the hash table entry when the ImageInput is destroyed by the right thread (which is mostly).

I don't think it's a problem in practice -- to be a real issue, you'd need tons of created and then destroyed ImageInput objects that encountered errors along the way. I'm happy to make that a "cross that very unlikely bridge when we get to it." But I'm curious if you did this because you spotted some big flaw in my proposal that I hadn't considered.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see what you mean now. Yes of course erase makes more sense.

Signed-off-by: Chris Kulla <ckulla@gmail.com>
@lgritz
Copy link
Collaborator

lgritz commented Apr 12, 2024

The same thing you fixed with ImageCache and the could happen with the errors, right? Like, you make an ImageInput, it has an error (that the caller never retrieves), destroy it, and then the next ImageInput you make MIGHT have the same address and be born with an error message already there!

It's really a user bug to destroy an object that accumulates errors without ever checking and clearing those errors. But I still wonder if we should clear the error in the destructor, even though that will only work in cases where the errors were encountered by the destroying thread. Or maybe that's worry for nothing.

… destroyed.

Signed-off-by: Chris Kulla <ckulla@gmail.com>
Signed-off-by: Chris Kulla <ckulla@gmail.com>
Signed-off-by: Chris Kulla <ckulla@gmail.com>
@lgritz
Copy link
Collaborator

lgritz commented Apr 12, 2024

Something is not happy

@fpsunflower
Copy link
Contributor Author

Indeed ... and I can't repro on my mac. It seems to only happen on linux platforms ...

I'll rewind a bit and see if I can get the tests green again.

…rors are destroyed."

This reverts commit d7e1afe.

Signed-off-by: Chris Kulla <ckulla@gmail.com>
@fpsunflower
Copy link
Contributor Author

It looks like those erase calls in the destructor were to blame. I am not sure why though.

@lgritz are you able to take this branch for a spin on your work machines to see how it does on the renderer testsuite for example?

Its definitely worth testing this one carefully.

@lgritz
Copy link
Collaborator

lgritz commented Apr 12, 2024

Yeah, I will try it and see what happens.

…mageCache and TextureSystem alone as they are less likely to accumulate cruft and also to avoid the static destruction order fiasco

Signed-off-by: Chris Kulla <ckulla@gmail.com>
…ve the ImageCache and TextureSystem alone as they are less likely to accumulate cruft and also to avoid the static destruction order fiasco"

This reverts commit ac5752c.

Signed-off-by: Chris Kulla <ckulla@gmail.com>
…can sort out the static destruction order fiasco

Signed-off-by: Chris Kulla <ckulla@gmail.com>
Copy link
Collaborator

@lgritz lgritz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

There is a very subtle issue that Chris and I have been talking about -- the "static destruction order fiasco" keeps biting us when we try to make sure that we delete these error strings. We're going to merge this with some possible string leaking to avoid it, and return to solve the problem in a more robust way.

@lgritz lgritz merged commit eef733c into AcademySoftwareFoundation:master Apr 14, 2024
23 of 24 checks passed
@ThiagoIze
Copy link
Collaborator

Something to keep in mind with having a thread_local with a destructor, such as the thread_local tsl::robin_map in this PR, is that this destructor won't run until the thread is destroyed. If OIIO is dlclosed before the thread is closed, this could cause a crash or hang if the OS doesn't handle this specifically because the destructor is run at thread termination, after the OIIO code that has been unloaded. We see that it was fixed in glibc 2.22 for linux. This means centos 7.7 still suffers from that (it's on 2.17) unless they backported that fix (I haven't checked).

From rust-lang/rust#28794 (comment) it looks like maybe this was fixed on macos (dlclose won't unload the dylib if you have thread_local). I haven't checked what the status is on Windows.

@fpsunflower fpsunflower deleted the remove_boost_tss branch April 28, 2024 18:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants