-
Notifications
You must be signed in to change notification settings - Fork 608
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Crash on exit with OSX #1991
Comments
Another important bit of info is that this crash happens only when we do diff --git a/src/libutil/thread.cpp b/src/libutil/thread.cpp
index c1da23d8..bb10bf9b 100644
--- a/src/libutil/thread.cpp
+++ b/src/libutil/thread.cpp
@@ -159,7 +159,8 @@ public:
else { // the number of threads is decreased
for (int i = oldNThreads - 1; i >= nThreads; --i) {
*this->flags[i] = true; // this thread will finish
- this->threads[i]->detach();
+ this->terminating_threads.push_back(std::move(this->threads[i]));
+ this->threads.erase(this->threads.begin() + i);
}
{
// stop the detached threads that were waiting
@@ -218,10 +219,15 @@ public:
if (thread->joinable())
thread->join();
}
+ for (auto& thread : this->terminating_threads) { // wait for the terminated threads to finish
+ if (thread->joinable())
+ thread->join();
+ }
// if there were no threads in the pool but some functors in the queue, the functors are not deleted by the threads
// therefore delete them here
this->clear_queue();
this->threads.clear();
+ this->terminating_threads.clear();
this->flags.clear();
}
@@ -317,6 +323,7 @@ private:
void init() { this->nWaiting = 0; this->isStop = false; this->isDone = false; }
std::vector<std::unique_ptr<std::thread>> threads;
+ std::vector<std::unique_ptr<std::thread>> terminating_threads;
std::vector<std::shared_ptr<std::atomic<bool>>> flags;
mutable Queue<std::function<void(int id)> *> q;
std::atomic<bool> isDone; If it looks good I can try to make a pull request. |
3 tasks
Fixed by #2013 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
While upgrading to the master branch, I came across a very rare crash (sometimes it'll take 10s of thousands of runs before a crash) that happened upon shutdown on my high sierra OSX machine:
This is a bug in OIIO. The following patch, which is perfectly valid code, allows it to happen on almost every run:
The bug here is that when we are shutting down and destroying everything, the thread pool class is destroyed while the detached worker threads are still making use of its member variables. This patch pauses the destructor for 1s after the
mutex
is destroyed and makes the worker thread delay exiting by 1s so that when it finally locks the mutex it's now much more likely that the mutex will have been destroyed and OSX throws the error.We could wrap all usage of the mutex usage in a try/catch block, and that does help a lot with this crash, however, even then I worry that we might still get other crashes due to other errors that could be triggered with using a mutex and other state that has been destroyed, not to mention maybe on other operating systems, or future versions of OSX, this could start to fail. So I think it might be better to have a proper fix.
I wonder if this could be related to #1572 and #1795?
The text was updated successfully, but these errors were encountered: