Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Deadlock if wait_for_tasks() is called concurrently #110

Closed
alugowski opened this issue May 19, 2023 · 7 comments
Closed

[BUG] Deadlock if wait_for_tasks() is called concurrently #110

alugowski opened this issue May 19, 2023 · 7 comments
Assignees
Labels
bug Something isn't working

Comments

@alugowski
Copy link

alugowski commented May 19, 2023

Describe the bug

Deadlock if wait_for_tasks() is called from multiple threads at the same time.

Minimal working example

#include <chrono>
#include <iostream>
#include <thread>
#include "BS_thread_pool.hpp"

using namespace std::chrono_literals;

void func() {
    BS::thread_pool pool{1};
    std::atomic<bool> go{false};
    std::atomic<bool> thread_started{false};

    auto t = std::thread([&]{
        thread_started = true;
        while (!go) {}
        pool.wait_for_tasks();
    });
    pool.push_task([]{std::this_thread::sleep_for(2ms);});
    while (!thread_started) {}
    go = true;
    pool.wait_for_tasks();

    t.join();
}

int main() {
    for (int i = 0; i < 1e6; ++i) {
        func();
        std::cout << i << " ";
        std::cout.flush();
    }
    std::cout << "done" << std::endl;
    return 0;
}

Behavior

Expectation is that both wait_for_tasks() will block until the tasks are completed then both will unblock. In some situations this does not happen, deadlocking one of the threads.

Example execution of above:

$ ./wait_for_tasks_deadlock
0 1 2 3 4 5 6

Edit: Added thread_started to example code to make the problem appear even faster.

@alugowski alugowski added the bug Something isn't working label May 19, 2023
@bshoshany
Copy link
Owner

Nice catch, thanks for posting this issue! I believe this happens if the second wait_for_tasks() is launched exactly when the first one is between the lines if (!waiting) and waiting = true;, so it hasn't yet had a chance to set waiting to true.

The ideal solution is to add an additional mutex to ensure that waiting stays locked until its value is set to true. I will add that in the next release. Meanwhile, a simple solution is to replace in wait_for_tasks()

task_done_cv.wait(tasks_lock, [this] { return (tasks_total == (paused ? tasks.size() : 0)); });

with

task_done_cv.wait(tasks_lock, [this] { return (tasks_total == (paused ? tasks.size() : 0)) || !waiting; });

and similarly in the other two wait functions if desired.

@alugowski
Copy link
Author

Note: I added an extra variable to the example code to make the problem appear even faster.

Meanwhile, a simple solution

That does not seem to have an effect.

@alugowski
Copy link
Author

if the second wait_for_tasks() is launched exactly when the first one is between the lines if (!waiting) and waiting = true;, so it hasn't yet had a chance to set waiting to true.

Wait, what? Both threads set waiting to true, presumably as it should be.

This actually brings up another possibility: If thread 2 entered wait_for_tasks after thread 1 already set waiting to true, then thread 2 wouldn't wait at all! Sure that's not a deadlock, but it's incorrect behavior.

@alugowski
Copy link
Author

FYI, atomics have been specifically built to make test-and-set type operations, well, atomic. So something like if (!waiting) { waiting = true} defeats the purpose of making waiting atomic.

One of these would be more appropriate. I'd recommend the strong variants for this project.
https://en.cppreference.com/w/cpp/atomic/atomic/compare_exchange

Though in this case if (!waiting) introduces a behavior bug, even if done atomically.

@bshoshany
Copy link
Owner

You're 100% right, I appear to have made a mistake and wait_for_tasks() is not behaving as expected. However, both the incorrect behavior and the deadlock can be resolved as follows:

  1. Delete the statement if (!waiting) along with its 2 braces, since as you pointed out, that doesn't make sense - if the second thread calls wait_for_tasks(), it will not block as expected. (This essentially returns the function to the form it had in v3.3.0.)
  2. In worker(), simply change task_done_cv.notify_one(); to task_done_cv.notify_all(); so that it notifies all the waiting threads, not just one of them; this was something I actually overlooked in the original design, and it is the cause of the deadlock, as far as I can tell.

I tested this with your minimal working example and did not encounter any deadlocks (I let it count to 30,000 and then stopped it manually). I also have a test in BS_thread_pool_test.cpp to test for deadlocks in wait_for_tasks(), and it too passes successfully with this change.

This fix will be posted within a few days with release v3.5.0 (along with a few other fixes I've been working on).

Thanks again for pointing this out, it was very helpful! :)

@alugowski
Copy link
Author

Great!

Delete the statement if (!waiting) along with its 2 braces,

I think that's correct. Notice that now the reproducing code always deadlocks on the first iteration. That's what I expected it to do when I wrote the code, I think the fact it took multiple iterations with stock 3.4.0 is that the second thread would simply not wait.

task_done_cv.notify_all();

Yes, this fixes it!

@bshoshany
Copy link
Owner

bshoshany commented May 26, 2023

Update: This issue has been fixed in v3.5.0, which was just released. Thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants