-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[C++] Thread deadlock in ObjectOutputStream #41862
Comments
Copying the temptative diagnosis I initially posted on the dev ML: It seems the problem lies here: arrow/cpp/src/arrow/filesystem/s3fs.cc Line 1858 in 9f58990
The Future is marked finished with the ObjectOutputStream's mutex taken, |
@icexelloss If I submit a PR trying to fix this, will it be possible for you to validate it with your use case? |
Yep - I can cherry pick the patch internally and test. |
@pitrou I am curious how did you find the |
It's frame 68 in the stacktrace :-) |
Thanks @pitrou! I will test both out in the next couple of days! |
FYI I am testing #41875 Today, I will let it run through out the day because the it doesn't always happen quickly. |
#41875 doesn't seem to work and with similar stacktrace :( I am rerunning it to double check |
I am testing #41876 Today |
Could you post the new stacktrace somewhere? |
Yep, let me do that. |
Here is the stacktrace after I applied the #41875 patch |
Did you apply it to an old version of Arrow? I cannot find anything in recent versions that corresponds to this traceback line:
It would be nice if you could test the PR itself. |
That said, I agree the stack trace is very similar (except for the |
This is on Arrow 15. The line numbers are slightly off but this is the line that stack 21 is referring to: (or on master) |
Ok, thanks. That explains why the "fix" doesn't work then :-) |
Sorry I am slightly confused. The "fix" doesn't work because I am using Arrow 15 and the fix only works on newer Arrow version? |
No, I mean the traceback better explains why the fix doesn't work. It probably wouldn't work on latest Arrow either (though you can still try if that's easy for you). |
I see. Trying #41876 then. |
Cool, thanks! |
…#41876) ### Rationale for this change When the Future returned by `OutputStream::CloseAsync` finishes, it can invoke a user-supplied callback. That callback may well destroy the stream as a side effect. If the stream is a S3 output stream, this might lead to a deadlock involving the mutex in the output stream's `UploadState` structure, since the callback is called with that mutex locked. ### What changes are included in this PR? Unlock the `UploadState` mutex before marking the Future finished, to avoid deadlocking. ### Are these changes tested? No. Unfortunately, I wasn't able to write a test that would trigger the original condition. Additional preconditions seem to be required to reproduce the deadlock. For example, it might require a mutex implementation that hangs if destroyed while locked. ### Are there any user-facing changes? No. * GitHub Issue: #41862 Authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: Antoine Pitrou <antoine@python.org>
Issue resolved by pull request 41876 |
Thank you! @pitrou |
Describe the bug, including details regarding any error messages, version, and platform.
I am seeing a deadlock when destructing an ObjectOutputStream. I have attached the stack trace.
I did some debugging and found that the issue seems to be that the mutex in question is already held by this thread (I checked the __owner field in the pthread_mutex_t which points to the hanging thread)
Unfortunately the stack trace doesn’t show exactly which mutex it is trying to lock. I wonder if someone more familiar with the IO code has some ideas what might be the issue and where to dig deeper?
arrow_object_output_stream_stacktrace.txt
Edit:
I am using Arrow 15.0.0
Component(s)
C++
The text was updated successfully, but these errors were encountered: