-
Notifications
You must be signed in to change notification settings - Fork 246
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Race condition in AsyncResult.wait
and Connection.serve
#530
Comments
I found a way to reproduce some pesky issues (prevelant in 99c5abe as well).
I fixed the busy loop and corrected a logic error in caf8c1c. The idea would be, if a thread is able to acquire the lock for a short period of time, then we know at that point in time no other thread is receiving data. However, the boolean we used prior cannot provide such guarantees as easily---always we can get more reuse out of the lock compared to introducing another variable which at best mirrors the recvlock state. Thanks again for the busy loop catch. Lmk if you notice any other improvements to be made. |
Now I'm confused, what issues?
Can you clarify? I don't see the connection between duration of lock ownership and another thread currently receiving. All access to the boolean is done while holding |
Anyway, I went down a rabbit whole w/ too many changes at once. So, I reverted back to where your code was merged and started resolving bugs that impacted my ability to refactor.
I'd still like to refactor bind threads a bit to make things easier to follow and less brittle. |
I added bind thread test to the CI/CD and your branch merge in is master + some fixes. |
I was rushed after pushing fixes yesterday, and I'll try to recap the chaos of my commits/learning @notEvil . Refactor rabbit hole and breaking changes Reverting back and finding more issues
TODOs/Take-aways
Thanks for your time/contributions. I aim to make bind threads the default behavior eventually. Before it becomes the default behavior, I'd like to familiarize myself with it, optimize it, make it easier to follow, and document why/how it works. |
Very unfortunate
When I run the tests, I usually see some fails due to hardened SSH config on my system. So I might accidentally missed an issue or two (your 1. for instance). And sometimes the background threads don't shut down gracefully. Especially in the gevent test which leaves the test environment tainted for subsequent tests. That area definitely needs more thought!
I don't think thats an option. A dedicated thread for communication would introduce thread switches which hurt performance. Also, there isn't that much added complexity I would say.
Great, if you need any information or find optimizations, let me know :) |
This issue was previously discussed in #492 and recently brought up in #527.
Hi!
I believe there is a race condition which potentially creates a deadlock situation. Consider thread A and B, and the following sequence of operations (rpyc 5.3.1)
rpyc/rpyc/core/protocol.py
Line 438 in ba07bae
rpyc/rpyc/core/async_.py
Line 47 in ba07bae
rpyc/rpyc/core/protocol.py
Line 428 in ba07bae
rpyc/rpyc/core/protocol.py
Line 445 in ba07bae
rpyc/rpyc/core/protocol.py
Line 408 in ba07bae
rpyc/rpyc/core/protocol.py
Line 438 in ba07bae
If there is no third party, B might wait indefinitely for something that A already received but didn't process in time (lost the race). I hope this is easy to follow and self-evident. Obviously, the probability hitting this should be low, but I did and was able to reproduce the issue reliably in the past.
I found a more concise solution:
A unit test may monkey patch
brine.load
to hold the thread betweenrelease
and_seq_request_callback
while sending a second thread to win the race.The text was updated successfully, but these errors were encountered: