-
Notifications
You must be signed in to change notification settings - Fork 169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TSAN Data Race in LangBindHelper_HandoverBetweenThreads #6474
Comments
➤ michael-wb commented: There may also be a related hang that occurred in theLangBindHelper_ImplicitTransactions_InterProcess test in test_lang_bind_helper.cpp that occurred in master: |
I don't see the connection with the last one? |
Yes, the last one seems different, it is a problem with |
Tracked by #6739 |
tsan report from first linked ci build:
|
Another report:
|
Yet another report:
|
@kiburtse I think I'd like to keep all races from |
ok, let's keep all failures under this ticket. The only stack trace which has happened multiple times already is on realm::Array::get_universal. That must be concurrent access to Node::m_data. What could this mean? |
In theory, I think it means that an array is being written to while there are still readers accessing it concurrently. But this should not be possible because a writer does a copy-on-write and only modifies the array at its new location specifically to allow readers at previous versions to continue to see a consistent view of the data. This may indicate a problem with our locking, version management, or online compaction. It could also be a false positive from TSAN. |
One more occurence from CI test run on the same ubuntu2004-encryption-tsan I've tried v13.23.3 on archlinux with tsan and gcc and it reproduces for me 100% of the time on this test run alone also. Missing stack trace in our output seems to be related to small default value for tsan stacktrace - increased history_size option for tsan should solve the issue. There are actually two possible combinations: we have three threads running handover_querier, handover_writer and handover_verifier functions.
Full tsan output from these two stacks: Stack 1
Stack 2
@finnschiermer @ironage could you comment on that? Looks to me that this is specifically encryption related issue after all. |
So, as was clarified under under linked pr for work-around, this may really be just a false positive in our case. Encrypted pages are refreshed unconditionally, but the data itself for those threads which read it in the moment is not expected to change, but tsan naturally sees it as a race. So, it's by design. @ironage do you think the stacktraces we've gathered here could all be explained by this? everything what intersects with
Suppressing all this with tsan doesn't look like the solution to me. |
@kiburtse yes, at this point, I do think that every race that intersects with
But the second stack, the one that couldn't be restored, is very likely |
Some core-tests are failing due to a data race in the LangBindHelper_HandoverBetweenThreads in test_lang_bind_helper.cpp. The tests themselves are passing, but the TSAN warnings are causing the task to fail. I have seen this failure twice within the past day or two.
Ubuntu 20.04 x86_64 (Clang 11 Encryption Enabled w/TSAN) Task Page
Data Race in Parsley
The text was updated successfully, but these errors were encountered: