Fix a race condition in subtable accessor deletion #3460

tgoyne · 2019-11-06T23:13:01Z

The following sequence of events would double-delete the Table:

Thread A enters SubtableColumnBase::SubtableMap::refresh_accessor_tree(), holding m_subtable_map_lock.
Thread B releases the last reference to a Table in that SubtableMap.
Thread B blocks on acquiring m_subtable_map_lock in unbind_ref().
Thread A reaches that Table in the iteration and acquires a strong reference (0 -> 1).
Thread A finishes with that Table and releases the reference (1 -> 0).
Thread A deletes the Table due to the refcount going to 0.
Thread A releases m_subtable_map_lock.
Thread B acquires m_subtable_map_lock.
Thread B rechecks the (now deleted) Table's refcount and sees 0.
Thread B tries to delete the already deleted Table.

Fix this by not acquiring and releasing a strong reference to the Table if the refcount is already 0.

I couldn't figure out a way to reasonable test this. The repro case supplied by the user just does a bunch of things on a bunch of threads and crashes eventually.

The following sequence of events would double-delete the Table: 1. Thread A enters SubtableColumnBase::SubtableMap::refresh_accessor_tree(), holding m_subtable_map_lock. 2. Thread B releases the last reference to a Table in that SubtableMap. 3. Thread B blocks on acquiring m_subtable_map_lock in unbind_ref(). 4. Thread A reaches that Table in the iteration and acquires a strong reference (0 -> 1). 5. Thread A finishes with that Table and releases the reference (1 -> 0). 6. Thread A deletes the Table due to the refcount going to 0. 7. Thread A releases m_subtable_map_lock. 8. Thread B acquires m_subtable_map_lock. 9. Thread B rechecks the (now deleted) Table's refcount and sees 0. 10. Thread B tries to delete the already deleted Table. Fix this by not acquiring and releasing a strong reference to the Table if the refcount is already 0.

finnschiermer · 2019-11-10T16:08:16Z

@tgoyne This is an awesome finding! We've been unable to nail this bug for a long, long time.

bmunkholm · 2019-11-10T20:03:44Z

This is likely hard to add a unit test for. But can we add a "random" stress test that eventually provokes this? Can that user's test be reduced to do something just in core?

tgoyne added the T-Bug-Crash label Nov 6, 2019

tgoyne requested a review from finnschiermer November 6, 2019 23:13

tgoyne self-assigned this Nov 6, 2019

tgoyne mentioned this pull request Nov 6, 2019

Crash with thread.cpp:186: [realm-core-5.23.5] pthread_mutex_destroy() failed realm/realm-swift#6333

Closed

realm-ci added the Thinking-Robot label Nov 7, 2019

realm-probot bot removed the Thinking-Robot label Nov 7, 2019

bmunkholm requested a review from jedelbo November 7, 2019 10:46

finnschiermer approved these changes Nov 11, 2019

View reviewed changes

cmelchior mentioned this pull request Nov 11, 2019

Help: Destruction of mutex in use realm/realm-java#5578

Closed

bmunkholm added the Pipeline-Review label Nov 11, 2019

tgoyne merged commit 260202e into master Nov 13, 2019

tgoyne deleted the tg/subtable-destruction-race branch November 13, 2019 03:02

finnschiermer mentioned this pull request Jan 2, 2020

Race in destruction of Table #3442

Closed

bmarty mentioned this pull request Jan 13, 2020

Destroyed mutex: Realm element-hq/element-android#544

Closed

github-actions bot locked as resolved and limited conversation to collaborators Mar 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix a race condition in subtable accessor deletion #3460

Fix a race condition in subtable accessor deletion #3460

tgoyne commented Nov 6, 2019

finnschiermer commented Nov 10, 2019

bmunkholm commented Nov 10, 2019

Fix a race condition in subtable accessor deletion #3460

Fix a race condition in subtable accessor deletion #3460

Conversation

tgoyne commented Nov 6, 2019

finnschiermer commented Nov 10, 2019

bmunkholm commented Nov 10, 2019