Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix a race condition in subtable accessor deletion #3460

Merged
merged 1 commit into from
Nov 13, 2019

Conversation

tgoyne
Copy link
Member

@tgoyne tgoyne commented Nov 6, 2019

The following sequence of events would double-delete the Table:

  1. Thread A enters SubtableColumnBase::SubtableMap::refresh_accessor_tree(), holding m_subtable_map_lock.
  2. Thread B releases the last reference to a Table in that SubtableMap.
  3. Thread B blocks on acquiring m_subtable_map_lock in unbind_ref().
  4. Thread A reaches that Table in the iteration and acquires a strong reference (0 -> 1).
  5. Thread A finishes with that Table and releases the reference (1 -> 0).
  6. Thread A deletes the Table due to the refcount going to 0.
  7. Thread A releases m_subtable_map_lock.
  8. Thread B acquires m_subtable_map_lock.
  9. Thread B rechecks the (now deleted) Table's refcount and sees 0.
  10. Thread B tries to delete the already deleted Table.

Fix this by not acquiring and releasing a strong reference to the Table if the refcount is already 0.

I couldn't figure out a way to reasonable test this. The repro case supplied by the user just does a bunch of things on a bunch of threads and crashes eventually.

The following sequence of events would double-delete the Table:

1. Thread A enters SubtableColumnBase::SubtableMap::refresh_accessor_tree(), holding m_subtable_map_lock.
2. Thread B releases the last reference to a Table in that SubtableMap.
3. Thread B blocks on acquiring m_subtable_map_lock in unbind_ref().
4. Thread A reaches that Table in the iteration and acquires a strong reference (0 -> 1).
5. Thread A finishes with that Table and releases the reference (1 -> 0).
6. Thread A deletes the Table due to the refcount going to 0.
7. Thread A releases m_subtable_map_lock.
8. Thread B acquires m_subtable_map_lock.
9. Thread B rechecks the (now deleted) Table's refcount and sees 0.
10. Thread B tries to delete the already deleted Table.

Fix this by not acquiring and releasing a strong reference to the Table if the refcount is already 0.
@finnschiermer
Copy link
Contributor

@tgoyne This is an awesome finding! We've been unable to nail this bug for a long, long time.

@bmunkholm
Copy link
Contributor

This is likely hard to add a unit test for. But can we add a "random" stress test that eventually provokes this? Can that user's test be reduced to do something just in core?

@tgoyne tgoyne merged commit 260202e into master Nov 13, 2019
@tgoyne tgoyne deleted the tg/subtable-destruction-race branch November 13, 2019 03:02
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 23, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants