-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tsm: cache: improve thread safety of Cache.Deduplicate (see #5699) #5701
Conversation
I left a comment about this in #5699 |
/cc @jwilder I get the possible need for a lock on the |
Jason's comments in the commit message of 5bee888 and the comment https://github.com/influxdata/influxdb/blob/master/tsdb/engine/tsm1/engine.go#L435-L437 seem reasonable to me. Typically 25MB of data may need to be sorted. Comparatively speaking a shallow clone of parts of the store map is a relatively cheap operation - certainly much cheaper than sorting 25MB of data. |
9a0b270
to
936a7a7
Compare
@jonseymour Can you squash these commits? Also, can you update the changelog with a link to #5699? 👍 after that. |
b939c75
to
5166f99
Compare
@jwilder there you go. I've left the test as a separate commit just so that anyone who is curious can checkout that commit and verify that it detects a problem. The RHS branch of the merge also merges cleanly with 0.10.0. Not suggesting you do this, only that it is possible. |
Will fix when I have PC keyboard with working QWERTYUIO keys! |
5166f99
to
98fe240
Compare
@jwilder - better hold off on merging this for the moment. The race test is actually detecting another race condition which I hadn't spotted yet which, when you think about it, is rather a good thing :-) I need to fix this before this change gets pushed to master or else everyone else's PRs will fail because of this test.
|
@jwilder c012e21 fixes the second race condition. I am going to run with this fix for a day to help verify that it doesn't cause a deadlock. (now squashed into d514e5f) |
1251896
to
4f534d3
Compare
4f534d3
to
663a674
Compare
@@ -247,7 +261,11 @@ func (c *Cache) merged(key string) Values { | |||
var entries []*entry | |||
sz := 0 | |||
for _, s := range c.snapshots { | |||
e := s.store[key] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the second race condition that the test case detects.
@jwilder I've been running this code on our production server for 6 hours and there has been no-evidence of a deadlock. I wasn't really expecting there to be an issue, but figured it was worth testing, lest the unexpected occurred. Let me know if you have any concerns, otherwise I'll leave it in your hands. |
Actually this locking approach is wrong. What we should do is clone the snapshot, dedupe the clone then replace the snapshot. That way we don't need fine grained locks |
Yeah, a very much better approach is to use immutable entries and clone them on duplication. This should reduce a lot of the existing need for locking. I'll re-roll this series accordingly. |
Ok, immutable entries aren't going to work for performance and memory reasons. Still, the cloning approach itself is sound. |
663a674
to
01a6fae
Compare
Ok, @jwilder here is the final fix. Much simpler now. |
4a4867e
to
3f29631
Compare
Tweaked the commit message to remove references to the old multi-snapshot design. |
@@ -81,6 +81,7 @@ const ( | |||
type Cache struct { | |||
mu sync.RWMutex | |||
store map[string]*entry | |||
dirty map[string]*entry |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not needed. The snapshot Cache
uses the store
of the hot cache and the hot cache gets a new map to write into.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This analysis is not correct and is not consistent with the go memory model.
Cache.Deduplicate() executes without taking any kind of lock. To do what it needs to do it sometimes has to update the e.values reference of some entries. This is done here:
While it is doing this, code that is running with Cache.merged() under either a read or a write lock is accessing the entries at line 302 of cache.go:
So, in one instance we have a goroutine (c.merged) which is accessing the c.snapshot.store[k].value while properly protected by a lock and another goroutine (Cache.Deduplicate) which is updating the same entry c.snapshot.store[k] without any lock.
This is exactly the condition that the go memory model requires the use of synchronization.
When multiple goroutines access a shared variable v, they must use synchronization events to establish happens-before conditions that ensure reads observe the desired writes.
This is also why the data race detector is reporting the existence of a data race.
Could I ask you to read the go memory model again, ask a colleague to read the same text and this PR and if you are still not convinced ask the #general channel gophers.slack.com for a second opinion. If I am wrong with my analysis then I am quite sure that everyone on that channel will be able to explain exactly why I am wrong.
It was my understanding, for example, that @e-dard did accept the reality of the race condition which is demonstrated by the test in the previous commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jonseymour My point is that dirty
is serving exactly the same purpose as what the store
of the snapshot cache was intended to serve. Adding another map makes the cache more complicated (3 maps instead of 2) which may solve the race, but it is adding another layer onto something that is still broken IMO.
The problem I see is that the encapsulation of entry
, snapshot
, etc.. is violated without adequate protection. For example, entry
's private state is accessed concurrently, but it doesn't have a mutex protecting it when it is accessed concurrently. The snapshot's store
was supposed to be safe copy of the cache to eliminate any chance of concurrent access which is what I believe dirty
is also intended to do as well. Seems like the snapshot's store
should be make a full safe, copy or the unsafe accesses in entry
, etc.. should be protected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My fix does address the thread safety issue, let me explain how.
- merged() never touches the dirty map
- Deduplicate() never touches the store map
There is a subset of entries in both maps that both methods touch. They are precisely those for which entry.needSort is false. The .values field of these entries is never updated by code executing in merged() or code executing in Deduplicate().
There are entries in dirty that Deduplicate() updates - those with needSort is true. Observe that those entries are created by cloning entries in .store while holding the cache's write lock.
So the entries in .dirty that Deduplicate() updates are not visible to merged() - they only exist in dirty and the only thing that accesses dirty outside a write lock is Deduplicate()
This why my solution is thread-safe with respect to Deduplicate() and merged().
It is also why my solution passes the data race test, but the previous commit does not.
If you believe it is not a solution, can you explain why the test behaves differently in both cases?
This PR needs to be rebased but I will not do this until @jwilder has had a chance to digest and consider my line comment above. |
b810c72
to
ee83509
Compare
@jwilder I have revised #5701 to use your suggestion to use entry locks in preference to the dirty map. I have also identified an opportunity to remove escalation of the cache read lock to a cache write lock in the implementation of merged(). I haven't incorporated your suggestion of locking with the snapshot lock during Cache.Deduplicate() for the reasons explained in my e-mail. I can re-articulate them here if necessary. |
Signed-off-by: Jon Seymour <jon@wildducktheories.com>
There are two tests that show two different one vulnerability. One test shows that Cache.Deduplicate modifies entries in a snapshot's store without a lock while cache readers are deduplicating those same entries while correctly locked. A second test shows that two threads trying to execute the methods that Engine.WriteSnapshot calls will cause concurrent, unsynchronized mutating access to the snapshot's store and entries. The tests fail at this commit and are fixed by subsequent commits. Signed-off-by: Jon Seymour <jon@wildducktheories.com>
Currently two compactors can execute Engine.WriteSnapshot at once. This isn't thread safe since both threads want to make modifications to Cache.snapshot at the same time. This commit introduces a lock which is acquired during Snapshot() and released during ClearSnapshot(), ensuring that at most one thread executes within Engine.WriteSnapshot() at once. To ensure that we always release this lock, but only release the snapshot resources on a successful commit, we modify ClearSnapshot() to accept a boolean which indicates whether the write was successful or not and guarantee to call this function if Snapshot() has been called. Signed-off-by: Jon Seymour <jon@wildducktheories.com>
ee83509
to
385221e
Compare
To demonstrate that the write lock was offering protection during concurrent reads from the cache, #5825 cherry-picks 4d98a1c onto d7d81f7. The test fails at the resulting commit whereas here it passes. This demonstrates that with the changes in c8f96a3, merged() no longer needs to take the write lock to achieve thread safety during concurrent reads. |
Previously, we needed a write lock on the cache because it was the only lock we had available to guard updates to entry.values and entry.needSort. However, now we have a entry-scoped lock for this purpose, we don't need the cache write lock for this purpose. Since merged() doesn't modify the .store or the c.snapshot.sort, there is no need for a write lock on the cache to protect the cache. So, we don't need to escalate here - we simply rely on the entry lock to protect the entries we are iterating over. Signed-off-by: Jon Seymour <jon@wildducktheories.com>
I measured the performance benefits of removing the escalation to a write lock with a unit test that can be found in https://github.com/jonseymour/influxdb/commits/jss-speed-check For that test (which doesn't represent real load) the average Values() invocation dropped from 51ms to 29ms. With the write lock and without #5701:
Without the write lock (e.g. #5701)
Of course, this isn't real load but it would be interesting to see if there is a measureable pre/post #5701 difference in system tests that check read performance. 95th percentile times were 139ms (with write lock) vs. 58 ms (without) |
Based on @jwilder's alternative to the 'dirty' slice that featured in previous iterations of this fix. Suggested-by: Jason Wilder <jason@influxdb.com> Signed-off-by: Jon Seymour <jon@wildducktheories.com>
6636d03
to
4809a8b
Compare
Previously, we needed a write lock on the cache because it was the only lock we had available to guard updates to entry.values and entry.needSort. However, now we have a entry-scoped lock for this purpose, we don't need the cache write lock for this purpose. Since merged() doesn't modify the .store or the c.snapshot.sort, there is no need for a write lock on the cache to protect the cache. So, we don't need to escalate here - we simply rely on the entry lock to protect the entries we are iterating over. Signed-off-by: Jon Seymour <jon@wildducktheories.com>
4809a8b
to
4d98a1c
Compare
👍 |
tsm: cache: improve thread safety of Cache.Deduplicate (see #5699)
Fixes #5699
Cache.Deduplicate is designed to be execute while the caller is not holding
any locks.
The entry.deduplicate() method contains two statements:
e.values = e.values.Deduplicate()
e.needSort = false
which need to be executed in that order (as observed by all threads). However,
without a synchronization construct won't (necessarily) be executed in that order.
This fix makes a clone of each entry before performing the deduplication,
then while briefly holding the lock, updates the store with the deduplicated clones
Signed-off-by: Jon Seymour jon@wildducktheories.com