Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tsm: cache: improve thread safety of Cache.Deduplicate (see #5699) #5701

Merged
merged 5 commits into from
Feb 25, 2016

Conversation

jonseymour
Copy link
Contributor

Fixes #5699

Cache.Deduplicate is designed to be execute while the caller is not holding
any locks.

The entry.deduplicate() method contains two statements:

e.values = e.values.Deduplicate()
e.needSort = false

which need to be executed in that order (as observed by all threads). However,
without a synchronization construct won't (necessarily) be executed in that order.

This fix makes a clone of each entry before performing the deduplication,
then while briefly holding the lock, updates the store with the deduplicated clones

Signed-off-by: Jon Seymour jon@wildducktheories.com

@e-dard
Copy link
Contributor

e-dard commented Feb 16, 2016

I left a comment about this in #5699

@e-dard
Copy link
Contributor

e-dard commented Feb 16, 2016

/cc @jwilder

I get the possible need for a lock on the cache.Deduplicate as there could be a race there, but I'm not sure why the extra work cloning is needed outside of that lock. Would need to see some profiling and/or benchmarks to know if that's needed or not IMO.

@jonseymour
Copy link
Contributor Author

Jason's comments in the commit message of 5bee888 and the comment https://github.com/influxdata/influxdb/blob/master/tsdb/engine/tsm1/engine.go#L435-L437 seem reasonable to me. Typically 25MB of data may need to be sorted. Comparatively speaking a shallow clone of parts of the store map is a relatively cheap operation - certainly much cheaper than sorting 25MB of data.

@jwilder
Copy link
Contributor

jwilder commented Feb 18, 2016

@jonseymour Can you squash these commits? Also, can you update the changelog with a link to #5699?

👍 after that.

@jwilder jwilder added this to the 0.11.0 milestone Feb 18, 2016
@jonseymour
Copy link
Contributor Author

@jwilder there you go. I've left the test as a separate commit just so that anyone who is curious can checkout that commit and verify that it detects a problem.

The RHS branch of the merge also merges cleanly with 0.10.0. Not suggesting you do this, only that it is possible.

@jonseymour
Copy link
Contributor Author

Will fix when I have PC keyboard with working QWERTYUIO keys!

@jonseymour
Copy link
Contributor Author

@jwilder - better hold off on merging this for the moment. The race test is actually detecting another race condition which I hadn't spotted yet which, when you think about it, is rather a good thing :-)

I need to fix this before this change gets pushed to master or else everyone else's PRs will fail because of this test.

==================
WARNING: DATA RACE
Write by goroutine 107:
  runtime.mapassign1()
      /usr/local/Cellar/go/1.5.3/libexec/src/runtime/hashmap.go:411 +0x0
  github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Cache).Deduplicate()
      /Users/jonseymour/ninja/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/cache.go:169 +0x3c5
  command-line-arguments_test.TestCacheRace.func2()
      /Users/jonseymour/ninja/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/cache_race_test.go:50 +0x8f

Previous read by goroutine 51:
  runtime.mapaccess1_faststr()
      /usr/local/Cellar/go/1.5.3/libexec/src/runtime/hashmap_fast.go:179 +0x0
  github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Cache).merged()
      /Users/jonseymour/ninja/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/cache.go:264 +0x25d
  github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Cache).Values.func1()
      /Users/jonseymour/ninja/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/cache.go:223 +0xa3
  github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Cache).Values()
      /Users/jonseymour/ninja/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/cache.go:224 +0x142
  command-line-arguments_test.TestCacheRace.func1()
      /Users/jonseymour/ninja/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/cache_race_test.go:42 +0x90

Goroutine 107 (running) created at:
  command-line-arguments_test.TestCacheRace()
      /Users/jonseymour/ninja/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/cache_race_test.go:52 +0xb7e
  testing.tRunner()
      /usr/local/Cellar/go/1.5.3/libexec/src/testing/testing.go:456 +0xdc

Goroutine 51 (finished) created at:
  command-line-arguments_test.TestCacheRace()
      /Users/jonseymour/ninja/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/cache_race_test.go:43 +0xaf8
  testing.tRunner()
      /usr/local/Cellar/go/1.5.3/libexec/src/testing/testing.go:456 +0xdc
==================
PASS

@jonseymour
Copy link
Contributor Author

@jwilder c012e21 fixes the second race condition. I am going to run with this fix for a day to help verify that it doesn't cause a deadlock. (now squashed into d514e5f)

@jonseymour jonseymour force-pushed the js-deduplicate-safety branch 2 times, most recently from 1251896 to 4f534d3 Compare February 19, 2016 01:29
@@ -247,7 +261,11 @@ func (c *Cache) merged(key string) Values {
var entries []*entry
sz := 0
for _, s := range c.snapshots {
e := s.store[key]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the second race condition that the test case detects.

@jonseymour
Copy link
Contributor Author

@jwilder I've been running this code on our production server for 6 hours and there has been no-evidence of a deadlock. I wasn't really expecting there to be an issue, but figured it was worth testing, lest the unexpected occurred.

Let me know if you have any concerns, otherwise I'll leave it in your hands.

@jonseymour
Copy link
Contributor Author

Actually this locking approach is wrong. What we should do is clone the snapshot, dedupe the clone then replace the snapshot. That way we don't need fine grained locks

@jonseymour
Copy link
Contributor Author

Yeah, a very much better approach is to use immutable entries and clone them on duplication. This should reduce a lot of the existing need for locking. I'll re-roll this series accordingly.

@jonseymour
Copy link
Contributor Author

Ok, immutable entries aren't going to work for performance and memory reasons. Still, the cloning approach itself is sound.

@jonseymour
Copy link
Contributor Author

Ok, @jwilder here is the final fix. Much simpler now.

@jonseymour
Copy link
Contributor Author

Tweaked the commit message to remove references to the old multi-snapshot design.

@@ -81,6 +81,7 @@ const (
type Cache struct {
mu sync.RWMutex
store map[string]*entry
dirty map[string]*entry
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not needed. The snapshot Cache uses the store of the hot cache and the hot cache gets a new map to write into.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This analysis is not correct and is not consistent with the go memory model.

Cache.Deduplicate() executes without taking any kind of lock. To do what it needs to do it sometimes has to update the e.values reference of some entries. This is done here:

https://github.com/jonseymour/influxdb/blob/f44db5df48b7eaa00ccbbc9b76406b29cf5a2bb3/tsdb/engine/tsm1/cache.go#L61-L62

While it is doing this, code that is running with Cache.merged() under either a read or a write lock is accessing the entries at line 302 of cache.go:

https://github.com/jonseymour/influxdb/blob/f44db5df48b7eaa00ccbbc9b76406b29cf5a2bb3/tsdb/engine/tsm1/cache.go#L302

So, in one instance we have a goroutine (c.merged) which is accessing the c.snapshot.store[k].value while properly protected by a lock and another goroutine (Cache.Deduplicate) which is updating the same entry c.snapshot.store[k] without any lock.

This is exactly the condition that the go memory model requires the use of synchronization.

When multiple goroutines access a shared variable v, they must use synchronization events to establish happens-before conditions that ensure reads observe the desired writes.

(https://golang.org/ref/mem)

This is also why the data race detector is reporting the existence of a data race.

Could I ask you to read the go memory model again, ask a colleague to read the same text and this PR and if you are still not convinced ask the #general channel gophers.slack.com for a second opinion. If I am wrong with my analysis then I am quite sure that everyone on that channel will be able to explain exactly why I am wrong.

It was my understanding, for example, that @e-dard did accept the reality of the race condition which is demonstrated by the test in the previous commit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jonseymour My point is that dirty is serving exactly the same purpose as what the store of the snapshot cache was intended to serve. Adding another map makes the cache more complicated (3 maps instead of 2) which may solve the race, but it is adding another layer onto something that is still broken IMO.

The problem I see is that the encapsulation of entry, snapshot, etc.. is violated without adequate protection. For example, entry's private state is accessed concurrently, but it doesn't have a mutex protecting it when it is accessed concurrently. The snapshot's store was supposed to be safe copy of the cache to eliminate any chance of concurrent access which is what I believe dirty is also intended to do as well. Seems like the snapshot's store should be make a full safe, copy or the unsafe accesses in entry, etc.. should be protected.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My fix does address the thread safety issue, let me explain how.

  • merged() never touches the dirty map
  • Deduplicate() never touches the store map

There is a subset of entries in both maps that both methods touch. They are precisely those for which entry.needSort is false. The .values field of these entries is never updated by code executing in merged() or code executing in Deduplicate().

There are entries in dirty that Deduplicate() updates - those with needSort is true. Observe that those entries are created by cloning entries in .store while holding the cache's write lock.

So the entries in .dirty that Deduplicate() updates are not visible to merged() - they only exist in dirty and the only thing that accesses dirty outside a write lock is Deduplicate()

This why my solution is thread-safe with respect to Deduplicate() and merged().

It is also why my solution passes the data race test, but the previous commit does not.

If you believe it is not a solution, can you explain why the test behaves differently in both cases?

@jonseymour
Copy link
Contributor Author

This PR needs to be rebased but I will not do this until @jwilder has had a chance to digest and consider my line comment above.

@jonseymour jonseymour force-pushed the js-deduplicate-safety branch 2 times, most recently from b810c72 to ee83509 Compare February 24, 2016 23:22
@jonseymour
Copy link
Contributor Author

@jwilder I have revised #5701 to use your suggestion to use entry locks in preference to the dirty map. I have also identified an opportunity to remove escalation of the cache read lock to a cache write lock in the implementation of merged(). I haven't incorporated your suggestion of locking with the snapshot lock during Cache.Deduplicate() for the reasons explained in my e-mail. I can re-articulate them here if necessary.

Signed-off-by: Jon Seymour <jon@wildducktheories.com>
There are two tests that show two different one vulnerability.

One test shows that Cache.Deduplicate modifies entries in a snapshot's
store without a lock while cache readers are deduplicating those same
entries while correctly locked.

A second test shows that two threads trying to execute the methods
that Engine.WriteSnapshot calls will cause concurrent, unsynchronized
mutating access to the snapshot's store and entries.

The tests fail at this commit and are fixed by subsequent commits.

Signed-off-by: Jon Seymour <jon@wildducktheories.com>
Currently two compactors can execute Engine.WriteSnapshot at once.

This isn't thread safe since both threads want to make modifications to
Cache.snapshot at the same time.

This commit introduces a lock which is acquired during Snapshot() and
released during ClearSnapshot(), ensuring that at most one thread
executes within Engine.WriteSnapshot() at once.

To ensure that we always release this lock, but only release the
snapshot resources on a successful commit, we modify ClearSnapshot() to
accept a boolean which indicates whether the write was successful or not
and guarantee to call this function if Snapshot() has been called.

Signed-off-by: Jon Seymour <jon@wildducktheories.com>
@jonseymour
Copy link
Contributor Author

To demonstrate that the write lock was offering protection during concurrent reads from the cache, #5825 cherry-picks 4d98a1c onto d7d81f7. The test fails at the resulting commit whereas here it passes. This demonstrates that with the changes in c8f96a3, merged() no longer needs to take the write lock to achieve thread safety during concurrent reads.

jonseymour referenced this pull request in jonseymour/influxdb Feb 25, 2016
Previously, we needed a write lock on the cache because it was the
only lock we had available to guard updates to entry.values and
entry.needSort.

However, now we have a entry-scoped lock for this purpose, we don't
need the cache write lock for this purpose. Since merged() doesn't
modify the .store or the c.snapshot.sort, there is no need for
a write lock on the cache to protect the cache.

So, we don't need to escalate here - we simply rely on the entry lock
to protect the entries we are iterating over.

Signed-off-by: Jon Seymour <jon@wildducktheories.com>
@jonseymour
Copy link
Contributor Author

I measured the performance benefits of removing the escalation to a write lock with a unit test that can be found in https://github.com/jonseymour/influxdb/commits/jss-speed-check

For that test (which doesn't represent real load) the average Values() invocation dropped from 51ms to 29ms.

With the write lock and without #5701:

greywedge2:tsm1 jonseymour (1)$ git checkout jss-speed-check
greywedge2:tsm1 jonseymour (1)$ GORACE=history=10000 go test -v -race cache_race_test.go
=== RUN   TestCheckConcurrentReadsAreSafe
average duration - 51698560 (ns)
--- PASS: TestCheckConcurrentReadsAreSafe (0.34s)
PASS
ok      command-line-arguments  1.359s

Without the write lock (e.g. #5701)

greywedge2:tsm1 jonseymour (1)$ git checkout js-deduplicate-safety
Switched to branch 'js-deduplicate-safety'
greywedge2:tsm1 jonseymour (1)$ git merge jss-speed-check 
Auto-merging tsdb/engine/tsm1/cache_race_test.go
Merge made by the 'recursive' strategy.
 tsdb/engine/tsm1/cache_race_test.go | 37 ++++++++++++++++++++++++-------------
 1 file changed, 24 insertions(+), 13 deletions(-)
greywedge2:tsm1 jonseymour (1)$ GORACE=history=10000 go test -v -race cache_race_test.go
=== RUN   TestCheckConcurrentReadsAreSafe
average duration - 28554186 (ns)
--- PASS: TestCheckConcurrentReadsAreSafe (0.29s)
=== RUN   TestCacheRace
--- PASS: TestCacheRace (0.20s)
=== RUN   TestCacheRace2Compacters
--- PASS: TestCacheRace2Compacters (0.23s)
PASS
ok      command-line-arguments  1.752s

Of course, this isn't real load but it would be interesting to see if there is a measureable pre/post #5701 difference in system tests that check read performance.

95th percentile times were 139ms (with write lock) vs. 58 ms (without)

Based on @jwilder's alternative to the 'dirty' slice that featured
in previous iterations of this fix.

Suggested-by: Jason Wilder <jason@influxdb.com>
Signed-off-by: Jon Seymour <jon@wildducktheories.com>
@jonseymour jonseymour force-pushed the js-deduplicate-safety branch 4 times, most recently from 6636d03 to 4809a8b Compare February 25, 2016 13:53
Previously, we needed a write lock on the cache because it was the
only lock we had available to guard updates to entry.values and
entry.needSort.

However, now we have a entry-scoped lock for this purpose, we don't
need the cache write lock for this purpose. Since merged() doesn't
modify the .store or the c.snapshot.sort, there is no need for
a write lock on the cache to protect the cache.

So, we don't need to escalate here - we simply rely on the entry lock
to protect the entries we are iterating over.

Signed-off-by: Jon Seymour <jon@wildducktheories.com>
@jwilder
Copy link
Contributor

jwilder commented Feb 25, 2016

👍

jwilder added a commit that referenced this pull request Feb 25, 2016
tsm: cache: improve thread safety of Cache.Deduplicate (see #5699)
@jwilder jwilder merged commit daf68db into influxdata:master Feb 25, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

code review: potential thread safety issue in cache.go
3 participants