tsm: cache: improve thread safety of Cache.Deduplicate (see #5699) #5701

jonseymour · 2016-02-16T13:15:45Z

Cache.Deduplicate is designed to be execute while the caller is not holding
any locks.

The entry.deduplicate() method contains two statements:

e.values = e.values.Deduplicate()
e.needSort = false

which need to be executed in that order (as observed by all threads). However,
without a synchronization construct won't (necessarily) be executed in that order.

This fix makes a clone of each entry before performing the deduplication,
then while briefly holding the lock, updates the store with the deduplicated clones

Signed-off-by: Jon Seymour jon@wildducktheories.com

e-dard · 2016-02-16T13:18:15Z

I left a comment about this in #5699

e-dard · 2016-02-16T14:22:27Z

/cc @jwilder

I get the possible need for a lock on the cache.Deduplicate as there could be a race there, but I'm not sure why the extra work cloning is needed outside of that lock. Would need to see some profiling and/or benchmarks to know if that's needed or not IMO.

jonseymour · 2016-02-16T15:05:25Z

Jason's comments in the commit message of 5bee888 and the comment https://github.com/influxdata/influxdb/blob/master/tsdb/engine/tsm1/engine.go#L435-L437 seem reasonable to me. Typically 25MB of data may need to be sorted. Comparatively speaking a shallow clone of parts of the store map is a relatively cheap operation - certainly much cheaper than sorting 25MB of data.

jwilder · 2016-02-18T22:58:56Z

@jonseymour Can you squash these commits? Also, can you update the changelog with a link to #5699?

👍 after that.

jonseymour · 2016-02-18T23:21:17Z

@jwilder there you go. I've left the test as a separate commit just so that anyone who is curious can checkout that commit and verify that it detects a problem.

The RHS branch of the merge also merges cleanly with 0.10.0. Not suggesting you do this, only that it is possible.

jonseymour · 2016-02-19T00:16:15Z

Will fix when I have PC keyboard with working QWERTYUIO keys!

jonseymour · 2016-02-19T00:57:36Z

@jwilder - better hold off on merging this for the moment. The race test is actually detecting another race condition which I hadn't spotted yet which, when you think about it, is rather a good thing :-)

I need to fix this before this change gets pushed to master or else everyone else's PRs will fail because of this test.

==================
WARNING: DATA RACE
Write by goroutine 107:
  runtime.mapassign1()
      /usr/local/Cellar/go/1.5.3/libexec/src/runtime/hashmap.go:411 +0x0
  github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Cache).Deduplicate()
      /Users/jonseymour/ninja/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/cache.go:169 +0x3c5
  command-line-arguments_test.TestCacheRace.func2()
      /Users/jonseymour/ninja/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/cache_race_test.go:50 +0x8f

Previous read by goroutine 51:
  runtime.mapaccess1_faststr()
      /usr/local/Cellar/go/1.5.3/libexec/src/runtime/hashmap_fast.go:179 +0x0
  github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Cache).merged()
      /Users/jonseymour/ninja/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/cache.go:264 +0x25d
  github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Cache).Values.func1()
      /Users/jonseymour/ninja/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/cache.go:223 +0xa3
  github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Cache).Values()
      /Users/jonseymour/ninja/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/cache.go:224 +0x142
  command-line-arguments_test.TestCacheRace.func1()
      /Users/jonseymour/ninja/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/cache_race_test.go:42 +0x90

Goroutine 107 (running) created at:
  command-line-arguments_test.TestCacheRace()
      /Users/jonseymour/ninja/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/cache_race_test.go:52 +0xb7e
  testing.tRunner()
      /usr/local/Cellar/go/1.5.3/libexec/src/testing/testing.go:456 +0xdc

Goroutine 51 (finished) created at:
  command-line-arguments_test.TestCacheRace()
      /Users/jonseymour/ninja/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/cache_race_test.go:43 +0xaf8
  testing.tRunner()
      /usr/local/Cellar/go/1.5.3/libexec/src/testing/testing.go:456 +0xdc
==================
PASS

jonseymour · 2016-02-19T01:12:50Z

@jwilder c012e21 fixes the second race condition. I am going to run with this fix for a day to help verify that it doesn't cause a deadlock. (now squashed into d514e5f)

jonseymour · 2016-02-19T03:19:38Z

tsdb/engine/tsm1/cache.go

@@ -247,7 +261,11 @@ func (c *Cache) merged(key string) Values {
 	var entries []*entry
 	sz := 0
 	for _, s := range c.snapshots {
-		e := s.store[key]


This is the second race condition that the test case detects.

jonseymour · 2016-02-19T07:34:55Z

@jwilder I've been running this code on our production server for 6 hours and there has been no-evidence of a deadlock. I wasn't really expecting there to be an issue, but figured it was worth testing, lest the unexpected occurred.

Let me know if you have any concerns, otherwise I'll leave it in your hands.

jonseymour · 2016-02-19T08:09:00Z

Actually this locking approach is wrong. What we should do is clone the snapshot, dedupe the clone then replace the snapshot. That way we don't need fine grained locks

jonseymour · 2016-02-19T09:19:27Z

Yeah, a very much better approach is to use immutable entries and clone them on duplication. This should reduce a lot of the existing need for locking. I'll re-roll this series accordingly.

jonseymour · 2016-02-19T10:37:46Z

Ok, immutable entries aren't going to work for performance and memory reasons. Still, the cloning approach itself is sound.

jonseymour · 2016-02-19T12:10:42Z

Ok, @jwilder here is the final fix. Much simpler now.

jonseymour · 2016-02-23T19:45:32Z

Tweaked the commit message to remove references to the old multi-snapshot design.

jwilder · 2016-02-23T19:49:44Z

tsdb/engine/tsm1/cache.go

@@ -81,6 +81,7 @@ const (
 type Cache struct {
 	mu      sync.RWMutex
 	store   map[string]*entry
+	dirty   map[string]*entry


This is not needed. The snapshot Cache uses the store of the hot cache and the hot cache gets a new map to write into.

This analysis is not correct and is not consistent with the go memory model.

Cache.Deduplicate() executes without taking any kind of lock. To do what it needs to do it sometimes has to update the e.values reference of some entries. This is done here:

https://github.com/jonseymour/influxdb/blob/f44db5df48b7eaa00ccbbc9b76406b29cf5a2bb3/tsdb/engine/tsm1/cache.go#L61-L62

While it is doing this, code that is running with Cache.merged() under either a read or a write lock is accessing the entries at line 302 of cache.go:

https://github.com/jonseymour/influxdb/blob/f44db5df48b7eaa00ccbbc9b76406b29cf5a2bb3/tsdb/engine/tsm1/cache.go#L302

So, in one instance we have a goroutine (c.merged) which is accessing the c.snapshot.store[k].value while properly protected by a lock and another goroutine (Cache.Deduplicate) which is updating the same entry c.snapshot.store[k] without any lock.

This is exactly the condition that the go memory model requires the use of synchronization.

When multiple goroutines access a shared variable v, they must use synchronization events to establish happens-before conditions that ensure reads observe the desired writes.

(https://golang.org/ref/mem)

This is also why the data race detector is reporting the existence of a data race.

Could I ask you to read the go memory model again, ask a colleague to read the same text and this PR and if you are still not convinced ask the #general channel gophers.slack.com for a second opinion. If I am wrong with my analysis then I am quite sure that everyone on that channel will be able to explain exactly why I am wrong.

It was my understanding, for example, that @e-dard did accept the reality of the race condition which is demonstrated by the test in the previous commit.

@jonseymour My point is that dirty is serving exactly the same purpose as what the store of the snapshot cache was intended to serve. Adding another map makes the cache more complicated (3 maps instead of 2) which may solve the race, but it is adding another layer onto something that is still broken IMO.

The problem I see is that the encapsulation of entry, snapshot, etc.. is violated without adequate protection. For example, entry's private state is accessed concurrently, but it doesn't have a mutex protecting it when it is accessed concurrently. The snapshot's store was supposed to be safe copy of the cache to eliminate any chance of concurrent access which is what I believe dirty is also intended to do as well. Seems like the snapshot's store should be make a full safe, copy or the unsafe accesses in entry, etc.. should be protected.

My fix does address the thread safety issue, let me explain how.

merged() never touches the dirty map

Deduplicate() never touches the store map

There is a subset of entries in both maps that both methods touch. They are precisely those for which entry.needSort is false. The .values field of these entries is never updated by code executing in merged() or code executing in Deduplicate().

There are entries in dirty that Deduplicate() updates - those with needSort is true. Observe that those entries are created by cloning entries in .store while holding the cache's write lock.

So the entries in .dirty that Deduplicate() updates are not visible to merged() - they only exist in dirty and the only thing that accesses dirty outside a write lock is Deduplicate()

This why my solution is thread-safe with respect to Deduplicate() and merged().

It is also why my solution passes the data race test, but the previous commit does not.

If you believe it is not a solution, can you explain why the test behaves differently in both cases?

jonseymour · 2016-02-23T21:15:50Z

This PR needs to be rebased but I will not do this until @jwilder has had a chance to digest and consider my line comment above.

jonseymour · 2016-02-24T23:25:01Z

@jwilder I have revised #5701 to use your suggestion to use entry locks in preference to the dirty map. I have also identified an opportunity to remove escalation of the cache read lock to a cache write lock in the implementation of merged(). I haven't incorporated your suggestion of locking with the snapshot lock during Cache.Deduplicate() for the reasons explained in my e-mail. I can re-articulate them here if necessary.

Signed-off-by: Jon Seymour <jon@wildducktheories.com>

There are two tests that show two different one vulnerability. One test shows that Cache.Deduplicate modifies entries in a snapshot's store without a lock while cache readers are deduplicating those same entries while correctly locked. A second test shows that two threads trying to execute the methods that Engine.WriteSnapshot calls will cause concurrent, unsynchronized mutating access to the snapshot's store and entries. The tests fail at this commit and are fixed by subsequent commits. Signed-off-by: Jon Seymour <jon@wildducktheories.com>

Currently two compactors can execute Engine.WriteSnapshot at once. This isn't thread safe since both threads want to make modifications to Cache.snapshot at the same time. This commit introduces a lock which is acquired during Snapshot() and released during ClearSnapshot(), ensuring that at most one thread executes within Engine.WriteSnapshot() at once. To ensure that we always release this lock, but only release the snapshot resources on a successful commit, we modify ClearSnapshot() to accept a boolean which indicates whether the write was successful or not and guarantee to call this function if Snapshot() has been called. Signed-off-by: Jon Seymour <jon@wildducktheories.com>

jonseymour · 2016-02-25T01:28:26Z

To demonstrate that the write lock was offering protection during concurrent reads from the cache, #5825 cherry-picks 4d98a1c onto d7d81f7. The test fails at the resulting commit whereas here it passes. This demonstrates that with the changes in c8f96a3, merged() no longer needs to take the write lock to achieve thread safety during concurrent reads.

Previously, we needed a write lock on the cache because it was the only lock we had available to guard updates to entry.values and entry.needSort. However, now we have a entry-scoped lock for this purpose, we don't need the cache write lock for this purpose. Since merged() doesn't modify the .store or the c.snapshot.sort, there is no need for a write lock on the cache to protect the cache. So, we don't need to escalate here - we simply rely on the entry lock to protect the entries we are iterating over. Signed-off-by: Jon Seymour <jon@wildducktheories.com>

jonseymour · 2016-02-25T02:20:28Z

I measured the performance benefits of removing the escalation to a write lock with a unit test that can be found in https://github.com/jonseymour/influxdb/commits/jss-speed-check

For that test (which doesn't represent real load) the average Values() invocation dropped from 51ms to 29ms.

With the write lock and without #5701:

greywedge2:tsm1 jonseymour (1)$ git checkout jss-speed-check
greywedge2:tsm1 jonseymour (1)$ GORACE=history=10000 go test -v -race cache_race_test.go
=== RUN   TestCheckConcurrentReadsAreSafe
average duration - 51698560 (ns)
--- PASS: TestCheckConcurrentReadsAreSafe (0.34s)
PASS
ok      command-line-arguments  1.359s

Without the write lock (e.g. #5701)

greywedge2:tsm1 jonseymour (1)$ git checkout js-deduplicate-safety
Switched to branch 'js-deduplicate-safety'
greywedge2:tsm1 jonseymour (1)$ git merge jss-speed-check 
Auto-merging tsdb/engine/tsm1/cache_race_test.go
Merge made by the 'recursive' strategy.
 tsdb/engine/tsm1/cache_race_test.go | 37 ++++++++++++++++++++++++-------------
 1 file changed, 24 insertions(+), 13 deletions(-)
greywedge2:tsm1 jonseymour (1)$ GORACE=history=10000 go test -v -race cache_race_test.go
=== RUN   TestCheckConcurrentReadsAreSafe
average duration - 28554186 (ns)
--- PASS: TestCheckConcurrentReadsAreSafe (0.29s)
=== RUN   TestCacheRace
--- PASS: TestCacheRace (0.20s)
=== RUN   TestCacheRace2Compacters
--- PASS: TestCacheRace2Compacters (0.23s)
PASS
ok      command-line-arguments  1.752s

Of course, this isn't real load but it would be interesting to see if there is a measureable pre/post #5701 difference in system tests that check read performance.

95th percentile times were 139ms (with write lock) vs. 58 ms (without)

@jwilder

Based on @jwilder's alternative to the 'dirty' slice that featured in previous iterations of this fix. Suggested-by: Jason Wilder <jason@influxdb.com> Signed-off-by: Jon Seymour <jon@wildducktheories.com>

Previously, we needed a write lock on the cache because it was the only lock we had available to guard updates to entry.values and entry.needSort. However, now we have a entry-scoped lock for this purpose, we don't need the cache write lock for this purpose. Since merged() doesn't modify the .store or the c.snapshot.sort, there is no need for a write lock on the cache to protect the cache. So, we don't need to escalate here - we simply rely on the entry lock to protect the entries we are iterating over. Signed-off-by: Jon Seymour <jon@wildducktheories.com>

jwilder · 2016-02-25T16:18:07Z

👍

tsm: cache: improve thread safety of Cache.Deduplicate (see #5699)

jonseymour force-pushed the js-deduplicate-safety branch from 9a0b270 to 936a7a7 Compare February 18, 2016 05:34

jwilder added this to the 0.11.0 milestone Feb 18, 2016

jwilder added the area/tsm label Feb 18, 2016

jonseymour force-pushed the js-deduplicate-safety branch from b939c75 to 5166f99 Compare February 18, 2016 23:19

jonseymour force-pushed the js-deduplicate-safety branch from 5166f99 to 98fe240 Compare February 19, 2016 00:54

jonseymour force-pushed the js-deduplicate-safety branch 2 times, most recently from 1251896 to 4f534d3 Compare February 19, 2016 01:29

jonseymour force-pushed the js-deduplicate-safety branch from 4f534d3 to 663a674 Compare February 19, 2016 01:39

jonseymour reviewed Feb 19, 2016
View reviewed changes

jonseymour force-pushed the js-deduplicate-safety branch from 663a674 to 01a6fae Compare February 19, 2016 12:10

jonseymour force-pushed the js-deduplicate-safety branch from 4a4867e to 3f29631 Compare February 23, 2016 19:44

jwilder reviewed Feb 23, 2016
View reviewed changes

jonseymour force-pushed the js-deduplicate-safety branch 2 times, most recently from b810c72 to ee83509 Compare February 24, 2016 23:22

jonseymour added 3 commits February 25, 2016 12:06

tsm: cache: add a test that demonstrates concurrent reads are safe

d7d81f7

Signed-off-by: Jon Seymour <jon@wildducktheories.com>

jonseymour force-pushed the js-deduplicate-safety branch from ee83509 to 385221e Compare February 25, 2016 01:14

jonseymour mentioned this pull request Feb 25, 2016

[don't merge me] - designed to show write lock in merged() is no longer needed after #5701 #5825

Closed

tsm: cache: introduce entry locks.

452d77c

Based on @jwilder's alternative to the 'dirty' slice that featured in previous iterations of this fix. Suggested-by: Jason Wilder <jason@influxdb.com> Signed-off-by: Jon Seymour <jon@wildducktheories.com>

jonseymour force-pushed the js-deduplicate-safety branch 4 times, most recently from 6636d03 to 4809a8b Compare February 25, 2016 13:53

jonseymour force-pushed the js-deduplicate-safety branch from 4809a8b to 4d98a1c Compare February 25, 2016 14:32

jonseymour mentioned this pull request Feb 25, 2016

tsm: cache: need to check that snapshot has been sorted #5833

Merged

jwilder added a commit that referenced this pull request Feb 25, 2016

Merge pull request #5701 from jonseymour/js-deduplicate-safety

daf68db

tsm: cache: improve thread safety of Cache.Deduplicate (see #5699)

jwilder merged commit daf68db into influxdata:master Feb 25, 2016

This was referenced Feb 29, 2016

[0.10.0rc1] panic in tsm1.Values.Deduplicate #5857

Closed

[0.10.x] Backport of cache related changes to 0.10.x #5858

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tsm: cache: improve thread safety of Cache.Deduplicate (see #5699) #5701

tsm: cache: improve thread safety of Cache.Deduplicate (see #5699) #5701

jonseymour commented Feb 16, 2016

e-dard commented Feb 16, 2016

e-dard commented Feb 16, 2016

jonseymour commented Feb 16, 2016

jwilder commented Feb 18, 2016

jonseymour commented Feb 18, 2016

jonseymour commented Feb 19, 2016

jonseymour commented Feb 19, 2016

jonseymour commented Feb 19, 2016

jonseymour Feb 19, 2016

jonseymour commented Feb 19, 2016

jonseymour commented Feb 19, 2016

jonseymour commented Feb 19, 2016

jonseymour commented Feb 19, 2016

jonseymour commented Feb 19, 2016

jonseymour commented Feb 23, 2016

jwilder Feb 23, 2016

jonseymour Feb 23, 2016

jwilder Feb 23, 2016

jonseymour Feb 23, 2016

jonseymour commented Feb 23, 2016

jonseymour commented Feb 24, 2016

jonseymour commented Feb 25, 2016

jonseymour commented Feb 25, 2016

jwilder commented Feb 25, 2016

tsm: cache: improve thread safety of Cache.Deduplicate (see #5699) #5701

tsm: cache: improve thread safety of Cache.Deduplicate (see #5699) #5701

Conversation

jonseymour commented Feb 16, 2016

e-dard commented Feb 16, 2016

e-dard commented Feb 16, 2016

jonseymour commented Feb 16, 2016

jwilder commented Feb 18, 2016

jonseymour commented Feb 18, 2016

jonseymour commented Feb 19, 2016

jonseymour commented Feb 19, 2016

jonseymour commented Feb 19, 2016

jonseymour Feb 19, 2016

Choose a reason for hiding this comment

jonseymour commented Feb 19, 2016

jonseymour commented Feb 19, 2016

jonseymour commented Feb 19, 2016

jonseymour commented Feb 19, 2016

jonseymour commented Feb 19, 2016

jonseymour commented Feb 23, 2016

jwilder Feb 23, 2016

Choose a reason for hiding this comment

jonseymour Feb 23, 2016

Choose a reason for hiding this comment

jwilder Feb 23, 2016

Choose a reason for hiding this comment

jonseymour Feb 23, 2016

Choose a reason for hiding this comment

jonseymour commented Feb 23, 2016

jonseymour commented Feb 24, 2016

jonseymour commented Feb 25, 2016

jonseymour commented Feb 25, 2016

jwilder commented Feb 25, 2016