-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tsm: ensure that snapshot writes are eventually retried/released, even if WriteSnapshot fails #5689
tsm: ensure that snapshot writes are eventually retried/released, even if WriteSnapshot fails #5689
Commits on Feb 22, 2016
-
tsm: cache: add a test that demonstrates the the race condition.
To run the test, do: cd tsdb/engine/tsm1 Then run: GORACE=history=1000 go test -race cache_race_test.go
Configuration menu - View commit details
-
Copy full SHA for 32fc885 - Browse repository at this point
Copy the full SHA 32fc885View commit details -
tsm: cache: improve thread safety of Cache.Deduplicate.
The current implementation of Cache.Deduplicate is not thread safe as revealed by a code review and, seperately, a go race detector test case. The first race condition arises because entries being deduplicated by the Cache.Deduplicate method were being read by the cache logic without the deduplicating thread taking a write lock. A second race condition arose with one of the initial fixes because the snapshot map was being updated by the deduplicating thread without a readlock (on the snapshot) being taken by readers of the snapshot. This fix works by taking a copy of the snapshot's store (called dirty) while the write lock of the engine's cache is held. Only the entries that need sorting are cloned. The other entries are copied by reference. Deduplication of the slice occurs on the dirty slice, rather than the store slice. There is no thread safety issue with the dirty slice because only the snapshotting thread sees that slice. The entries that are updated by the snapshotting thread are the entries that need sorting but all these entries are clones which are not visible to any other thread. When we are done deduplicating, we update the store of the snapshot while holding the write lock of the engine's cache - not that of the snapshot itself. Once a snapshot is deduplicated, it is never modified again. In the time between the snapshot being created and the deduplication finishing, the snapshot may be modified, however, in these cases the writes are guarded by a writelock on the engine's cache and the snapshotting thread does not refer to those values. The end result is that it is no longer necessary to take any snapshot locks the lock guarding the engine's cache is sufficient. The snapshotter does need to take write locks on the engine's cache but only at two points: * while taking the initial snapshot * after deduplication is complete when updating the cache's store reference Signed-off-by: Jon Seymour <jon@wildducktheories.com>
Configuration menu - View commit details
-
Copy full SHA for 26fd71d - Browse repository at this point
Copy the full SHA 26fd71dView commit details
Commits on Feb 23, 2016
-
tsm: rename Snapshot/ClearSnapshot to PrepareSnapshot/CommitSnapshot
Also, introduce a currently empty RollbackSnapshot method. Later, we are going to use a commit pattern where resources are first prepared and then either committed or rolled back. Currently, the code doesn't understand the concept of a rollback, but we will add it later. For now, we will simply rename the existing methods to their final names and add the RollbackSnapshot() method as a placeholder. Signed-off-by: Jon Seymour <jon@wildducktheories.com>
Configuration menu - View commit details
-
Copy full SHA for 982c4df - Browse repository at this point
Copy the full SHA 982c4dfView commit details -
tsm: ensure that either CommitSnapshot() or RollbackSnapshot() is cal…
…led. Signed-off-by: Jon Seymour <jon@wildducktheories.com>
Configuration menu - View commit details
-
Copy full SHA for 964a0e4 - Browse repository at this point
Copy the full SHA 964a0e4View commit details -
This lock ensures that at most one thread is actively attempting to commit the WAL log for a single shard at once. The assumption here it is that it is not helpful to have more than one thread attempting to do this and that reasoning about correctness is much more easily achieved if one only has to consider a single thread. The Prepare/Commit or Rollback pattern ensures that the lock will always be released. This is not to say that the TSM writing process might not be benefit from concurrency, only that the concurrency should be managed within the scope of the shard-wide commit lock. Note that the acqusition order is commit lock first, then cache mutex. This ensures the cache is available for writes while the snapshot is being processed. Signed-off-by: Jon Seymour <jon@wildducktheories.com>
Configuration menu - View commit details
-
Copy full SHA for 8d1b7b8 - Browse repository at this point
Copy the full SHA 8d1b7b8View commit details -
tsm: simplify interface to CommitSnapshot() and RollbackSnapshot()
Now that there is only one thread allowed to be actively writing a snapshot at once, we can simplify the the commit logic to simply removing all active snapshots. It also means we do not need to scan for the committed snapshot since we are going to purge all of them. Note that Rollback doesn't touch the snapshot slice - it leaves the snapshot in place until a subsequent compaction succeeds at which point it will be removed. Signed-off-by: Jon Seymour <jon@wildducktheories.com>
Configuration menu - View commit details
-
Copy full SHA for dda3b19 - Browse repository at this point
Copy the full SHA dda3b19View commit details -
tsm: ensure previously failed snapshot writes are retried
The previous commits ensured that snapshots that failed to write would eventually be released. However, only the current snapshot would get written to disk. This commit ensures that the writes for previously failed snapshots are retried again. To ensure we cleanup only those WAL segments that we have successfully compacted, we need to keep track of precisely which WAL segments each snapshot captures, and retry all the snapshots in the same order. We only remove WAL segments for snapshots that we successfully write and stop on the first error we encounter. When we rollback, we only rollback the snapshots that we failed to write. Signed-off-by: Jon Seymour <jon@wildducktheories.com>
Configuration menu - View commit details
-
Copy full SHA for 59d5319 - Browse repository at this point
Copy the full SHA 59d5319View commit details -
tsm: extend TestCache_CacheSnapshot
Add some tests to check that new files specified on PrepareSnapshots are associated with the latest snapshot.
Configuration menu - View commit details
-
Copy full SHA for 393e07e - Browse repository at this point
Copy the full SHA 393e07eView commit details -
tsm: add TestCache_CacheRollback
The test now demonstrates that if a snapshot is rolled back, the failed snapshots (only) will appear in the slice returned by the next PrepareSnapshots call. The test demonstrates how the discovery of new WAL segment leads to the new files being associated with the latest snapshot. The test demonstrates that when retrying multiple snapshots, we can successfully commit partial progress, even if we need to rollback one of the later snapshots.
Configuration menu - View commit details
-
Copy full SHA for c8796d9 - Browse repository at this point
Copy the full SHA c8796d9View commit details -
tsm: add TestCache_CacheTestCommitLock
A test to check that the commit lock is working as expected. In particular, we want to check that if there is a pending commit, a prepare issued by a concurrent goroutine will block until such time as the current holder of the lock either commits or rolls back. This test also checks that concurrent writes into the cache are not prevented by goroutines that hold the cache's commit lock. Signed-off-by: Jon Seymour <jon@wildducktheories.com>
Configuration menu - View commit details
-
Copy full SHA for 80e97b4 - Browse repository at this point
Copy the full SHA 80e97b4View commit details