-
Notifications
You must be signed in to change notification settings - Fork 138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CBG-3852: Implementation of single sequence entry for skipped sequence slice #6747
Conversation
db/skipped_sequence.go
Outdated
) | ||
|
||
const ( | ||
ClipCapacityHeadroom = 100 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For testing/tuning, do we want to define this as a property on SkippedSequenceSlice (and maybe change this constant to DefaultClipCapacityHeadroom).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, this value seems quite small to me, since we care about this most when the size of the slice is very large. Thinking of a case where the length of the slice grows to 2.5M, then shrinks to 1M: the new capacity will be 1000100. If only 101 sequences are added to the list, go is going to increase the capacity to 2000202. I think we want a bit more buffer than that to minimize churn when the overall trend in the slice size is shrinking, but there's still some load.
Was there a specific reason for this value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the headroom to be configurable. Also changed the default to be 1000. My thinking behind this was after looking at a few customer environments (that didn't have issues with skipped sequences) and the list never seemed to grown more than 30-40 in any environment I looked at. So thought I should have this capacity headroom fairly low. But you raise a good point, if we get a big jump in slice length this doesn't give much headroom for further jumps.
Another option is to have two headroom values, one for when the list isn't getting used much < around 100 items say, and one for when slice grows > 10,000 for example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that in most cases the list is small, but the memory associated with a minimum allocation of 1000 (or 10000) when the list is otherwise empty should be negligible, so I think a single value is fine.
s.lock.Lock() | ||
defer s.lock.Unlock() | ||
s.list = append(s.list[:index+1], s.list[index:]...) | ||
s.list[index] = entry |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SliceTricks recommends something a little bit different for insert:
s = append(s, 0 /* use the zero value of the element type */)
copy(s[i+1:], s[i:])
s[i] = x
Is there an advantage to the approach you've got?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Slight difference in the benchmarking. Below is my approach currently:
BenchmarkInsertSkippedItem
BenchmarkInsertSkippedItem/single_entries_large_slice
BenchmarkInsertSkippedItem/single_entries_large_slice-10 309714 3615 ns/op 108 B/op 1 allocs/op
BenchmarkInsertSkippedItem/single_entries_small_slice
BenchmarkInsertSkippedItem/single_entries_small_slice-10 8871403 136.3 ns/op 109 B/op 1 allocs/op
Then with the above suggested approach:
BenchmarkInsertSkippedItem
BenchmarkInsertSkippedItem/single_entries_large_slice
BenchmarkInsertSkippedItem/single_entries_large_slice-10 311587 3610 ns/op 123 B/op 2 allocs/op
BenchmarkInsertSkippedItem/single_entries_small_slice
BenchmarkInsertSkippedItem/single_entries_small_slice-10 7984885 140.0 ns/op 114 B/op 2 allocs/op
Seems that the SliceTricks approach has extra allocation per op and slightly more bytes per op used (but really not much difference).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the SliceTricks has an extra allocation, if it's implemented as:
s.list = append(s.list, nil)
copy(s.list[index+1:], s.list[index:])
s.list[index] = entry
When running locally the two approaches seemed equivalent from a timing and allocation perspective. I couldn't find anything definitive in the go docs/slice tricks about the relative benefits of:
s= append(s, nil)
copy(s[i+1:], s[i:])
versus
s = append(s[:i+1], s.list[i:]...)
Now that I've spent this much time on it, though, I'm wondering whether we even expect a use case for insert, since this is an append-only slice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see where I went wrong now, thank you for putting me right! We need insert for when we have to split ranges when sequences arrive at the cache. So if we have range 10-20 and sequence 15 arrives, we split range into 10-14 and insert new range 16-20. I was thinking this is needed for single entries too as if we have range 10-20 and 19 arrives we can alter range to 10-18 but then replace the range and index+1 with a single entry for sequence 20.
type SkippedSequenceSlice struct { | ||
list []SkippedSequenceListInterface | ||
lock sync.RWMutex | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we implement the following methods (that are defined for SkippedSequenceList today) on SkippedSequenceSlice in this PR, so that we can ensure the implementation is valid for these?
- getOldest
- Contains
- getOlderThan
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have changed the IsSkipped method to be named Contains, implemented getOldest with a test. With regards to getOlderThan. My implementation of SkippedSequenceCompact will cover this, my thinking was when we integrate this into the cache we can just call SkippedSequenceCompact from CleanSkippedSequenceQueue. Although getOlderThan returns the sequences older than a threshold, these sequences are just used to actually go ahead and remove them from the list a little further down in CleanSkippedSequenceQueue (which we already do in SkippedSequenceCompact). So overall my thinking it would just be best to call SkippedSequenceCompact and have it return how many its removing for logging/stats purposes in CleanSkippedSequenceQueue. It will also simplify CleanSkippedSequenceQueue method.
I have updated the test TestCompactSkippedList
to reflect this change.
s.lock.Lock() | ||
defer s.lock.Unlock() | ||
s.list = append(s.list[:index+1], s.list[index:]...) | ||
s.list[index] = entry |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the SliceTricks has an extra allocation, if it's implemented as:
s.list = append(s.list, nil)
copy(s.list[index+1:], s.list[index:])
s.list[index] = entry
When running locally the two approaches seemed equivalent from a timing and allocation perspective. I couldn't find anything definitive in the go docs/slice tricks about the relative benefits of:
s= append(s, nil)
copy(s[i+1:], s[i:])
versus
s = append(s[:i+1], s.list[i:]...)
Now that I've spent this much time on it, though, I'm wondering whether we even expect a use case for insert, since this is an append-only slice?
db/skipped_sequence.go
Outdated
) | ||
|
||
const ( | ||
ClipCapacityHeadroom = 100 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that in most cases the list is small, but the memory associated with a minimum allocation of 1000 (or 10000) when the list is otherwise empty should be negligible, so I think a single value is fine.
* CBG-3852: Implementation of single sequence entry for skipped sequence slice (#6747) * CBG-3852: implementation of single sequence entry for skipped sequence slice * minor updates to make setLastSeq no-op for single values * updates for review round * remove not needed comment * updates based off review * CBG-3853: Implementation of range sequence entry for skipped sequence slice (#6764) * CBG-3854: Integration of the skipped sequence slice into the cache (#6789) * CBG-3854: Integeration of the skipped sequence slice into the cache * remove old chnages not needed anymore + gh actions failures * updates based of review. Move stat update to updateStats and alter soem naming. Test updates to reflect stat chnages too. * remove incorrect incrementing of num current sequences inside PushSkippedSequenceEntry * Fix race condition in updateStats() * updates off review * tidy CleanSkippedSequenceQueue * stats update * CBG-3855: Sequence range removal support (#6801) * CBG-3855: support for seqeuence range removal * add comment to function + remove repeated code * refactor error handling, simplify stat calculation update * CBG-3850: Optimise releaseUnusedSequenceRange (#6831) * CBG-3850: Optimise releaseUnusedSequenceRange function + support for ranges in pending queue * updates to add _pushRangeToPending and _removeSubsetOfRangeFromSkipped to process duplicates sequences between unused rnage and pending/skipped lists * fix race condition in test * fix for incorrect logic for removing skipped range * updates based off review * updates to fix incorrect code + add extra test coverage of changes * change error name * CBG-3945: Fix for panic in _clip method on skipped sequence slice (#6842) * fix test to use old api
CBG-3852
Initial implementation of skipped sequence interfcae and addition of a single skipped sequence entry.
There are some incomplete functions that will require ranges to be implemented so have left comments on these functions/tests.
Also includes Benchmarks for each operation.
Pre-review checklist
fmt.Print
,log.Print
, ...)base.UD(docID)
,base.MD(dbName)
)docs/api
Integration Tests
GSI=true,xattrs=true
https://jenkins.sgwdev.com/job/SyncGateway-Integration/000/