CBG-3852: Implementation of single sequence entry for skipped sequence slice #6747

gregns1 · 2024-03-28T11:44:28Z

Initial implementation of skipped sequence interfcae and addition of a single skipped sequence entry.
There are some incomplete functions that will require ranges to be implemented so have left comments on these functions/tests.
Also includes Benchmarks for each operation.

Pre-review checklist

Removed debug logging (fmt.Print, log.Print, ...)
Logging sensitive data? Make sure it's tagged (e.g. base.UD(docID), base.MD(dbName))
Updated relevant information in the API specifications (such as endpoint descriptions, schemas, ...) in docs/api

Integration Tests

GSI=true,xattrs=true https://jenkins.sgwdev.com/job/SyncGateway-Integration/000/

adamcfraser · 2024-03-28T18:59:35Z

db/skipped_sequence.go

+)
+
+const (
+	ClipCapacityHeadroom = 100


For testing/tuning, do we want to define this as a property on SkippedSequenceSlice (and maybe change this constant to DefaultClipCapacityHeadroom).

Also, this value seems quite small to me, since we care about this most when the size of the slice is very large. Thinking of a case where the length of the slice grows to 2.5M, then shrinks to 1M: the new capacity will be 1000100. If only 101 sequences are added to the list, go is going to increase the capacity to 2000202. I think we want a bit more buffer than that to minimize churn when the overall trend in the slice size is shrinking, but there's still some load.

Was there a specific reason for this value?

Updated the headroom to be configurable. Also changed the default to be 1000. My thinking behind this was after looking at a few customer environments (that didn't have issues with skipped sequences) and the list never seemed to grown more than 30-40 in any environment I looked at. So thought I should have this capacity headroom fairly low. But you raise a good point, if we get a big jump in slice length this doesn't give much headroom for further jumps.
Another option is to have two headroom values, one for when the list isn't getting used much < around 100 items say, and one for when slice grows > 10,000 for example.

I agree that in most cases the list is small, but the memory associated with a minimum allocation of 1000 (or 10000) when the list is otherwise empty should be negligible, so I think a single value is fine.

db/skipped_sequence.go

adamcfraser · 2024-03-28T19:27:27Z

db/skipped_sequence.go

+	s.lock.Lock()
+	defer s.lock.Unlock()
+	s.list = append(s.list[:index+1], s.list[index:]...)
+	s.list[index] = entry


SliceTricks recommends something a little bit different for insert:

s = append(s, 0 /* use the zero value of the element type */) copy(s[i+1:], s[i:]) s[i] = x

Is there an advantage to the approach you've got?

Slight difference in the benchmarking. Below is my approach currently:

BenchmarkInsertSkippedItem BenchmarkInsertSkippedItem/single_entries_large_slice BenchmarkInsertSkippedItem/single_entries_large_slice-10 309714 3615 ns/op 108 B/op 1 allocs/op BenchmarkInsertSkippedItem/single_entries_small_slice BenchmarkInsertSkippedItem/single_entries_small_slice-10 8871403 136.3 ns/op 109 B/op 1 allocs/op

Then with the above suggested approach:

BenchmarkInsertSkippedItem BenchmarkInsertSkippedItem/single_entries_large_slice BenchmarkInsertSkippedItem/single_entries_large_slice-10 311587 3610 ns/op 123 B/op 2 allocs/op BenchmarkInsertSkippedItem/single_entries_small_slice BenchmarkInsertSkippedItem/single_entries_small_slice-10 7984885 140.0 ns/op 114 B/op 2 allocs/op

Seems that the SliceTricks approach has extra allocation per op and slightly more bytes per op used (but really not much difference).

I don't think the SliceTricks has an extra allocation, if it's implemented as:

s.list = append(s.list, nil) copy(s.list[index+1:], s.list[index:]) s.list[index] = entry

When running locally the two approaches seemed equivalent from a timing and allocation perspective. I couldn't find anything definitive in the go docs/slice tricks about the relative benefits of:

s= append(s, nil) copy(s[i+1:], s[i:]) versus s = append(s[:i+1], s.list[i:]...)

Now that I've spent this much time on it, though, I'm wondering whether we even expect a use case for insert, since this is an append-only slice?

Ah I see where I went wrong now, thank you for putting me right! We need insert for when we have to split ranges when sequences arrive at the cache. So if we have range 10-20 and sequence 15 arrives, we split range into 10-14 and insert new range 16-20. I was thinking this is needed for single entries too as if we have range 10-20 and 19 arrives we can alter range to 10-18 but then replace the range and index+1 with a single entry for sequence 20.

db/skipped_sequence.go

adamcfraser · 2024-03-28T19:37:33Z

db/skipped_sequence.go

+type SkippedSequenceSlice struct {
+	list []SkippedSequenceListInterface
+	lock sync.RWMutex
+}


Can we implement the following methods (that are defined for SkippedSequenceList today) on SkippedSequenceSlice in this PR, so that we can ensure the implementation is valid for these?

getOldest

Contains

getOlderThan

I have changed the IsSkipped method to be named Contains, implemented getOldest with a test. With regards to getOlderThan. My implementation of SkippedSequenceCompact will cover this, my thinking was when we integrate this into the cache we can just call SkippedSequenceCompact from CleanSkippedSequenceQueue. Although getOlderThan returns the sequences older than a threshold, these sequences are just used to actually go ahead and remove them from the list a little further down in CleanSkippedSequenceQueue (which we already do in SkippedSequenceCompact). So overall my thinking it would just be best to call SkippedSequenceCompact and have it return how many its removing for logging/stats purposes in CleanSkippedSequenceQueue. It will also simplify CleanSkippedSequenceQueue method.

I have updated the test TestCompactSkippedList to reflect this change.

db/skipped_sequence_test.go

adamcfraser · 2024-04-04T22:26:27Z

db/skipped_sequence.go

+	s.lock.Lock()
+	defer s.lock.Unlock()
+	s.list = append(s.list[:index+1], s.list[index:]...)
+	s.list[index] = entry


I don't think the SliceTricks has an extra allocation, if it's implemented as:

s.list = append(s.list, nil) copy(s.list[index+1:], s.list[index:]) s.list[index] = entry

When running locally the two approaches seemed equivalent from a timing and allocation perspective. I couldn't find anything definitive in the go docs/slice tricks about the relative benefits of:

s= append(s, nil) copy(s[i+1:], s[i:]) versus s = append(s[:i+1], s.list[i:]...)

Now that I've spent this much time on it, though, I'm wondering whether we even expect a use case for insert, since this is an append-only slice?

adamcfraser · 2024-04-04T22:31:02Z

db/skipped_sequence.go

+)
+
+const (
+	ClipCapacityHeadroom = 100


I agree that in most cases the list is small, but the memory associated with a minimum allocation of 1000 (or 10000) when the list is otherwise empty should be negligible, so I think a single value is fine.

…e slice

…e slice (#6747) * CBG-3852: implementation of single sequence entry for skipped sequence slice * minor updates to make setLastSeq no-op for single values * updates for review round * remove not needed comment * updates based off review

* CBG-3852: Implementation of single sequence entry for skipped sequence slice (#6747) * CBG-3852: implementation of single sequence entry for skipped sequence slice * minor updates to make setLastSeq no-op for single values * updates for review round * remove not needed comment * updates based off review * CBG-3853: Implementation of range sequence entry for skipped sequence slice (#6764) * CBG-3854: Integration of the skipped sequence slice into the cache (#6789) * CBG-3854: Integeration of the skipped sequence slice into the cache * remove old chnages not needed anymore + gh actions failures * updates based of review. Move stat update to updateStats and alter soem naming. Test updates to reflect stat chnages too. * remove incorrect incrementing of num current sequences inside PushSkippedSequenceEntry * Fix race condition in updateStats() * updates off review * tidy CleanSkippedSequenceQueue * stats update * CBG-3855: Sequence range removal support (#6801) * CBG-3855: support for seqeuence range removal * add comment to function + remove repeated code * refactor error handling, simplify stat calculation update * CBG-3850: Optimise releaseUnusedSequenceRange (#6831) * CBG-3850: Optimise releaseUnusedSequenceRange function + support for ranges in pending queue * updates to add _pushRangeToPending and _removeSubsetOfRangeFromSkipped to process duplicates sequences between unused rnage and pending/skipped lists * fix race condition in test * fix for incorrect logic for removing skipped range * updates based off review * updates to fix incorrect code + add extra test coverage of changes * change error name * CBG-3945: Fix for panic in _clip method on skipped sequence slice (#6842) * fix test to use old api

gregns1 assigned gregns1 and unassigned gregns1 Mar 28, 2024

gregns1 requested a review from adamcfraser March 28, 2024 15:40

gregns1 assigned adamcfraser Mar 28, 2024

adamcfraser requested changes Mar 28, 2024

View reviewed changes

adamcfraser assigned gregns1 and unassigned adamcfraser Mar 28, 2024

gregns1 assigned adamcfraser and unassigned gregns1 Apr 2, 2024

gregns1 requested a review from adamcfraser April 2, 2024 13:36

adamcfraser reviewed Apr 4, 2024

View reviewed changes

adamcfraser assigned gregns1 and unassigned adamcfraser Apr 4, 2024

gregns1 force-pushed the CBG-3852 branch from c4c7453 to ee1346d Compare April 8, 2024 09:37

gregns1 assigned adamcfraser and unassigned gregns1 Apr 8, 2024

gregns1 requested a review from adamcfraser April 8, 2024 09:49

bbrks force-pushed the main branch from a9811cd to 1879730 Compare April 8, 2024 11:02

gregns1 added 5 commits April 8, 2024 15:31

CBG-3852: implementation of single sequence entry for skipped sequenc…

8ce6b14

…e slice

minor updates to make setLastSeq no-op for single values

8cfa7d6

updates for review round

de3cd31

remove not needed comment

6bc6a96

updates based off review

2a64991

gregns1 force-pushed the CBG-3852 branch from ee1346d to 2a64991 Compare April 8, 2024 14:31

adamcfraser approved these changes Apr 8, 2024

View reviewed changes

adamcfraser merged commit cb85b7c into main Apr 8, 2024
32 checks passed

adamcfraser deleted the CBG-3852 branch April 8, 2024 18:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CBG-3852: Implementation of single sequence entry for skipped sequence slice #6747

CBG-3852: Implementation of single sequence entry for skipped sequence slice #6747

gregns1 commented Mar 28, 2024

adamcfraser Mar 28, 2024

adamcfraser Mar 28, 2024

gregns1 Apr 2, 2024

adamcfraser Apr 4, 2024

adamcfraser Mar 28, 2024

gregns1 Apr 2, 2024

adamcfraser Apr 4, 2024

gregns1 Apr 8, 2024

adamcfraser Mar 28, 2024

gregns1 Apr 2, 2024 •

edited

Loading

adamcfraser Apr 4, 2024

adamcfraser Apr 4, 2024

CBG-3852: Implementation of single sequence entry for skipped sequence slice #6747

CBG-3852: Implementation of single sequence entry for skipped sequence slice #6747

Conversation

gregns1 commented Mar 28, 2024

Pre-review checklist

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gregns1 Apr 2, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gregns1 Apr 2, 2024 •

edited

Loading