-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: implement sequence caching and cached sequence serial normalization #56954
sql: implement sequence caching and cached sequence serial normalization #56954
Conversation
9ad6f16
to
efa9362
Compare
94632f5
to
a972571
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 8 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @jayshrivastava)
pkg/sql/exec_util.go, line 745 at r1 (raw file):
VersionUpgradeHook func(ctx context.Context, to roachpb.Version) error SequenceCache *SequenceCache
give this a comment to indicate that it is node-level cache of sequence values.
pkg/sql/sequence.go, line 43 at r1 (raw file):
// are stored using sequence descriptor IDs as keys. The cache should only // be accessed using the provided API. type SequenceCache struct {
let's iterate on this a bit. I think it's worthwhile to write some microbenchmarks to help guide the optimizations here.
I think it's a bad idea to hold the mutex during the entire operation for a given cache value. Instead let's utilize a singleflight.Group for initializing and updating the cache value.
Another thing we will likely eventually want to do is to switch to a syncutil.IntMap
but let's leave that off for the first optimization path.
pkg/sql/sequence.go, line 64 at r1 (raw file):
currentIndex int // Stores values to be given out. values *[]int64
rather than using a slice, I think we can get away with constant memory usage. What if we had a curVal, increment, maxVal or something like that.
pkg/sql/sequence.go, line 186 at r1 (raw file):
} if seqOpts.CacheSize == 1 {
all previously existing sequences will have a value of 0
. How do we deal with that? Can you write a test that injects such a descriptor and ensure that it behaves properly?
b8ad19f
to
5356781
Compare
5356781
to
0fc1d9e
Compare
0fc1d9e
to
110cb70
Compare
Thank you for updating your pull request. Before a member of our team reviews your PR, I have some potential action items for you:
🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like you need a rebase. Sorry I took so darn long to get back to this. It's very close.
Reviewed 1 of 12 files at r2.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @jayshrivastava)
pkg/sql/exec_util.go, line 783 at r5 (raw file):
// VersionUpgradeHook is called after validating a `SET CLUSTER SETTING // version` but before executing it. It structuredcan carry out arbitrary migrations
detritus?
pkg/sql/exec_util.go, line 2250 at r5 (raw file):
} // initSequenceCache
Missing a comment.
pkg/sql/sequence.go, line 105 at r5 (raw file):
val = int64(rowid) } else { fetchNextValues := func() (int64, int64, int64, error) {
Consider pulling everything in this else into a helper.
pkg/sql/sequence.go, line 224 at r5 (raw file):
Quoted 4 lines of code…
// If there are cached values, changing the value of the sequence should invalidate the cache. // For simplicity, the cache invalidates implicitly when it sees new descriptor versions. Thus, // a schema change is triggered here to make sure the cache gets invalidated. mutableDescriptor, err := resolver.ResolveMutableExistingTableObject(ctx, p, seqName, true, tree.ResolveRequireSequenceDesc)
I don't think we need to do this. Simpler not to. Sorry I sent you astray here.
ALTER SEQUENCE will not immediately affect nextval results in backends, other than the current one, that have preallocated (cached) sequence values. They will use up all cached values prior to noticing the changed sequence generation parameters. The current backend will be affected immediately.
110cb70
to
307787f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @jayshrivastava)
pkg/sql/exec_util.go, line 783 at r5 (raw file):
Previously, ajwerner wrote…
detritus?
Yes.
pkg/sql/exec_util.go, line 2250 at r5 (raw file):
Previously, ajwerner wrote…
Missing a comment.
Done.
pkg/sql/sequence.go, line 224 at r5 (raw file):
Previously, ajwerner wrote…
// If there are cached values, changing the value of the sequence should invalidate the cache. // For simplicity, the cache invalidates implicitly when it sees new descriptor versions. Thus, // a schema change is triggered here to make sure the cache gets invalidated. mutableDescriptor, err := resolver.ResolveMutableExistingTableObject(ctx, p, seqName, true, tree.ResolveRequireSequenceDesc)
I don't think we need to do this. Simpler not to. Sorry I sent you astray here.
ALTER SEQUENCE will not immediately affect nextval results in backends, other than the current one, that have preallocated (cached) sequence values. They will use up all cached values prior to noticing the changed sequence generation parameters. The current backend will be affected immediately.
Yup it's simpler not to.
beb0787
to
24b7752
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My one remaining thing on this is do we need a cluster version to gate this in the mixed version state. This week is the last week before stability period. I don't have much doubt that this is going to make it across the line. Thanks for all the revisions!
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @arulajmani, and @jayshrivastava)
pkg/sql/sequence.go, line 171 at r19 (raw file):
Quoted 4 lines of code…
if endValue < seqOpts.MinValue { if endValue-seqOpts.Increment*cacheSize <= seqOpts.MinValue { return 0, 0, 0, boundsExceededError(descriptor)
What is this case dealing with?
pkg/sql/sequence.go, line 176 at r19 (raw file):
return 0, 0, 0, boundsExceededError(descriptor) } return endValue - seqOpts.Increment*(cacheSize-1), seqOpts.Increment, int64(math.Abs(float64(seqOpts.MinValue-(endValue-seqOpts.Increment*cacheSize)))) / seqOpts.Increment * -1, nil
It feels like there's some repetition that can be dealt with here.
69b7f02
to
5716da0
Compare
Previously, incrementing sequences at a high throughput would result in many distributed writes to the KV layer due to MVCC. This has caused garbage collection problems in the past. This would occur in situations such as bulk importing data while using the sequence number as an id for each new row being added. This change allows clients to cache sequence numbers in their local session data. When the cache is empty, the sequence will be incremented once by the cache size * increment amount, which are both sequence options. Then, all the intermediate values will be cached locally on a node to be given out whenever the sequence is incremented. To accommodate schema changes, cached sequences values will be invalidated when new descriptor versions are seen by the cache. This invalidation can occur when old versions are seen as well to accommodate schema change rollbacks. Release note (sql change): Using the CACHE sequence option no longer results in an "unimplemented" error. The CACHE option is now fully implemented and will allow nodes to cache sequence numbers. A cache size of 1 means that there is no cache, and cache sizes of less than 1 are not valid.
Previously, there was no benchmark that tested the performance of concurrent increments to sequences. There was also no benchmark which compared sequence performance based on different cache sizes. This change adds a benchmark to measure performance based on the above criteria. Release note: None
Closes: cockroachdb#51259 Release note (sql change): The `serial_normalization` session variable can now be set to the value `sql_sequence_cached`. If this value is set, the cluster setting `sql.defaults.serial_sequences_cache_size` can be used to control the number of values to cache in a user's session with a default of 256. When the cache is empty, the the underlying sequence will only be incremened once to populate it. Using `sql_sequence_cached` will result in better performance than `sql_sequence` because the former will perform fewer distributed calls to increment sequences. However, cached seqences may result in large gaps between serial sequence numbers if a session terminates before using all the values in its cache.
5716da0
to
40f998e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having a cluster in a mixed version state is fine.
Caching occurs in a session, so the feature is isolated from other nodes. The new code handles all cache sizes, including 0. The old code will ignore a sequence's cache size when incrementing a sequence. In the case of creating/altering a sequence's cache size, the old code will just throw an error saying that caching is not supported.
One tricky thing is that a session running the new code may increment a sequence beyond its MAXVALUE several times. A user talking to another node may try to increase the MAXVALUE so that the sequence can be incremented again. If the other node is running the new version, any value greater than the old max value will make the sequence incrementable again. This is because the new alter sequence code will restore the value of the sequence to the old MAXVALUE before executing the schema change. If the node has an old version, this will not happen. To make the sequence incrementable again, a user will have to set a max value greater than the current value of the sequence or they will have to use setval() to set the value of the sequence to the old MAXVALUE first. Overall, I think this is okay since sequences are non-transactional and we are allowed to skip numbers.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @arulajmani)
pkg/sql/sequence.go, line 171 at r19 (raw file):
Previously, ajwerner wrote…
if endValue < seqOpts.MinValue { if endValue-seqOpts.Increment*cacheSize <= seqOpts.MinValue { return 0, 0, 0, boundsExceededError(descriptor)
What is this case dealing with?
Just repetition of the above case when seqOpts.Increment < 0. I just updated the code to not have this repetition.
40f998e
to
192bac7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Get rid of the float math.Abs
stuff and this is ready for bors!
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @arulajmani, and @jayshrivastava)
pkg/sql/sequence.go, line 148 at r24 (raw file):
cacheSize := seqOpts.EffectiveCacheSize() fetchNextValues := func() (currentValue int64, incrementAmount int64, sizeOfCache int64, err error) {
nit, you can group the return values: `(currentValue, incrementAmount, sizeOfCache int64, err error)
pkg/sql/sequence.go, line 164 at r24 (raw file):
if endValue > seqOpts.MaxValue || endValue < seqOpts.MinValue { // If the sequence exceeded its bounds prior to the increment, then return an error. if seqOpts.Increment > 0 && endValue-seqOpts.Increment*cacheSize >= seqOpts.MaxValue || seqOpts.Increment < 0 && endValue-seqOpts.Increment*cacheSize <= seqOpts.MinValue {
nit: wrap this conditional.
pkg/sql/sequence.go, line 172 at r24 (raw file):
limit = seqOpts.MinValue } return endValue - seqOpts.Increment*(cacheSize-1), seqOpts.Increment, int64(math.Abs(float64(limit-(endValue-seqOpts.Increment*cacheSize))) / math.Abs(float64(seqOpts.Increment))), nil
Also, please assign these variables for readability.
this math.Abs stuff seems like it'll get screwed up as these limit values get to be larger than can be represented in a float64. Just write an `
abs := func(i int64) int64 {
if i < 0 {
return -i
}
return i
}
Previously, cached sequences did not obey pg semantics with respect to exceeding bounds. For example, if there were a sequence with a cache size of 5 and max value of 2, calling nextval(...) would immediately cause an error due to exceeded bounds. According to postgres, nextval(...) should return 1, then 2, then finally return an error due to the max value of 2 being reached. This change alters sequence caching behavior to match the above semantics. Additionally, this change now makes it possible for sequences to exceed the bounds set by MAXVALUE and MINVALUE. This is because calling nextval(...) should be as fast as possible, and the fastest way to do this is to let nextval(...) always succeed on incrementing a sequence. Despite this, a user will never see any values that exceed a sequence's bounds. To make a sequence incrementable again after exceeding its bounds, there are two options: 1. The user changes the sequence's value by calling setval(...). 2. The user performs a schema change to alter the sequences MinValue or MaxValue. If the value of a sequence exceeds its bounds, it must be restored to the original MinValue or MaxValue in the same transaction as the schema change. This change adds code to handle case 2 above. Release note: None
192bac7
to
91c162a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks again for keeping after this. This is going to be a great addition!
bors r+
Reviewed 3 of 13 files at r20, 4 of 5 files at r22, 1 of 3 files at r23, 1 of 1 files at r25.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @ajwerner and @arulajmani)
Build succeeded: |
62498: utilccl,kvccl: improve performance when checking enterprise features r=tbg a=erikgrinaker **utilccl: cache license decoding** Previously, the `utilccl` package would decode the license from the the base64-encoded Protobuf representation in settings every time it was needed, which was sufficient for its uses. However, recently there's been a need to check whether enterprise features are enabled in hot paths (e.g. with follower reads as seen in #62447), making the decoding cost too great. This patch adds `cluster.Settings.Cache` as a shared cache, and uses it to cache decoded licenses with a private key type. **utilccl,kvccl: add IsEnterpriseEnabled for faster license checks** `utilccl.CheckEnterpriseEnabled()` is used to check whether a valid enterprise license exists for a given feature. If no valid license is found, it returns an error with specific details. However, `kvccl` used this function in follower read hot paths, and instantiating an error when follower reads are unavailable could have significant overhead -- see e.g. #62447. This patch adds `IsEnterpriseEnabled()`, which has the same behavior as `CheckEnterpriseEnabled()` but returns a boolean instead. This is significantly faster since we can avoid instantiating a custom error each time. `kvccl` is also updated to use this in hot paths. Resolves #62489. Release note: None 62642: colserde: fix the edge case with nulls handling r=yuzefovich a=yuzefovich When serializing the data of Bool, Bytes, Int, and Float types when they don't have any nulls in the vector, we don't explicit specify the null bitmap. Previously, when deserializing such vectors with no nulls we would simply call `UnsetNulls` on the `coldata.Nulls` object that is currently present. However, it is possible that already present nulls object cannot support the desired batch length. This could lead to index out of bounds accesses. Note that in the vast majority of cases this likely doesn't happen in practice because we check `MaybeHasNulls`, and that would return `false` making us omit the null checking code. Fixes: #62636. Release note (bug fix): Previously, CockroachDB could encounter an internal error in rare circumstances when executing queries via the vectorized engine that operate on columns of BOOL, BYTES, INT, and FLOAT types that have a mix of NULL and non-NULL values. 62740: workload: add idle-conns flag for adding idle connections to tpcc r=rafiss a=RichardJCai workload: add idle-conns flag for adding idle connections to tpcc Release note: None #62526 62814: tenantrate: add "test" that reports IOPS estimations r=RaduBerinde a=RaduBerinde This change adds a "test" facility which takes the description of a uniform workload (read percentage, read size, write size) and prints out an estimation of the sustained IOPS and burst IO. This will allow a better understanding of how changes to the settings or the mechanism translate into IOPS changes. Release note: None 62833: kvserver: deflake TestFollowerReadsWithStaleDescriptor r=aayushshah15 a=aayushshah15 A preceding change (#62696) introduced a flakey update to this test. Prior to that change, this test was using 2 voting replicas but that change tried to make it use 1 voter and 1 non-voter instead (as a litmus test for the new syntax added in #62696). The test rebalances a replica away from a node and ensures that a historical read sent immediately afterwards gets re-routed to the leaseholder replica, since the receiving store had its replica destroyed. However, when we're using a non-voter in this test, that non-voter may not have learned about this replication change by the time it receives this historical query and that fails the assertion. This commit re-organizes the test and fixes the flake. Release note: None 62862: testutils: add skip.UnderBazelWithIssue r=rickystewart a=stevendanna This is to skip individual tests under bazel. This seems a bit more fine-grained than the broken_in_bazel tag in the bazel configuration but also allows us to avoid skipping tests that work outside of bazel in our main test suite. Release note: None 62877: Added CACHE to SEQUENCE syntax diagrams r=ericharmeling a=ericharmeling Follow-up of #56954. Release justification: non-production code changes Release note: None 62889: colexecerror: catch panics from packages in sql/sem folder r=yuzefovich a=yuzefovich Previously, we would only catch panics from `sql/sem/tree` package. Recently sqlsmith encountered a crash because of a panic in `sql/sem/builtins` package, and I believe it is reasonable to catch panics from that package as well as from `sql/sem/transform`, so we will now be catching based on `sql/sem` prefix. Addresses: #62846. Release note: None 62898: build: install essential build tools in teamcity build agents r=jlinder a=rickystewart In #62815, we migrated from an alternative way of installing golang, the `longsleep/golang-backports` deb repo, to the currently recommended install method found at https://golang.org/doc/install -- namely, we download a tarball and then just unzip it in the right spot. This works perfectly, *except* that the deb package had a dependency on build tools like `gcc` and `make`, and certain build configurations had come to depend on their global installation (namely, all those that don't use `builder.sh` to run a build). This resulted in a couple of failures being reported: * https://teamcity.cockroachdb.com/buildConfiguration/Cockroach_ExampleORMs/2834741 * https://teamcity.cockroachdb.com/buildConfiguration/Cockroach_UnitTests_Acceptance/2834732 We just install [`build-essential`](https://packages.ubuntu.com/xenial/build-essential) here, which is the easiest way to get all of that stuff. Release note: None Co-authored-by: Erik Grinaker <grinaker@cockroachlabs.com> Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com> Co-authored-by: richardjcai <caioftherichard@gmail.com> Co-authored-by: Radu Berinde <radu@cockroachlabs.com> Co-authored-by: Aayush Shah <aayush.shah15@gmail.com> Co-authored-by: Steven Danna <danna@cockroachlabs.com> Co-authored-by: Eric Harmeling <eric.harmeling@cockroachlabs.com> Co-authored-by: Ricky Stewart <ricky@cockroachlabs.com>
if val > seqOpts.MaxValue || val < seqOpts.MinValue { | ||
return 0, boundsExceededError(descriptor) | ||
|
||
// This sequence has exceeded its bounds after performing this increment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why should out-of-range endValue be stored in the cache?
This is not the behavior I have observed. At least on v21.2.10, the sequence seems to be cached per-session (as compared to per-node). This means that -unless sessions are very long lived- a lot of identifiers will be wasted and not a lot of improvement in terms of contention will be achieved. |
You're right - this cache is implemented per-session. The sequence docs are a little more clear, but I agree that this release note is misleading, and the docs could have more clarity. One thing to note is that this was implemented to be compatible with sequence caching in PostgreSQL, which is described here in the "Notes" section: https://www.postgresql.org/docs/current/sql-createsequence.html And I also agree with you that the sequence cache size should be chosen based on how frequently the sequence is used along with how long the session lasts. |
@rafiss I agree. But I would love the addition of a keyword to enable the per-node caching, which would be very advantageous. Some syntax like |
I've filed #89338 to track that request. |
sql: implement sequence caching
Previously, incrementing sequences at a high throughput
would result in many distributed writes to the KV layer
due to MVCC. This has caused garbage collection problems
in the past. This would occur in situations such as
bulk importing data while using the sequence number as an
id for each new row being added.
This change allows clients to cache sequence numbers in their local
session data. When the cache is empty, the sequence will be
incremented once by the cache size * increment amount, which are
both sequence options. Then, all the intermediate values will be
cached locally on a node to be given out whenever the sequence is
incremented.
To accommodate schema changes, cached sequences values will be
invalidated when new descriptor versions are seen by the cache.
This invalidation can occur when old versions are seen as well
to accommodate schema change rollbacks.
Release note (sql change): Using the CACHE sequence option no longer
results in an "unimplemented" error. The CACHE option is now fully
implemented and will allow nodes to cache sequence numbers. A cache
size of 1 means that there is no cache, and cache sizes of less than 1
are not valid.
sql: create benchmark for concurrent sequence increments
Previously, there was no benchmark that tested the performance
of concurrent increments to sequences. There was also no
benchmark which compared sequence performance based on
different cache sizes. This change updates a benchmark to measure
performance based on the above criteria.
Release note: None
sql: add serial normalization setting for cached sequences
Closes: #51259
Release note (sql change): The
serial_normalization
sessionvariable can now be set to the value
sql_sequence_cached
.Cached sequences will allow nodes to cache 256 sequence numbers
locally. The underlying sequence will only be incremened once (by
256 increments) when the cache is empty. Using
sql_sequence_cached
will result in better performance than
sql_sequence
because theformer will perform fewer distributed calls to increment sequences.
However, cached seqences may contribute to large gaps between
sequence numbers if cached values are lost due to errors or
node outages.
sql: make cached sequences bounds-related semantics match pg semantics
Previously, cached sequences did not obey pg semantics
with respect to exceeding bounds. For example, if there
were a sequence with a cache size of 5 and max value of 2,
calling nextval(...) would immediately cause an error due
to exceeded bounds. According to postgres, nextval(...)
should return 1, then 2, then finally return an error due
to the max value of 2 being reached. This change alters
sequence caching behavior to match the above semantics.
Additionally, this change now makes it possible for sequences
to exceed the bounds set by MAXVALUE and MINVALUE. This is
because calling nextval(...) should be as fast as possible,
and the fastest way to do this is to let nextval(...) always
succeed on incrementing a sequence. Despite this, a user
will never see any values that exceed a sequence's bounds.
To make a sequence incrementable again after exceeding its
bounds, there are two options:
or MaxValue. If the value of a sequence exceeds its bounds,
it must be restored to the original MinValue or MaxValue in
the same transaction as the schema change.
This change adds code to handle case 2 above.
Release note: None
Benchmark
Using
make bench PKG='./pkg/sql' BENCHES='BenchmarkSequenceIncrement' TESTFLAGS="--count=5 --benchmem" |& tee c{size} .txt
:Caching 256 values
For the other cache sizes I tested, the performance improvement is virtually the same as above.
c1.txt
c32.txt
c64.txt
c128.txt
c256.txt
c512.txt
c1024.txt