-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[prototype] *: s/config.SystemConfig/spanconfig.Store #68491
Closed
irfansharif
wants to merge
1
commit into
cockroachdb:multi-tenant-zcfgs
from
irfansharif:210804.replace-syscfgspan
Closed
[prototype] *: s/config.SystemConfig/spanconfig.Store #68491
irfansharif
wants to merge
1
commit into
cockroachdb:multi-tenant-zcfgs
from
irfansharif:210804.replace-syscfgspan
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
d3e272d
to
3652df2
Compare
3652df2
to
764975f
Compare
In KV server, we abstract away system config dependency through the spanconfig.StoreReader interface (also implemented by spanconfigstore.Store). The commit also: - Fixes span generations for pseudo table IDs - Introduce builtins to pretty print spans Within KV we now store previous references zonepb.ZoneConfig as roachpb.SpanConfig. This is done both on the config stored on the Replica itself, and as a front-end to accessing config.SystemConfig; we now talk to it through the StoreReader interface. We gate the usage of the new infrastructure through a cluster setting. When toggled, we switch over the ConfReader source to point to the embedded spanconfigstore.Store instead. Release note: None
764975f
to
7ba4ff2
Compare
Cannibalized across #67679. |
irfansharif
added a commit
to irfansharif/cockroach
that referenced
this pull request
Apr 14, 2022
Fixes cockroachdb#72389. Fixes cockroachdb#66063 (gated behind a cluster setting). This should drastically reduce the total number of ranges in the system, especially when running with a large number of tables and/or tenants. To understand what the new set of split points are, consider the following test snippet: exec-sql tenant=11 CREATE DATABASE db; CREATE TABLE db.t0(); CREATE TABLE db.t1(); CREATE TABLE db.t2(); CREATE TABLE db.t3(); CREATE TABLE db.t4(); CREATE TABLE db.t5(); CREATE TABLE db.t6(); CREATE TABLE db.t7(); CREATE TABLE db.t8(); CREATE TABLE db.t9(); ALTER TABLE db.t5 CONFIGURE ZONE using num_replicas = 42; ---- # If installing a unique zone config for a table in the middle, we # should observe three splits (before, the table itself, and after). diff offset=48 ---- --- gossiped system config span (legacy) +++ span config infrastructure (current) ... /Tenant/10 database system (tenant) /Tenant/11 database system (tenant) +/Tenant/11/Table/106 range default +/Tenant/11/Table/111 num_replicas=42 +/Tenant/11/Table/112 range default This PR introduces two cluster settings to selectively opt into this optimization: spanconfig.{tenant,host}_coalesce_adjacent.enabled (defaulting to true and false respectively). We also don't coalesce system table ranges on the host tenant. We had a few implementation choices here: (a) Down in KV, augment the spanconfig.Store to coalesce the in-memory state when updating entries. On every update, we'd check if the span we're writing to has the same config as the preceding and/or succeeding one, and if so, write out a larger span instead. We previously prototyped a form of this in cockroachdb#68491. Pros: - reduced memory footprint of spanconfig.Store - read path (computing splits) stays fast -- just have to look up the right span from the backing interval tree Cons: - uses more persisted storage than necessary - difficult to disable the optimization dynamically (though still possible -- we'd effectively have to restart the KVSubscriber and populate in-memory span config state using a store that does/doesn't coalesce configs) - difficult to implement; our original prototype did not have cockroachdb#73150 which is important for reducing reconciliation round trips (b) Same as (a) but coalesce configs up in the spanconfig.Store maintained in reconciler itself. Pros: - reduced storage use (both persisted and in-memory) - read path (computing splits) stays fast -- just have to look up the right span from the backing interval tree Cons: - very difficult to disable the optimization dynamically (each tenant process would need to be involved) - most difficult to implement (c) Same as (a) but through another API on the spanconfig.Store interface that accepts only a single update at a time and does not generate deltas (not something we need down in KV). Removes the implementation complexity. (d) Keep the contents of `system.span_configurations` and the in-memory state of spanconfig.Stores as it is today, uncoalesced. When determining split points, iterate through adjacent configs within the provided key bounds and see whether we could ignore certain split keys. Pros: - easiest to implement - easy to disable the optimization dynamically, for ex. through a cluster setting Cons: - uses more storage (persisted and in-memory) than necessary - read path (computing splits) is more expensive if iterating through adjacent configs ---- This PR implements option (d). For a benchmark on how slow (d) is going to be in practice with varying numbers of entries to be scanning over (10k, 100k, 1m): $ dev bench pkg/spanconfig/spanconfigstore -f=BenchmarkStoreComputeSplitKey -v BenchmarkStoreComputeSplitKey BenchmarkStoreComputeSplitKey/num-entries=10000 BenchmarkStoreComputeSplitKey/num-entries=10000-10 1166842 ns/op BenchmarkStoreComputeSplitKey/num-entries=100000 BenchmarkStoreComputeSplitKey/num-entries=100000-10 12273375 ns/op BenchmarkStoreComputeSplitKey/num-entries=1000000 BenchmarkStoreComputeSplitKey/num-entries=1000000-10 140591766 ns/op PASS It's feasible that in the future we re-work this in favor of (c) possibly. Release note: None Release justification: high benefit change to existing functionality (affecting only multi-tenant clusters).
irfansharif
added a commit
to irfansharif/cockroach
that referenced
this pull request
Apr 27, 2022
Fixes cockroachdb#72389. Fixes cockroachdb#66063 (gated behind a cluster setting). This should drastically reduce the total number of ranges in the system, especially when running with a large number of tables and/or tenants. To understand what the new set of split points are, consider the following test snippet: exec-sql tenant=11 CREATE DATABASE db; CREATE TABLE db.t0(); CREATE TABLE db.t1(); CREATE TABLE db.t2(); CREATE TABLE db.t3(); CREATE TABLE db.t4(); CREATE TABLE db.t5(); CREATE TABLE db.t6(); CREATE TABLE db.t7(); CREATE TABLE db.t8(); CREATE TABLE db.t9(); ALTER TABLE db.t5 CONFIGURE ZONE using num_replicas = 42; ---- # If installing a unique zone config for a table in the middle, we # should observe three splits (before, the table itself, and after). diff offset=48 ---- --- gossiped system config span (legacy) +++ span config infrastructure (current) ... /Tenant/10 database system (tenant) /Tenant/11 database system (tenant) +/Tenant/11/Table/106 range default +/Tenant/11/Table/111 num_replicas=42 +/Tenant/11/Table/112 range default This PR introduces two cluster settings to selectively opt into this optimization: spanconfig.{tenant,host}_coalesce_adjacent.enabled (defaulting to true and false respectively). We also don't coalesce system table ranges on the host tenant. We had a few implementation choices here: (a) Down in KV, augment the spanconfig.Store to coalesce the in-memory state when updating entries. On every update, we'd check if the span we're writing to has the same config as the preceding and/or succeeding one, and if so, write out a larger span instead. We previously prototyped a form of this in cockroachdb#68491. Pros: - reduced memory footprint of spanconfig.Store - read path (computing splits) stays fast -- just have to look up the right span from the backing interval tree Cons: - uses more persisted storage than necessary - difficult to disable the optimization dynamically (though still possible -- we'd effectively have to restart the KVSubscriber and populate in-memory span config state using a store that does/doesn't coalesce configs) - difficult to implement; our original prototype did not have cockroachdb#73150 which is important for reducing reconciliation round trips (b) Same as (a) but coalesce configs up in the spanconfig.Store maintained in reconciler itself. Pros: - reduced storage use (both persisted and in-memory) - read path (computing splits) stays fast -- just have to look up the right span from the backing interval tree Cons: - very difficult to disable the optimization dynamically (each tenant process would need to be involved) - most difficult to implement (c) Same as (a) but through another API on the spanconfig.Store interface that accepts only a single update at a time and does not generate deltas (not something we need down in KV). Removes the implementation complexity. (d) Keep the contents of `system.span_configurations` and the in-memory state of spanconfig.Stores as it is today, uncoalesced. When determining split points, iterate through adjacent configs within the provided key bounds and see whether we could ignore certain split keys. Pros: - easiest to implement - easy to disable the optimization dynamically, for ex. through a cluster setting Cons: - uses more storage (persisted and in-memory) than necessary - read path (computing splits) is more expensive if iterating through adjacent configs ---- This PR implements option (d). For a benchmark on how slow (d) is going to be in practice with varying numbers of entries to be scanning over (10k, 100k, 1m): $ dev bench pkg/spanconfig/spanconfigstore -f=BenchmarkStoreComputeSplitKey -v BenchmarkStoreComputeSplitKey BenchmarkStoreComputeSplitKey/num-entries=10000 BenchmarkStoreComputeSplitKey/num-entries=10000-10 1166842 ns/op BenchmarkStoreComputeSplitKey/num-entries=100000 BenchmarkStoreComputeSplitKey/num-entries=100000-10 12273375 ns/op BenchmarkStoreComputeSplitKey/num-entries=1000000 BenchmarkStoreComputeSplitKey/num-entries=1000000-10 140591766 ns/op PASS It's feasible that in the future we re-work this in favor of (c) possibly. Release note: None Release justification: high benefit change to existing functionality (affecting only multi-tenant clusters).
craig bot
pushed a commit
that referenced
this pull request
Apr 29, 2022
77975: sql,cli: implement `\password` cli command r=ZhouXing19 a=ZhouXing19 This commit is to add support for the `\password` that securely changes the password for a user. fixes #48543 Release note (cli change): add support for `\password` cli command that enables secure alteration of the password for a user. The given password will always be pre-hashed with the password hash method obtained via the session variable `password-encryption`, i.e. `scram-sha-256` as the default hashing algorithm. 79700: spanconfig,kv: merge adjacent ranges with identical configs r=irfansharif a=irfansharif **spanconfig,kv: merge adjacent ranges with identical configs** Fixes #72389. Fixes #66063 (gated behind a cluster setting). This should drastically reduce the total number of ranges in the system, especially when running with a large number of tables and/or tenants. To understand what the new set of split points are, consider the following test snippet: exec-sql tenant=11 CREATE DATABASE db; CREATE TABLE db.t0(); CREATE TABLE db.t1(); CREATE TABLE db.t2(); CREATE TABLE db.t3(); CREATE TABLE db.t4(); CREATE TABLE db.t5(); CREATE TABLE db.t6(); CREATE TABLE db.t7(); CREATE TABLE db.t8(); CREATE TABLE db.t9(); ALTER TABLE db.t5 CONFIGURE ZONE using num_replicas = 42; ---- # If installing a unique zone config for a table in the middle, we # should observe three splits (before, the table itself, and after). diff offset=48 ---- --- gossiped system config span (legacy) +++ span config infrastructure (current) ... /Tenant/10 database system (tenant) /Tenant/11 database system (tenant) +/Tenant/11/Table/106 range default +/Tenant/11/Table/111 num_replicas=42 +/Tenant/11/Table/112 range default This PR introduces two cluster settings to selectively opt into this optimization: spanconfig.{tenant,host}_coalesce_adjacent.enabled (defaulting to true and false respectively). We also don't coalesce system table ranges on the host tenant. We had a few implementation choices here: (a) Down in KV, augment the spanconfig.Store to coalesce the in-memory state when updating entries. On every update, we'd check if the span we're writing to has the same config as the preceding and/or succeeding one, and if so, write out a larger span instead. We previously prototyped a form of this in #68491. Pros: - reduced memory footprint of spanconfig.Store - read path (computing splits) stays fast -- just have to look up the right span from the backing interval tree Cons: - uses more persisted storage than necessary - difficult to disable the optimization dynamically (though still possible -- we'd effectively have to restart the KVSubscriber and populate in-memory span config state using a store that does/doesn't coalesce configs) - difficult to implement; our original prototype did not have #73150 which is important for reducing reconciliation round trips (b) Same as (a) but coalesce configs up in the spanconfig.Store maintained in reconciler itself. Pros: - reduced storage use (both persisted and in-memory) - read path (computing splits) stays fast -- just have to look up the right span from the backing interval tree Cons: - very difficult to disable the optimization dynamically (each tenant process would need to be involved) - most difficult to implement (c) Same as (a) but through another API on the spanconfig.Store interface that accepts only a single update at a time and does not generate deltas (not something we need down in KV). Removes the implementation complexity. (d) Keep the contents of `system.span_configurations` and the in-memory state of spanconfig.Stores as it is today, uncoalesced. When determining split points, iterate through adjacent configs within the provided key bounds and see whether we could ignore certain split keys. Pros: - easiest to implement - easy to disable the optimization dynamically, for ex. through a cluster setting Cons: - uses more storage (persisted and in-memory) than necessary - read path (computing splits) is more expensive if iterating through adjacent configs This PR implements option (d). For a benchmark on how slow (d) is going to be in practice with varying numbers of entries to be scanning over (10k, 100k, 1m): ``` $ dev bench pkg/spanconfig/spanconfigstore -f=BenchmarkStoreComputeSplitKey -v BenchmarkStoreComputeSplitKey BenchmarkStoreComputeSplitKey/num-entries=10000 BenchmarkStoreComputeSplitKey/num-entries=10000-10 1166842 ns/op BenchmarkStoreComputeSplitKey/num-entries=100000 BenchmarkStoreComputeSplitKey/num-entries=100000-10 12273375 ns/op BenchmarkStoreComputeSplitKey/num-entries=1000000 BenchmarkStoreComputeSplitKey/num-entries=1000000-10 140591766 ns/op PASS ``` It's feasible that in the future we re-work this in favor of (c). --- **spanconfig/store: use templatized btree impl instead** ``` $ dev bench pkg/spanconfig/spanconfigstore -f=BenchmarkStoreComputeSplitKey -v BenchmarkStoreComputeSplitKey BenchmarkStoreComputeSplitKey/num-entries=10000 BenchmarkStoreComputeSplitKey/num-entries=10000-10 431093 ns/op BenchmarkStoreComputeSplitKey/num-entries=100000 BenchmarkStoreComputeSplitKey/num-entries=100000-10 4308200 ns/op BenchmarkStoreComputeSplitKey/num-entries=1000000 BenchmarkStoreComputeSplitKey/num-entries=1000000-10 43827373 ns/op PASS $ benchstat old.txt new.txt # from previous commit name old time/op new time/op delta StoreComputeSplitKey/num-entries=10000-10 1.17ms ± 0% 0.43ms ± 0% ~ (p=1.000 n=1+1) StoreComputeSplitKey/num-entries=100000-10 12.4ms ± 0% 4.3ms ± 0% ~ (p=1.000 n=1+1) StoreComputeSplitKey/num-entries=1000000-10 136ms ± 0% 44ms ± 0% ~ (p=1.000 n=1+1) ``` --- **spanconfig/store: intern span configs** - Avoid the expensive proto.Equal() when computing split keys; - Reduce memory overhead of the data structure ``` $ dev bench pkg/spanconfig/spanconfigstore -f=BenchmarkStoreComputeSplitKey -v BenchmarkStoreComputeSplitKey BenchmarkStoreComputeSplitKey/num-entries=10000 BenchmarkStoreComputeSplitKey/num-entries=10000-10 90323 ns/op BenchmarkStoreComputeSplitKey/num-entries=100000 BenchmarkStoreComputeSplitKey/num-entries=100000-10 915936 ns/op BenchmarkStoreComputeSplitKey/num-entries=1000000 BenchmarkStoreComputeSplitKey/num-entries=1000000-10 9575781 ns/op $ benchstat old.txt new.txt # from previous commit name old time/op new time/op delta StoreComputeSplitKey/num-entries=10000-10 431µs ± 0% 90µs ± 0% ~ (p=1.000 n=1+1) StoreComputeSplitKey/num-entries=100000-10 4.31ms ± 0% 0.92ms ± 0% ~ (p=1.000 n=1+1) StoreComputeSplitKey/num-entries=1000000-10 43.8ms ± 0% 9.6ms ± 0% ~ (p=1.000 n=1+1) ``` Release note: None Release justification: high benefit change to existing functionality (affecting only multi-tenant clusters). Co-authored-by: Jane Xing <zhouxing@uchicago.edu> Co-authored-by: irfan sharif <irfanmahmoudsharif@gmail.com>
blathers-crl bot
pushed a commit
that referenced
this pull request
Apr 29, 2022
Fixes #72389. Fixes #66063 (gated behind a cluster setting). This should drastically reduce the total number of ranges in the system, especially when running with a large number of tables and/or tenants. To understand what the new set of split points are, consider the following test snippet: exec-sql tenant=11 CREATE DATABASE db; CREATE TABLE db.t0(); CREATE TABLE db.t1(); CREATE TABLE db.t2(); CREATE TABLE db.t3(); CREATE TABLE db.t4(); CREATE TABLE db.t5(); CREATE TABLE db.t6(); CREATE TABLE db.t7(); CREATE TABLE db.t8(); CREATE TABLE db.t9(); ALTER TABLE db.t5 CONFIGURE ZONE using num_replicas = 42; ---- # If installing a unique zone config for a table in the middle, we # should observe three splits (before, the table itself, and after). diff offset=48 ---- --- gossiped system config span (legacy) +++ span config infrastructure (current) ... /Tenant/10 database system (tenant) /Tenant/11 database system (tenant) +/Tenant/11/Table/106 range default +/Tenant/11/Table/111 num_replicas=42 +/Tenant/11/Table/112 range default This PR introduces two cluster settings to selectively opt into this optimization: spanconfig.{tenant,host}_coalesce_adjacent.enabled (defaulting to true and false respectively). We also don't coalesce system table ranges on the host tenant. We had a few implementation choices here: (a) Down in KV, augment the spanconfig.Store to coalesce the in-memory state when updating entries. On every update, we'd check if the span we're writing to has the same config as the preceding and/or succeeding one, and if so, write out a larger span instead. We previously prototyped a form of this in #68491. Pros: - reduced memory footprint of spanconfig.Store - read path (computing splits) stays fast -- just have to look up the right span from the backing interval tree Cons: - uses more persisted storage than necessary - difficult to disable the optimization dynamically (though still possible -- we'd effectively have to restart the KVSubscriber and populate in-memory span config state using a store that does/doesn't coalesce configs) - difficult to implement; our original prototype did not have #73150 which is important for reducing reconciliation round trips (b) Same as (a) but coalesce configs up in the spanconfig.Store maintained in reconciler itself. Pros: - reduced storage use (both persisted and in-memory) - read path (computing splits) stays fast -- just have to look up the right span from the backing interval tree Cons: - very difficult to disable the optimization dynamically (each tenant process would need to be involved) - most difficult to implement (c) Same as (a) but through another API on the spanconfig.Store interface that accepts only a single update at a time and does not generate deltas (not something we need down in KV). Removes the implementation complexity. (d) Keep the contents of `system.span_configurations` and the in-memory state of spanconfig.Stores as it is today, uncoalesced. When determining split points, iterate through adjacent configs within the provided key bounds and see whether we could ignore certain split keys. Pros: - easiest to implement - easy to disable the optimization dynamically, for ex. through a cluster setting Cons: - uses more storage (persisted and in-memory) than necessary - read path (computing splits) is more expensive if iterating through adjacent configs ---- This PR implements option (d). For a benchmark on how slow (d) is going to be in practice with varying numbers of entries to be scanning over (10k, 100k, 1m): $ dev bench pkg/spanconfig/spanconfigstore -f=BenchmarkStoreComputeSplitKey -v BenchmarkStoreComputeSplitKey BenchmarkStoreComputeSplitKey/num-entries=10000 BenchmarkStoreComputeSplitKey/num-entries=10000-10 1166842 ns/op BenchmarkStoreComputeSplitKey/num-entries=100000 BenchmarkStoreComputeSplitKey/num-entries=100000-10 12273375 ns/op BenchmarkStoreComputeSplitKey/num-entries=1000000 BenchmarkStoreComputeSplitKey/num-entries=1000000-10 140591766 ns/op PASS It's feasible that in the future we re-work this in favor of (c) possibly. Release note: None Release justification: high benefit change to existing functionality (affecting only multi-tenant clusters).
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
In KV server, we abstract away system config dependency through the
spanconfig.StoreReader interface (also implemented by
spanconfigstore.Store). The commit also:
Within KV we now store previous references zonepb.ZoneConfig as
roachpb.SpanConfig. This is done both on the config stored on the
Replica itself, and as a front-end to accessing config.SystemConfig; we
now talk to it through the StoreReader interface.
We gate the usage of the new infrastructure through a cluster setting.
When toggled, we switch over the ConfReader source to point to the
embedded spanconfigstore.Store instead.
Release note: None