Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[prototype] *: s/config.SystemConfig/spanconfig.Store #68491

Conversation

irfansharif
Copy link
Contributor

@irfansharif irfansharif commented Aug 5, 2021

In KV server, we abstract away system config dependency through the
spanconfig.StoreReader interface (also implemented by
spanconfigstore.Store). The commit also:

  • Fixes span generations for pseudo table IDs
  • Introduce builtins to pretty print spans

Within KV we now store previous references zonepb.ZoneConfig as
roachpb.SpanConfig. This is done both on the config stored on the
Replica itself, and as a front-end to accessing config.SystemConfig; we
now talk to it through the StoreReader interface.

We gate the usage of the new infrastructure through a cluster setting.
When toggled, we switch over the ConfReader source to point to the
embedded spanconfigstore.Store instead.

Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@irfansharif irfansharif force-pushed the 210804.replace-syscfgspan branch 7 times, most recently from d3e272d to 3652df2 Compare August 16, 2021 17:49
@irfansharif irfansharif changed the title [wip,prototype] kvserver: s/zonepb.ZoneConfig/roachpb.SpanConfig [prototype] *: s/config.SystemConfig/spanconfig.Store Aug 16, 2021
@irfansharif irfansharif marked this pull request as ready for review August 16, 2021 17:49
@irfansharif irfansharif requested a review from a team August 16, 2021 17:49
@irfansharif irfansharif requested review from a team as code owners August 16, 2021 17:49
@irfansharif irfansharif requested review from pbardea, erikgrinaker and arulajmani and removed request for a team, pbardea and erikgrinaker August 16, 2021 17:49
@irfansharif irfansharif force-pushed the 210804.replace-syscfgspan branch from 3652df2 to 764975f Compare August 17, 2021 15:00
In KV server, we abstract away system config dependency through the
spanconfig.StoreReader interface (also implemented by
spanconfigstore.Store). The commit also:

- Fixes span generations for pseudo table IDs
- Introduce builtins to pretty print spans

Within KV we now store previous references zonepb.ZoneConfig as
roachpb.SpanConfig. This is done both on the config stored on the
Replica itself, and as a front-end to accessing config.SystemConfig; we
now talk to it through the StoreReader interface.

We gate the usage of the new infrastructure through a cluster setting.
When toggled, we switch over the ConfReader source to point to the
embedded spanconfigstore.Store instead.

Release note: None
@irfansharif
Copy link
Contributor Author

Cannibalized across #67679.

@irfansharif irfansharif deleted the 210804.replace-syscfgspan branch September 13, 2021 18:13
@irfansharif irfansharif restored the 210804.replace-syscfgspan branch April 8, 2022 18:42
irfansharif added a commit to irfansharif/cockroach that referenced this pull request Apr 14, 2022
Fixes cockroachdb#72389.
Fixes cockroachdb#66063 (gated behind a cluster setting).

This should drastically reduce the total number of ranges in the system,
especially when running with a large number of tables and/or tenants. To
understand what the new set of split points are, consider the following
test snippet:

    exec-sql tenant=11
    CREATE DATABASE db;
    CREATE TABLE db.t0();
    CREATE TABLE db.t1();
    CREATE TABLE db.t2();
    CREATE TABLE db.t3();
    CREATE TABLE db.t4();
    CREATE TABLE db.t5();
    CREATE TABLE db.t6();
    CREATE TABLE db.t7();
    CREATE TABLE db.t8();
    CREATE TABLE db.t9();
    ALTER TABLE db.t5 CONFIGURE ZONE using num_replicas = 42;
    ----

    # If installing a unique zone config for a table in the middle, we
    # should observe three splits (before, the table itself, and after).

    diff offset=48
    ----
    --- gossiped system config span (legacy)
    +++ span config infrastructure (current)
    ...
     /Tenant/10                                 database system (tenant)
     /Tenant/11                                 database system (tenant)
    +/Tenant/11/Table/106                       range default
    +/Tenant/11/Table/111                       num_replicas=42
    +/Tenant/11/Table/112                       range default

This PR introduces two cluster settings to selectively opt into this
optimization: spanconfig.{tenant,host}_coalesce_adjacent.enabled
(defaulting to true and false respectively). We also don't coalesce
system table ranges on the host tenant. We had a few implementation
choices here:

(a) Down in KV, augment the spanconfig.Store to coalesce the in-memory
    state when updating entries. On every update, we'd check if the span
    we're writing to has the same config as the preceding and/or
    succeeding one, and if so, write out a larger span instead. We
    previously prototyped a form of this in cockroachdb#68491.

    Pros:
    - reduced memory footprint of spanconfig.Store
    - read path (computing splits) stays fast -- just have to look up
      the right span from the backing interval tree
    Cons:
    - uses more persisted storage than necessary
    - difficult to disable the optimization dynamically (though still
      possible -- we'd effectively have to restart the KVSubscriber and
      populate in-memory span config state using a store that
      does/doesn't coalesce configs)
    - difficult to implement; our original prototype did not have cockroachdb#73150
      which is important for reducing reconciliation round trips

(b) Same as (a) but coalesce configs up in the spanconfig.Store
    maintained in reconciler itself.

    Pros:
    - reduced storage use (both persisted and in-memory)
    - read path (computing splits) stays fast -- just have to look up
      the right span from the backing interval tree
    Cons:
    - very difficult to disable the optimization dynamically (each
      tenant process would need to be involved)
    - most difficult to implement

(c) Same as (a) but through another API on the spanconfig.Store
    interface that accepts only a single update at a time and does not
    generate deltas (not something we need down in KV). Removes the
    implementation complexity.

(d) Keep the contents of `system.span_configurations` and the in-memory
    state of spanconfig.Stores as it is today, uncoalesced. When
    determining split points, iterate through adjacent configs within
    the provided key bounds and see whether we could ignore certain
    split keys.

    Pros:
    - easiest to implement
    - easy to disable the optimization dynamically, for ex. through a
      cluster setting
    Cons:
    - uses more storage (persisted and in-memory) than necessary
    - read path (computing splits) is more expensive if iterating
      through adjacent configs

----

This PR implements option (d). For a benchmark on how slow (d) is going
to be in practice with varying numbers of entries to be scanning over
(10k, 100k, 1m):

    $ dev bench pkg/spanconfig/spanconfigstore -f=BenchmarkStoreComputeSplitKey -v

    BenchmarkStoreComputeSplitKey
    BenchmarkStoreComputeSplitKey/num-entries=10000
    BenchmarkStoreComputeSplitKey/num-entries=10000-10               1166842 ns/op
    BenchmarkStoreComputeSplitKey/num-entries=100000
    BenchmarkStoreComputeSplitKey/num-entries=100000-10             12273375 ns/op
    BenchmarkStoreComputeSplitKey/num-entries=1000000
    BenchmarkStoreComputeSplitKey/num-entries=1000000-10           140591766 ns/op
    PASS

It's feasible that in the future we re-work this in favor of (c)
possibly.

Release note: None
Release justification: high benefit change to existing functionality
(affecting only multi-tenant clusters).
@irfansharif irfansharif deleted the 210804.replace-syscfgspan branch April 14, 2022 21:17
irfansharif added a commit to irfansharif/cockroach that referenced this pull request Apr 27, 2022
Fixes cockroachdb#72389.
Fixes cockroachdb#66063 (gated behind a cluster setting).

This should drastically reduce the total number of ranges in the system,
especially when running with a large number of tables and/or tenants. To
understand what the new set of split points are, consider the following
test snippet:

    exec-sql tenant=11
    CREATE DATABASE db;
    CREATE TABLE db.t0();
    CREATE TABLE db.t1();
    CREATE TABLE db.t2();
    CREATE TABLE db.t3();
    CREATE TABLE db.t4();
    CREATE TABLE db.t5();
    CREATE TABLE db.t6();
    CREATE TABLE db.t7();
    CREATE TABLE db.t8();
    CREATE TABLE db.t9();
    ALTER TABLE db.t5 CONFIGURE ZONE using num_replicas = 42;
    ----

    # If installing a unique zone config for a table in the middle, we
    # should observe three splits (before, the table itself, and after).

    diff offset=48
    ----
    --- gossiped system config span (legacy)
    +++ span config infrastructure (current)
    ...
     /Tenant/10                                 database system (tenant)
     /Tenant/11                                 database system (tenant)
    +/Tenant/11/Table/106                       range default
    +/Tenant/11/Table/111                       num_replicas=42
    +/Tenant/11/Table/112                       range default

This PR introduces two cluster settings to selectively opt into this
optimization: spanconfig.{tenant,host}_coalesce_adjacent.enabled
(defaulting to true and false respectively). We also don't coalesce
system table ranges on the host tenant. We had a few implementation
choices here:

(a) Down in KV, augment the spanconfig.Store to coalesce the in-memory
    state when updating entries. On every update, we'd check if the span
    we're writing to has the same config as the preceding and/or
    succeeding one, and if so, write out a larger span instead. We
    previously prototyped a form of this in cockroachdb#68491.

    Pros:
    - reduced memory footprint of spanconfig.Store
    - read path (computing splits) stays fast -- just have to look up
      the right span from the backing interval tree
    Cons:
    - uses more persisted storage than necessary
    - difficult to disable the optimization dynamically (though still
      possible -- we'd effectively have to restart the KVSubscriber and
      populate in-memory span config state using a store that
      does/doesn't coalesce configs)
    - difficult to implement; our original prototype did not have cockroachdb#73150
      which is important for reducing reconciliation round trips

(b) Same as (a) but coalesce configs up in the spanconfig.Store
    maintained in reconciler itself.

    Pros:
    - reduced storage use (both persisted and in-memory)
    - read path (computing splits) stays fast -- just have to look up
      the right span from the backing interval tree
    Cons:
    - very difficult to disable the optimization dynamically (each
      tenant process would need to be involved)
    - most difficult to implement

(c) Same as (a) but through another API on the spanconfig.Store
    interface that accepts only a single update at a time and does not
    generate deltas (not something we need down in KV). Removes the
    implementation complexity.

(d) Keep the contents of `system.span_configurations` and the in-memory
    state of spanconfig.Stores as it is today, uncoalesced. When
    determining split points, iterate through adjacent configs within
    the provided key bounds and see whether we could ignore certain
    split keys.

    Pros:
    - easiest to implement
    - easy to disable the optimization dynamically, for ex. through a
      cluster setting
    Cons:
    - uses more storage (persisted and in-memory) than necessary
    - read path (computing splits) is more expensive if iterating
      through adjacent configs

----

This PR implements option (d). For a benchmark on how slow (d) is going
to be in practice with varying numbers of entries to be scanning over
(10k, 100k, 1m):

    $ dev bench pkg/spanconfig/spanconfigstore -f=BenchmarkStoreComputeSplitKey -v

    BenchmarkStoreComputeSplitKey
    BenchmarkStoreComputeSplitKey/num-entries=10000
    BenchmarkStoreComputeSplitKey/num-entries=10000-10               1166842 ns/op
    BenchmarkStoreComputeSplitKey/num-entries=100000
    BenchmarkStoreComputeSplitKey/num-entries=100000-10             12273375 ns/op
    BenchmarkStoreComputeSplitKey/num-entries=1000000
    BenchmarkStoreComputeSplitKey/num-entries=1000000-10           140591766 ns/op
    PASS

It's feasible that in the future we re-work this in favor of (c)
possibly.

Release note: None
Release justification: high benefit change to existing functionality
(affecting only multi-tenant clusters).
craig bot pushed a commit that referenced this pull request Apr 29, 2022
77975: sql,cli: implement `\password` cli command r=ZhouXing19 a=ZhouXing19

This commit is to add support for the `\password` that securely changes the
password for a user. 

fixes #48543

Release note (cli change): add support for `\password` cli command that enables
secure alteration of the password for a user. The given password will always be pre-hashed 
with the password hash method obtained via the session variable `password-encryption`,
i.e. `scram-sha-256` as the default hashing algorithm.

79700: spanconfig,kv: merge adjacent ranges with identical configs  r=irfansharif a=irfansharif

**spanconfig,kv: merge adjacent ranges with identical configs**

Fixes #72389.
Fixes #66063 (gated behind a cluster setting).

This should drastically reduce the total number of ranges in the system,
especially when running with a large number of tables and/or tenants. To
understand what the new set of split points are, consider the following
test snippet:

    exec-sql tenant=11
    CREATE DATABASE db;
    CREATE TABLE db.t0();
    CREATE TABLE db.t1();
    CREATE TABLE db.t2();
    CREATE TABLE db.t3();
    CREATE TABLE db.t4();
    CREATE TABLE db.t5();
    CREATE TABLE db.t6();
    CREATE TABLE db.t7();
    CREATE TABLE db.t8();
    CREATE TABLE db.t9();
    ALTER TABLE db.t5 CONFIGURE ZONE using num_replicas = 42;
    ----

    # If installing a unique zone config for a table in the middle, we
    # should observe three splits (before, the table itself, and after).

    diff offset=48
    ----
    --- gossiped system config span (legacy)
    +++ span config infrastructure (current)
    ...
     /Tenant/10                                 database system (tenant)
     /Tenant/11                                 database system (tenant)
    +/Tenant/11/Table/106                       range default
    +/Tenant/11/Table/111                       num_replicas=42
    +/Tenant/11/Table/112                       range default

This PR introduces two cluster settings to selectively opt into this
optimization: spanconfig.{tenant,host}_coalesce_adjacent.enabled
(defaulting to true and false respectively). We also don't coalesce
system table ranges on the host tenant. We had a few implementation
choices here:

(a) Down in KV, augment the spanconfig.Store to coalesce the in-memory
    state when updating entries. On every update, we'd check if the span
    we're writing to has the same config as the preceding and/or
    succeeding one, and if so, write out a larger span instead. We
    previously prototyped a form of this in #68491.

    Pros:
    - reduced memory footprint of spanconfig.Store
    - read path (computing splits) stays fast -- just have to look up
      the right span from the backing interval tree
    Cons:
    - uses more persisted storage than necessary
    - difficult to disable the optimization dynamically (though still
      possible -- we'd effectively have to restart the KVSubscriber and
      populate in-memory span config state using a store that
      does/doesn't coalesce configs)
    - difficult to implement; our original prototype did not have #73150
      which is important for reducing reconciliation round trips

(b) Same as (a) but coalesce configs up in the spanconfig.Store
    maintained in reconciler itself.

    Pros:
    - reduced storage use (both persisted and in-memory)
    - read path (computing splits) stays fast -- just have to look up
      the right span from the backing interval tree
    Cons:
    - very difficult to disable the optimization dynamically (each
      tenant process would need to be involved)
    - most difficult to implement

(c) Same as (a) but through another API on the spanconfig.Store
    interface that accepts only a single update at a time and does not
    generate deltas (not something we need down in KV). Removes the
    implementation complexity.

(d) Keep the contents of `system.span_configurations` and the in-memory
    state of spanconfig.Stores as it is today, uncoalesced. When
    determining split points, iterate through adjacent configs within
    the provided key bounds and see whether we could ignore certain
    split keys.

    Pros:
    - easiest to implement
    - easy to disable the optimization dynamically, for ex. through a
      cluster setting
    Cons:
    - uses more storage (persisted and in-memory) than necessary
    - read path (computing splits) is more expensive if iterating
      through adjacent configs

This PR implements option (d). For a benchmark on how slow (d) is going
to be in practice with varying numbers of entries to be scanning over
(10k, 100k, 1m):
```
    $ dev bench pkg/spanconfig/spanconfigstore -f=BenchmarkStoreComputeSplitKey -v
    BenchmarkStoreComputeSplitKey
    BenchmarkStoreComputeSplitKey/num-entries=10000
    BenchmarkStoreComputeSplitKey/num-entries=10000-10               1166842 ns/op
    BenchmarkStoreComputeSplitKey/num-entries=100000
    BenchmarkStoreComputeSplitKey/num-entries=100000-10             12273375 ns/op
    BenchmarkStoreComputeSplitKey/num-entries=1000000
    BenchmarkStoreComputeSplitKey/num-entries=1000000-10           140591766 ns/op
    PASS
```
It's feasible that in the future we re-work this in favor of (c).

---

**spanconfig/store: use templatized btree impl instead**

```
   $ dev bench pkg/spanconfig/spanconfigstore -f=BenchmarkStoreComputeSplitKey -v
    BenchmarkStoreComputeSplitKey
    BenchmarkStoreComputeSplitKey/num-entries=10000
    BenchmarkStoreComputeSplitKey/num-entries=10000-10                431093 ns/op
    BenchmarkStoreComputeSplitKey/num-entries=100000
    BenchmarkStoreComputeSplitKey/num-entries=100000-10              4308200 ns/op
    BenchmarkStoreComputeSplitKey/num-entries=1000000
    BenchmarkStoreComputeSplitKey/num-entries=1000000-10            43827373 ns/op
    PASS

    $ benchstat old.txt new.txt # from previous commit
    name                                         old time/op  new time/op  delta
    StoreComputeSplitKey/num-entries=10000-10    1.17ms ± 0%  0.43ms ± 0%   ~     (p=1.000 n=1+1)
    StoreComputeSplitKey/num-entries=100000-10   12.4ms ± 0%   4.3ms ± 0%   ~     (p=1.000 n=1+1)
    StoreComputeSplitKey/num-entries=1000000-10   136ms ± 0%    44ms ± 0%   ~     (p=1.000 n=1+1)
```

---

**spanconfig/store: intern span configs**

- Avoid the expensive proto.Equal() when computing split keys;
- Reduce memory overhead of the data structure
```
    $ dev bench pkg/spanconfig/spanconfigstore -f=BenchmarkStoreComputeSplitKey -v
    BenchmarkStoreComputeSplitKey
    BenchmarkStoreComputeSplitKey/num-entries=10000
    BenchmarkStoreComputeSplitKey/num-entries=10000-10                 90323 ns/op
    BenchmarkStoreComputeSplitKey/num-entries=100000
    BenchmarkStoreComputeSplitKey/num-entries=100000-10               915936 ns/op
    BenchmarkStoreComputeSplitKey/num-entries=1000000
    BenchmarkStoreComputeSplitKey/num-entries=1000000-10             9575781 ns/op

    $ benchstat old.txt new.txt # from previous commit
    name                                         old time/op  new time/op  delta
    StoreComputeSplitKey/num-entries=10000-10     431µs ± 0%    90µs ± 0%   ~     (p=1.000 n=1+1)
    StoreComputeSplitKey/num-entries=100000-10   4.31ms ± 0%  0.92ms ± 0%   ~     (p=1.000 n=1+1)
    StoreComputeSplitKey/num-entries=1000000-10  43.8ms ± 0%   9.6ms ± 0%   ~     (p=1.000 n=1+1)
```

Release note: None
Release justification: high benefit change to existing functionality
(affecting only multi-tenant clusters).

Co-authored-by: Jane Xing <zhouxing@uchicago.edu>
Co-authored-by: irfan sharif <irfanmahmoudsharif@gmail.com>
blathers-crl bot pushed a commit that referenced this pull request Apr 29, 2022
Fixes #72389.
Fixes #66063 (gated behind a cluster setting).

This should drastically reduce the total number of ranges in the system,
especially when running with a large number of tables and/or tenants. To
understand what the new set of split points are, consider the following
test snippet:

    exec-sql tenant=11
    CREATE DATABASE db;
    CREATE TABLE db.t0();
    CREATE TABLE db.t1();
    CREATE TABLE db.t2();
    CREATE TABLE db.t3();
    CREATE TABLE db.t4();
    CREATE TABLE db.t5();
    CREATE TABLE db.t6();
    CREATE TABLE db.t7();
    CREATE TABLE db.t8();
    CREATE TABLE db.t9();
    ALTER TABLE db.t5 CONFIGURE ZONE using num_replicas = 42;
    ----

    # If installing a unique zone config for a table in the middle, we
    # should observe three splits (before, the table itself, and after).

    diff offset=48
    ----
    --- gossiped system config span (legacy)
    +++ span config infrastructure (current)
    ...
     /Tenant/10                                 database system (tenant)
     /Tenant/11                                 database system (tenant)
    +/Tenant/11/Table/106                       range default
    +/Tenant/11/Table/111                       num_replicas=42
    +/Tenant/11/Table/112                       range default

This PR introduces two cluster settings to selectively opt into this
optimization: spanconfig.{tenant,host}_coalesce_adjacent.enabled
(defaulting to true and false respectively). We also don't coalesce
system table ranges on the host tenant. We had a few implementation
choices here:

(a) Down in KV, augment the spanconfig.Store to coalesce the in-memory
    state when updating entries. On every update, we'd check if the span
    we're writing to has the same config as the preceding and/or
    succeeding one, and if so, write out a larger span instead. We
    previously prototyped a form of this in #68491.

    Pros:
    - reduced memory footprint of spanconfig.Store
    - read path (computing splits) stays fast -- just have to look up
      the right span from the backing interval tree
    Cons:
    - uses more persisted storage than necessary
    - difficult to disable the optimization dynamically (though still
      possible -- we'd effectively have to restart the KVSubscriber and
      populate in-memory span config state using a store that
      does/doesn't coalesce configs)
    - difficult to implement; our original prototype did not have #73150
      which is important for reducing reconciliation round trips

(b) Same as (a) but coalesce configs up in the spanconfig.Store
    maintained in reconciler itself.

    Pros:
    - reduced storage use (both persisted and in-memory)
    - read path (computing splits) stays fast -- just have to look up
      the right span from the backing interval tree
    Cons:
    - very difficult to disable the optimization dynamically (each
      tenant process would need to be involved)
    - most difficult to implement

(c) Same as (a) but through another API on the spanconfig.Store
    interface that accepts only a single update at a time and does not
    generate deltas (not something we need down in KV). Removes the
    implementation complexity.

(d) Keep the contents of `system.span_configurations` and the in-memory
    state of spanconfig.Stores as it is today, uncoalesced. When
    determining split points, iterate through adjacent configs within
    the provided key bounds and see whether we could ignore certain
    split keys.

    Pros:
    - easiest to implement
    - easy to disable the optimization dynamically, for ex. through a
      cluster setting
    Cons:
    - uses more storage (persisted and in-memory) than necessary
    - read path (computing splits) is more expensive if iterating
      through adjacent configs

----

This PR implements option (d). For a benchmark on how slow (d) is going
to be in practice with varying numbers of entries to be scanning over
(10k, 100k, 1m):

    $ dev bench pkg/spanconfig/spanconfigstore -f=BenchmarkStoreComputeSplitKey -v

    BenchmarkStoreComputeSplitKey
    BenchmarkStoreComputeSplitKey/num-entries=10000
    BenchmarkStoreComputeSplitKey/num-entries=10000-10               1166842 ns/op
    BenchmarkStoreComputeSplitKey/num-entries=100000
    BenchmarkStoreComputeSplitKey/num-entries=100000-10             12273375 ns/op
    BenchmarkStoreComputeSplitKey/num-entries=1000000
    BenchmarkStoreComputeSplitKey/num-entries=1000000-10           140591766 ns/op
    PASS

It's feasible that in the future we re-work this in favor of (c)
possibly.

Release note: None
Release justification: high benefit change to existing functionality
(affecting only multi-tenant clusters).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants