Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
137319: vecstore: implement partition manipulation functions r=mw5h a=mw5h

Stub out the persistent storage vecstore implementation. Implement the methods for partition creation, retrieval and deletion. Add a simple test to ensure that these functions work properly.

137426: sql: make memo.IsStale more efficient when schema objects aren't modified r=fqazi a=fqazi

This patch speeds up memo staleness checks for prepared statements by:

1. Updates the lease manager to expose a generation ID which allows you to know if any descriptors changed
2. Updates the table stats cache to add a generation ID to determine if any new stats were added
3. Updates the IsStale check to confirm if either the generation IDs have changed, zone configs, search_path, changed or current database has changed. If everything was the same in the last execution, the full CheckDependencies can be skipped. This is only used for prepared statements.
5. Updates the schema changer logic to lease new descriptors if the schema is already leased. This allows invalidation to work within a search path if new objects are made.

Fixes: #105867


Numbers from sysbench (before these changes) using `rpsak.sh repeated`:
```
Before the changes (geomean TPS): 1121.75
After the changes (geomean TPS): 1165.47
Before the changes (geomean QPS): 22435.06
After the changes (geomean QPS): 23309.41
Percent improvement: 3.89%
```

Also from the sysbench workload numbers (from `@tbg)` write_only:
```
benchdiff --old lastmerge  ./pkg/sql/tests -b -r 'Sysbench/SQL/3node/oltp_write_only' -d 1000x -c 10

name                                   old time/op    new time/op    delta
Sysbench/SQL/3node/oltp_write_only-24    4.38ms ± 1%    4.23ms ± 1%  -3.38%  (p=0.000 n=10+10)

name                                   old alloc/op   new alloc/op   delta
Sysbench/SQL/3node/oltp_write_only-24     935kB ± 6%     933kB ± 6%    ~     (p=0.739 n=10+10)

name                                   old allocs/op  new allocs/op  delta
Sysbench/SQL/3node/oltp_write_only-24     6.08k ± 2%     6.04k ± 3%    ~     (p=0.060 n=10+10)
```

read_only:
```
benchdiff --old lastmerge  ./pkg/sql/tests -b -r 'Sysbench/SQL/3node/oltp_read_only' -d 1000x -c 10
test binaries already exist for d2bd29e: Merge #137750
test binaries already exist for 3cc7a6c: sql: add avoid_catalog_generation_for_staleness to

  pkg=1/1 iter=10/10 cockroachdb/cockroach/pkg/sql/tests \

name                                  old time/op    new time/op    delta
Sysbench/SQL/3node/oltp_read_only-24    7.88ms ± 1%    7.44ms ± 2%  -5.62%  (p=0.000 n=10+10)

name                                  old alloc/op   new alloc/op   delta
Sysbench/SQL/3node/oltp_read_only-24    1.20MB ± 4%    1.19MB ± 5%    ~     (p=0.353 n=10+10)

name                                  old allocs/op  new allocs/op  delta
Sysbench/SQL/3node/oltp_read_only-24     4.71k ± 3%     4.64k ± 3%  -1.48%  (p=0.022 n=10+10)
```

Co-authored-by: Matt White <matt.white@cockroachlabs.com>
Co-authored-by: Faizan Qazi <faizan@cockroachlabs.com>
  • Loading branch information
3 people committed Jan 14, 2025
3 parents 30a6267 + 780f5a6 + 3c749bc commit f29c65b
Show file tree
Hide file tree
Showing 38 changed files with 1,628 additions and 247 deletions.
1 change: 1 addition & 0 deletions docs/generated/metrics/metrics.html
Original file line number Diff line number Diff line change
Expand Up @@ -1720,6 +1720,7 @@
<tr><td>APPLICATION</td><td>sql.insights.anomaly_detection.memory</td><td>Current memory used to support anomaly detection</td><td>Memory</td><td>GAUGE</td><td>BYTES</td><td>AVG</td><td>NONE</td></tr>
<tr><td>APPLICATION</td><td>sql.leases.active</td><td>The number of outstanding SQL schema leases.</td><td>Outstanding leases</td><td>GAUGE</td><td>COUNT</td><td>AVG</td><td>NONE</td></tr>
<tr><td>APPLICATION</td><td>sql.leases.expired</td><td>The number of outstanding session based SQL schema leases expired.</td><td>Leases expired because of a new version</td><td>GAUGE</td><td>COUNT</td><td>AVG</td><td>NONE</td></tr>
<tr><td>APPLICATION</td><td>sql.leases.long_wait_for_initial_version</td><td>The number of wait for initial version routines taking more than the lease duration.</td><td>Number of wait for initial version routines executing</td><td>GAUGE</td><td>COUNT</td><td>AVG</td><td>NONE</td></tr>
<tr><td>APPLICATION</td><td>sql.leases.long_wait_for_no_version</td><td>The number of wait for no versions that are taking more than the lease duration.</td><td>Number of wait for long wait for no version routines executing</td><td>GAUGE</td><td>COUNT</td><td>AVG</td><td>NONE</td></tr>
<tr><td>APPLICATION</td><td>sql.leases.long_wait_for_one_version</td><td>The number of wait for one versions that are taking more than the lease duration.</td><td>Number of wait for long wait for one version routines executing</td><td>GAUGE</td><td>COUNT</td><td>AVG</td><td>NONE</td></tr>
<tr><td>APPLICATION</td><td>sql.leases.long_wait_for_two_version_invariant</td><td>The number of two version invariant waits that are taking more than the lease duration.</td><td>Number of two version invariant wait routines executing</td><td>GAUGE</td><td>COUNT</td><td>AVG</td><td>NONE</td></tr>
Expand Down
10 changes: 5 additions & 5 deletions pkg/bench/rttanalysis/testdata/benchmark_expectations
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ exp,benchmark
8,AlterTableUnsplit/alter_table_unsplit_at_1_value
11,AlterTableUnsplit/alter_table_unsplit_at_2_values
14,AlterTableUnsplit/alter_table_unsplit_at_3_values
5,Audit/select_from_an_audit_table
2-4,Audit/select_from_an_audit_table
26,CreateRole/create_role_with_1_option
29,CreateRole/create_role_with_2_options
32,CreateRole/create_role_with_3_options
Expand All @@ -52,7 +52,7 @@ exp,benchmark
17,DropView/drop_2_views
17,DropView/drop_3_views
5,GenerateObjects/generate_1000_tables_-_this_test_should_use_the_same_number_of_RTTs_as_for_10_tables
11,GenerateObjects/generate_100_tables_from_template
8-10,GenerateObjects/generate_100_tables_from_template
5,GenerateObjects/generate_10_tables
16,GenerateObjects/generate_10x10_schemas_and_tables_in_existing_db
5,GenerateObjects/generate_50000_tables
Expand Down Expand Up @@ -125,9 +125,9 @@ exp,benchmark
20,Truncate/truncate_2_column_0_rows
20,Truncate/truncate_2_column_1_rows
20,Truncate/truncate_2_column_2_rows
3,UDFResolution/select_from_udf
5,UseManyRoles/use_2_roles
3,UseManyRoles/use_50_roles
0,UDFResolution/select_from_udf
2,UseManyRoles/use_2_roles
0,UseManyRoles/use_50_roles
1,VirtualTableQueries/select_crdb_internal.invalid_objects_with_1_fk
1,VirtualTableQueries/select_crdb_internal.tables_with_1_fk
5,VirtualTableQueries/virtual_table_cache_with_point_lookups
Expand Down
3 changes: 3 additions & 0 deletions pkg/sql/catalog/descpb/lease.proto
Original file line number Diff line number Diff line change
Expand Up @@ -32,4 +32,7 @@ message WaitStats {
(gogoproto.customtype) =
"github.com/cockroachdb/cockroach/pkg/util/uuid.UUID"
];
// TargetCount the number of leases we expected to observe, which will be
// non-zero for wait for initial version.
int32 target_count = 7;
}
7 changes: 7 additions & 0 deletions pkg/sql/catalog/descs/collection.go
Original file line number Diff line number Diff line change
Expand Up @@ -210,6 +210,13 @@ func (tc *Collection) ReleaseAll(ctx context.Context) {
tc.skipValidationOnWrite = false
}

// GetLeaseGeneration provides an integer which will change whenever new
// descriptor versions are available. This can be used for fast comparisons
// to make sure previously looked up information is still valid.
func (tc *Collection) GetLeaseGeneration() int64 {
return tc.leased.lm.GetLeaseGeneration()
}

// HasUncommittedTables returns true if the Collection contains uncommitted
// tables.
func (tc *Collection) HasUncommittedTables() (has bool) {
Expand Down
2 changes: 2 additions & 0 deletions pkg/sql/catalog/descs/leased_descriptors.go
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,8 @@ type LeaseManager interface {
) (decrAfterWait func())

GetSafeReplicationTS() hlc.Timestamp

GetLeaseGeneration() int64
}

type deadlineHolder interface {
Expand Down
46 changes: 29 additions & 17 deletions pkg/sql/catalog/lease/count.go
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,8 @@ import (
type countDetail struct {
// count is the number of unexpired leases
count int
// targetCount is the target number we are trying to reach.
targetCount int
// numSQLInstances is the number of distinct SQL instances with unexpired leases.
numSQLInstances int
// sampleSQLInstanceID is one of the sql_instance_id values we are waiting on,
Expand Down Expand Up @@ -292,23 +294,7 @@ func countLeasesByRegion(
} else {
err = queryRegionRows(ctx)
}
if err != nil {
if regionliveness.IsQueryTimeoutErr(err) {
// Probe and mark the region potentially.
probeErr := prober.ProbeLiveness(ctx, region)
if probeErr != nil {
err = errors.WithSecondaryError(err, probeErr)
return err
}
return errors.Wrapf(err, "count-lease timed out reading from a region")
} else if regionliveness.IsMissingRegionEnumErr(err) {
// Skip this region because we were unable to find region in
// type descriptor. Since the database regions are cached, they
// may be stale and have dropped regions.
log.Infof(ctx, "count-lease is skipping region %s because of the "+
"following error %v", region, err)
return nil
}
if err := handleRegionLivenessErrors(ctx, prober, region, err); err != nil {
return err
}
if values == nil {
Expand Down Expand Up @@ -343,3 +329,29 @@ func getCountLeaseColumns(usesOldSchema bool) string {
sb.WriteString(`, count(distinct sql_instance_id), ifnull(min(sql_instance_id),0)`)
return sb.String()
}

// handleRegionLivenessErrors handles errors that are linked to region liveness
// timeouts.
func handleRegionLivenessErrors(
ctx context.Context, prober regionliveness.Prober, region string, err error,
) error {
if err != nil {
if regionliveness.IsQueryTimeoutErr(err) {
// Probe and mark the region potentially.
probeErr := prober.ProbeLiveness(ctx, region)
if probeErr != nil {
err = errors.WithSecondaryError(err, probeErr)
return err
}
return errors.Wrapf(err, "count-lease timed out reading from a region")
} else if regionliveness.IsMissingRegionEnumErr(err) {
// Skip this region because we were unable to find region in
// type descriptor. Since the database regions are cached, they
// may be stale and have dropped regions.
log.Infof(ctx, "count-lease skipping region %s due to error: %v", region, err)
return nil
}
return err
}
return err
}
Loading

0 comments on commit f29c65b

Please sign in to comment.