-
Notifications
You must be signed in to change notification settings - Fork 4k
sql: support collecting and adding single-column partial table statistics USING EXTREMES to system.table_statistics
#91228
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: support collecting and adding single-column partial table statistics USING EXTREMES to system.table_statistics
#91228
Conversation
24be4da to
623f005
Compare
USING EXTREMES to system.table_statistics
623f005 to
cc480a2
Compare
cc480a2 to
25dac52
Compare
rytaft
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work!
Reviewed 10 of 10 files at r1, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @faizaanmadhani and @michae2)
pkg/jobs/jobspb/jobs.proto line 981 at r1 (raw file):
bool delete_other_stats = 8; // If true, will collect partial table statistics at extreme values
nit: end comment with a period.
pkg/sql/create_stats.go line 266 at r1 (raw file):
var multiColEnabled bool // Disable multi-column stats if partial statistics at the extremes are requested // TODO (faizaanmadhani): Add support for multi-column stats
nit: end with period.
pkg/sql/create_stats.go line 269 at r1 (raw file):
if n.Options.UsingExtremes { multiColEnabled = false } else {
nit: you don't need an else here, I'd just say
if !n.Options.UsingExtremes {
multiColEnabled = ...
}
pkg/sql/distsql_plan_stats.go line 46 at r1 (raw file):
const histogramSamples = 10000 // constructContraintsForExtremes(
looks like a leftover
pkg/sql/distsql_plan_stats.go line 61 at r1 (raw file):
) func (dsp *DistSQLPlanner) createPartialStatsPlan(
It looks like a lot of the code in this function is duplicated from createStatsPlan. Can you extract the common logic into a helper function that is called by both?
pkg/sql/distsql_plan_stats.go line 75 at r1 (raw file):
return nil, err } // Map the ColumnIDs to their histograms if they exist
nit: end with period.
here and comments below
pkg/sql/distsql_plan_stats.go line 88 at r1 (raw file):
} if len(reqStat.columns) > 1 {
nit: check this at the top of the function so you don't waste time computing the tableStatsMap if it's not needed
In general you should check for valid input at the top of functions if needed
pkg/sql/distsql_plan_stats.go line 90 at r1 (raw file):
if len(reqStat.columns) > 1 { // TODO (faizaanmadhani): Add support for creating multi-column stats return nil, errors.Newf("multi-column partial statistics are not currently supported")
Let's use pgerror.Newf with pgcode FeatureNotSupported
pkg/sql/distsql_plan_stats.go line 122 at r1 (raw file):
sb.Init(planCtx.EvalContext(), planCtx.ExtendedEvalCtx.Codec, desc, scan.index) var sketchSpec, invSketchSpec []execinfrapb.SketchSpec
nit: move this closer to where it's used below
pkg/sql/distsql_plan_stats.go line 124 at r1 (raw file):
var sketchSpec, invSketchSpec []execinfrapb.SketchSpec histogram, ok := tableStatsMap[column.GetID()]
It looks like this is the only place you're using the tableStatsMap, so I wouldn't bother creating a map at all -- just loop through the stats here to find the one stat you care about. You also don't need to bother fetching from the cache until you get here either.
pkg/sql/distsql_plan_stats.go line 126 at r1 (raw file):
histogram, ok := tableStatsMap[column.GetID()] if !ok { return nil, errors.Newf("column %s does not have a prior statistic", column.GetName())
Let's try to find an appropriate pgcode for these errors. MaybeObjectNotInPrerequisiteState?
pkg/sql/distsql_plan_stats.go line 150 at r1 (raw file):
return nil, err } scan.isFull = false
nit: the default value of booleans is false, so you don't need to set this explicitly
pkg/sql/distsql_plan_stats.go line 209 at r1 (raw file):
invSketchSpec = append(invSketchSpec, spec) } else { // Append the new sketch specification to the slice of specifications.
nit: this kind of comment is not especially useful -- it's easy to see what's happening just from looking at the code.
pkg/sql/distsql_plan_stats.go line 574 at r1 (raw file):
func constructUpperBoundSpan(histogram []cat.HistogramBucket) constraint.Span { var ubSpan constraint.Span if len(histogram) == 0 {
but you already checked this before calling this function. Do you want to allow an empty histogram or not?
pkg/sql/distsql_plan_stats.go line 592 at r1 (raw file):
} // convertUsingExtremesToPredicate returns string of a predicate identifying
nit: convertUsingExtremesToPredicate -> constructUsingExtremesPredicate
pkg/sql/distsql_plan_stats.go line 597 at r1 (raw file):
lower constraint.Span, upper constraint.Span, columnName string, ) string { predicate := fmt.Sprintf("%s <= %s OR %s >= %s", columnName, lower.EndKey().Value(0).String(), columnName, upper.StartKey().Value(0).String())
should this be <= and >= or < and >? Or maybe you should check whether the boundaries are constraint.IncludeBoundary or constraint.ExcludeBoundary?
pkg/sql/scan.go line 236 at r1 (raw file):
func (n *scanNode) initDescSpecificCol(colCfg scanColumnsConfig, prefixCol catalog.Column) error { n.colCfg = colCfg indexes := n.desc.AllIndexes()
I think we probably want ActiveIndexes
pkg/sql/scan.go line 242 at r1 (raw file):
// table where prefixCol is the key column. foundIndex := false for _, idx := range indexes {
We should probably check that this is a forward index (i.e., GetType() == IndexDescriptor_FORWARD)
pkg/sql/scan.go line 253 at r1 (raw file):
} if !foundIndex { return errors.Newf("table %s does not contain an index with %s as a prefix column", n.desc.GetName(), prefixCol.GetName())
Let's use a pgcode error for this with pgerror.Newf. Maybe pgcode.InvalidColumnReference or pgcode.InvalidParameterValue?
pkg/sql/scan.go line 260 at r1 (raw file):
return err } // Set up the rest of the scanNode
nit: end with period.
pkg/sql/stats/new_stat.go line 98 at r1 (raw file):
if partialPredicate == "" { predicateValue = nil } else {
nit: you don't need else here, since the default constructed value should already be nil. just do:
if partialPredicate != "" {
...
}
pkg/sql/stats/new_stat.go line 113 at r1 (raw file):
"avgSize", histogram, "partialPredicate"
formatting seems messed up
pkg/sql/stats/table_statistic.proto line 59 at r1 (raw file):
// The average row size of the columns in ColumnIDs. uint64 avg_size = 10; // Is a predicate representing the range of a partial statistic,
nit: "Is a predicate" -> "A predicate"
pkg/sql/logictest/testdata/logic_test/distsql_stats line 1904 at r1 (raw file):
u_defaults {d,c,b,a} # Test single column partial statistics creation
nit: end with a period here and below.
pkg/sql/logictest/testdata/logic_test/distsql_stats line 2028 at r1 (raw file):
statement error pq: table xy does not contain an index with y as a prefix column CREATE STATISTICS xy_y_partial ON y FROM xy USING EXTREMES;
it would be good to add a couple of tests with null values too -- e.g., if the only value in a table is null, or if there exists a null value along with the other values
79327b0 to
7d39975
Compare
DrewKimball
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job! I'm not super familiar with the stats code, but I left a few questions and nits. By the way, you can respond to comments you've addressed with Done. in order to mark them as resolved.
Reviewed 3 of 10 files at r1, 7 of 7 files at r2, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @faizaanmadhani, @michae2, and @rytaft)
pkg/sql/create_stats.go line 273 at r2 (raw file):
return nil, err } deleteOtherStats = true
Do we want to be deleting old stats when we collect partial statistics?
Also, will there ever be a case where we don't specify column names for partial stats? It looks from the RFC like this is possible, but that we only collect for the first column in each index, is that correct?
pkg/sql/distsql_plan_stats.go line 234 at r2 (raw file):
extremesPredicate := constructUsingExtremesPredicate(lowerBoundSpan, upperBoundSpan, column.GetName()) var extremesSpans constraint.Spans extremesSpans.InitSingleSpan(&lowerBoundSpan)
Shouldn't upperBoundSpan be added here?
pkg/sql/distsql_plan_stats.go line 461 at r2 (raw file):
tableDesc := tabledesc.NewBuilder(&details.Table).BuildImmutableTable() if details.UsingExtremes { // Currently, we limit the number of requested for partial statistics
[nit] requested -> requests
pkg/sql/scan.go line 242 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
We should probably check that this is a forward index (i.e.,
GetType() == IndexDescriptor_FORWARD)
Should we also check that the index isn't partial? I guess the index predicate could match the partial stats predicate, but that seems complicated to handle.
Similarly, I think the span generation logic needs to know if the index is ascending or descending - we should either handle it there or filter out the case here.
pkg/sql/rowexec/sample_aggregator.go line 492 at r2 (raw file):
for i, c := range si.spec.Columns { columnIDs[i] = s.sampledCols[c]
[super nit] did you mean to add this newline?
pkg/sql/rowexec/sample_aggregator.go line 496 at r2 (raw file):
// Delete old stats that have been superseded. if err := stats.DeleteOldStatsForColumns(
Similar question to above, do we want to be deleting old stats after gathering partial statistics?
pkg/sql/span/span_builder.go line 246 at r2 (raw file):
// spans. A span.Splitter can be used to generate more specific // family spans from constraints, datums and encdatums. func (s *Builder) SpansFromConstraintSpan(
[nit] Should we just unexport appendSpansFromConstraintSpan? It seems like that's what you really want to use - you'd just call it once for each constraint span since it already has appending behavior.
pkg/sql/logictest/testdata/logic_test/distsql_stats line 2028 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
it would be good to add a couple of tests with null values too -- e.g., if the only value in a table is null, or if there exists a null value along with the other values
Probably also want tests with a partial index and a descending index.
7d39975 to
650b723
Compare
15ceba1 to
b337823
Compare
faizaanmadhani
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the reviews! I've addressed all the comments. Adding support for some of the things we discussed, like making sure we would only use the latest full stat collection when doing the next partial stat collection required updating the show statistics code to include a partial_predicate column to avoid some errors being thrown, which I intended to do later but have done in this PR. Tests for it have been included as well.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball, @michae2, and @rytaft)
pkg/sql/distsql_plan_stats.go line 461 at r2 (raw file):
Previously, DrewKimball (Drew Kimball) wrote…
[nit] requested -> requests
Done.
pkg/sql/rowexec/sample_aggregator.go line 496 at r2 (raw file):
Previously, DrewKimball (Drew Kimball) wrote…
Similar question to above, do we want to be deleting old stats after gathering partial statistics?
Done.
pkg/sql/stats/new_stat.go line 113 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
formatting seems messed up
Done.
pkg/sql/logictest/testdata/logic_test/distsql_stats line 2028 at r1 (raw file):
Previously, DrewKimball (Drew Kimball) wrote…
Probably also want tests with a partial index and a descending index.
Done.
b337823 to
4bfe553
Compare
michae2
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of this needs to be conditioned on the V23_1AddPartialStatisticsPredicateCol version gate. I'll try to identify all the spots.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball and @rytaft)
rytaft
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 6 files at r3, 10 of 10 files at r4, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball and @faizaanmadhani)
-- commits line 10 at r4:
this should also be added to the release note
pkg/sql/distsql_plan_stats.go line 81 at r4 (raw file):
// since we only support one reqStat at a time. for _, s := range reqStats { sampler.MaxFractionIdle = details.MaxFractionIdle
You can remove this since you're doing it above the loop
pkg/sql/distsql_plan_stats.go line 171 at r4 (raw file):
) (*PhysicalPlan, error) { // We can only process one statistic
nit: end with period.
Also -- should you return an error if len(reqStats) != 1?
pkg/sql/distsql_plan_stats.go line 420 at r4 (raw file):
} // Estimate the expected number of rows based on existing stats in the cache.
This comment should be moved inside createAndAttachSamplers, just above the declaration of the rowsExpected variable
pkg/sql/distsql_plan_stats.go line 469 at r4 (raw file):
if len(reqStats) > 1 { return nil, errors.New("cannot process multiple partial statistics at once") }
Let's put this check at the top of createPartialStatsPlan
pkg/sql/distsql_plan_stats.go line 518 at r4 (raw file):
predicate = fmt.Sprintf("%s < %s OR %s > %s", columnName, extremesSpans.Get(0).EndKey().Value(0).String(), columnName, extremesSpans.Get(1).StartKey().Value(0).String()) } else { predicate = fmt.Sprintf("%s < %s OR %s > %s", columnName, extremesSpans.Get(1).StartKey().Value(0).String(), columnName, extremesSpans.Get(0).EndKey().Value(0).String())
I realized thanks to your new tests that this predicate is technically not correct right now, since the spans include null values. Either we should change the predicate to col < %s OR col > %s OR col IS NULL, or we should update the spans to exclude NULL values. I'm not sure which is best, but I don't really see any harm in collecting nulls, so for now I think you can just update the predicate here to include IS NULL.
pkg/sql/distsql_plan_stats.go line 524 at r4 (raw file):
// constructExtremesSpans returns a constraint.Spans consisting of a lowerbound and upperbound span // covering the extremes of an index
nit: comments should be <= 80 columns wide and end with a period.
pkg/sql/distsql_plan_stats.go line 532 at r4 (raw file):
lowerBound := histogram[0].UpperBound upperBound := constraint.MakeKey(histogram[len(histogram)-1].UpperBound) // Pick the latest lowerBound that is not null,
nit: latest -> earliest ?
pkg/sql/distsql_plan_stats.go line 534 at r4 (raw file):
// Pick the latest lowerBound that is not null, // but if none exist, return error for _, hist := range histogram {
instead of copying each bucket, this would be more efficient:
for i := range histogram {
if histogram[i].UpperBound.Compare ...
}
or
for i := range histogram {
hist := &histogram[i]
if hist.UpperBound.Compare ...
}
pkg/sql/scan.go line 234 at r4 (raw file):
} // initDescSpecificCol initializes the column structures with the index that the provided column
nit: comments should be <= 80 columns wide
pkg/sql/scan.go line 235 at r4 (raw file):
// initDescSpecificCol initializes the column structures with the index that the provided column // is the prefix for, and returns that index.
doesn't look like it returns the index....
pkg/sql/show_stats.go line 97 at r4 (raw file):
"nullCount", "avgSize", "partialPredicate",
nit: formatting is messed up
pkg/sql/rowexec/sample_aggregator.go line 497 at r4 (raw file):
// if the new statistic is not partial if si.spec.PartialPredicate == "" { if err := stats.DeleteOldStatsForColumns(
FYI -- Inside this DeleteOldStatsForColumns function, I think we'll probably also want to add a check for the partial predicate (i.e., the few old stats we keep should be full, not partial). This won't really matter until you add automatic partial stats, but just wanted to mention it now so I don't forget later. No need to make the change now, but just FYI.
pkg/sql/stats/forecast.go line 248 at r4 (raw file):
NullCount: uint64(nullCount), AvgSize: uint64(avgSize), IsForecast: true,
Is this needed? We're already using the ForecastStatsName for this purpose
pkg/sql/stats/stats_cache.go line 720 at r4 (raw file):
"nullCount", "avgSize", "partialPredicate",
nit: formatting messed up
pkg/sql/stats/table_statistic.proto line 64 at r4 (raw file):
// A boolean which is true if the statistic has been // forecasted, and false if not bool is_forecast = 12;
Like I mentioned above -- I'm not convinced we need this
pkg/sql/logictest/testdata/logic_test/distsql_stats line 2028 at r1 (raw file):
Previously, faizaanmadhani (Faizaan Madhani) wrote…
Done.
I don't think you added any tests with partial indexes
pkg/sql/logictest/testdata/logic_test/distsql_stats line 2044 at r4 (raw file):
# Test descending indexes. statement ok create table d_desc (a int, b int, index (a desc, b));
nit: use all caps for SQL keywords. here and below
pkg/sql/logictest/testdata/logic_test/distsql_stats line 2094 at r4 (raw file):
# Test if requesting a partial stat again uses the previous full stat and not the previous partial stat. statement ok CREATE STATISTICS sdnp2 ON a FROM d_desc USING EXTREMES
I don't think this is testing what you want, since the previous partial stat didn't actually scan any non-null values.
86ef2db to
3f61eb9
Compare
faizaanmadhani
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball and @rytaft)
Previously, rytaft (Rebecca Taft) wrote…
this should also be added to the release note
Done.
pkg/sql/distsql_plan_stats.go line 81 at r4 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
You can remove this since you're doing it above the loop
Done.
pkg/sql/distsql_plan_stats.go line 171 at r4 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
nit: end with period.
Also -- should you return an error if
len(reqStats) != 1?
Right now I'm doing it on line 467 before calling this function. As per your comment below, I've moved it inside this function 🙂
pkg/sql/distsql_plan_stats.go line 420 at r4 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
This comment should be moved inside
createAndAttachSamplers, just above the declaration of therowsExpectedvariable
Done.
pkg/sql/distsql_plan_stats.go line 469 at r4 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
Let's put this check at the top of
createPartialStatsPlan
Done.
pkg/sql/distsql_plan_stats.go line 518 at r4 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
I realized thanks to your new tests that this predicate is technically not correct right now, since the spans include null values. Either we should change the predicate to
col < %s OR col > %s OR col IS NULL, or we should update the spans to exclude NULL values. I'm not sure which is best, but I don't really see any harm in collecting nulls, so for now I think you can just update the predicate here to include IS NULL.
Done.
pkg/sql/distsql_plan_stats.go line 534 at r4 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
instead of copying each bucket, this would be more efficient:
for i := range histogram { if histogram[i].UpperBound.Compare ... }or
for i := range histogram { hist := &histogram[i] if hist.UpperBound.Compare ... }
Done.
pkg/sql/scan.go line 235 at r4 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
doesn't look like it returns the index....
Yes, that was an accident. I wrote the comment when I started editing the function but missed changing it when I was done.
pkg/sql/stats/table_statistic.proto line 64 at r4 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
Like I mentioned above -- I'm not convinced we need this
Removed.
pkg/sql/logictest/testdata/logic_test/distsql_stats line 2028 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
I don't think you added any tests with partial indexes
Yup, sorry. Should have added an error test for partial indexes. It's been added now.
pkg/sql/logictest/testdata/logic_test/distsql_stats line 2094 at r4 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
I don't think this is testing what you want, since the previous partial stat didn't actually scan any non-null values.
Yeah, I see. I removed this test and made a similar one higher up to an earlier table without null values, where the previous partial stat does retrieve not null columns.
pkg/sql/stats/forecast.go line 248 at r4 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
Is this needed? We're already using the
ForecastStatsNamefor this purpose
Oh, I didn't realize that :( !
Hence, I added this additional field to account for it. I'll use ForecastStatsName instead.
3f61eb9 to
7b8919c
Compare
rytaft
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice job working through all the comments! No other comments for now, but I'll take another look once you add checks for V23_1AddPartialStatisticsPredicateCol.
Reviewed 8 of 8 files at r5, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball and @faizaanmadhani)
pkg/sql/distsql_plan_stats.go line 171 at r4 (raw file):
Previously, faizaanmadhani (Faizaan Madhani) wrote…
Right now I'm doing it on line 467 before calling this function. As per your comment below, I've moved it inside this function 🙂
Thanks. I think you can get rid of this comment now.
c8445a9 to
e3e1cc6
Compare
rytaft
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 7 of 7 files at r6, 1 of 1 files at r7, all commit messages.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @DrewKimball, @faizaanmadhani, and @michae2)
pkg/sql/alter_table.go line 1396 at r7 (raw file):
"nullCount", "avgSize", "partialPredicate",
nit: formatting is messed up
pkg/sql/stats/stats_cache.go line 618 at r7 (raw file):
var err error if sc.Settings.Version.IsActive(ctx, clusterversion.V23_1AddPartialStatisticsPredicateCol) { tsp, err = NewTableStatisticProto(datums, true)
nit: true /* partialPredColVerActive */ (add the comment here and next to false below)
295cbca to
6f7dd2e
Compare
This commit adds support to manually create partial statistics at the extremes of indexes for individual columns. Columns must be the key of their index, otherwise an error will be returned. Partial statistics are stored in `system.table_statistics` and are identified by a predicate indicating their coverage. Additionally, this PR also adds a partial_predicate column to the output to `SHOW STATISTICS`. Epic: CRDB-19449 Release note (sql change): users can now manually create partial single-column statistics at the extreme values on columns that are prefixes of their index. The output of SHOW STATISTICS now includes a column indicating the partial predicate for a partial statistic, or NULL for a full statistic.
6f7dd2e to
0c2c583
Compare
|
TFTRs!! 🎉 bors r=rytaft |
rytaft
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball, @faizaanmadhani, and @michae2)
pkg/sql/stats/stats_cache.go line 621 at r9 (raw file):
} else { tsp, err = NewTableStatisticProto(datums, false /* partialPredColVerActive */) }
(No need to stop bors from running since this is just a nit, but you might fix in a future PR...)
nit: instead of passing true and false -- you can just pass partialPredicateVerColActive
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball, @michae2, and @rytaft)
pkg/sql/stats/stats_cache.go line 621 at r9 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
(No need to stop bors from running since this is just a nit, but you might fix in a future PR...)
nit: instead of passing true and false -- you can just pass
partialPredicateVerColActive
I see, I'll fix this in my next merging histograms PR!
|
Build succeeded: |
michae2
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I'm coming to the late! I realize this has already been merged, but I've noticed a few small things that we should fix in another PR.
Nice work, Faizaan! This is awesome!
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale)
pkg/sql/distsql_plan_stats.go line 227 at r9 (raw file):
// cache. for _, t := range tableStats { if len(t.ColumnIDs) == 1 && column.GetID() == t.ColumnIDs[0] && t.PartialPredicate == "" && t.Name != jobspb.ForecastStatsName {
There's currently an odd situation with STRING and ARRAY types, where they can have two stats for the same column, one for an inverted index and one for a normal (forward) index. We only want to look at histograms for forward indexes. The way to check for this is to add && (!colinfo.ColumnTypeIsInvertedIndexable(column.GetType()) || t.HistogramType().Family() != types.BytesFamily) to the conditions.
pkg/sql/distsql_plan_stats.go line 284 at r9 (raw file):
} var sketchSpec, invSketchSpec []execinfrapb.SketchSpec if reqStat.inverted {
If reqStat.inverted is true, we should return an error for now. We're already skipping over inverted indexes in scan.initDescSpecificCol(colCfg, column). Supporting USING EXTREMES on inverted indexes will require some more work.
pkg/sql/distsql_plan_stats.go line 515 at r9 (raw file):
var predicate string if index.GetKeyColumnDirection(0) == catpb.IndexColumn_ASC { predicate = fmt.Sprintf("%s < %s OR %s > %s OR %s is NULL", columnName, extremesSpans.Get(0).EndKey().Value(0).String(), columnName, extremesSpans.Get(1).StartKey().Value(0).String(), columnName)
When saving the column names and literal values to disk, we have to be extra careful to write them in a way that will be parsable later on (possibly even by a different version of the software). We have a special way to do this, by calling foo.Format(tree.NewFmtCtx(tree.FmtSerializable)) on any object foo that is a NodeFormatter. This should be guaranteed to be parsable (foo.String() is not).
I believe all tree.Datum types are NodeFormatter. And I think you can call column.ColName() to get a tree.Node that is also a NodeFormatter.
pkg/sql/stats/new_stat.go line 95 at r9 (raw file):
} if !settings.Version.IsActive(ctx, clusterversion.V23_1AddPartialStatisticsPredicateCol) {
This isn't quite right. If we've collected partial stats, we definitely don't want to insert them without a partialPredicate, as if they were full stats. It would probably be better to disallow partial stats collection altogether.
I'm going to do a little more research on upgrades. It's possible we might still be able to insert into the new version of the table even if we haven't crossed the version gate.
michae2
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale)
pkg/sql/stats/new_stat.go line 95 at r9 (raw file):
Previously, michae2 (Michael Erickson) wrote…
This isn't quite right. If we've collected partial stats, we definitely don't want to insert them without a partialPredicate, as if they were full stats. It would probably be better to disallow partial stats collection altogether.
I'm going to do a little more research on upgrades. It's possible we might still be able to insert into the new version of the table even if we haven't crossed the version gate.
After a little more research: before the version gate the table still has the old schema. So we cannot insert the predicate, and we cannot simply omit it, either. I think we should throw an error here if there is a predicate.
To make it less expensive, we should also add an additional version check earlier, before we've actually collected the statistics. Probably somewhere in distsql_plan_stats.go.
This commit adds support to manually create partial statistics
at the extremes of indexes for individual columns. Columns must
be the key of their index, otherwise an error will be returned.
Partial statistics are stored in
system.table_statisticsand areidentified by a predicate indicating their coverage. Additionally,
this PR also adds a partial_predicate column to the output to
SHOW STATISTICS.Epic: CRDB-19449
Release note (sql change): users can now manually create partial
single-column statistics at the extreme values on
columns that are prefixes of their index. The output of SHOW
STATISTICS now includes a column indicating the partial predicate
for a partial statistic, or NULL for a full statistic.