-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
builtins: use ST_Union as aggregate #53127
Conversation
c87b15c
to
d48518a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 13 of 13 files at r1.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @otan and @sumeerbhola)
pkg/sql/opt/operator.go, line 406 at r1 (raw file):
case ArrayAggOp, AvgOp, ConcatAggOp, CorrOp, JsonAggOp, JsonbAggOp, JsonObjectAggOp, JsonbObjectAggOp, PercentileContOp, PercentileDiscOp, SqrDiffOp, StdDevOp, StringAggOp, VarianceOp, StdDevPopOp, STUnionOp, VarPopOp:
Unless I'm misunderstanding the function, seems like it can merge.
pkg/sql/opt/operator.go, line 425 at r1 (raw file):
SumOp, SqrDiffOp, VarianceOp, StdDevOp, XorAggOp, JsonAggOp, JsonbAggOp, StringAggOp, PercentileDiscOp, PercentileContOp, StdDevPopOp, STMakeLineOp, STUnionOp, VarPopOp, JsonObjectAggOp, JsonbObjectAggOp:
Unless I'm misunderstanding the function, seems like it does ignore duplicates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @rytaft and @sumeerbhola)
pkg/sql/opt/operator.go, line 406 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
Unless I'm misunderstanding the function, seems like it can merge.
hmm, i think iwas paranoid about order of ops. will do!
pkg/sql/opt/operator.go, line 425 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
Unless I'm misunderstanding the function, seems like it does ignore duplicates.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 7 of 13 files at r1, 6 of 6 files at r2.
Reviewable status: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @otan and @rytaft)
pkg/sql/sem/builtins/aggregate_builtins.go, line 701 at r2 (raw file):
} var err error agg.ewkb, err = geos.Union(agg.ewkb, geomArg.EWKB())
We are allocating a slice for the result each time we call geos.Union
, in cStringToSafeGoBytes
. We could change geos.Union
to accept the existing slice. A TODO is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 5 of 6 files at r2.
Reviewable status: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @otan)
pkg/sql/distsql/columnar_operators_test.go, line 187 at r2 (raw file):
aggFnInputTypes[j] = sqlbase.RandType(rng) } // There is a special case for these functions when at
[nit] these -> some?
bors r=rytaft,sumeerbhola |
46407: kvcoord: small fixes r=andreimatei a=andreimatei See individual commits. 52740: sql: owning a schema gives privilege to drop tables in schema r=RichardJCai a=RichardJCai sql: owning a schema gives privilege to drop tables in schema Release note (sql change): Schema owners can drop tables inside the schema without explicit DROP privilege on the table. Fixes #51931 53127: builtins: use ST_Union as aggregate r=rytaft,sumeerbhola a=otan Add the ST_Union aggregate, removing the two-argument version temporarily as we cannot currently have an aggregate and non-aggregate at the same time. This is ok since we haven't released yet, and from reading it seems more likely people will use the aggregate version. Release note (sql change): Implement the ST_Union builtin as an aggregate. The previous alpha-available ST_Union for two arguments is deprecated. 53162: builtins: implement ST_GeomFromGeoHash/ST_Box2DFromGeoHash r=sumeerbhola a=otan Resolves #48806 Release note (sql change): Implemented the ST_GeomFromGeoHash and ST_Box2DFromGeoHash builtins. 53181: sql/kv: avoid expensive heap allocations with inverted joins r=nvanbenschoten a=nvanbenschoten This PR contains three optimizations that reduce heap allocations by inverted join operations. These were found by profiling the following geospatial query: ```sql SELECT Count(census.blkid), Count(subways.name) FROM nyc_census_blocks AS census JOIN nyc_subway_stations AS subways ON ST_Intersects(subways.geom, census.geom); ``` Combined, these three commits speed up the query by a little over **3%**. ``` name old ms new ms delta Test/postgis_geometry_tutorial/nyc 139 ±12% 135 ±11% -3.04% (p=0.000 n=241+243) ``` #### sql/rowexec: avoid heap allocations in batchedInvertedExprEvaluator This commit restructures the slice manipulation performed in `(*batchedInvertedExprEvaluator).init` to avoid heap allocations. Before the change, this method used to contain the first and fifth most expensive memory allocations by total allocation size (`alloc_space`) in a geospatial query of interest: ``` ----------------------------------------------------------+------------- 3994.06MB 100% | github.com/cockroachdb/cockroach/pkg/sql/rowexec.(*invertedJoiner).readInput /go/src/github.com/cockroachdb/cockroach/pkg/sql/rowexec/inverted_joiner.go:418 3994.06MB 10.70% 10.70% 3994.06MB 10.70% | github.com/cockroachdb/cockroach/pkg/sql/rowexec.(*batchedInvertedExprEvaluator).init /go/src/github.com/cockroachdb/cockroach/pkg/sql/rowexec/inverted_expr_evaluator.go:366 ----------------------------------------------------------+------------- ----------------------------------------------------------+------------- 954.07MB 100% | github.com/cockroachdb/cockroach/pkg/sql/rowexec.(*invertedJoiner).readInput /go/src/github.com/cockroachdb/cockroach/pkg/sql/rowexec/inverted_joiner.go:418 954.07MB 4.61% 41.71% 954.07MB 4.61% | github.com/cockroachdb/cockroach/pkg/sql/rowexec.(*batchedInvertedExprEvaluator).init /go/src/github.com/cockroachdb/cockroach/pkg/sql/rowexec/inverted_expr_evaluator.go:409 ----------------------------------------------------------+------------- ``` The largest offender here was the `routingSpans` slice, which made no attempt to recycle memory. The other offender was the `pendingSpans` slice. We made an attempt to recycle the memory in this slice, except we truncated it from the front instead of the back, so we ended us not actually recycling anything. This commit addresses this in two ways. First, it updates the `pendingSpans` slice to point into the `routingSpans` slice so the two can share memory. `pendingSpans` becomes a mutable window into `routingSpans`, so it no longer requires its own heap allocation. Next, the commit updates the memory recycling to recycle `routingSpans` instead of `pendingSpans`. Since we never truncate `routingSpans` from the front, the recycling now works. With these two changes, the two heap allocations, which used to account for **15.31%** of total allocated space combined drops to a single heap allocation that accounts for only **2.43%** of total allocated space on the geospatial query. ``` ----------------------------------------------------------+------------- 182.62MB 100% | github.com/cockroachdb/cockroach/pkg/sql/rowexec.(*invertedJoiner).readInput /go/src/github.com/cockroachdb/cockroach/pkg/sql/rowexec/inverted_joiner.go:418 182.62MB 2.43% 58.36% 182.62MB 2.43% | github.com/cockroachdb/cockroach/pkg/sql/rowexec.(*batchedInvertedExprEvaluator).init /go/src/github.com/cockroachdb/cockroach/pkg/sql/rowexec/inverted_expr_evaluator.go:366 ----------------------------------------------------------+------------- ``` #### sql: properly copy span slices This commit fixes two locations where we not optimally performing copies from one slice to another. In `invertedJoiner.generateSpans`, we were not pre-allocating the destination slice and were using append instead of direct assignment, which is measurably slower due to the extra writes to the slice header. In `getProtoSpans`, we again were appending to a slice with zero length and sufficient capacity instead of assigning to a slice with sufficient length. #### kv: check for expensive logging before VEventf in appendRefreshSpans Before this change, we called VEventf directly with the refresh span corresponding to each request in a batch. This caused each span object to allocate while passing through an `interface{}`. This could be seen in heap profiles, where it accounted for **1.89%** of total allocated memory (`alloc_space`) in a geospatial query of interest: ``` ----------------------------------------------------------+------------- 142.01MB 100% | github.com/cockroachdb/cockroach/pkg/roachpb.(*BatchRequest).RefreshSpanIterate /go/src/github.com/cockroachdb/cockroach/pkg/roachpb/batch.go:407 142.01MB 1.89% 68.71% 142.01MB 1.89% | github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*txnSpanRefresher).appendRefreshSpans.func1 /go/src/github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/txn_interceptor_span_refresher.go:510 ----------------------------------------------------------+------------- ``` This commit adds a guard around the VEventf to avoid the allocation. 53196: sqlliveness/slstorage,timeutil: various improvements r=spaskob a=ajwerner See individual commits. This PR addresses some anxieties I had about the implementation of slstorage. In particular, it makes the testing deterministic by giving it a new table independent of the host cluster's state and allows injecting time. It then changes the GC to be less frequent and instead relies on failing out sessions when it determines they are expired. This is important to avoid spinning between when a session is deemed expired and when it would later be deleted by a gc interval. It also jitters that GC interval to avoid problems due to contention in larger clusters. 53214: serverpb: add docstrings to `HotRangesRequest` and `HotRangesResponse` r=mjibson a=knz This is part of a project to document the public HTTP APIs. @mjibson you'll notice in this PR that the doc generator doesn't deal well with newline characters in docstrings. Co-authored-by: Andrei Matei <andrei@cockroachlabs.com> Co-authored-by: richardjcai <caioftherichard@gmail.com> Co-authored-by: Oliver Tan <otan@cockroachlabs.com> Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com> Co-authored-by: Andrew Werner <ajwerner@cockroachlabs.com> Co-authored-by: Raphael 'kena' Poss <knz@thaumogen.net>
Build failed (retrying...): |
Merge conflict. |
bors r- |
Add the ST_Union aggregate, removing the two-argument version temporarily as we cannot currently have an aggregate and non-aggregate at the same time. This is ok since we haven't released yet, and from reading it seems more likely people will use the aggregate version. Release note (sql change): Implement the ST_Union builtin as an aggregate. The previous alpha-available ST_Union for two arguments is deprecated.
bors r=rytaft,sumeerbhola |
This PR was included in a batch that was canceled, it will be automatically retried |
Build failed (retrying...): |
Build succeeded: |
Add the ST_Union aggregate, removing the two-argument version
temporarily as we cannot currently have an aggregate and non-aggregate
at the same time. This is ok since we haven't released yet, and from
reading it seems more likely people will use the aggregate version.
Resolves #51248
Release note (sql change): Implement the ST_Union builtin as an
aggregate. The previous alpha-available ST_Union for two arguments is
deprecated.