-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: output RU estimate for EXPLAIN ANALYZE on tenants #89256
Conversation
0bc21b1
to
238ae9d
Compare
c4cad32
to
29969e8
Compare
TODO: squash commits before merging. |
22c2ab2
to
26b0658
Compare
26b0658
to
b45a63a
Compare
This comment was marked as resolved.
This comment was marked as resolved.
b45a63a
to
227c501
Compare
Alright, the only test failures I see now are this:
and this:
both of which seem unrelated. |
071137b
to
5c788b9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work!
Reviewed 1 of 13 files at r1, 1 of 16 files at r2, 5 of 27 files at r6, 24 of 24 files at r13, 18 of 18 files at r14, 16 of 16 files at r15, 3 of 3 files at r16, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @andy-kimball and @DrewKimball)
pkg/ccl/multitenantccl/tenantcostclient/tenant_side.go
line 878 at r14 (raw file):
c.mu.Lock() cpuAvg = c.mu.avgCPUPerSec c.mu.Unlock()
nit: I believe in one of the recent Go releases the price of the defer
has gone done significantly, so we switched to mostly Unlock
ing in a defer
in situations like this, even in performance sensitive code. But up to you.
pkg/multitenant/multitenantcpu/cpu_usage.go
line 9 at r14 (raw file):
// https://github.com/cockroachdb/cockroach/blob/master/licenses/CCL.txt package multitenantcpu
Should this be somewhere in pkg/ccl
subtree? Alternatively, should this use BSL license (similar to multitenantio
)?
pkg/multitenant/multitenantcpu/cpu_usage.go
line 63 at r14 (raw file):
// It is used for measuring tenant RU consumption. func GetCPUSeconds(ctx context.Context) (cpuSecs float64) { userTimeMillis, sysTimeMillis, err := status.GetCPUTime(ctx)
GetCPUTime
doesn't actually use ctx
argument. We could remove it which will reduce the amount of plumbing needed here.
pkg/sql/distsql_running.go
line 617 at r13 (raw file):
// Only collect the network egress estimate for a tenant that is running // EXPLAIN ANALYZE, since the overhead is non-negligible. recv.isTenantExplainAnalyze = instrumentation.collectExecStats &&
nit: I think the second part of the AND is sufficient, no?
pkg/sql/colflow/stats.go
line 234 at r15 (raw file):
execstats.PopulateKVMVCCStats(&s.KV, &scanStats) if vsc.isTenant { // Only set the RUs consumed if this is a tenant.
nit: would it be ok to set it for single tenant deployments too (since RuConsumed
would be zero)? This would then be ignored in instrumentation.go
.
pkg/sql/colflow/vectorized_flow.go
line 506 at r14 (raw file):
MaxMemUsage: optional.MakeUint(uint64(flowCtx.Mon.MaximumBytes())), MaxDiskUsage: optional.MakeUint(uint64(flowCtx.DiskMonitor.MaximumBytes())), ConsumedRU: optional.MakeUint(uint64(cpuStatsCollector.EndCollection(ctx))),
We also need to make a similar change to flowinfra/outbox.go
for the row-based flows.
pkg/sql/colflow/vectorized_flow.go
line 750 at r14 (raw file):
stream *execinfrapb.StreamEndpointSpec, factory coldata.ColumnFactory, getStats func(ctx2 context.Context) []*execinfrapb.ComponentStats,
nit: can omit the variable name altogether.
pkg/sql/execstats/stats.go
line 77 at r15 (raw file):
// RuConsumed is the number of RUs that were consumed during the course of a // scan. RuConsumed uint64
nit: maybe s/RuConsumed/ConsumedRU/
for symmetry with the proto?
pkg/ccl/multitenantccl/tenantcostclient/query_ru_estimate_test.go
line 67 at r16 (raw file):
} params.DisableDefaultTestTenant = true
nit: this is already set above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice! I'm not so familiar with this code so someone else should approve, but I just had a couple of small questions.
Reviewed 1 of 13 files at r1, 1 of 16 files at r2, 5 of 27 files at r6, 24 of 24 files at r13, 18 of 18 files at r14, 16 of 16 files at r15, 3 of 3 files at r16, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @andy-kimball, @DrewKimball, and @yuzefovich)
pkg/multitenant/multitenantcpu/cpu_usage.go
line 52 at r16 (raw file):
cpuDelta := GetCPUSeconds(ctx) - h.startCPU timeElapsed := timeutil.Since(h.startTime) expectedCPUDelta := timeElapsed.Seconds() * h.avgCPUPerSecond
I guess this won't work well when there are many concurrent queries?
pkg/sql/rowexec/inverted_joiner.go
line 782 at r15 (raw file):
} if ij.FlowCtx.Cfg.TenantCostController != nil { // Don't populate the consumed RU unless this is a tenant.
Would it be bad to collect this for non-tenants? I know someone was asking if it would be possible to surface RU estimates even for non-tenants so users could know what their workload might cost on CRDB Serverless. Then you could also avoid a lot of plumbing.
(maybe the same question @yuzefovich had)
pkg/ccl/multitenantccl/tenantcostclient/query_ru_estimate_test.go
line 43 at r16 (raw file):
// consumption is within reasonable distance from the actual measured RUs for // the tenant. func TestEstimateQueryRUConsumption(t *testing.T) {
Nice test!
pkg/ccl/multitenantccl/tenantcostclient/query_ru_estimate_test.go
line 136 at r16 (raw file):
if strings.Contains(val, "estimated RUs consumed") { substr := strings.Split(val, " ") if len(substr) == 4 {
if this is false should you return an error?
d1ba637
to
783dcb7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @andy-kimball, @rytaft, and @yuzefovich)
pkg/ccl/multitenantccl/tenantcostclient/tenant_side.go
line 796 at r1 (raw file):
Previously, andy-kimball (Andy Kimball) wrote…
This should go above the lock, since we want to release the lock as quickly as possible.
Done.
pkg/ccl/multitenantccl/tenantcostclient/tenant_side.go
line 878 at r14 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: I believe in one of the recent Go releases the price of the
defer
has gone done significantly, so we switched to mostlyUnlock
ing in adefer
in situations like this, even in performance sensitive code. But up to you.
Done. That's good to know
pkg/multitenant/multitenantcpu/cpu_usage.go
line 9 at r14 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
Should this be somewhere in
pkg/ccl
subtree? Alternatively, should this use BSL license (similar tomultitenantio
)?
Oops, just copied that license blindly. I'm not really sure of where things are supposed to be located WRT multitenant
vs multitenantccl
but for now I've just changed the license. Done.
pkg/multitenant/multitenantcpu/cpu_usage.go
line 63 at r14 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
GetCPUTime
doesn't actually usectx
argument. We could remove it which will reduce the amount of plumbing needed here.
That's a good point. How would you handle the logging here without the context though? Returning the error would just run into the same problem further up the stack. I guess an option would be to just return 0 on error, but that seems problematic.
pkg/multitenant/multitenantcpu/cpu_usage.go
line 52 at r16 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
I guess this won't work well when there are many concurrent queries?
Yes, I'll add a comment. Done.
pkg/sql/distsql_running.go
line 617 at r13 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: I think the second part of the AND is sufficient, no?
Done.
pkg/sql/colflow/stats.go
line 234 at r15 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: would it be ok to set it for single tenant deployments too (since
RuConsumed
would be zero)? This would then be ignored ininstrumentation.go
.
Good point, that cleans up some plumbing I had to do. Done.
pkg/sql/colflow/vectorized_flow.go
line 506 at r14 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
We also need to make a similar change to
flowinfra/outbox.go
for the row-based flows.
Done. I've moved the cpu usage collector for remote flows to the FlowCtx
since I didn't see a good way to propagate it to the row-based outbox otherwise.
pkg/sql/colflow/vectorized_flow.go
line 750 at r14 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: can omit the variable name altogether.
Done.
pkg/sql/execstats/stats.go
line 77 at r15 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: maybe
s/RuConsumed/ConsumedRU/
for symmetry with the proto?
Done.
pkg/sql/rowexec/inverted_joiner.go
line 782 at r15 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
Would it be bad to collect this for non-tenants? I know someone was asking if it would be possible to surface RU estimates even for non-tenants so users could know what their workload might cost on CRDB Serverless. Then you could also avoid a lot of plumbing.
(maybe the same question @yuzefovich had)
Good point, Done.
pkg/ccl/multitenantccl/tenantcostclient/query_ru_estimate_test.go
line 43 at r16 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
Nice test!
Thanks! Took some doing to figure out how to make it run.
pkg/ccl/multitenantccl/tenantcostclient/query_ru_estimate_test.go
line 67 at r16 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: this is already set above.
Done.
pkg/ccl/multitenantccl/tenantcostclient/query_ru_estimate_test.go
line 136 at r16 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
if this is false should you return an error?
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 16 files at r2, 31 of 31 files at r17, 21 of 21 files at r18, 14 of 14 files at r19, 5 of 5 files at r20, all commit messages.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @andy-kimball, @DrewKimball, and @rytaft)
pkg/ccl/multitenantccl/tenantcostclient/tenant_side.go
line 878 at r18 (raw file):
c.mu.Lock() defer c.mu.Unlock() cpuAvg = c.mu.avgCPUPerSec
nit: could remove the named return argument and just return the mutex-protected thing.
pkg/multitenant/multitenantcpu/cpu_usage.go
line 63 at r14 (raw file):
Previously, DrewKimball (Drew Kimball) wrote…
That's a good point. How would you handle the logging here without the context though? Returning the error would just run into the same problem further up the stack. I guess an option would be to just return 0 on error, but that seems problematic.
Right, I didn't realize that we also use ctx
in the current function, nvm.
pkg/sql/conn_executor.go
line 16 at r18 (raw file):
"context" "fmt" "github.com/cockroachdb/cockroach/pkg/multitenant/multitenantcpu"
nit: the ordering is off in the second commit.
pkg/sql/execinfra/flow_context.go
line 17 at r18 (raw file):
import ( "context" "github.com/cockroachdb/cockroach/pkg/multitenant/multitenantcpu"
nit: ditto for the second commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @DrewKimball and @rytaft)
pkg/ccl/multitenantccl/tenantcostclient/tenant_side.go
line 406 at r12 (raw file):
// Keep track of an exponential moving average of CPU usage. avg := deltaCPU / deltaTime.Seconds() c.mu.Lock()
I'd move this code down into the lock that we're already taking, so that we don't have to acquire/release twice.
pkg/ccl/multitenantccl/tenantcostclient/tenant_side.go
line 871 at r12 (raw file):
} c.mu.Lock()
Don't we need to add RU here as well? Could it show up in an EXPLAIN ANALYZE
of an EXPORT or some other statement?
pkg/multitenant/multitenantcpu/cpu_usage.go
line 60 at r20 (raw file):
expectedCPUDelta := timeElapsed.Seconds() * h.avgCPUPerSecond cpuUsageSeconds := cpuDelta - expectedCPUDelta if cpuUsageSeconds < 0 {
I think we should also bound the consumed CPU on the upper side by checking if cpuUsageSeconds
exceeds timeElapsed.Seconds()
. I don't believe that we do much parallel CPU work for a single query, and so if it's greater, it's almost always going to be due to other queries interfering with the CPU measurement. I'd rather bound it to reduce that possibility of gross error.
pkg/sql/rowexec/inverted_joiner.go
line 782 at r15 (raw file):
Previously, DrewKimball (Drew Kimball) wrote…
Good point, Done.
Note that we don't calculate RU for non-tenants today, so ConsumedRU
should be zero in those cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @andy-kimball, @rytaft, and @yuzefovich)
pkg/ccl/multitenantccl/tenantcostclient/tenant_side.go
line 406 at r12 (raw file):
Previously, andy-kimball (Andy Kimball) wrote…
I'd move this code down into the lock that we're already taking, so that we don't have to acquire/release twice.
Done.
pkg/ccl/multitenantccl/tenantcostclient/tenant_side.go
line 871 at r12 (raw file):
Previously, andy-kimball (Andy Kimball) wrote…
Don't we need to add RU here as well? Could it show up in an
EXPLAIN ANALYZE
of an EXPORT or some other statement?
I added a TODO for that above, but you make a good point that we probably want to handle such cases now. It's not obvious to me how to propagate this information for a query. Do you think it would be ok to just disable RU consumption estimation for "non-simple" statements?
pkg/ccl/multitenantccl/tenantcostclient/tenant_side.go
line 878 at r18 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: could remove the named return argument and just return the mutex-protected thing.
Done.
pkg/multitenant/multitenantcpu/cpu_usage.go
line 63 at r14 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
Right, I didn't realize that we also use
ctx
in the current function, nvm.
Done.
pkg/multitenant/multitenantcpu/cpu_usage.go
line 60 at r20 (raw file):
Previously, andy-kimball (Andy Kimball) wrote…
I think we should also bound the consumed CPU on the upper side by checking if
cpuUsageSeconds
exceedstimeElapsed.Seconds()
. I don't believe that we do much parallel CPU work for a single query, and so if it's greater, it's almost always going to be due to other queries interfering with the CPU measurement. I'd rather bound it to reduce that possibility of gross error.
Done.
pkg/sql/conn_executor.go
line 16 at r18 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: the ordering is off in the second commit.
Ah, forgot I haven't set up crlfmt for goland on this laptop yet. Done.
pkg/sql/execinfra/flow_context.go
line 17 at r18 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: ditto for the second commit.
Done.
783dcb7
to
6ad8dd8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 31 of 31 files at r17, 12 of 21 files at r18, 16 of 16 files at r21, 10 of 11 files at r22, 2 of 2 files at r23, all commit messages.
Reviewable status:complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @andy-kimball, @DrewKimball, and @yuzefovich)
Thanks for the reviews so far! I think the network RU estimation is broken for the row-wise path, since the types don't get set (the batch route directly uses the types from the batch). I'm also going to look into how complicated it'll be to account for external IOs as well, since it's not obvious how to prevent RU estimation for queries that perform external IO. |
6ad8dd8
to
7bf3af4
Compare
Alright, I made a few changes that should hopefully tie up the last few loose ends for this PR.
I'm leaving two items for a future PR(s) for the sake of decreasing the burden of reviewing:
|
Could someone take a look at the most recent changes? Sorry for all the updates, hopefully this is the end of it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some nits, but nothing blocking.
Curious though what prompted to make the change number 1?
Reviewed 33 of 33 files at r24, 21 of 21 files at r25, 16 of 16 files at r26, 3 of 3 files at r27, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @andy-kimball and @DrewKimball)
pkg/sql/plan_node_to_row_source.go
line 160 at r26 (raw file):
ctx = p.StartInternal(ctx, nodeName(p.node)) p.params.ctx = ctx if execstats.ShouldCollectStats(ctx, p.FlowCtx.CollectStats) {
nit: let's move this to occur earlier. All other processors do this in their constructor, and I think we could do it in newPlanNodeToRowSource
too. If not, after InitWithEvalCtx
in Init
should work.
pkg/sql/plan_node_to_row_source.go
line 258 at r26 (raw file):
scanStats := execstats.GetScanStats(p.Ctx, p.ExecStatsTrace) if scanStats.ConsumedRU == 0 { return nil
nit: what would happen if we returned a non-nil object with ConsumedRU of 0? I'd guess it'd be ignored later down the line. This would allow us to not add a non-nil check in the columnarizer. Up to you.
pkg/sql/pgwire/conn.go
line 1418 at r24 (raw file):
var conv sessiondatapb.DataConversionConfig buf := newWriteBuffer(nil /* byteCount */)
nit: we could reuse this writeBuffer
between different invocations to save on some of the allocations. I'm thinking about introducing a struct that would hold on the write buffer and would export two methods (i.e. implement an interface defined in sql
package). We'd probably then need to inject a constructor method into sql
package. However, the write buffer already has some scratch space that the allocations might not happen, and this might not be a big deal after all. Up to you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 33 of 33 files at r24, 21 of 21 files at r25, 16 of 16 files at r26, 3 of 3 files at r27, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @andy-kimball and @DrewKimball)
This commit adds a top-level field to the output of `EXPLAIN ANALYZE` that shows the estimated number of RUs that would be consumed due to network egress to the client. The estimate is obtained by buffering each value from the query result in text format and then measuring the size of the buffer before resetting it. The result is used to get the RU consumption with the tenant cost config's `PGWireEgressCost` method. **sql: surface query request units consumed due to cpu usage** This commit adds the ability for clients to estimate the number of RUs consumed by a query due to CPU usage. This is accomplished by keeping a moving average of the CPU usage for the entire tenant process, then using that to obtain an estimate for what the CPU usage *would* be if the query wasn't running. This is then compared against the actual measured CPU usage during the query's execution to get the estimate. For local flows this is done at the `connExecutor` level; for remote flows this is handled by the last outbox on the node (which gathers and sends the flow's metadata). The resulting RU estimate is added to the existing estimate from network egress and displayed in the output of `EXPLAIN ANALYZE`. **sql: surface query request units consumed by IO** This commit adds tracking for request units consumed by IO operations for all execution operators that perform KV operations. The corresponding RU count is recorded in the span and later aggregated with the RU consumption due to network egress and CPU usage. The resulting query RU consumption estimate is visible in the output of `EXPLAIN ANALYZE`. **multitenantccl: add sanity testing for ru estimation** This commit adds a sanity test for the RU estimates produced by running queries with `EXPLAIN ANALYZE` on a tenant. The test runs each test query several times with `EXPLAIN ANALYZE`, then runs all test queries without `EXPLAIN ANALYZE` and compares the resulting actual RU measurement to the aggregated estimates. For now, this test is disabled during builds because it is flaky in the presence of background activity. For this reason it should only be used as a manual sanity test. Informs cockroachdb#74441 Release note (sql change): Added an estimate for the number of request units consumed by a query to the output of `EXPLAIN ANALYZE` for tenant sessions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found that a NPE was possible when the row implementation was used because the DistSQLReceiver
isn't set up with the column types of the queried rows for EXPLAIN ANALYZE
. I could have changed the function interface to also pass a types slice, but it seemed safer to explicitly set everything up for the purpose of buffering and counting bytes to ensure no more nasty surprises.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @andy-kimball and @yuzefovich)
pkg/sql/plan_node_to_row_source.go
line 160 at r26 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: let's move this to occur earlier. All other processors do this in their constructor, and I think we could do it in
newPlanNodeToRowSource
too. If not, afterInitWithEvalCtx
inInit
should work.
Moved it to InitWithOutput
, since we need the FlowCtx
. Done.
pkg/sql/plan_node_to_row_source.go
line 258 at r26 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: what would happen if we returned a non-nil object with ConsumedRU of 0? I'd guess it'd be ignored later down the line. This would allow us to not add a non-nil check in the columnarizer. Up to you.
Yeah that would be fine, but the comment for ProcessorBaseNoHelper.ExecStatsForTrace
says it can return nil, so it seems like we should do this anyway. Is there something special about the columnarizer so that it doesn't have to worry about that?
pkg/sql/pgwire/conn.go
line 1418 at r24 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: we could reuse this
writeBuffer
between different invocations to save on some of the allocations. I'm thinking about introducing a struct that would hold on the write buffer and would export two methods (i.e. implement an interface defined insql
package). We'd probably then need to inject a constructor method intosql
package. However, the write buffer already has some scratch space that the allocations might not happen, and this might not be a big deal after all. Up to you.
I like that idea, won't be significantly more complex and will save allocations. Done.
pkg/sql/rowexec/inverted_joiner.go
line 782 at r15 (raw file):
Previously, andy-kimball (Andy Kimball) wrote…
Note that we don't calculate RU for non-tenants today, so
ConsumedRU
should be zero in those cases.
Done.
7bf3af4
to
f95b046
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 7 of 7 files at r28, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @andy-kimball and @DrewKimball)
pkg/sql/plan_node_to_row_source.go
line 258 at r26 (raw file):
Previously, DrewKimball (Drew Kimball) wrote…
Yeah that would be fine, but the comment for
ProcessorBaseNoHelper.ExecStatsForTrace
says it can return nil, so it seems like we should do this anyway. Is there something special about the columnarizer so that it doesn't have to worry about that?
Oh yeah, you're right, let's keep the nil check in. For some reason I assumed that the meaning of "Can return nil" was equivalent to "This field can be nil".
TFTRs! |
bors r+ |
Build succeeded: |
Here's an example of what the output looks like:
|
sql: surface query request units consumed by network egress
This commit adds a top-level field to the output of
EXPLAIN ANALYZE
that shows the estimated number of RUs that would be consumed due to
network egress to the client. The estimate is obtained by measuring
the in-memory size of the query result, and passing that to the
tenant cost config's
PGWireEgressCost
method.sql: surface query request units consumed due to cpu usage
This commit adds the ability for clients to estimate the number of RUs
consumed by a query due to CPU usage. This is accomplished by keeping a
moving average of the CPU usage for the entire tenant process, then using
that to obtain an estimate for what the CPU usage would be if the query
wasn't running. This is then compared against the actual measured CPU usage
during the query's execution to get the estimate. For local flows this is
done at the
connExecutor
level; for remote flows this is handled by thelast outbox on the node (which gathers and sends the flow's metadata).
The resulting RU estimate is added to the existing estimate from network
egress and displayed in the output of
EXPLAIN ANALYZE
.sql: surface query request units consumed by IO
This commit adds tracking for request units consumed by IO operations
for all execution operators that perform KV operations. The corresponding
RU count is recorded in the span and later aggregated with the RU consumption
due to network egress and CPU usage. The resulting query RU consumption
estimate is visible in the output of
EXPLAIN ANALYZE
.multitenantccl: add sanity testing for ru estimation
This commit adds a sanity test for the RU estimates produced by running
queries with
EXPLAIN ANALYZE
on a tenant. The test runs each test queryseveral times with
EXPLAIN ANALYZE
, then runs all test queries withoutEXPLAIN ANALYZE
and compares the resulting actual RU measurement to theaggregated estimates.
Informs #74441
Release note (sql change): Added an estimate for the number of request units
consumed by a query to the output of
EXPLAIN ANALYZE
for tenant sessions.