-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Surface a Request Unit metric per statement #74441
Comments
This seems very hard given the current architecture of how we compute RUs. Potentially depends on #60589, see also golang/go#41554. Even that wouldn't necessarily be enough, but without something like that, it's hard to imagine a path. |
I've been doing a lot of work in this area. Most queries are dominated by Storage Layer requests, not by CPU. We can also estimate CPU pretty well, especially when there isn't background work going on. I think as long as we made it clear that the RUs reported by |
This commit adds a top-level field to the output of `EXPLAIN ANALYZE` that shows the estimated number of RUs that would be consumed due to network egress to the client. The estimate is obtained by buffering each value from the query result in text format and then measuring the size of the buffer before resetting it. The result is used to get the RU consumption with the tenant cost config's `PGWireEgressCost` method. **sql: surface query request units consumed due to cpu usage** This commit adds the ability for clients to estimate the number of RUs consumed by a query due to CPU usage. This is accomplished by keeping a moving average of the CPU usage for the entire tenant process, then using that to obtain an estimate for what the CPU usage *would* be if the query wasn't running. This is then compared against the actual measured CPU usage during the query's execution to get the estimate. For local flows this is done at the `connExecutor` level; for remote flows this is handled by the last outbox on the node (which gathers and sends the flow's metadata). The resulting RU estimate is added to the existing estimate from network egress and displayed in the output of `EXPLAIN ANALYZE`. **sql: surface query request units consumed by IO** This commit adds tracking for request units consumed by IO operations for all execution operators that perform KV operations. The corresponding RU count is recorded in the span and later aggregated with the RU consumption due to network egress and CPU usage. The resulting query RU consumption estimate is visible in the output of `EXPLAIN ANALYZE`. **multitenantccl: add sanity testing for ru estimation** This commit adds a sanity test for the RU estimates produced by running queries with `EXPLAIN ANALYZE` on a tenant. The test runs each test query several times with `EXPLAIN ANALYZE`, then runs all test queries without `EXPLAIN ANALYZE` and compares the resulting actual RU measurement to the aggregated estimates. For now, this test is disabled during builds because it is flaky in the presence of background activity. For this reason it should only be used as a manual sanity test. Informs cockroachdb#74441 Release note (sql change): Added an estimate for the number of request units consumed by a query to the output of `EXPLAIN ANALYZE` for tenant sessions.
89256: sql: output RU estimate for EXPLAIN ANALYZE on tenants r=DrewKimball a=DrewKimball **sql: surface query request units consumed by network egress** This commit adds a top-level field to the output of `EXPLAIN ANALYZE` that shows the estimated number of RUs that would be consumed due to network egress to the client. The estimate is obtained by measuring the in-memory size of the query result, and passing that to the tenant cost config's `PGWireEgressCost` method. **sql: surface query request units consumed due to cpu usage** This commit adds the ability for clients to estimate the number of RUs consumed by a query due to CPU usage. This is accomplished by keeping a moving average of the CPU usage for the entire tenant process, then using that to obtain an estimate for what the CPU usage *would* be if the query wasn't running. This is then compared against the actual measured CPU usage during the query's execution to get the estimate. For local flows this is done at the `connExecutor` level; for remote flows this is handled by the last outbox on the node (which gathers and sends the flow's metadata). The resulting RU estimate is added to the existing estimate from network egress and displayed in the output of `EXPLAIN ANALYZE`. **sql: surface query request units consumed by IO** This commit adds tracking for request units consumed by IO operations for all execution operators that perform KV operations. The corresponding RU count is recorded in the span and later aggregated with the RU consumption due to network egress and CPU usage. The resulting query RU consumption estimate is visible in the output of `EXPLAIN ANALYZE`. **multitenantccl: add sanity testing for ru estimation** This commit adds a sanity test for the RU estimates produced by running queries with `EXPLAIN ANALYZE` on a tenant. The test runs each test query several times with `EXPLAIN ANALYZE`, then runs all test queries without `EXPLAIN ANALYZE` and compares the resulting actual RU measurement to the aggregated estimates. Informs #74441 Release note (sql change): Added an estimate for the number of request units consumed by a query to the output of `EXPLAIN ANALYZE` for tenant sessions. Co-authored-by: Drew Kimball <drewk@cockroachlabs.com>
This patch adds a cluster setting, `sql.tenant_ru_estimation.enabled`, which is used to determine whether tenants collect an RU estimate for queries run with `EXPLAIN ANALYZE`. This is an escape hatch so that the RU estimation logic can be more safely backported. Informs cockroachdb#74441 Release note: None
This patch adds a cluster setting, `sql.tenant_ru_estimation.enabled`, which is used to determine whether tenants collect an RU estimate for queries run with `EXPLAIN ANALYZE`. This is an escape hatch so that the RU estimation logic can be more safely backported. Informs cockroachdb#74441 Release note: None
92952: sql: fix statement bundle creation when memo isn't detached r=cucaroach a=cucaroach When a memo is deemed not resuable we don't detach it from the factory and this causes problems later when we execute SetIndexRecommendations which resets the optimizer context which will reset the memo. This causes the schema.sql and opt*.txt files to be empty. Fixes: #92920 Release note: None 92968: sql: add cluster setting to disable RU estimation r=DrewKimball a=DrewKimball This patch adds a cluster setting, `sql.tenant_ru_estimation.enabled`, which is used to determine whether tenants collect an RU estimate for queries run with `EXPLAIN ANALYZE`. This is an escape hatch so that the RU estimation logic can be more safely backported. Informs #74441 Release note: None 93131: rttanalysis: don't fail when benchmarking, do skip r=ajwerner a=ajwerner Fixes: #92770 Release note: None Co-authored-by: Tommy Reilly <treilly@cockroachlabs.com> Co-authored-by: Drew Kimball <drewk@cockroachlabs.com> Co-authored-by: Andrew Werner <awerner32@gmail.com>
This commit adds a top-level field to the output of `EXPLAIN ANALYZE` that shows the estimated number of RUs that would be consumed due to network egress to the client. The estimate is obtained by buffering each value from the query result in text format and then measuring the size of the buffer before resetting it. The result is used to get the RU consumption with the tenant cost config's `PGWireEgressCost` method. **sql: surface query request units consumed due to cpu usage** This commit adds the ability for clients to estimate the number of RUs consumed by a query due to CPU usage. This is accomplished by keeping a moving average of the CPU usage for the entire tenant process, then using that to obtain an estimate for what the CPU usage *would* be if the query wasn't running. This is then compared against the actual measured CPU usage during the query's execution to get the estimate. For local flows this is done at the `connExecutor` level; for remote flows this is handled by the last outbox on the node (which gathers and sends the flow's metadata). The resulting RU estimate is added to the existing estimate from network egress and displayed in the output of `EXPLAIN ANALYZE`. **sql: surface query request units consumed by IO** This commit adds tracking for request units consumed by IO operations for all execution operators that perform KV operations. The corresponding RU count is recorded in the span and later aggregated with the RU consumption due to network egress and CPU usage. The resulting query RU consumption estimate is visible in the output of `EXPLAIN ANALYZE`. **multitenantccl: add sanity testing for ru estimation** This commit adds a sanity test for the RU estimates produced by running queries with `EXPLAIN ANALYZE` on a tenant. The test runs each test query several times with `EXPLAIN ANALYZE`, then runs all test queries without `EXPLAIN ANALYZE` and compares the resulting actual RU measurement to the aggregated estimates. For now, this test is disabled during builds because it is flaky in the presence of background activity. For this reason it should only be used as a manual sanity test. Informs cockroachdb#74441 Release note (sql change): Added an estimate for the number of request units consumed by a query to the output of `EXPLAIN ANALYZE` for tenant sessions.
This patch adds a cluster setting, `sql.tenant_ru_estimation.enabled`, which is used to determine whether tenants collect an RU estimate for queries run with `EXPLAIN ANALYZE`. This is an escape hatch so that the RU estimation logic can be more safely backported. Informs cockroachdb#74441 Release note: None
This commit adds a top-level field to the output of `EXPLAIN ANALYZE` that shows the estimated number of RUs that would be consumed due to network egress to the client. The estimate is obtained by buffering each value from the query result in text format and then measuring the size of the buffer before resetting it. The result is used to get the RU consumption with the tenant cost config's `PGWireEgressCost` method. **sql: surface query request units consumed due to cpu usage** This commit adds the ability for clients to estimate the number of RUs consumed by a query due to CPU usage. This is accomplished by keeping a moving average of the CPU usage for the entire tenant process, then using that to obtain an estimate for what the CPU usage *would* be if the query wasn't running. This is then compared against the actual measured CPU usage during the query's execution to get the estimate. For local flows this is done at the `connExecutor` level; for remote flows this is handled by the last outbox on the node (which gathers and sends the flow's metadata). The resulting RU estimate is added to the existing estimate from network egress and displayed in the output of `EXPLAIN ANALYZE`. **sql: surface query request units consumed by IO** This commit adds tracking for request units consumed by IO operations for all execution operators that perform KV operations. The corresponding RU count is recorded in the span and later aggregated with the RU consumption due to network egress and CPU usage. The resulting query RU consumption estimate is visible in the output of `EXPLAIN ANALYZE`. **multitenantccl: add sanity testing for ru estimation** This commit adds a sanity test for the RU estimates produced by running queries with `EXPLAIN ANALYZE` on a tenant. The test runs each test query several times with `EXPLAIN ANALYZE`, then runs all test queries without `EXPLAIN ANALYZE` and compares the resulting actual RU measurement to the aggregated estimates. For now, this test is disabled during builds because it is flaky in the presence of background activity. For this reason it should only be used as a manual sanity test. Informs cockroachdb#74441 Release note (sql change): Added an estimate for the number of request units consumed by a query to the output of `EXPLAIN ANALYZE` for tenant sessions.
This patch adds a cluster setting, `sql.tenant_ru_estimation.enabled`, which is used to determine whether tenants collect an RU estimate for queries run with `EXPLAIN ANALYZE`. This is an escape hatch so that the RU estimation logic can be more safely backported. Informs cockroachdb#74441 Release note: None
This commit adds a top-level field to the output of `EXPLAIN ANALYZE` that shows the estimated number of RUs that would be consumed due to network egress to the client. The estimate is obtained by buffering each value from the query result in text format and then measuring the size of the buffer before resetting it. The result is used to get the RU consumption with the tenant cost config's `PGWireEgressCost` method. **sql: surface query request units consumed due to cpu usage** This commit adds the ability for clients to estimate the number of RUs consumed by a query due to CPU usage. This is accomplished by keeping a moving average of the CPU usage for the entire tenant process, then using that to obtain an estimate for what the CPU usage *would* be if the query wasn't running. This is then compared against the actual measured CPU usage during the query's execution to get the estimate. For local flows this is done at the `connExecutor` level; for remote flows this is handled by the last outbox on the node (which gathers and sends the flow's metadata). The resulting RU estimate is added to the existing estimate from network egress and displayed in the output of `EXPLAIN ANALYZE`. **sql: surface query request units consumed by IO** This commit adds tracking for request units consumed by IO operations for all execution operators that perform KV operations. The corresponding RU count is recorded in the span and later aggregated with the RU consumption due to network egress and CPU usage. The resulting query RU consumption estimate is visible in the output of `EXPLAIN ANALYZE`. **multitenantccl: add sanity testing for ru estimation** This commit adds a sanity test for the RU estimates produced by running queries with `EXPLAIN ANALYZE` on a tenant. The test runs each test query several times with `EXPLAIN ANALYZE`, then runs all test queries without `EXPLAIN ANALYZE` and compares the resulting actual RU measurement to the aggregated estimates. For now, this test is disabled during builds because it is flaky in the presence of background activity. For this reason it should only be used as a manual sanity test. Informs cockroachdb#74441 Release note (sql change): Added an estimate for the number of request units consumed by a query to the output of `EXPLAIN ANALYZE` for tenant sessions.
This patch adds a cluster setting, `sql.tenant_ru_estimation.enabled`, which is used to determine whether tenants collect an RU estimate for queries run with `EXPLAIN ANALYZE`. This is an escape hatch so that the RU estimation logic can be more safely backported. Informs cockroachdb#74441 Release note: None
This commit adds a top-level field to the output of `EXPLAIN ANALYZE` that shows the estimated number of RUs that would be consumed due to network egress to the client. The estimate is obtained by buffering each value from the query result in text format and then measuring the size of the buffer before resetting it. The result is used to get the RU consumption with the tenant cost config's `PGWireEgressCost` method. **sql: surface query request units consumed due to cpu usage** This commit adds the ability for clients to estimate the number of RUs consumed by a query due to CPU usage. This is accomplished by keeping a moving average of the CPU usage for the entire tenant process, then using that to obtain an estimate for what the CPU usage *would* be if the query wasn't running. This is then compared against the actual measured CPU usage during the query's execution to get the estimate. For local flows this is done at the `connExecutor` level; for remote flows this is handled by the last outbox on the node (which gathers and sends the flow's metadata). The resulting RU estimate is added to the existing estimate from network egress and displayed in the output of `EXPLAIN ANALYZE`. **sql: surface query request units consumed by IO** This commit adds tracking for request units consumed by IO operations for all execution operators that perform KV operations. The corresponding RU count is recorded in the span and later aggregated with the RU consumption due to network egress and CPU usage. The resulting query RU consumption estimate is visible in the output of `EXPLAIN ANALYZE`. **multitenantccl: add sanity testing for ru estimation** This commit adds a sanity test for the RU estimates produced by running queries with `EXPLAIN ANALYZE` on a tenant. The test runs each test query several times with `EXPLAIN ANALYZE`, then runs all test queries without `EXPLAIN ANALYZE` and compares the resulting actual RU measurement to the aggregated estimates. For now, this test is disabled during builds because it is flaky in the presence of background activity. For this reason it should only be used as a manual sanity test. Informs cockroachdb#74441 Release note (sql change): Added an estimate for the number of request units consumed by a query to the output of `EXPLAIN ANALYZE` for tenant sessions.
This patch adds a cluster setting, `sql.tenant_ru_estimation.enabled`, which is used to determine whether tenants collect an RU estimate for queries run with `EXPLAIN ANALYZE`. This is an escape hatch so that the RU estimation logic can be more safely backported. Informs cockroachdb#74441 Release note: None
@DrewKimball What work remains here? |
One thing we realized after rolling out the RU estimation - it's being measured in |
The first two bullet points are still left:
There are also some smaller things to do like reporting RUs in float64 instead of int64 as mentioned above and using grunning to measure CPU for the estimate, but I'll open separate issues for those. |
How come we'd want to return RU consumed as part of the result of statement execution? I thought we only compute RUs consumed for |
We do, but that doesn't necessarily have to be the case. Do you think we shouldn't measure it for each statement? |
I'd be worried about extra overhead to gather information that will probably almost never be used. Also, we can only get an estimated amount right now, not an exact amount. @kevin-v-ngo, do you see important use cases for returning RUs from every statement execution, as opposed to just from |
This issue was not synced from JIRA and was scoped to only EXPLAIN ANALYZE. It should be closed as Done. There are use cases for understanding workloads (e.g., statements and transactions) that take up the most RUs over time; however, I haven't validate or yet seen explicit use cases expressed from users. It'd take it incrementally for now and to also understand how accurate and useful our existing RU estimation before capturing this in our other observability interfaces. |
Thanks for the extra context, I'll close this out. |
This issue tracks surfacing how many RUs are consumed per statement for CockroachDB Serverless.
Epic: CRDB-12080
The text was updated successfully, but these errors were encountered: