-
Notifications
You must be signed in to change notification settings - Fork 3.9k
admission: SET default_transaction_quality_of_service=[background,regular,critical] #76512
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was some discussion in #70295 about when There was also discussion of the |
One proposal is that we call this session variable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @msirek, @otan, and @sumeerbhola)
pkg/sql/execstats/traceanalyzer_test.go, line 208 at r1 (raw file):
} func panicSeen(g func()) (retval bool) {
nit: you can use something like require.Panics
for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 4 of 29 files at r1.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @msirek, and @otan)
pkg/kv/txn.go, line 1659 at r1 (raw file):
// BatchRequests evaluated by KV had been assigned high priority due to // locking. h.Priority = int32(admission.HighPri)
since you are adding additional levels to admission: I think in hindsight using HighPri was a mistake that limits flexibility for the future. Since this is the only case we currently use HighPri, I think we should change the usage here to admission.LockingPri and define that as +100, which gives us some space above it for future use.
pkg/util/admission/work_queue.go, line 77 at r1 (raw file):
TTLLowPri WorkPriority = -100 // UserLowPri is low priority work from user submissions (SQL). UserLowPri WorkPriority = -5
nit: let's change these to -50 and 50.
(Based on recent experiments etc., I think we don't really know how to use priority effectively, so leaving more gaps is worthwhile.)
@ajwerner @vy-ton |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @otan, @sumeerbhola, and @yuzefovich)
pkg/kv/txn.go, line 1659 at r1 (raw file):
Previously, sumeerbhola wrote…
since you are adding additional levels to admission: I think in hindsight using HighPri was a mistake that limits flexibility for the future. Since this is the only case we currently use HighPri, I think we should change the usage here to admission.LockingPri and define that as +100, which gives us some space above it for future use.
Done
Code quote:
int32(admission.HighPri)
pkg/util/admission/work_queue.go, line 77 at r1 (raw file):
Previously, sumeerbhola wrote…
nit: let's change these to -50 and 50.
(Based on recent experiments etc., I think we don't really know how to use priority effectively, so leaving more gaps is worthwhile.)
Done
pkg/sql/execstats/traceanalyzer_test.go, line 208 at r1 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: you can use something like
require.Panics
for this.
Thanks for the tip! Done
bfb7ef7
to
3db80b3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for getting this ready! naming looks ok. some nits on typing
} | ||
|
||
// NewQoSLevelFromString converts a string into a QoSLevel | ||
func NewQoSLevelFromString(val string) (_ QoSLevel, ok bool) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: i'd say we should call this ParseQoSLevelFromString
@@ -226,6 +226,9 @@ message LocalOnlySessionData { | |||
// and joins using the same default number of bytes per column instead of | |||
// column sizes from the AvgSize table statistic. | |||
bool cost_scans_with_default_col_size = 61; | |||
// QualityOfService is the session settings interface to admission control, | |||
// to control resource overload by SQL queries. | |||
sint32 quality_of_service = 62; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we make this [(gogoproto.casttype)="github.com/cockroachdb/cockroach/pkg/util/admission.WorkPriority"]
, which means all these int32s in the code are typed in, e.g. NewTxnWithSteppingEnabled
Just to confirm, I would expect to see the setting value has change but it does not take affect?
For |
Note that In postgres;
|
Just so that it's extra extra explicit, I don't think implementing a variant that allows you to change the priority of the current transaction if you're in a transaction should be in scope for this PR. We should file a follow-up issue for that. I think we should just implement |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to confirm, I would expect to see the setting value has change but it does not take affect?
@vy-ton So, the setting of the session flag will always take immediate effect, which is what the SHOW
statement prints out. The session flag is like a default value to use every time you start a new transaction (issue a new SQL statement or BEGIN statement). SQL outside of a BEGIN/END
block always starts a new transaction, so picks up that session setting, but SQL inside a BEGIN/END
block just re-uses the transaction created by the BEGIN statement. So, the session flag setting at the time the BEGIN
statement is issued will be the setting in effect for all SQLs in that BEGI/END
block. The only real way to see that setting value in the transaction though is to go to the database console on the Advanced Debug
page and click on the All Sessions
link:
That would show the quality_of_service in the active transactions:
example: #70295 (comment)
Regarding SET transaction_read_only=true;
, that seems to be taking the place of a SET TRANSACTION
statement according to the docs: "The current transaction access mode is also exposed as the session variable transaction_read_only." So, that seems to have the same behavior as SET TRANSACTION READ ONLY;
. I don't know if we'd want to mimic similar behavior with a SET transaction_quality_of_service=;
statement or not.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @otan, @sumeerbhola, and @yuzefovich)
pkg/sql/sessiondatapb/local_only_session_data.go, line 301 at r3 (raw file):
ParseQoSLevelFromString
Renamed as suggested
pkg/sql/sessiondatapb/local_only_session_data.proto, line 231 at r3 (raw file):
Previously, otan (Oliver Tan) wrote…
can we make this
[(gogoproto.casttype)="github.com/cockroachdb/cockroach/pkg/util/admission.WorkPriority"]
, which means all these int32s in the code are typed in, e.g.NewTxnWithSteppingEnabled
Great tip! I used the casttype of QoSLevel
here because everywhere we're calling it qualityOfService and use methods whose receiver is QoSLevel. Only in single location, in the cast to an int32 in NewTxnWithSteppingEnabled
, do we make the leap over to WorkType. There's a comment on that function describing the 1-to-1 correspondence between the types:
// qualityOfService is the QoSLevel level to use in admission control, whose
// value also corresponds exactly with the admission.WorkPriority to use.
If we ever want to map more than one QoSLevelQoSLevel to a given WorkType, we could introduce a function to do that and call it here.
BTW, I had to change the type of quality_of_service
in local_only_session_data.proto
from sint32
to int32
because of these error messages from dev generate
:
compilepkg: nogo: errors found by nogo during build-time code analysis:
bazel-out/k8-opt-exec-2B5CBBC6/bin/pkg/sql/sessiondatapb/sessiondatapb_go_proto_/github.com/cockroachdb/cockroach/pkg/sql/sessiondatapb/local_only_session_data.pb.go:480:96: m.QualityOfService (8 bits) too small for shift of 31 (shift)
bazel-out/k8-opt-exec-2B5CBBC6/bin/pkg/sql/sessiondatapb/sessiondatapb_go_proto_/github.com/cockroachdb/cockroach/pkg/sql/sessiondatapb/local_only_session_data.pb.go:2801:43: ((v & 1) << 31) (8 bits) too small for shift of 31 (shift)
bazel-out/k8-opt-exec-2B5CBBC6/bin/pkg/sql/sessiondatapb/sessiondatapb_go_proto_/github.com/cockroachdb/cockroach/pkg/sql/sessiondatapb/local_only_session_data.pb.go:2801:44: (v & 1) (8 bits) too small for shift of 31 (shift)
Target //pkg/gen:gen failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 87.469s, Critical Path: 54.79s
INFO: 3974 processes: 579 remote cache hit, 581 internal, 2814 linux-sandbox.
FAILED: Build did NOT complete successfully
FAILED: Build did NOT complete successfully
ERROR: generating target //pkg/gen: exit status 1compilepkg: nogo: errors found by nogo during build-time code analysis:
bazel-out/k8-opt-exec-2B5CBBC6/bin/pkg/sql/sessiondatapb/sessiondatapb_go_proto_/github.com/cockroachdb/cockroach/pkg/sql/sessiondatapb/local_only_session_data.pb.go:480:96: m.QualityOfService (8 bits) too small for shift of 31 (shift)
bazel-out/k8-opt-exec-2B5CBBC6/bin/pkg/sql/sessiondatapb/sessiondatapb_go_proto_/github.com/cockroachdb/cockroach/pkg/sql/sessiondatapb/local_only_session_data.pb.go:2801:43: ((v & 1) << 31) (8 bits) too small for shift of 31 (shift)
bazel-out/k8-opt-exec-2B5CBBC6/bin/pkg/sql/sessiondatapb/sessiondatapb_go_proto_/github.com/cockroachdb/cockroach/pkg/sql/sessiondatapb/local_only_session_data.pb.go:2801:44: (v & 1) (8 bits) too small for shift of 31 (shift)
Target //pkg/gen:gen failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 87.469s, Critical Path: 54.79s
INFO: 3974 processes: 579 remote cache hit, 581 internal, 2814 linux-sandbox.
FAILED: Build did NOT complete successfully
FAILED: Build did NOT complete successfully
ERROR: generating target //pkg/gen: exit status 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ajwerner Thanks for clarifying! I've opened #76624 as a follow-up and updated the session flag name to default_transaction_quality_of_service
.
It may be good to consider now if the QoS level names should be changed. For example, the default orNormal
priority setting is currently called default
. But this is sort of a circular definition. Q: "What's the default setting?" A: "default"
The word default doesn't really describe a service level. Default just means, "a selection automatically used by a program in the absence of a choice made by the user". Also, when comparing the name default
withbest_effort
, it's not immediately clear which level is higher than the other, and best_effort
may sound like we're really giving our best effort in running the query, where in reality we've given it the lowest user priority. I don't necessarily have recommendations for change here. Just pointing out it may be possible to make the hierarchy of levels more obvious.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @otan, @sumeerbhola, and @yuzefovich)
pkg/kv/txn.go, line 1659 at r1 (raw file):
quality_of_service
@sumeerbhola The externalquality_of_service
level for such a transaction in the List Sessions API is currentlyquality_of_service: locking
. If you can think of a better name for this besideslocking
, let me know and I'll update it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vy-ton Thanks for researching this, those sound like good options. My only comment is that we may want to keep all 3 levels/adjectives having a common theme. For example background
sounds like it's in the same theme as expedited
, but not top
, which describes the position of the setting relative to others. LOW
, NORMAL
, and HIGH
transaction priorities all have a common theme. Would it be possible to still use LOW
, NORMAL
, and HIGH
with QoS to keep our naming consistent as long as we describe the differences between priorities and QoS in docs? For example, highlight that priorities are about making other transactions wait or abort while QoS deals with servicing of work queues, which may slow down query execution times to keep the system from becoming overloaded, but won't cause transaction aborts. If we still don't like LOW
, NORMAL
, and HIGH
, my preferred triple would be background
, best_effort
, expedited
, because the names all have a common theme. @ajwerner @vy-ton Thoughts?
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @otan, @sumeerbhola, and @yuzefovich)
+1 to a common theme. I want to encourage users to think about QoS at the higher abstraction of workload/work compared to transaction priorities. Another reason besides naming overlap that I want to avoid
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @vy-ton ! I will use those level names.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @otan, @sumeerbhola, and @yuzefovich)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
QoS level names have been updated to background
, regular
, critical
. The code could be merged as soon as it's approved.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @otan, @sumeerbhola, and @yuzefovich)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! I have some nits, and it but it'd be good to get an approval from Andrew / Schema as well.
Reviewed 7 of 29 files at r1, 2 of 7 files at r2, 1 of 1 files at r3, 17 of 24 files at r4, 1 of 1 files at r5, 6 of 6 files at r6, all commit messages.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @ajwerner, @msirek, @otan, and @sumeerbhola)
pkg/sql/conn_executor.go, line 2846 at r6 (raw file):
defer ex.state.mu.Unlock() userPriority := ex.state.mu.txn.UserPriority() // Should the QualityOfService be increased here since we already had to wait?
I'd guess no, but I'll defer to @cockroachdb/sql-schema.
pkg/sql/conn_executor_exec.go, line 828 at r6 (raw file):
userPriority := ex.state.mu.txn.UserPriority() // Should the QualityOfService be increased here to try to avoid more // retries or errors?
Also will defer to Schema / @ajwerner.
pkg/sql/conn_fsm.go, line 119 at r6 (raw file):
// makeEventTxnStartPayload creates an eventTxnStartPayload. // qualityOfService must be the same type as // sessiondatapb.LocalOnlySessionData.DefaultTxnQualityOfService
nit: missing period. Also isn't the correct type already enforced by Go? (I feel like this comment is redundant.)
pkg/sql/sessiondatapb/local_only_session_data.proto, line 1 at r6 (raw file):
// Copyright 2022 The Cockroach Authors.
nit: we don't update the year when updating a file.
pkg/sql/logictest/testdata/logic_test/txn, line 1168 at r6 (raw file):
statement ok SET default_transaction_quality_of_service=regular
nit: we probably could do RESET default_transaction_quality_of_service
to verify the cluster setting default.
pkg/sql/execstats/traceanalyzer_test.go, line 121 at r6 (raw file):
qosLevel := sessiondatapb.QoSLevel(122) // Only defined QoSLevels are currently allowed. require.Panics(t, func() {
nit: this seems like it should be a separate unit test in a different package because TestTraceAnalyzer
is about a totally different thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Andrew is out through Friday and @otan has had a pending request since Monday that I merge this as soon as I can. I pinged the Schema team to see if they would like to review it. If not, I may just rely on your approval.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @ajwerner, @msirek, @otan, and @sumeerbhola)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm new here at CRL, but I took a brief look and it looks like the blocking question on SQL-Schema is do we want to increase the QoS for handleWaitingForConcurrentSchemaChanges
and checkDescriptorTwoVersionInvariant
, I think we can leave it as it is for now unless there’s a stronger reason how it would make it better. so I gave a green light. We may always tune it after.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @ajwerner, @msirek, @otan, and @sumeerbhola)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @chengxiong-ruan ! Yes, those were more like TODO questions for possible improvement. But I really don't know if those are useful comments, so I've removed them.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner, @otan, @sumeerbhola, and @yuzefovich)
pkg/sql/conn_executor.go, line 2846 at r6 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
I'd guess no, but I'll defer to @cockroachdb/sql-schema.
This was more like a TODO question, as a way to improve responsiveness. I don't know if this is a useful TODO, so I have removed it.
pkg/sql/conn_executor_exec.go, line 828 at r6 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
Also will defer to Schema / @ajwerner.
Another potential TODO that might not be applicable. Removed it.
Code quote:
// Should the QualityOfService be increased here to try to avoid more
pkg/sql/conn_fsm.go, line 119 at r6 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: missing period. Also isn't the correct type already enforced by Go? (I feel like this comment is redundant.)
This is a stale comment applicable to old code. Removed the comment.
Code quote:
makeEventTxnStartPayload creates an eventTxnStartPayload.
pkg/sql/sessiondatapb/local_only_session_data.proto, line 1 at r6 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: we don't update the year when updating a file.
Good to know. Reverted it.
pkg/sql/logictest/testdata/logic_test/txn, line 1168 at r6 (raw file):
RESET default_transaction_quality_of_service
Changed this to RESET default_transaction_quality_of_service
pkg/sql/execstats/traceanalyzer_test.go, line 121 at r6 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: this seems like it should be a separate unit test in a different package because
TestTraceAnalyzer
is about a totally different thing.
OK, moved these into pkg/sql/internal_test.go
.
Code quote:
qosLevel := sessiondatapb.QoSLevel(122)
b622485
to
e74b1b8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, ship it!
Reviewed 21 of 21 files at r7, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner, @otan, and @sumeerbhola)
qosLevel := sessiondatapb.TTLLow | ||
_, err := ie.ExecEx( | ||
ctx, "defined_quality_of_service_level_does_not_panic", nil, | ||
sessiondata.InternalExecutorOverride{User: security.RootUserName(), QualityOfService: &qosLevel}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a reason to pass this in by pointer? the default is 0 which is normal pri which seems reasonable to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner, @otan, and @sumeerbhola)
pkg/sql/internal_test.go, line 597 at r7 (raw file):
Previously, otan (Oliver Tan) wrote…
is there a reason to pass this in by pointer? the default is 0 which is normal pri which seems reasonable to me
If it were not a pointer and the current session setting is not normal priority, then an uninitialized value for InternalExecutorOverride.QualityOfService (0) would get interpreted as overriding the setting back to normal priority, when the caller never set that field and never intended to override it.
…ular,critical] This commit adds support for a `default_transaction_quality_of_service` session setting which when set to `critical`, increases the priority of work submitted to admission queues from subsequent SQL requests, and when set to `background`, decreases admission queue priority of subsequent SQL requests. This commit also adds a method of controlling the admission queue priority of internal SQL requests. Now, if InternalExecutor.ExecEx is called with a nil txn and the InternalExecutorOverride parameter is passed with a valid QualityOfService setting, the new transaction will be created with the specified QualityOfService. This commit adds a `qualityOfService` field to the `activeTxn` in the [List Sessions API](https://www.cockroachlabs.com/docs/api/cluster/v2#operation/listSessions). This commit also includes the current setting of default_transaction_quality_of_service in the statement bundle built via EXPLAIN (OPT, ENV). Fixes cockroachdb#70295 Release note (sql change): Adds session setting `default_transaction_quality_of_service` which controls the priority of work submitted to the different admission control queues on behalf of SQL requests submitted in a session. Admission control must be enabled for this setting to have an effect, see: https://www.cockroachlabs.com/docs/v21.2/architecture/admission-control.html To increase admission control priority of subsequent SQL requests: `SET default_transaction_quality_of_service=critical;` To decrease admission control priority of subsequent SQL requests: `SET default_transaction_quality_of_service=background;` To reset admission control priority to the default session setting (in between background and critical): `SET default_transaction_quality_of_service=regular;`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 TFTR!
bors r+
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner, @otan, @sumeerbhola, and @yuzefovich)
Build succeeded: |
This commit adds support for a
default_transaction_quality_of_service
session setting which when set to
critical
, increases the priorityof work submitted to admission queues from subsequent SQL requests,
and when set to
background
, decreases admission queue priority ofsubsequent SQL requests.
This commit also adds a method of controlling the admission queue
priority of internal SQL requests. Now, if InternalExecutor.ExecEx is
called with a nil txn and the InternalExecutorOverride parameter is
passed with a valid QualityOfService setting, the new transaction will
be created with the specified QualityOfService.
This commit adds a
qualityOfService
field to theactiveTxn
in theList Sessions API.
This commit also includes the current setting of
default_transaction_quality_of_service in the statement bundle built via
EXPLAIN (OPT, ENV).
Fixes #70295
Release note (sql change): Adds session setting
default_transaction_quality_of_service
which controls the priority ofwork submitted to the different admission control queues on behalf of
SQL requests submitted in a session. Admission control must be enabled
for this setting to have an effect, see:
https://www.cockroachlabs.com/docs/v21.2/architecture/admission-control.html
To increase admission control priority of subsequent SQL requests:
SET default_transaction_quality_of_service=critical;
To decrease admission control priority of subsequent SQL requests:
SET default_transaction_quality_of_service=background;
To reset admission control priority to the default session setting
(in between background and critical):
SET default_transaction_quality_of_service=regular;