Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver,backupccl: make export, gc subject to admission control #71109

Merged
merged 1 commit into from
Oct 13, 2021

Conversation

sumeerbhola
Copy link
Collaborator

@sumeerbhola sumeerbhola commented Oct 4, 2021

They are now marked with AdmissionHeader_ROOT_KV, which stops
them from bypassing admission control. The priority is set to
admission.NormalPri since there are cases where they do need
to complete in a very timely manner. There are TODOs in the
integration code to make more sophisticated priority
assignments and use LowPri when possible.

Informs #65957

Release note: None

@sumeerbhola sumeerbhola requested review from ajwerner, RaduBerinde and a team October 4, 2021 21:29
@sumeerbhola sumeerbhola requested a review from a team as a code owner October 4, 2021 21:29
@sumeerbhola sumeerbhola requested review from dt and removed request for a team October 4, 2021 21:29
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@dt
Copy link
Member

dt commented Oct 5, 2021

Nice!

Any chance you're interested in splitting this up, to put GC and Export changes into separate commits, or maybe even separate PRs?

Also I'm not quite sure about this: "The priority is set to admission.LowPri since these are background activities that should not be preferred over interactive user traffic". First, sometimeExportRequests are sent in the foreground, namely when fetching schema revisions where it is used a revision-capable, albeit harder to use, alternative to ScanRequest, and in these cases it is sometimes in a foreground, user transaction (indeed, when used for this purpose during backup planning, that txn actually tends to lock the jobs table for all new adoption and introspection, so starvation would be bad there).

But also, even when used by background jobs (backups) on row data, the priority of of individual ExportRequests is currently explicitly managed: they can start at lower priority but with a lock policy of error to cause them to be put on the back of the queue for retry, but then on retry, if sufficient (and configurable) time has elapsed since the read timestamp, they actually invoke maximum priority, since based on our user feedback, complying with retention/rpo policies is often important.

@tbg
Copy link
Member

tbg commented Oct 5, 2021

Are you sure about wanting to apply admission control to GC requests, especially on low (I could see why you'd want to put them through admission control, I'm specifically commenting on ranking it lower than user traffic)? The GC queue is on a time budget and failure to perform significant amounts of work within the budget can lead to ranges beginning to backpressure as well as foreground traffic performance degradation.

@sumeerbhola sumeerbhola force-pushed the kv_bg_admission branch 2 times, most recently from 5c3be6f to 993d9e3 Compare October 6, 2021 01:52
Copy link
Collaborator Author

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTRs!

Any chance you're interested in splitting this up, to put GC and Export changes into separate commits, or maybe even separate PRs?

I was hoping not to. I am possibly an outlier, but find it harder cognitively to handle a sequence of small commits in a PR (this would be 3 commits (a) the refactor in node.go and store.go, (b) the change for export (c) the change for GC).
If you have a strong opinion, I will do so.

Also I'm not quite sure about this: "The priority is set to admission.LowPri since these are background activities that should not be preferred over interactive user traffic". First, sometimeExportRequests are sent in the foreground, namely when fetching schema revisions where it is used a revision-capable, albeit harder to use, alternative to ScanRequest, and in these cases it is sometimes in a foreground, user transaction (indeed, when used for this purpose during backup planning, that txn actually tends to lock the jobs table for all new adoption and introspection, so starvation would be bad there).

I didn't realize there were cases that used a foreground user txn or that it locks the jobs table.
I've changed this to NormalPri and added a TODO(bulkio) with a summary of some of the things you said here.

Are you sure about wanting to apply admission control to GC requests, especially on low (I could see why you'd want to put them through admission control, I'm specifically commenting on ranking it lower than user traffic)? The GC queue is on a time budget and failure to perform significant amounts of work within the budget can lead to ranges beginning to backpressure as well as foreground traffic performance degradation.

@tbg I've changed this to NormalPri, and added a TODO(kv) with a summary of your comment.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @dt, and @RaduBerinde)

@dt
Copy link
Member

dt commented Oct 6, 2021

Now that I read the diff and not just the summary: The ExportRequests being sent there, in backup_processor.go, are indeed all background; I'd be fine with either priority on them as long as we have something like the TODO you added about varying it based on retry status to be just like we do with the existing UserPriority header field (I wonder if we should unify these?).

The ExportRequests that are sent in foreground user transactions are sent by storageccl/revision_reader.go's GetAllRevisions, which a couple places use in lieu of db.Scan if they want revisions, but I don't think that call is touched here.

Copy link
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 5 files at r1, 2 of 2 files at r2, 3 of 3 files at r3, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @dt, @RaduBerinde, and @sumeerbhola)


pkg/server/node.go, line 347 at r3 (raw file):

		spanConfigAccessor: spanConfigAccessor,
	}
	n.storeCfg.KVAdmissionController = n

Now store references node references store etc. Could we pass something in here that is scoped more tightly? It doesn't seem appealing to me that all of the methods sit on Node directly. Can't we carve out kvAdmissionQ and storeGrantCoords into a type that we can then pass in here?

They are now marked with AdmissionHeader_ROOT_KV, which stops
them from bypassing admission control. The priority is set to
admission.NormalPri since there are cases where they do need
to complete in a very timely manner. There are TODOs in the
integration code to make more sophisticated priority
assignments and use LowPri when possible.

Informs cockroachdb#65957

Release note: None
Copy link
Collaborator Author

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be fine with either priority on them as long as we have something like the TODO you added about varying it based on retry status to be just like we do with the existing UserPriority header field (I wonder if we should unify these?).

Thanks. Updated the comment, including mentioning that this could be a function of the UserPriority.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @dt, @RaduBerinde, and @tbg)


pkg/server/node.go, line 347 at r3 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Now store references node references store etc. Could we pass something in here that is scoped more tightly? It doesn't seem appealing to me that all of the methods sit on Node directly. Can't we carve out kvAdmissionQ and storeGrantCoords into a type that we can then pass in here?

Done, by moving the implementation to kvserver in store.go.

Copy link
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 3 files at r4, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @dt, and @RaduBerinde)


pkg/server/node.go, line 347 at r3 (raw file):

Previously, sumeerbhola wrote…

Done, by moving the implementation to kvserver in store.go.

Thank you! Looks good. I did not review the method impls on the assumption that they were pure code movement with the exception of accounting for the receiver name etc.

Copy link
Collaborator Author

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @dt, and @RaduBerinde)


pkg/server/node.go, line 347 at r3 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Thank you! Looks good. I did not review the method impls on the assumption that they were pure code movement with the exception of accounting for the receiver name etc.

Yes, they were purely code movement.

@sumeerbhola
Copy link
Collaborator Author

bors r+

@craig
Copy link
Contributor

craig bot commented Oct 13, 2021

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants