Skip to content
This repository has been archived by the owner on Jan 8, 2024. It is now read-only.

runner: static runners accept multiple jobs in parallel #3300

Merged
merged 5 commits into from
May 1, 2022

Conversation

mitchellh
Copy link
Contributor

This modifies internal/runner to support accepting multiple jobs in
parallel. Not much work here since we always designed the runner struct
from the beginning to support this so there are no data races.

This modifies internal/cli so that runners in non-ODR run in parallel
mode by default. ODR doesn't make sense to have any parallelism since
they always run exactly one job. Non-ODR runners typically ONLY launch
ODR tasks, which are highly IO-bound, so we default to a multiple above
CPU count for concurrency.

This is a necessary pre-requisite for pipelines since they'll likely
perform blocking jobs on the static runners to "watch" tasks. Today,
tasks are launched and stopped, but not watched so this is not an issue.

This modifies `internal/runner` to support accepting multiple jobs in
parallel. Not much work here since we always designed the runner struct
from the beginning to support this so there are no data races.

This modifies `internal/cli` so that runners in non-ODR run in parallel
mode by default. ODR doesn't make sense to have any parallelism since
they always run exactly one job. Non-ODR runners typically ONLY launch
ODR tasks, which are highly IO-bound, so we default to a multiple above
CPU count for concurrency.

This is a necessary pre-requisite for pipelines since they'll likely
perform blocking jobs on the static runners to "watch" tasks. Today,
tasks are launched and stopped, but not watched so this is not an issue.
@mitchellh mitchellh requested review from evanphx, briancain and a team April 29, 2022 19:18
@github-actions github-actions bot added the core label Apr 29, 2022
Copy link
Contributor

@izaaklauer izaaklauer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wohoo!

Copy link
Member

@briancain briancain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look great!

Copy link
Contributor

@evanphx evanphx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just one question about the semantics of canceling in the worker goroutine.

wg.Add(count)
for i := 0; i < count; i++ {
go func() {
defer cancel()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just saying this out loud, should the exit of ANY of the goroutines cause them all to exit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So actually, yes I think this needs to do slightly better, although its edge case-y. Here is the thought:

  1. If the user cancels the context, then this is effectively a no-op and all of the goroutines are canceled anyways. No issue.

  2. If a goroutine has an error, its likely the error will impact all, because there is considerable retry logic already in each Accept call -- including reconnection -- so if it actually errors it is likely unrecoverable. So we DO want to exit all the goroutines.

However, for #2, right now we're canceling the context which just causes a cascade effect to cancel ASAP. I think we can do better by just letting each existing job try to finish gracefully, and then say "don't accept any more jobs thereafter."

I'll work on this Monday.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I think this is okay for now. Its a bit non-trivial to get this fix in and looking at the possible reasons for a return from AcceptMany, i do think things are really broken if they exit so cancelling all is okay for now. We can improve this later. I've added a TODO to note it.

@mitchellh mitchellh merged commit c1dcceb into main May 1, 2022
@mitchellh mitchellh deleted the static-runner-par branch May 1, 2022 23:12
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants