Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit concurrent builds #6847

Merged
merged 7 commits into from
Apr 6, 2020
Merged

Conversation

humitos
Copy link
Member

@humitos humitos commented Apr 1, 2020

Add a Feature flag to be able to limit concurrent builds for specific
projects. This will allow us to protect ourselves for projects
triggering a lot of builds via the API or for a few very long builds
of the same project blocking all the rest of users.

If the limit is reached, the Build is marked as "Queue" and an error
message is shown in the build's detail page saying that the concurrent
limit was reached and the build is re-triggered to be executed in 5
minutes again.

Screenshot_2020-04-01_22-48-07

@humitos humitos requested a review from a team April 1, 2020 20:48
Add a Feature flag to be able to limit concurrent builds for specific
projects. This will allow us to protect ourselves for projects
triggering a lot of builds via the API or for a few very long builds
of the same project blocking all the rest of users.

If the limit is reached, the Build is marked as "Queue" and an error
message is shown in the build's detail page saying that the concurrent
limit was reached and the build is re-triggered to be executed in 5
minutes again.
@humitos humitos force-pushed the humitos/max-builds-concurrency branch from 051bf17 to 4313e49 Compare April 1, 2020 20:55
@@ -510,6 +512,26 @@ def run(
self.commit = commit
self.config = None

if self.project.has_feature(Feature.LIMIT_CONCURRENT_BUILDS):
response = api_v2.build.running.get(project__slug=self.project.slug)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could use APIv3 here (https://docs.readthedocs.io/en/stable/api/v3.html#builds-listing) if we add a new permission class to allow build user to list builds. As this will require extra work and we are only using APIv2 from builders, I didn't want to mix them here.

Copy link
Member

@ericholscher ericholscher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a good approach to test. I'd like to be explicit in our call to retry though, so we aren't depending on global settings that might change how this works randomly in the future.

readthedocs/doc_builder/exceptions.py Outdated Show resolved Hide resolved
readthedocs/projects/tasks.py Outdated Show resolved Hide resolved
humitos and others added 4 commits April 2, 2020 16:18
Co-Authored-By: Eric Holscher <25510+ericholscher@users.noreply.github.com>
…cs/readthedocs.org into humitos/max-builds-concurrency
Copy link
Contributor

@agjohnson agjohnson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd still be +1 on removing "Queued" state and just using "Triggered". From a user's perspective, there isn't a difference here, and adding a queued state could throw off any queries that were just looking for state = 'triggered'.

queryset = (
self.get_queryset()
.filter(project__slug=project_slug)
.exclude(state__in=[BUILD_STATE_FINISHED, BUILD_STATE_QUEUED])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This case is where I had issues with overloading the build state to communicate build step to the user in the UI. There are likely other queries/methods we need to update whenever we add build states like this -- for instance, anything checking just for BUILD_STATE_TRIGGERED and should be also checking for our new state, BUILD_STATE_QUEUED?

This code is correct, but it's hard to know what fallout could be without some thorough testing. Queries like these should probably be QuerySet methods so that we're not continually reproducing logic that we need to update.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong opinion here, but a slight preference to add a new state. I just wanted to mention why I thought it could be useful.

The difference in UI is that going to the Build list page, you can have a big picture about what's happening: Triggered, Queued, Building, Passed, Failed. Where Triggered means that you are waiting for your turn to build, while Queued means that you have reached the concurrency limit: "those builds are waiting "because of you" (in some way), not us.

Having too many builds in Triggered could cause the user to contact us because they think we are not processing their builds: "the waiting time is too high", where in fact, they have reached the limit.

On the other hand, we use Build.is_stale to show a small "Warning" icon in the build list if the build has been in Triggered state for more than 5 minutes. I'd not show this icon for Queued state.

Finally, I did a quick grep trying to find filters via BUILD_STATE_TRIGGERED and I didn't find anything important. Although, for BUILD_STATE_FINISHED I found this,

def get_running(self, queryset, name, value):
if value:
return queryset.exclude(state=BUILD_STATE_FINISHED)
return queryset.filter(state=BUILD_STATE_FINISHED)
but I'm not sure it will be affected, because it considered Triggered as running anyways.

I agree we could move them to a QuerySet and avoid repetition, though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I say lets keep it Triggered for now, and we can adjust as we go.

There are ways to change the UX for the user without using the state, if needed, or audit all the queries. We need to do additional work before we ship it for users anyway.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good points! Some of this felt like core team opinions, so I got feedback on the UI changes. What it came back to was that builds should be in "Queued" state if the user has to wait for the build -- regardless of whether we introduce a delay due to concurrency limits, or the build is queued because of build backup. "Triggered" doesn't communicate the same thing as "Queued".

So, I think my opinion is that adding "Queued" is not a problem, but all builds should become "Queued" state when they are put into the build queue. This might leave no room for "Triggered" state if so, and so probably back to my original point, should be dropped in favor of "Queued".

For now, I'd agree we can leave it "Triggered" and move more deliberately to clearer language for the users. We can add more UI later and guide our UI decisions based on our technical implementation.

Here are the feedback notes:

  • "Triggered" vs "Queued" does communicate the difference between build state. "Triggered" implies there is something RTD needs to do still, while "Queued" implies the build will be grabbed eventually.
  • "I don't need to worry about builds in a queued state, and would have more patience with these builds vs triggered state"
  • "Queued" state, when builds go over concurrency, makes sense and is mostly obvious. "I'd have more patience with builds in queued state"
  • Expectation is that builds should be picked up and move quickly from a "Triggered" state into "Building" state
  • Builds stuck in "queued" for a long time would eventually be a problem
  • A build stuck in "Triggered" state does not imply that we will ever start the build. If the queue is temporarily backed up, it was expected that the build would enter a "Queued" state, not stay in "Triggered" state. "Triggered" state is more worrying.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 --I've updated this PR and removed the QUEUED state for now. We can come back to this discussion later if needed.

humitos added 2 commits April 6, 2020 12:42
We decided we are going to keep the build in Triggered state for now.
@ericholscher
Copy link
Member

@humitos I'm 👍 to merge once the tests are fixed.

@ericholscher
Copy link
Member

Test failure looks unrelated.

@ericholscher ericholscher merged commit 253273e into master Apr 6, 2020
@ericholscher ericholscher deleted the humitos/max-builds-concurrency branch April 6, 2020 18:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants