-
-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Limit concurrent builds #6847
Limit concurrent builds #6847
Conversation
Add a Feature flag to be able to limit concurrent builds for specific projects. This will allow us to protect ourselves for projects triggering a lot of builds via the API or for a few very long builds of the same project blocking all the rest of users. If the limit is reached, the Build is marked as "Queue" and an error message is shown in the build's detail page saying that the concurrent limit was reached and the build is re-triggered to be executed in 5 minutes again.
051bf17
to
4313e49
Compare
@@ -510,6 +512,26 @@ def run( | |||
self.commit = commit | |||
self.config = None | |||
|
|||
if self.project.has_feature(Feature.LIMIT_CONCURRENT_BUILDS): | |||
response = api_v2.build.running.get(project__slug=self.project.slug) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could use APIv3 here (https://docs.readthedocs.io/en/stable/api/v3.html#builds-listing) if we add a new permission class to allow build user to list builds. As this will require extra work and we are only using APIv2 from builders, I didn't want to mix them here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a good approach to test. I'd like to be explicit in our call to retry though, so we aren't depending on global settings that might change how this works randomly in the future.
Co-Authored-By: Eric Holscher <25510+ericholscher@users.noreply.github.com>
…cs/readthedocs.org into humitos/max-builds-concurrency
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd still be +1 on removing "Queued" state and just using "Triggered". From a user's perspective, there isn't a difference here, and adding a queued state could throw off any queries that were just looking for state = 'triggered'.
queryset = ( | ||
self.get_queryset() | ||
.filter(project__slug=project_slug) | ||
.exclude(state__in=[BUILD_STATE_FINISHED, BUILD_STATE_QUEUED]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This case is where I had issues with overloading the build state to communicate build step to the user in the UI. There are likely other queries/methods we need to update whenever we add build states like this -- for instance, anything checking just for BUILD_STATE_TRIGGERED
and should be also checking for our new state, BUILD_STATE_QUEUED
?
This code is correct, but it's hard to know what fallout could be without some thorough testing. Queries like these should probably be QuerySet methods so that we're not continually reproducing logic that we need to update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a strong opinion here, but a slight preference to add a new state. I just wanted to mention why I thought it could be useful.
The difference in UI is that going to the Build list page, you can have a big picture about what's happening: Triggered, Queued, Building, Passed, Failed. Where Triggered means that you are waiting for your turn to build, while Queued means that you have reached the concurrency limit: "those builds are waiting "because of you" (in some way), not us.
Having too many builds in Triggered could cause the user to contact us because they think we are not processing their builds: "the waiting time is too high", where in fact, they have reached the limit.
On the other hand, we use Build.is_stale
to show a small "Warning" icon in the build list if the build has been in Triggered state for more than 5 minutes. I'd not show this icon for Queued state.
Finally, I did a quick grep trying to find filters via BUILD_STATE_TRIGGERED
and I didn't find anything important. Although, for BUILD_STATE_FINISHED
I found this,
readthedocs.org/readthedocs/api/v3/filters.py
Lines 41 to 45 in ee403c5
def get_running(self, queryset, name, value): | |
if value: | |
return queryset.exclude(state=BUILD_STATE_FINISHED) | |
return queryset.filter(state=BUILD_STATE_FINISHED) |
I agree we could move them to a QuerySet and avoid repetition, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I say lets keep it Triggered for now, and we can adjust as we go.
There are ways to change the UX for the user without using the state, if needed, or audit all the queries. We need to do additional work before we ship it for users anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good points! Some of this felt like core team opinions, so I got feedback on the UI changes. What it came back to was that builds should be in "Queued" state if the user has to wait for the build -- regardless of whether we introduce a delay due to concurrency limits, or the build is queued because of build backup. "Triggered" doesn't communicate the same thing as "Queued".
So, I think my opinion is that adding "Queued" is not a problem, but all builds should become "Queued" state when they are put into the build queue. This might leave no room for "Triggered" state if so, and so probably back to my original point, should be dropped in favor of "Queued".
For now, I'd agree we can leave it "Triggered" and move more deliberately to clearer language for the users. We can add more UI later and guide our UI decisions based on our technical implementation.
Here are the feedback notes:
- "Triggered" vs "Queued" does communicate the difference between build state. "Triggered" implies there is something RTD needs to do still, while "Queued" implies the build will be grabbed eventually.
- "I don't need to worry about builds in a queued state, and would have more patience with these builds vs triggered state"
- "Queued" state, when builds go over concurrency, makes sense and is mostly obvious. "I'd have more patience with builds in queued state"
- Expectation is that builds should be picked up and move quickly from a "Triggered" state into "Building" state
- Builds stuck in "queued" for a long time would eventually be a problem
- A build stuck in "Triggered" state does not imply that we will ever start the build. If the queue is temporarily backed up, it was expected that the build would enter a "Queued" state, not stay in "Triggered" state. "Triggered" state is more worrying.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 --I've updated this PR and removed the QUEUED state for now. We can come back to this discussion later if needed.
We decided we are going to keep the build in Triggered state for now.
@humitos I'm 👍 to merge once the tests are fixed. |
Test failure looks unrelated. |
Add a Feature flag to be able to limit concurrent builds for specific
projects. This will allow us to protect ourselves for projects
triggering a lot of builds via the API or for a few very long builds
of the same project blocking all the rest of users.
If the limit is reached, the Build is marked as "Queue" and an error
message is shown in the build's detail page saying that the concurrent
limit was reached and the build is re-triggered to be executed in 5
minutes again.