Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit concurrent builds #6847

Merged
merged 7 commits into from
Apr 6, 2020
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 20 additions & 1 deletion readthedocs/api/v2/views/model_views.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,13 @@
from rest_framework.renderers import BaseRenderer, JSONRenderer
from rest_framework.response import Response

from readthedocs.builds.constants import BRANCH, TAG, INTERNAL
from readthedocs.builds.constants import (
BRANCH,
TAG,
INTERNAL,
BUILD_STATE_QUEUED,
BUILD_STATE_FINISHED,
)
from readthedocs.builds.models import Build, BuildCommandResult, Version
from readthedocs.core.utils import trigger_build
from readthedocs.core.utils.extend import SettingsOverrideObject
Expand Down Expand Up @@ -276,6 +282,19 @@ class BuildViewSetBase(UserSelectViewSet):
model = Build
filterset_fields = ('project__slug', 'commit')

@decorators.action(
detail=False,
permission_classes=[permissions.IsAdminUser],
methods=['get'],
)
def running(self, request, **kwargs):
project_slug = request.GET.get('project__slug')
queryset = (
self.get_queryset()
.filter(project__slug=project_slug)
.exclude(state__in=[BUILD_STATE_FINISHED, BUILD_STATE_QUEUED])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This case is where I had issues with overloading the build state to communicate build step to the user in the UI. There are likely other queries/methods we need to update whenever we add build states like this -- for instance, anything checking just for BUILD_STATE_TRIGGERED and should be also checking for our new state, BUILD_STATE_QUEUED?

This code is correct, but it's hard to know what fallout could be without some thorough testing. Queries like these should probably be QuerySet methods so that we're not continually reproducing logic that we need to update.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong opinion here, but a slight preference to add a new state. I just wanted to mention why I thought it could be useful.

The difference in UI is that going to the Build list page, you can have a big picture about what's happening: Triggered, Queued, Building, Passed, Failed. Where Triggered means that you are waiting for your turn to build, while Queued means that you have reached the concurrency limit: "those builds are waiting "because of you" (in some way), not us.

Having too many builds in Triggered could cause the user to contact us because they think we are not processing their builds: "the waiting time is too high", where in fact, they have reached the limit.

On the other hand, we use Build.is_stale to show a small "Warning" icon in the build list if the build has been in Triggered state for more than 5 minutes. I'd not show this icon for Queued state.

Finally, I did a quick grep trying to find filters via BUILD_STATE_TRIGGERED and I didn't find anything important. Although, for BUILD_STATE_FINISHED I found this,

def get_running(self, queryset, name, value):
if value:
return queryset.exclude(state=BUILD_STATE_FINISHED)
return queryset.filter(state=BUILD_STATE_FINISHED)
but I'm not sure it will be affected, because it considered Triggered as running anyways.

I agree we could move them to a QuerySet and avoid repetition, though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I say lets keep it Triggered for now, and we can adjust as we go.

There are ways to change the UX for the user without using the state, if needed, or audit all the queries. We need to do additional work before we ship it for users anyway.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good points! Some of this felt like core team opinions, so I got feedback on the UI changes. What it came back to was that builds should be in "Queued" state if the user has to wait for the build -- regardless of whether we introduce a delay due to concurrency limits, or the build is queued because of build backup. "Triggered" doesn't communicate the same thing as "Queued".

So, I think my opinion is that adding "Queued" is not a problem, but all builds should become "Queued" state when they are put into the build queue. This might leave no room for "Triggered" state if so, and so probably back to my original point, should be dropped in favor of "Queued".

For now, I'd agree we can leave it "Triggered" and move more deliberately to clearer language for the users. We can add more UI later and guide our UI decisions based on our technical implementation.

Here are the feedback notes:

  • "Triggered" vs "Queued" does communicate the difference between build state. "Triggered" implies there is something RTD needs to do still, while "Queued" implies the build will be grabbed eventually.
  • "I don't need to worry about builds in a queued state, and would have more patience with these builds vs triggered state"
  • "Queued" state, when builds go over concurrency, makes sense and is mostly obvious. "I'd have more patience with builds in queued state"
  • Expectation is that builds should be picked up and move quickly from a "Triggered" state into "Building" state
  • Builds stuck in "queued" for a long time would eventually be a problem
  • A build stuck in "Triggered" state does not imply that we will ever start the build. If the queue is temporarily backed up, it was expected that the build would enter a "Queued" state, not stay in "Triggered" state. "Triggered" state is more worrying.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 --I've updated this PR and removed the QUEUED state for now. We can come back to this discussion later if needed.

)
return Response({'count': queryset.count()})

class BuildViewSet(SettingsOverrideObject):

Expand Down
2 changes: 2 additions & 0 deletions readthedocs/builds/constants.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@


BUILD_STATE_TRIGGERED = 'triggered'
BUILD_STATE_QUEUED = 'queued'
BUILD_STATE_CLONING = 'cloning'
BUILD_STATE_INSTALLING = 'installing'
BUILD_STATE_BUILDING = 'building'
Expand All @@ -13,6 +14,7 @@

BUILD_STATE = (
(BUILD_STATE_TRIGGERED, _('Triggered')),
(BUILD_STATE_QUEUED, _('Queued')),
(BUILD_STATE_CLONING, _('Cloning')),
(BUILD_STATE_INSTALLING, _('Installing')),
(BUILD_STATE_BUILDING, _('Building')),
Expand Down
18 changes: 18 additions & 0 deletions readthedocs/builds/migrations/0016_add-queued-state.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# Generated by Django 2.2.11 on 2020-04-01 20:44

from django.db import migrations, models


class Migration(migrations.Migration):

dependencies = [
('builds', '0015_uploading_build_state'),
]

operations = [
migrations.AlterField(
model_name='build',
name='state',
field=models.CharField(choices=[('triggered', 'Triggered'), ('queued', 'Queued'), ('cloning', 'Cloning'), ('installing', 'Installing'), ('building', 'Building'), ('uploading', 'Uploading'), ('finished', 'Finished')], default='finished', max_length=55, verbose_name='State'),
),
]
4 changes: 4 additions & 0 deletions readthedocs/doc_builder/exceptions.py
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,10 @@ class BuildTimeoutError(BuildEnvironmentError):
message = ugettext_noop('Build exited due to time out')


class BuildMaxConcurrencyError(BuildEnvironmentError):
message = ugettext_noop('Concurrent limit reached ({limit}), retrying in 5 minutes.')
humitos marked this conversation as resolved.
Show resolved Hide resolved


class BuildEnvironmentWarning(BuildEnvironmentException):
pass

Expand Down
5 changes: 5 additions & 0 deletions readthedocs/projects/models.py
Original file line number Diff line number Diff line change
Expand Up @@ -1515,6 +1515,7 @@ def add_features(sender, **kwargs):
SKIP_SYNC_TAGS = 'skip_sync_tags'
SKIP_SYNC_BRANCHES = 'skip_sync_branches'
CACHED_ENVIRONMENT = 'cached_environment'
LIMIT_CONCURRENT_BUILDS = 'limit_concurrent_builds'

FEATURES = (
(USE_SPHINX_LATEST, _('Use latest version of Sphinx')),
Expand Down Expand Up @@ -1585,6 +1586,10 @@ def add_features(sender, **kwargs):
CACHED_ENVIRONMENT,
_('Cache the environment (virtualenv, conda, pip cache, repository) in storage'),
),
(
LIMIT_CONCURRENT_BUILDS,
_('Limit the amount of concurrent builds'),
),
)

projects = models.ManyToManyField(
Expand Down
22 changes: 22 additions & 0 deletions readthedocs/projects/tasks.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@

from readthedocs.api.v2.client import api as api_v2
from readthedocs.builds.constants import (
BUILD_STATE_QUEUED,
BUILD_STATE_BUILDING,
BUILD_STATE_CLONING,
BUILD_STATE_FINISHED,
Expand Down Expand Up @@ -57,6 +58,7 @@
from readthedocs.doc_builder.exceptions import (
BuildEnvironmentError,
BuildEnvironmentWarning,
BuildMaxConcurrencyError,
BuildTimeoutError,
MkDocsYAMLParseError,
ProjectBuildsSkippedError,
Expand Down Expand Up @@ -510,6 +512,26 @@ def run(
self.commit = commit
self.config = None

if self.project.has_feature(Feature.LIMIT_CONCURRENT_BUILDS):
response = api_v2.build.running.get(project__slug=self.project.slug)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could use APIv3 here (https://docs.readthedocs.io/en/stable/api/v3.html#builds-listing) if we add a new permission class to allow build user to list builds. As this will require extra work and we are only using APIv2 from builders, I didn't want to mix them here.

if response.get('count', 0) >= settings.RTD_MAX_CONCURRENT_BUILDS:
log.warning(
'Delaying tasks due to concurrency limit. project=%s version=%s',
self.project.slug,
self.version.slug,
)

# This is done automatically on the environment context, but
# we are executing this code before creating one
api_v2.build(self.build['id']).patch({
'error': BuildMaxConcurrencyError.message.format(
limit=settings.RTD_MAX_CONCURRENT_BUILDS,
),
'state': BUILD_STATE_QUEUED,
})
self.task.retry(exc=BuildMaxConcurrencyError, throw=False)
humitos marked this conversation as resolved.
Show resolved Hide resolved
return False

# Build process starts here
setup_successful = self.run_setup(record=record)
if not setup_successful:
Expand Down
1 change: 1 addition & 0 deletions readthedocs/settings/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -98,6 +98,7 @@ class CommunityBaseSettings(Settings):
RTD_STABLE = 'stable'
RTD_STABLE_VERBOSE_NAME = 'stable'
RTD_CLEAN_AFTER_BUILD = False
RTD_MAX_CONCURRENT_BUILDS = 4

# Database and API hitting settings
DONT_HIT_API = False
Expand Down