-
-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove files after build #5680
Remove files after build #5680
Conversation
Mock new task
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to me.
We need to decide how to handle this on Corporate before merging and deploying this PR, though.
Notes:
But I think we can just don't cache anything since we are going to remove it
This happens now, but isn't noticeable because we don't remove the whole checkout, now we do (and
I'm removing the files from all the build servers, if we don't want to be so aggressive, we can just remove the files from the current one. |
Also, a general solution for the race conditions I think is having a way to query the build queue of the project, that way we can limit to build one version at the time (if the version is the same) and get rid of the lock when cloning a repo. |
If two builds (same version) are triggered almost at the same time on different builders, the first one that ends will remove the files from the other one which is still building. Won't this be a problem? |
Removing only from the same builder will have the same problems with race conditions #5680 (comment) |
readthedocs/projects/tasks.py
Outdated
for dir_ in ('checkouts', 'envs', 'conda') | ||
] | ||
for del_dir in del_dirs: | ||
broadcast(type='build', task=remove_dirs, args=[(del_dir,)]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This definitely shouldn't be a broadcast. We only do locking per-server, so doing this will definitely break things.
We decided to extend the lock, right now we release the lock after the clone step. Also, we can rely on the db for this. I'll see if this can fix in a separated PR, because it's useful for the current code too :) |
Just a note for myself: We need to remove the checkout after we do the versions sync when a new branch is created. This is another task, separated from update_docs. |
I'm not adding the clean_build task in the old webhooks because they are going to be removed #5479. I could add them if you want. |
readthedocs/projects/tasks.py
Outdated
os.path.join(version.project.doc_path, dir_, version.slug) | ||
for dir_ in ('checkouts', 'envs', 'conda') | ||
] | ||
# TODO: the max_lock_age can be lower than the default |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, default is fine. Because a build can take till x miinutes to timeout. If this task is triggered, it would start to delete everything and the build will fail, when maybe it only needed some more minutes.
Artifacts are copied to rtd_builds after build
So, I was able to reduce the percent of race conditions. The trick is keep using locks: one lock per task that modify files (build, clean build, sync versions). Right now the biggest problem is that build_docs acquire 3 locks in the whole task. That lead to spaces in time where another task could take the lock and start modifying files that the other task expects to be as they were before releasing the lock. I'm reducing the number of locks in build_docs to 2 in #5695. We can have only one after #5672 Till now, not changes are required in the corporate site. |
Note for myself: I still need to add a lock in the syn versions step, but it would be more easy after #5695 |
readthedocs/core/views/hooks.py
Outdated
args=(version.pk,), | ||
immutable=True, | ||
), | ||
clean_build_task.signature( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a doubt here.
I understand that when we receive a webhook this task is called to sync the versions. So, if the default_version
is triggered for a build as well, it could happen that this clean_build_task
removes the files while the build is still building.
In other words, since calling sync_versions
may trigger a build for stable
version, I think that there is one case where this could break. Consider this scenario:
- the project
default_version
is set asstable
- we receive a webhook that calls
sync_versions
and it triggers a build forstable
- the build for the
stable
version starts in the same builder clean_build_task
is called withstable
version because it was thedefault_version
of the project- files for
stable
version are removed from the builder while it was building
I understand that if we merge #5695 as well (lock the whole build), step number 5) won't happen because clean_build_task
tries to acquire the lock at this line: https://github.com/rtfd/readthedocs.org/pull/5680/files/0a6445edb9621256f28f673e1a9ae8d4a3831704..b613bbb98f99fb507fbe84826d8fa90de2db4358#diff-b9399e1d3499066c5564f98a620e8881R1434
Please, double check this and confirm if I'm right or wrong. Although, it seems to be a very improbable case :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Presumably we should only be cleaning up the files for the specific version we're building?
This is another vote for single build per builder in my mind. We're wasting a lot of time worrying about this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm acquiring a lock for the clean build task in
And skipping the deletion if there is another task with a lock. That avoids the problem of deleting stuff if we have another task expecting files to be there.
I only need to update the code acquire a lock in the sync_versions task after #5695, that way clean_build will skip the deletion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Presumably we should only be cleaning up the files for the specific version we're building?
We are doing that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And skipping the deletion if there is another task with a lock. That avoids the problem of deleting stuff if we have another task expecting files to be there.
Since we are not locking the whole build, this can still happen in between the locks, right?
Good question, honestly I'm not sure. I should take a look the rest of tasks how they handle that or see the celery docs.
Yes, but only in one case (between the clone step and the build step there is a release of the block for some seconds). That is blocked because of #5672 We used to have like 3 lock releases here :)
I don't understand this one. We are using a feature flag, so only projects with the flag will trigger the task. |
readthedocs/projects/tasks.py
Outdated
step = UpdateDocsTaskStep(task=self) | ||
return step.run(project_id, *args, **kwargs) | ||
finally: | ||
clean_build_task(kwargs.get('version_pk')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, to make sure we always have a version_pk
we need #5765
@stsewd think merging the other PR for locks caused some conflicts here. Sorry about that. |
@ericholscher that was expected actually :) |
Fix for tests on master is in #5162 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code looks way simpler that what we had before :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a much better implementation. ❤️ I'm 👍 to ship it, especially since it's behind a feature flag.
Needs #5678
I need to put this under a feature flag and test locally. Tests are passing for now :)
Closes #5646