-
Notifications
You must be signed in to change notification settings - Fork 968
Configurable timeouts for celery workers #2993
Configurable timeouts for celery workers #2993
Conversation
Thanks for the pull request, @e-kolpakov! It looks like you're a member of a company that does contract work for edX. If you're doing this work as part of a paid contract with edX, you should talk to edX about who will review this pull request. If this work is not part of a paid contract with edX, then you should ensure that there is an OSPR issue to track this work in JIRA, so that we don't lose track of your pull request. To automatically create an OSPR issue for this pull request, just visit this link: https://openedx-webhooks.herokuapp.com/github/process_pr?repo=edx%2Fconfiguration&number=2993 |
Thanks for the pull request, @e-kolpakov! I've created OSPR-1252 to keep track of it in JIRA. JIRA is a place for product owners to prioritize feature reviews by the engineering development teams. Feel free to add as much of the following information to the ticket:
All technical communication about the code itself will still be done via the GitHub pull request interface. As a reminder, our process documentation is here. If you like, you can add yourself to the AUTHORS file for this repo, though that isn't required. Please see the CONTRIBUTING file for more information. |
b26a42b
to
e6f33a8
Compare
Changes look good to me. 👍 once the sandbox provisioning using the test branch succeeds. Once we have the option to configure the timeout, we can still decide on our side what a good timeout to use would be. |
@@ -599,6 +599,7 @@ edxapp_git_ssh: "/tmp/edxapp_git_ssh.sh" | |||
edxapp_legacy_course_data_dir: "{{ edxapp_app_dir }}/data" | |||
|
|||
edxapp_workers: "{{ EDXAPP_CELERY_WORKERS }}" | |||
edxapp_worker_default_stopwaitsecs: 432000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since it is meant to be externally configurable let's make this an ALL_CAPS variable name.
e6f33a8
to
230be33
Compare
@feanil thank you for review - note addressed. |
@edx/devops change looks good to me, can I get a second review? |
👍 |
Description: This PR makes edxapp celery worker timeouts configurable (both global default and per-worker).
Background: Provisions often fail with
or just timeout around the same lines. Discoveries point out that
edxapp_worker:cms_default_1
fails to restart properly (or fails to signal supervisor that they are restarted). Unlike other approaches to fix this (listed below), lowering restart timeout seems to reliably fix/workaround the problem.Related:
First noticed: comment thread.
Previous attempts to fix the issue: #2557, #2871, #2875
JIRA: https://openedx.atlassian.net/browse/OSPR-1252
EMail threads: https://groups.google.com/forum/#!topic/openedx-ops/5hblv9LGeR8 - potentailly related.
Author concerns:
Major concern is that the problem this patch aims to fix/workaround was an intermittent one - about 50% builds succeeded without the patch. So it might be that probabilities aligned in my favor this time and all the deployments I did to verify the fix just never actually had a root cause expressed.