Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only instantiate one celery app per process. #25822

Merged
merged 1 commit into from
Dec 9, 2020
Merged

Conversation

feanil
Copy link
Contributor

@feanil feanil commented Dec 9, 2020

Ticket: BOM-2086

Currently there are parts of the LMS that import content from the CMS
APP and vice-versa. When this happens, we end up with 2 instances of
the celery app and some tasks get registered to the wrong one. The
tasks that were getting registered to the wrong one are never able to
run and result in lots of production errors on celery workers.

The timing of the CMS celery app instantiation is non deterministic
so different tasks get lost depending on when it's imported by some
code in the LMS.

As long as SERVICE_VARIANT is set, this code should prevent the
instantiation of both celery apps.

Ticket: BOM-2086

Currently there are parts of the LMS that import content from the CMS
APP and vice-versa.  When this happens, we end up with 2 instances of
the celery app and some tasks get registered to the wrong one. The
tasks that were getting registered to the wrong one are never able to
run and result in lots of production errors on celery workers.

The timing of the CMS celery app instantiation is non deterministic
so different tasks get lost depending on when it's imported by some
code in the LMS.

As long as SERVICE_VARIANT is set, this code should prevent the
instantiation of both celery apps.
@edx-status-bot
Copy link

Your PR has finished running tests. There were no failures.

@feanil feanil merged commit 1984751 into master Dec 9, 2020
@feanil feanil deleted the feanil/celery_worker_fix branch December 9, 2020 19:45
timmc-edx added a commit that referenced this pull request Dec 9, 2020
In #25822 we protect against the case where *any* `cms.**` module is
imported by the running LMS process, causing cms.celery to be imported
and a second celery to be instantiated (leading to tasks being registered
to the wrong one and effectively lost).

This commit protects against the more restricted case of `cms.celery` or
`lms.celery` being imported explicitly, rather than implicitly by their
respective root `__init__.py`.
@edx-pipeline-bot
Copy link
Contributor

EdX Release Notice: This PR has been deployed to the staging environment in preparation for a release to production.

feanil referenced this pull request Dec 9, 2020
This change was reverted along with the downgrade of super-csv because
it was believed that this change was causing issues with celery task
registration.

It was reverted in https://github.com/edx/edx-platform/pull/25762

The fix turned out to be something different and this and the
super-csv library should both be same to upgrade.  The library
upgrade will come in a follow-on make upgrade with some other
constraints that also need to be lifted.

Actual fix: https://github.com/edx/edx-platform/pull/25822
feanil referenced this pull request Dec 9, 2020
This should no longer be necessary now that we have found a root cause
fix for the issue. This was removed because it was causing the case
of celery tasks not getting registered to happen more often.

Undoing https://github.com/edx/edx-platform/pull/25787

Actual Fixes:
- https://github.com/edx/edx-platform/pull/25822
- https://github.com/edx/edx-platform/pull/25825
@edx-pipeline-bot
Copy link
Contributor

EdX Release Notice: This PR has been deployed to the production environment.

@edx-pipeline-bot
Copy link
Contributor

EdX Release Notice: This PR has been rolled back from the production environment.

timmc-edx added a commit that referenced this pull request Dec 10, 2020
This should prevent the issues we've seen recently where cms modules are
imported by the running lms process, resulting in two celery instances
being created and tasks intermittently being registered to the wrong
instance (and therefore effectively lost.)

In commit ab6bf34/PR #25822 we tried to ensure that only one or the
other of the instances was created by adding a startup check.
Unfortunately, there's an external shared library that refers directly
to the lms celery, causing a startup failure in cms, so we had to revert
it. Rather than waiting to fix that library, this commit collapses
the two instances together so that there is only ever one.
@edx-pipeline-bot
Copy link
Contributor

EdX Release Notice: This PR has been deployed to the production environment.

timmc-edx added a commit that referenced this pull request Dec 10, 2020
…25840)

This should prevent the issues we've seen recently where cms modules are
imported by the running lms process, resulting in two celery instances
being created and tasks intermittently being registered to the wrong
instance (and therefore effectively lost.)

In commit ab6bf34/PR #25822 we tried to ensure that only one or the
other of the instances was created by adding a startup check.
Unfortunately, there's an external shared library that refers directly
to the lms celery, causing a startup failure in cms, so we had to revert
it. Rather than waiting to fix that library, this commit collapses
the two instances together so that there is only ever one.
feanil referenced this pull request Dec 10, 2020
This change was reverted along with the downgrade of super-csv because
it was believed that this change was causing issues with celery task
registration.

It was reverted in https://github.com/edx/edx-platform/pull/25762

The fix turned out to be something different and this and the
super-csv library should both be same to upgrade.  The library
upgrade will come in a follow-on make upgrade with some other
constraints that also need to be lifted.

Actual fix: https://github.com/edx/edx-platform/pull/25822
feanil referenced this pull request Dec 10, 2020
This should no longer be necessary now that we have found a root cause
fix for the issue. This was removed because it was causing the case
of celery tasks not getting registered to happen more often.

Undoing https://github.com/edx/edx-platform/pull/25787

Actual Fixes:
- https://github.com/edx/edx-platform/pull/25822
- https://github.com/edx/edx-platform/pull/25825
nedbat pushed a commit that referenced this pull request Dec 11, 2020
…25840)

This should prevent the issues we've seen recently where cms modules are
imported by the running lms process, resulting in two celery instances
being created and tasks intermittently being registered to the wrong
instance (and therefore effectively lost.)

In commit ab6bf34/PR #25822 we tried to ensure that only one or the
other of the instances was created by adding a startup check.
Unfortunately, there's an external shared library that refers directly
to the lms celery, causing a startup failure in cms, so we had to revert
it. Rather than waiting to fix that library, this commit collapses
the two instances together so that there is only ever one.

(cherry picked from commit 0c57a02)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants