You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We've had a production issue where whenever we deploy new code (which ultimately leads to restart of the processes, including the beat) - the scheduled periodic tasks do not dispatch for exactly 1 hour after this restart. After that - they begin as scheduled and we have no delays after that (until the next deploy, unfortunately).
Celery Version: 5.4.0
Celery-Beat Version: 2.7.0
Exact steps to reproduce the issue:
Set USE_TZ=False in your Django settings
Change the time zone configuration to use Europe/London
Detailed information
We found the exact root cause of this and it is a complex combination of:
The Django timezone settings
The last_run_at field of the PeriodicTask model
The Celery code that determines whether the task "is before the last run"
Datetime objects that are passed within the Django app are timezone-naive
The datetime objects are stored in the DB in the London timezone
Something to note here - London & UTC are even, but due to DST - they now have a one hour difference:
I have a periodic task that runs every minute. If I start the task for the first time - everything goes as expected and the task is dispatch every minute.
However, if I kill the beat process and run it again - the task is not dispatched until exactly 1 hour and 1 minute after that.
We found that this issue is because the last_run_at field in the PeriodicTask objects is saved as timezone-naive (which is expected, because USE_TZ is set to False) - the beat process does not properly convert it to a London timezone when checking the last run time, rather than converting it to UTC:
defmaybe_make_aware(dt, tz=None):
"""Convert dt to aware datetime, do nothing if dt is already aware."""ifis_naive(dt):
dt=to_utc(dt)
returnlocalize(
dt, timezone.utciftzisNoneelsetimezone.tz_or_local(tz),
)
returndt
This is most probably the root cause for all of these:
The same issue represents itself in the opposite way when you use a timezone that is "before" UTC, for example - America/New_York (which is currently 4 hours before UTC).
If you do that - the task is dispatched immediately after the process starts, no matter that you said it should be run every minute. Which makes sense, because the same comparison functions are executed, but only aimed toward UTC.
The fix we found for our case is to reset last_run_at to None each time we do a new deployment - this way, there is no past datetime to compare with, thus the tasks begin execution as normal. After that, further scheduled executions are correct.
PeriodicTask.objects.update(last_run_at=None)
Which is actually what the documentation suggests if you do timezone configuration changes. But in our case - we did not change the configuration at all, it is the same from the start. We need to do this in order to "fake" the Beat process that this task has never run before, ultimately making the tasks dispatch as expected.
If you use USE_TZ=True and TIME_ZONE="UTC" - you won't have this issue.
However, changing our settings to these ☝🏻 default values is impossible at this moment, thus I think this should be carefully thought and possibly issue a fix.
The fix itself should be relatively easy - when comparing datetime with last_run_at, observe the configured timezone and make the passed object aware to the relevant timezone, not strictly UTC.
The text was updated successfully, but these errors were encountered:
Summary:
We've had a production issue where whenever we deploy new code (which ultimately leads to restart of the processes, including the beat) - the scheduled periodic tasks do not dispatch for exactly 1 hour after this restart. After that - they begin as scheduled and we have no delays after that (until the next deploy, unfortunately).
Exact steps to reproduce the issue:
USE_TZ=False
in your Django settingsEurope/London
Detailed information
We found the exact root cause of this and it is a complex combination of:
last_run_at
field of thePeriodicTask
modelSo, we have the following Django settings:
This leads to the following:
Something to note here - London & UTC are even, but due to DST - they now have a one hour difference:
I have a periodic task that runs every minute. If I start the task for the first time - everything goes as expected and the task is dispatch every minute.
However, if I kill the
beat
process and run it again - the task is not dispatched until exactly 1 hour and 1 minute after that.We found that this issue is because the
last_run_at
field in thePeriodicTask
objects is saved as timezone-naive (which is expected, becauseUSE_TZ
is set toFalse
) - thebeat
process does not properly convert it to a London timezone when checking the last run time, rather than converting it to UTC:Here's the exact code that does that (https://github.com/celery/celery/blob/f3a2cf45a69b443cac6c79a5c85583c8bd91b0a3/celery/schedules.py#L470-L473):
where
maybe_make_aware
makes the passed datetime object timezone-aware, but defaults to UTC, rather than the specified timezone, which leads to the issue (https://github.com/celery/celery/blob/f3a2cf45a69b443cac6c79a5c85583c8bd91b0a3/celery/utils/time.py#L308):This is most probably the root cause for all of these:
The same issue represents itself in the opposite way when you use a timezone that is "before" UTC, for example -
America/New_York
(which is currently 4 hours before UTC).If you do that - the task is dispatched immediately after the process starts, no matter that you said it should be run every minute. Which makes sense, because the same comparison functions are executed, but only aimed toward UTC.
The fix we found for our case is to reset
last_run_at
toNone
each time we do a new deployment - this way, there is no past datetime to compare with, thus the tasks begin execution as normal. After that, further scheduled executions are correct.Which is actually what the documentation suggests if you do timezone configuration changes. But in our case - we did not change the configuration at all, it is the same from the start. We need to do this in order to "fake" the Beat process that this task has never run before, ultimately making the tasks dispatch as expected.
If you use
USE_TZ=True
andTIME_ZONE="UTC"
- you won't have this issue.However, changing our settings to these ☝🏻 default values is impossible at this moment, thus I think this should be carefully thought and possibly issue a fix.
The fix itself should be relatively easy - when comparing datetime with
last_run_at
, observe the configured timezone and make the passed object aware to the relevant timezone, not strictly UTC.The text was updated successfully, but these errors were encountered: