Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Airflow scheduler does not trigger when schedule_interval is @weekly OR (montlhy) "0 3 1 * *". #10824

Closed
ana-carolina-januario opened this issue Sep 9, 2020 · 11 comments
Labels
area:Scheduler including HA (high availability) scheduler kind:bug This is a clearly a bug

Comments

@ana-carolina-januario
Copy link

ana-carolina-januario commented Sep 9, 2020

Apache Airflow version:1.10.5

Kubernetes version (if you are using kubernetes) (use kubectl version): NA

Environment: CentOS Linux release 7.7.1908 (Core), without Airflow environment defined variables.

  • Cloud provider or hardware configuration: Azure, B4ms standard (4 cpu, 16GiB memory, 30GiB ssd disk)
  • OS (e.g. from /etc/os-release): CentOS Linux release 7.7.1908 (Core)
  • Kernel (e.g. uname -a): Linux prd-azure-ddp-airflow 3.10.0-1062.1.2.el7.x86_64 Improving the search functionality in the graph view #1 SMP Mon Sep 30 14:19:46 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
  • Install tools:
    $ pip list
    Package Version

alembic 1.2.1
apache-airflow 1.10.5
apispec 3.0.0
attrs 19.3.0
azure-common 1.1.25
azure-cosmos 3.1.2
azure-datalake-store 0.0.48
azure-mgmt-containerinstance 1.5.0
azure-mgmt-hdinsight 1.4.0
azure-mgmt-resource 9.0.0
azure-nspkg 3.0.2
azure-storage 0.36.0
azure-storage-blob 2.1.0
azure-storage-common 2.1.0
Babel 2.7.0
boto 2.49.0
boto3 1.12.31
botocore 1.15.31
cached-property 1.5.1
certifi 2019.9.11
cffi 1.13.0
chardet 3.0.4
Click 7.0
colorama 0.4.1
colorlog 4.0.2
configparser 3.5.3
croniter 0.3.30
cryptography 2.8
defusedxml 0.6.0
dill 0.2.9
docutils 0.15.2
dumb-init 1.2.2
elasticsearch 5.5.3
elasticsearch-dsl 5.4.0
Flask 1.1.1
Flask-Admin 1.5.3
Flask-AppBuilder 1.13.1
Flask-Babel 0.12.2
Flask-Caching 1.3.3
Flask-JWT-Extended 3.24.0
Flask-Login 0.4.1
Flask-OpenID 1.2.5
Flask-SQLAlchemy 2.4.1
flask-swagger 0.2.13
Flask-WTF 0.14.2
funcsigs 1.0.0
future 0.16.0
gunicorn 19.9.0
idna 2.8
importlib-metadata 0.23
iso8601 0.1.12
itsdangerous 1.1.0
JayDeBeApi 1.1.1
Jinja2 2.10.3
jmespath 0.9.5
JPype1 0.6.3
json-merge-patch 0.2
jsonschema 3.1.1
lazy-object-proxy 1.4.2
lockfile 0.12.2
Mako 1.1.0
Markdown 2.6.11
MarkupSafe 1.1.1
marshmallow 2.19.5
marshmallow-enum 1.5.1
marshmallow-sqlalchemy 0.19.0
more-itertools 7.2.0
mysql-connector 2.2.9
mysqlclient 1.3.14
ndg-httpsclient 0.5.1
numpy 1.17.3
ordereddict 1.1
pandas 0.25.2
pendulum 1.4.4
pip 20.2
prison 0.1.0
psutil 5.6.3
psycopg2 2.8.4
pyasn1 0.4.7
pycparser 2.19
Pygments 2.4.2
PyJWT 1.7.1
pyOpenSSL 19.0.0
pyrsistent 0.15.4
python-daemon 2.1.2
python-dateutil 2.8.0
python-editor 1.0.4
python3-openid 3.1.0
pytz 2019.3
pytzdata 2019.3
PyYAML 5.1.2
requests 2.22.0
s3transfer 0.3.3
setproctitle 1.1.10
setuptools 41.4.0
six 1.12.0
SQLAlchemy 1.3.10
tabulate 0.8.5
tenacity 4.12.0
termcolor 1.1.0
text-unidecode 1.2
thrift 0.11.0
tzlocal 1.5.1
unicodecsv 0.14.1
urllib3 1.25.6
Werkzeug 0.16.0
wheel 0.33.6
WTForms 2.2.1
zipp 0.6.0
zope.deprecation 4.4.0

  • Others:

What happened: I have a dag scheduled to be triggered weekly using '@Weekly' in dag creation but the dag was never triggered according to this schedule.
Before the weekly schedule, I've also tried to triggered this same DAG monthly with "0 3 1 * *" which didn't work as well.

What you expected to happen: I've expected the referred DAG to be triggered.

How to reproduce it:
This can be reproduced by simply create a couple of DAGs scheduled to be triggered with the referred schedule intervals(weekly and monthly).

The relevant code would be:

default_args = {
'owner': 'daai',
'depends_on_past': False,
'start_date': airflow.utils.dates.days_ago(1),
'retries': 1,
'retry_delay': timedelta(minutes=2),
'provide_context': True}

dag = DAG(dag_name, concurrency=6, schedule_interval="@Weekly", default_args=default_args, max_active_runs=1, dagrun_timeout=timedelta(hours=3))

Anything else we need to know:
I don't thing the content of the DAGs impact this issue but let me know if you think they impact.

@ana-carolina-januario ana-carolina-januario added the kind:bug This is a clearly a bug label Sep 9, 2020
@boring-cyborg
Copy link

boring-cyborg bot commented Sep 9, 2020

Thanks for opening your first issue here! Be sure to follow the issue template!

@turbaszek turbaszek added the area:Scheduler including HA (high availability) scheduler label Sep 10, 2020
@ana-carolina-januario
Copy link
Author

Hi there,

I ended up figuring this out. Basically the was in the value of "start_date" DAG's field. It should be pointing to a date previous to the (first) execution_date (which depends on schedule_interval).In case you want to trigger a DAGs on a daily basis, your schedule_interval will be something like "@daily", your start_date should be 1 day ago. For a schedule_interval @Weekly", your start_date should be 8 days ago (= week +1day), etc.

I hope this helps.

Cheers.

@akanshajainnice
Copy link

akanshajainnice commented Feb 24, 2022

Hi I am facing the same problem. Can you help me understand how to set start date for monthly jobs like u said above.
One of the dags having monthly cron set as "0 11 1 * *" but it does not trigger as per scheduled interval. The dag was created in 06/12/ 2021. With cron schedule, the job was expected to trigger automatically on 1st Jan 2022 and 1st Feb 2022. But it did not trigger on time.

default_args = {
'owner': 'airflow',
'depends_on_past': False,
'start_date': airflow.utils.dates.days_ago(1),
'email_on_failure': False,
'email_on_retry': False,
'retries': 1,
'retry_delay': timedelta(minutes=5),
}

@potiuk
Copy link
Member

potiuk commented Feb 24, 2022

'start_date': airflow.utils.dates.days_ago(1),

Don't use "days_ago" -> this is confusing and you just experienced this very confusion. We are actually deprecating days_ago and already removed it from all examples. Used fixed date for it (6/12/2021) - see our examples.

The way you specified it "days_ago(1)" is yesterday. EVERY SINGLE DAY IT IS YESTERDAY.
This means that start_date literally changes every single day.

When you want a dag that should start running 6th of December - make it so explicitly - look at our fresh examples where date is fixed - and set it to 6/12/2021 explicitly. See examples how to it for example here: https://airflow.apache.org/docs/apache-airflow/stable/best-practices.html?highlight=start_date

@akanshajainnice
Copy link

akanshajainnice commented Feb 25, 2022

won't it have a impact if I set a specific start date as the dag execution depends on schedule_interval which we pass as a cron which is different for every dag.

@potiuk
Copy link
Member

potiuk commented Feb 25, 2022

won't it have a impact if I set a specific start date as the dag execution depends on schedule_interval which we pass as a cron which is different for every dag.

What Impact do you think it will have? I do not understand? using days_ago in this case is just wrong

@akanshajainnice
Copy link

mixed it with execution_date. Thanks.

@akanshajainnice
Copy link

I am using version 1.10.10 for airflow. Using fixed start date with tz argument gave me below error. I couldn't find any document for this version stating if it supports this parameter. Can you please help with the valid format for using fixed date.
'start_date': pendulum.datetime(2022, 1, 1, tz="UTC")
TypeError: new() got an unexpected keyword argument 'tz'

@potiuk
Copy link
Member

potiuk commented Feb 28, 2022

PLease migrate to Airflow 2. Airflow 1.10 reached end-of-life and is not supported for over 8 months already.

@akanshajainnice
Copy link

ok. But is there a workaround for now or it takes timezone as UTC by default?

@potiuk
Copy link
Member

potiuk commented Feb 28, 2022

ok. But is there a workaround for now or it takes timezone as UTC by default?

No idea. This is standard pendulum library - check with version you have and read the docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:Scheduler including HA (high availability) scheduler kind:bug This is a clearly a bug
Projects
None yet
Development

No branches or pull requests

4 participants