Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The execution time of an operator can not be changed by simply changing the start_date #1232

Closed
WesleyBatista opened this issue Mar 28, 2016 · 0 comments

Comments

@WesleyBatista
Copy link
Contributor

Dear Airflow Maintainers,

Before I tell you about my issue, let me describe my environment:

Environment

  • Version of Airflow (e.g. a release version, running your own fork, running off master -- provide a git log snippet) : v1.5.1
  • Operating System: Linux airflow 3.16.0-4-amd64 Improving the search functionality in the graph view #1 SMP Debian 3.16.7-ckt11-1+deb8u5 (2015-10-09) x86_64 GNU/Linux
  • Python Version: Python 2.7.9

Now that you know a little about me, let me tell you about the issue I am having:

Description of Issue

The scheduler do not execute the task considering the start_date param on the operator level (in this case a PythonOperator).

  • What did you expect to happen?
    The task being scheduled on a different time.
  • What happened instead?
    on the task details I can see that the start_date changed on the PythonOperator, however the executions prints on the log the time set on default_args
  • Here is how you can reproduce this issue on your machine:
from airflow import DAG
from airflow.operators import PythonOperator
from datetime import datetime, timedelta

default_args = {
    'owner': 'airflow',
    'start_date': datetime(2016,03,21, 9, 0, 0),
    'depends_on_past': False,
    'wait_for_downstream': True,
    'sla': timedelta(minutes=30)
}
dag = DAG(
    'sample_dag',
    default_args=default_args,
    schedule_interval=timedelta(hours=24)
)

def python_callable(**context):
    return True

taskParameters = {
    "dag": dag,
    "task_id": "send_email",
    "python_callable": python_callable,
    "provide_context": True,
    # 'start_date': datetime(2016,03,22, 12,0,0),
}

task = PythonOperator(**taskParameters)

Reproduction Steps

  1. I have start_date set on default_args on the DAG definition.
  2. Leave running to get some successful tasks
  3. I set a different start_date to the PythonOperator.
  4. Clear some of the last successful executions to force the scheduler rerun the task
  5. On the task log I get the following: [2016-03-28 09:00:50,997] {models.py:974} INFO - Executing <Task(PythonOperator): send_email> on 2016-03-26 09:00:00

I guess this happens due to the last execution date + schedule interval rule.

I think that the common expectation here is: after the change on the start_date, the new schedules to be executed are considering the new date.
But I do not know too if it goes against what the project proposes.

What do you guys think about it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants