Skip to content

Conversation

@ashb
Copy link
Member

@ashb ashb commented Oct 7, 2020

Spawning a whole new python process and then re-loading all of Airflow
is expensive. All though this time fades to insignificance for long
running tasks, this delay gives a "bad" experience for new users when
they are just trying out Airflow for the first time.

For the LocalExecutor this cuts the "queued time" down from 1.5s to 0.1s
on average.

Data on this for a simple 10-task sequential DAG:

SELECT execution_date,
    min(start_date - queued_dttm) AS min_quued_delay,
    max(start_date - queued_dttm) AS max_queued_delay
FROM task_instance
WHERE dag_id = 'scenario1_case2_03_1' GROUP BY execution_date;
execution_date min_quued_delay max_queued_delay with change?
2020-10-07 02:00:00+01 00:00:01.329747 00:00:01.475085 No
2020-10-07 01:00:00+01 00:00:00.083668 00:00:00.11244 Yes

This was discovered in my general benchmarking and profiling of the scheduler for AIP-15, but it's not tied to any of that work. There are more of these kind of improvements coming, each unrelated but all add up.


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.

@ashb ashb added area:Scheduler including HA (high availability) scheduler area:performance labels Oct 7, 2020
@ashb ashb requested review from kaxil, potiuk and turbaszek October 7, 2020 09:52
@ashb
Copy link
Member Author

ashb commented Oct 7, 2020

image

@ashb ashb requested a review from turbaszek October 7, 2020 13:23
@ashb
Copy link
Member Author

ashb commented Oct 7, 2020

I was thinking I'd follow up with a separate PR to add the same to CeleryExecutor.

@ashb
Copy link
Member Author

ashb commented Oct 7, 2020

I just noticed that we still need a fork in here - task_run in cli.commands does a logging.shutdown (and we need that as it does the remote upload) so I'll change this to fork first.

@ashb ashb marked this pull request as draft October 7, 2020 19:14
@ashb ashb force-pushed the speedup-local-executor branch from cddc189 to af20de4 Compare October 8, 2020 09:14
@ashb ashb marked this pull request as ready for review October 8, 2020 09:15
@ashb ashb requested review from kaxil and turbaszek October 8, 2020 09:15
@ashb
Copy link
Member Author

ashb commented Oct 8, 2020

Updated benchmarks -- the fork is basically free.

@ashb
Copy link
Member Author

ashb commented Oct 8, 2020

Hmmm, something broken in all the kube test unrelated to this.

./scripts/ci/kubernetes/ci_run_kubernetes_tests.sh: /home/runner/work/airflow/airflow/.build/.kubernetes_venv/bin/pip: /home/runner/work/airflow/airflow/.build/.kubernetes_venv/bin/python: bad interpreter: No such file or directory

Spawning a whole new python process and then re-loading all of Airflow
is expensive. All though this time fades to insignificance for long
running tasks, this delay gives a "bad" experience for new users when
they are just trying out Airflow for the first time.

For the LocalExecutor this cuts the "queued time" down from 1.5s to 0.1s
on average.
@ashb ashb force-pushed the speedup-local-executor branch from af20de4 to 3429c2d Compare October 8, 2020 13:43
@ashb
Copy link
Member Author

ashb commented Oct 8, 2020

Oh, those kube tests will fail because I haven't re-triggered the image build. If the rests of the tests pass I'll merge this.

@ashb ashb merged commit 4839a5b into apache:master Oct 8, 2020
@ashb ashb deleted the speedup-local-executor branch October 8, 2020 16:37
ashb added a commit to astronomer/airflow that referenced this pull request Oct 9, 2020
This is similar to apache#11327, but for Celery this time.

The impact is not quite as pronounced here (for simple dags at least)
but takes the average queued to start delay from 1.5s to 0.4s
ashb added a commit that referenced this pull request Oct 9, 2020
This is similar to #11327, but for Celery this time.

The impact is not quite as pronounced here (for simple dags at least)
but takes the average queued to start delay from 1.5s to 0.4s
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:performance area:Scheduler including HA (high availability) scheduler

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants