Skip to content

Conversation

@jason810496
Copy link
Member

@jason810496 jason810496 commented Dec 25, 2025

related: https://lists.apache.org/thread/4mc41hws0655b2p4s88f1t83klppdmwq

Why

Before current refactor, no matter which airflow command we execute, cli_parser will import actual AuthManager and Executor we use just to call get_cli_commands for getting optional commands.

Which means no matter what CLI commands we run, even we run airflow --help, we will import heavy module like kubernetes, flask_appbuilder, etc (based on AIRFLOW__CORE__AUTH_MANAGER and AIRFLOW__CORE__EXECUTOR config). In the worse case ( FabAuthManager + CeleryKubernetesExecutor ), it will took approximately 5 seconds just to show airflow --help command based on the benchmark!

How

The refactor includes:

  1. Adding a cli section to provider metadata (provider.yaml / def get_provider_info) that points to get_cli_commands
  2. Moving get_cli_commands into a clean module that does not import any heavy dependencies
    It should only import from airflow.cli.cli_config
    It should rely on lazy_load_command

What

Introduce cli section in provider metadata

The main behavioral change is that, after this refactor, any installed provider that exposes CLI commands will have those commands available in the Airflow CLI, even if it is not configured as the active AuthManager or Executor.

Benchmark Result

Summary:

  • Overall average: 3.117s, down 0.931s from 4.048s (22.999% improvement).
  • Fastest time: 3.092s, down 0.474s from 3.566s (13.292% improvement).
  • Slowest time: 3.155s, down 1.851s from 5.006s (36.976% improvement).
Full Airflow CLI Latency Benchmark - After Refactor

Benchmark results for airflow --help command with different Auth Manager and Executor combinations.

Total combinations tested: 32

Results Table

Auth Manager Executor Avg Time (s) Min Time (s) Status
Default Default 3.133 3.072
Default LocalExecutor 3.112 3.075
Default SequentialExecutor 3.116 3.084
Default AwsEcsExecutor 3.151 3.141
Default CeleryExecutor 3.119 3.072
Default CeleryKubernetesExecutor 3.111 3.074
Default KubernetesExecutor 3.115 3.083
Default EdgeExecutor 3.096 3.077
AwsAuthManager Default 3.107 3.085
AwsAuthManager LocalExecutor 3.112 3.071
AwsAuthManager SequentialExecutor 3.135 3.123
AwsAuthManager AwsEcsExecutor 3.111 3.073
AwsAuthManager CeleryExecutor 3.104 3.082
AwsAuthManager CeleryKubernetesExecutor 3.099 3.076
AwsAuthManager KubernetesExecutor 3.124 3.105
AwsAuthManager EdgeExecutor 3.122 3.107
FabAuthManager Default 3.130 3.114
FabAuthManager LocalExecutor 3.115 3.076
FabAuthManager SequentialExecutor 3.102 3.082
FabAuthManager AwsEcsExecutor 3.108 3.082
FabAuthManager CeleryExecutor 3.110 3.073
FabAuthManager CeleryKubernetesExecutor 3.111 3.080
FabAuthManager KubernetesExecutor 3.101 3.071
FabAuthManager EdgeExecutor 3.128 3.086
KeycloakAuthManager Default 3.139 3.082
KeycloakAuthManager LocalExecutor 3.110 3.076
KeycloakAuthManager SequentialExecutor 3.092 3.073
KeycloakAuthManager AwsEcsExecutor 3.113 3.081
KeycloakAuthManager CeleryExecutor 3.112 3.079
KeycloakAuthManager CeleryKubernetesExecutor 3.120 3.106
KeycloakAuthManager KubernetesExecutor 3.121 3.074
KeycloakAuthManager EdgeExecutor 3.155 3.123

Summary Statistics

  • Successful combinations: 32/32
  • Overall average time: 3.117s
  • Fastest time: 3.092s
  • Slowest time: 3.155s

Note: Each combination was run 3 times and averaged.

Full Airflow CLI Latency Benchmark - Before Refactor

Benchmark results for airflow --help command with different Auth Manager and Executor combinations.
Total combinations tested: 32

Results Table

Auth Manager Executor Avg Time (s) Min Time (s) Status
Default Default 3.610 3.570
Default LocalExecutor 3.566 3.556
Default SequentialExecutor 3.617 3.561
Default AwsEcsExecutor 3.746 3.741
Default CeleryExecutor 3.578 3.567
Default CeleryKubernetesExecutor 4.761 4.748
Default KubernetesExecutor 4.715 4.687
Default EdgeExecutor 3.968 3.919
AwsAuthManager Default 3.760 3.727
AwsAuthManager LocalExecutor 3.721 3.718
AwsAuthManager SequentialExecutor 3.717 3.712
AwsAuthManager AwsEcsExecutor 3.762 3.739
AwsAuthManager CeleryExecutor 3.785 3.729
AwsAuthManager CeleryKubernetesExecutor 4.954 4.923
AwsAuthManager KubernetesExecutor 4.915 4.890
AwsAuthManager EdgeExecutor 4.067 4.042
FabAuthManager Default 3.813 3.783
FabAuthManager LocalExecutor 3.796 3.790
FabAuthManager SequentialExecutor 3.784 3.774
FabAuthManager AwsEcsExecutor 3.960 3.952
FabAuthManager CeleryExecutor 3.813 3.804
FabAuthManager CeleryKubernetesExecutor 5.006 4.982
FabAuthManager KubernetesExecutor 4.981 4.965
FabAuthManager EdgeExecutor 4.108 4.095
KeycloakAuthManager Default 3.646 3.626
KeycloakAuthManager LocalExecutor 3.654 3.625
KeycloakAuthManager SequentialExecutor 3.632 3.617
KeycloakAuthManager AwsEcsExecutor 3.802 3.799
KeycloakAuthManager CeleryExecutor 3.637 3.625
KeycloakAuthManager CeleryKubernetesExecutor 4.853 4.820
KeycloakAuthManager KubernetesExecutor 4.829 4.802
KeycloakAuthManager EdgeExecutor 3.994 3.962

Summary Statistics

  • Successful combinations: 32/32
  • Overall average time: 4.048s
  • Fastest time: 3.566s
  • Slowest time: 5.006s

Note: Each combination was run 3 times and averaged.

Output Difference

`airflow --help` output after refactor
Usage: airflow [-h] GROUP_OR_COMMAND ...

Positional Arguments:
  GROUP_OR_COMMAND

    Groups
      assets            Manage assets
      aws-auth-manager  Manage resources used by AWS auth manager
      backfill          Manage backfills
      celery            Celery components
      config            View configuration
      connections       Manage connections
      dags              Manage DAGs
      db                Database operations
      db-manager        Manage externally connected database managers
      edge              Edge Worker components
      fab-db            Manage FAB
      jobs              Manage jobs
      keycloak-auth-manager
                        Manage resources used by Keycloak auth manager
      kubernetes        Tools to help run the KubernetesExecutor
      pools             Manage pools
      providers         Display providers
      roles             Manage roles
      tasks             Manage tasks
      teams             Manage teams
      users             Manage users
      variables         Manage variables

    Commands:
      api-server        Start an Airflow API server instance
      cheat-sheet       Display cheat sheet
      dag-processor     Start a dag processor instance
      info              Show information about current Airflow and environment
      kerberos          Start a kerberos ticket renewer
      permissions-cleanup
                        Clean up DAG permissions in Flask-AppBuilder tables
      plugins           Dump information about loaded plugins
      rotate-fernet-key
                        Rotate encrypted connection credentials and variables
      scheduler         Start a scheduler instance
      standalone        Run an all-in-one copy of Airflow
      sync-perm         Update permissions for existing roles and optionally DAGs
      triggerer         Start a triggerer instance
      version           Show the version

Options:
  -h, --help            show this help message and exit
`airflow --help` output before refactor
Usage: airflow [-h] GROUP_OR_COMMAND ...

Positional Arguments:
  GROUP_OR_COMMAND

    Groups
      assets         Manage assets
      backfill       Manage backfills
      config         View configuration
      connections    Manage connections
      dags           Manage DAGs
      db             Database operations
      db-manager     Manage externally connected database managers
      jobs           Manage jobs
      pools          Manage pools
      providers      Display providers
      tasks          Manage tasks
      teams          Manage teams
      variables      Manage variables

    Commands:
      api-server     Start an Airflow API server instance
      cheat-sheet    Display cheat sheet
      dag-processor  Start a dag processor instance
      info           Show information about current Airflow and environment
      kerberos       Start a kerberos ticket renewer
      plugins        Dump information about loaded plugins
      rotate-fernet-key
                     Rotate encrypted connection credentials and variables
      scheduler      Start a scheduler instance
      standalone     Run an all-in-one copy of Airflow
      triggerer      Start a triggerer instance
      version        Show the version

Options:
  -h, --help         show this help message and exit

@boring-cyborg boring-cyborg bot added area:CLI area:providers provider:amazon AWS/Amazon - related issues provider:celery provider:cncf-kubernetes Kubernetes (k8s) provider related issues provider:edge Edge Executor / Worker (AIP-69) / edge3 provider:fab provider:keycloak labels Dec 25, 2025
@jason810496 jason810496 force-pushed the refactor/cli/add-cli-section-get-provider-info branch from 0f807ee to aabcf78 Compare December 28, 2025 08:44
@jason810496 jason810496 changed the title [WIP] Introduce "cli" section in provider metadata to eliminate executor/auth manager imports from cli_parser Introduce a "cli" section in provider metadata Dec 28, 2025
@jason810496 jason810496 force-pushed the refactor/cli/add-cli-section-get-provider-info branch 2 times, most recently from 1e179c4 to 7d5d2c6 Compare December 31, 2025 05:59
Copy link
Member Author

@jason810496 jason810496 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is the exact same idea for me. It could be done in ProviderManager, and I will add the same validation for AuthManager as well.

Just some updates about how to achieve compatible import while not impacting performance ( or achieve lazy loading in another word ).

Previous approach: 1e179c4

Maintaining executor_class_not_defined_cli_command_names set
while initialize_providers_executors. Only call ExecutorLoader.get_executor_names if len of executor_class_not_defined_cli_command_names set is not zero. However, I found that the _correctness_check in initialize_providers_executors will actually import those heavy module as _correctness_check is using import_string to check whether we are able to import those class.

Current approach: 7d5d2c6

Introducing ProviderManager.executor_without_check property to skip the _correctness_check, the property will return set of (executor_name, executor_provider_package_name). Only if there is any executor provider not defined the cli section, then we will call ExecutorLoader.get_executor_names to achieve the "real lazy loading".

@potiuk
Copy link
Member

potiuk commented Dec 31, 2025

Introducing ProviderManager.executor_without_check property to skip the _correctness_check, the property will return set of (executor_name, executor_provider_package_name). Only if there is any executor provider not defined the cli section, then we will call ExecutorLoader.get_executor_names to achieve the "real lazy loading".

Nice. needs a bit refactoring (extracting functions) - but this is cool

@potiuk potiuk linked an issue Jan 1, 2026 that may be closed by this pull request
2 tasks
@potiuk potiuk mentioned this pull request Jan 1, 2026
2 tasks
@jason810496 jason810496 force-pushed the refactor/cli/add-cli-section-get-provider-info branch 5 times, most recently from 176bf10 to 23d391b Compare January 5, 2026 08:35
@jason810496 jason810496 force-pushed the refactor/cli/add-cli-section-get-provider-info branch from c2f765e to d73d63b Compare January 5, 2026 15:08
@jason810496 jason810496 marked this pull request as ready for review January 5, 2026 15:09
Fix auth manager unused assign
…ary condition for Airflow 3.2+

Try fix provider cli test

Refactor CLI test skipping logic for Airflow version compatibility
Fix parser attribute not found

Fix executors test error

Fix exeuctors compat test

Fix compat test for k8s cli
@jason810496 jason810496 force-pushed the refactor/cli/add-cli-section-get-provider-info branch from d73d63b to 44b88ae Compare January 5, 2026 15:10
Copy link
Contributor

@vincbeck vincbeck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall

Copy link
Member Author

@jason810496 jason810496 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are some key point of current PR change:

  1. Unify module path of CLI definition, the convention will be airflow.providers.<provider_name>.cli.definition.
    • Only the module path is unified now
    • The function names are not unified ( get_<name>_cli for each provider currently, we can unify them if necessary )
  2. Add prek check to avoid importing heavy dependencies in community providers that support 'cli' section.
    • Only CLI related built-in module are allowed
  3. Introduce ProviderManager.executor_without_check and ProviderManager.auth_manager_without_check properties to avoid loading all the possible Executors and AuthManger by skipping _correctness_check
  4. Compatible loading for providers that support Executors or AuthManger but not support 'cli' section - in airflow-core/src/airflow/cli/cli_parser.py
    • Only load Executors (AuthManger) if necessary by comparing the set of providers that supports CLI and set of providers that support Executor (AuthManger)
    • detailed scenario mentioned in #59805 (comment)
  5. User-facing documentation for 'cli' section
  6. Benchmark script atscripts/in_container/benchmark_cli_latency.py
    • Overall average time: 3.558s
    • Fastest time: 3.329s
    • Slowest time: 4.148s

Thanks in advance for the review!

Copy link
Contributor

@jscheffl jscheffl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general looks good. I am a bit (negative) surprised that (thanks for adding the benchmark tool!) in breeze locally the time for airflow --help "only" drops from 4.0s to 2.7s on my machine (Python 3.12/Linux/x86). I would have expected more.
Assuming this is related to the time that plugin manager needs to initialize 97 providers when running in breeze? Or have you checked where the most time still is spend?

w/o needing to touch providers manager would it make sense "into the blue" depending on executor/auth manager specified loading the cli.definition module and if not there fall-back to the previous logic to speed things-up?

@potiuk
Copy link
Member

potiuk commented Jan 5, 2026

Lots and lots of imports:

airflow_help

@jscheffl
Copy link
Contributor

jscheffl commented Jan 5, 2026

Lots and lots of imports:

Okay, yeah, the import airflow problem :-(

@jscheffl
Copy link
Contributor

jscheffl commented Jan 5, 2026

Lots and lots of imports:

Okay, yeah, the import airflow problem :-(

Interesting to see that indirectly loading pendulum initialized time-machine which loads even pytest... which takes alone 8% of time... Crazy.

@potiuk
Copy link
Member

potiuk commented Jan 5, 2026

Yes This is what I am talking about when telling about "explicit initialization" of things that we need in the commands we need.

The biggest problem is that we absolutely do not control what is getting imported and touching anything there now introduces cyclic imports or god-knowsl-what. And we already do about a million of semi-random lazy initializations to make things faster (???)

That's explicit is better than implicit in it's purest form

@potiuk
Copy link
Member

potiuk commented Jan 5, 2026

BTW:

Screenshot 2026-01-06 at 00 17 04

@jason810496
Copy link
Member Author

Assuming this is related to the time that plugin manager needs to initialize 97 providers when running in breeze? Or have you checked where the most time still is spend?

This is the part where skipping _correctness_check comes to play. Before skipping the _correctness_check, we will import all the available executors for every CLI command called!
( I used py-spy to check the most time-wasting part as well. Before the refactor, importing those heavy modules take most of the time. After the refactor, yes, the import airflow take most of the time as Jarek described )

@potiuk
Copy link
Member

potiuk commented Jan 6, 2026

He he - initialzing all providers takes about the same time as import pendulum :D

@potiuk
Copy link
Member

potiuk commented Jan 6, 2026

Shall we merge ?

@jason810496
Copy link
Member Author

Shall we merge ?

I’m definitely good to go 😄. Not sure if anyone has any additional comments.

@potiuk potiuk merged commit c1ecd30 into apache:main Jan 6, 2026
240 of 241 checks passed
@potiuk
Copy link
Member

potiuk commented Jan 6, 2026

We can always make another PR ;)

chirodip98 pushed a commit to chirodip98/airflow-contrib that referenced this pull request Jan 9, 2026
* Initial plan

* Add CLI section to provider system - schema and core implementation

Co-authored-by: jason810496 <68415893+jason810496@users.noreply.github.com>

* Add CLI functions for Keycloak and Amazon auth managers

Co-authored-by: jason810496 <68415893+jason810496@users.noreply.github.com>

* Simplify CLI schema and move functions to definition.py

- Change CLI schema from object with 'function' property to simple string array
- Update ProvidersManager to parse CLI items as strings directly
- Move get_xxx_cli_commands functions from __init__.py to definition.py
- Update all provider.yaml files to use simplified format: `cli: [path.to.function]`

Co-authored-by: jason810496 <68415893+jason810496@users.noreply.github.com>

* Add CLI latency benchmark script

* Final refactor for ProviderManager

* Add CLI command definitions to provider info for Amazon, Fab, and Keycloak

* Refactor CLI commands for Celery executor and add provider info section

* Refactor Kubernetes CLI integration and add provider info section

* Add CLI commands for Edge Worker and integrate provider info retrieval

* Fix CLI command definitions for Celery and Kubernetes providers

* Fix static check

* Refactor CLI tests with ProviderManger instead of AuthManger +
ExecutorLoader

* Add unit tests for Celery, Kubernetes, and Edge CLI command definitions

* Fix k8s, edge compat test

Fix conf_vars import in keycloak

* Move cli_commands.definition to cli.definition for FAB

Fix FAB get_provider_info

* Add prek check to avoid import heavy module in cli.definition

* Remove auth_manager prefix for CLI definition for AWS, FAB, and Keycloak

* Doc: Add CLI commands directive and template for provider-level CLI commands

* Doc: Move generate doc get_parser to .cli.definition for each provider

- also unifiy the the provider CLI doc as cli-ref.rst

* Doc: mention provider-level CLI in airflow-core doc

* Doc: add CLI section to provider documentation and clarify CLI command usage

* Improve cli_parser speed by skipping _correctness_check for AuthManager
and Executor

Refactor ProvidersManager to consolidate executor and auth manager tracking without correctness checks

* Fix mypy error and import for test

* Enhance CLI warnings for missing 'cli' sections in provider info for executors and auth managers

* Fix tests

* Fix list has no add attribute error

Fix auth manager unused assign

* Refactor skip_cli_test function to simplify logic and remove unnecessary condition for Airflow 3.2+

Try fix provider cli test

Refactor CLI test skipping logic for Airflow version compatibility

* Finialize compatibility test

Fix parser attribute not found

Fix executors test error

Fix exeuctors compat test

Fix compat test for k8s cli

* Add test for ProviderManager change

* fixup! Remove unused __future__.annotations

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
stegololz pushed a commit to stegololz/airflow that referenced this pull request Jan 9, 2026
* Initial plan

* Add CLI section to provider system - schema and core implementation

Co-authored-by: jason810496 <68415893+jason810496@users.noreply.github.com>

* Add CLI functions for Keycloak and Amazon auth managers

Co-authored-by: jason810496 <68415893+jason810496@users.noreply.github.com>

* Simplify CLI schema and move functions to definition.py

- Change CLI schema from object with 'function' property to simple string array
- Update ProvidersManager to parse CLI items as strings directly
- Move get_xxx_cli_commands functions from __init__.py to definition.py
- Update all provider.yaml files to use simplified format: `cli: [path.to.function]`

Co-authored-by: jason810496 <68415893+jason810496@users.noreply.github.com>

* Add CLI latency benchmark script

* Final refactor for ProviderManager

* Add CLI command definitions to provider info for Amazon, Fab, and Keycloak

* Refactor CLI commands for Celery executor and add provider info section

* Refactor Kubernetes CLI integration and add provider info section

* Add CLI commands for Edge Worker and integrate provider info retrieval

* Fix CLI command definitions for Celery and Kubernetes providers

* Fix static check

* Refactor CLI tests with ProviderManger instead of AuthManger +
ExecutorLoader

* Add unit tests for Celery, Kubernetes, and Edge CLI command definitions

* Fix k8s, edge compat test

Fix conf_vars import in keycloak

* Move cli_commands.definition to cli.definition for FAB

Fix FAB get_provider_info

* Add prek check to avoid import heavy module in cli.definition

* Remove auth_manager prefix for CLI definition for AWS, FAB, and Keycloak

* Doc: Add CLI commands directive and template for provider-level CLI commands

* Doc: Move generate doc get_parser to .cli.definition for each provider

- also unifiy the the provider CLI doc as cli-ref.rst

* Doc: mention provider-level CLI in airflow-core doc

* Doc: add CLI section to provider documentation and clarify CLI command usage

* Improve cli_parser speed by skipping _correctness_check for AuthManager
and Executor

Refactor ProvidersManager to consolidate executor and auth manager tracking without correctness checks

* Fix mypy error and import for test

* Enhance CLI warnings for missing 'cli' sections in provider info for executors and auth managers

* Fix tests

* Fix list has no add attribute error

Fix auth manager unused assign

* Refactor skip_cli_test function to simplify logic and remove unnecessary condition for Airflow 3.2+

Try fix provider cli test

Refactor CLI test skipping logic for Airflow version compatibility

* Finialize compatibility test

Fix parser attribute not found

Fix executors test error

Fix exeuctors compat test

Fix compat test for k8s cli

* Add test for ProviderManager change

* fixup! Remove unused __future__.annotations

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:CLI area:providers provider:amazon AWS/Amazon - related issues provider:celery provider:cncf-kubernetes Kubernetes (k8s) provider related issues provider:edge Edge Executor / Worker (AIP-69) / edge3 provider:fab provider:keycloak

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Airflow CLI extensions via plugins

4 participants