-
Notifications
You must be signed in to change notification settings - Fork 16.3k
Introduce a "cli" section in provider metadata #59805
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce a "cli" section in provider metadata #59805
Conversation
0f807ee to
aabcf78
Compare
1e179c4 to
7d5d2c6
Compare
jason810496
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that is the exact same idea for me. It could be done in ProviderManager, and I will add the same validation for AuthManager as well.
Just some updates about how to achieve compatible import while not impacting performance ( or achieve lazy loading in another word ).
Previous approach: 1e179c4
Maintaining executor_class_not_defined_cli_command_names set
while initialize_providers_executors. Only call ExecutorLoader.get_executor_names if len of executor_class_not_defined_cli_command_names set is not zero. However, I found that the _correctness_check in initialize_providers_executors will actually import those heavy module as _correctness_check is using import_string to check whether we are able to import those class.
Current approach: 7d5d2c6
Introducing ProviderManager.executor_without_check property to skip the _correctness_check, the property will return set of (executor_name, executor_provider_package_name). Only if there is any executor provider not defined the cli section, then we will call ExecutorLoader.get_executor_names to achieve the "real lazy loading".
Nice. needs a bit refactoring (extracting functions) - but this is cool |
176bf10 to
23d391b
Compare
c2f765e to
d73d63b
Compare
…executors and auth managers
Fix auth manager unused assign
…ary condition for Airflow 3.2+ Try fix provider cli test Refactor CLI test skipping logic for Airflow version compatibility
Fix parser attribute not found Fix executors test error Fix exeuctors compat test Fix compat test for k8s cli
d73d63b to
44b88ae
Compare
vincbeck
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall
providers/amazon/src/airflow/providers/amazon/aws/auth_manager/cli/__init__.py
Outdated
Show resolved
Hide resolved
providers/fab/src/airflow/providers/fab/auth_manager/cli_commands/__init__.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here are some key point of current PR change:
- Unify module path of CLI definition, the convention will be
airflow.providers.<provider_name>.cli.definition.- Only the module path is unified now
- The function names are not unified (
get_<name>_clifor each provider currently, we can unify them if necessary )
- Add
prekcheck to avoid importing heavy dependencies in community providers that support 'cli' section.- Only CLI related built-in module are allowed
- Introduce
ProviderManager.executor_without_checkandProviderManager.auth_manager_without_checkproperties to avoid loading all the possible Executors and AuthManger by skipping_correctness_check- discussed trade-off of different approaches in #59805 (review)
- Compatible loading for providers that support Executors or AuthManger but not support 'cli' section - in
airflow-core/src/airflow/cli/cli_parser.py- Only load Executors (AuthManger) if necessary by comparing the set of providers that supports CLI and set of providers that support Executor (AuthManger)
- detailed scenario mentioned in #59805 (comment)
- User-facing documentation for 'cli' section
- Benchmark script at
scripts/in_container/benchmark_cli_latency.py- Overall average time: 3.558s
- Fastest time: 3.329s
- Slowest time: 4.148s
Thanks in advance for the review!
providers/keycloak/src/airflow/providers/keycloak/auth_manager/cli/__init__.py
Show resolved
Hide resolved
jscheffl
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general looks good. I am a bit (negative) surprised that (thanks for adding the benchmark tool!) in breeze locally the time for airflow --help "only" drops from 4.0s to 2.7s on my machine (Python 3.12/Linux/x86). I would have expected more.
Assuming this is related to the time that plugin manager needs to initialize 97 providers when running in breeze? Or have you checked where the most time still is spend?
w/o needing to touch providers manager would it make sense "into the blue" depending on executor/auth manager specified loading the cli.definition module and if not there fall-back to the previous logic to speed things-up?
Okay, yeah, the |
Interesting to see that indirectly loading pendulum initialized time-machine which loads even pytest... which takes alone 8% of time... Crazy. |
|
Yes This is what I am talking about when telling about "explicit initialization" of things that we need in the commands we need. The biggest problem is that we absolutely do not control what is getting imported and touching anything there now introduces cyclic imports or god-knowsl-what. And we already do about a million of semi-random lazy initializations to make things faster (???) That's |
This is the part where skipping |
|
He he - initialzing all providers takes about the same time as |
|
Shall we merge ? |
I’m definitely good to go 😄. Not sure if anyone has any additional comments. |
|
We can always make another PR ;) |
* Initial plan * Add CLI section to provider system - schema and core implementation Co-authored-by: jason810496 <68415893+jason810496@users.noreply.github.com> * Add CLI functions for Keycloak and Amazon auth managers Co-authored-by: jason810496 <68415893+jason810496@users.noreply.github.com> * Simplify CLI schema and move functions to definition.py - Change CLI schema from object with 'function' property to simple string array - Update ProvidersManager to parse CLI items as strings directly - Move get_xxx_cli_commands functions from __init__.py to definition.py - Update all provider.yaml files to use simplified format: `cli: [path.to.function]` Co-authored-by: jason810496 <68415893+jason810496@users.noreply.github.com> * Add CLI latency benchmark script * Final refactor for ProviderManager * Add CLI command definitions to provider info for Amazon, Fab, and Keycloak * Refactor CLI commands for Celery executor and add provider info section * Refactor Kubernetes CLI integration and add provider info section * Add CLI commands for Edge Worker and integrate provider info retrieval * Fix CLI command definitions for Celery and Kubernetes providers * Fix static check * Refactor CLI tests with ProviderManger instead of AuthManger + ExecutorLoader * Add unit tests for Celery, Kubernetes, and Edge CLI command definitions * Fix k8s, edge compat test Fix conf_vars import in keycloak * Move cli_commands.definition to cli.definition for FAB Fix FAB get_provider_info * Add prek check to avoid import heavy module in cli.definition * Remove auth_manager prefix for CLI definition for AWS, FAB, and Keycloak * Doc: Add CLI commands directive and template for provider-level CLI commands * Doc: Move generate doc get_parser to .cli.definition for each provider - also unifiy the the provider CLI doc as cli-ref.rst * Doc: mention provider-level CLI in airflow-core doc * Doc: add CLI section to provider documentation and clarify CLI command usage * Improve cli_parser speed by skipping _correctness_check for AuthManager and Executor Refactor ProvidersManager to consolidate executor and auth manager tracking without correctness checks * Fix mypy error and import for test * Enhance CLI warnings for missing 'cli' sections in provider info for executors and auth managers * Fix tests * Fix list has no add attribute error Fix auth manager unused assign * Refactor skip_cli_test function to simplify logic and remove unnecessary condition for Airflow 3.2+ Try fix provider cli test Refactor CLI test skipping logic for Airflow version compatibility * Finialize compatibility test Fix parser attribute not found Fix executors test error Fix exeuctors compat test Fix compat test for k8s cli * Add test for ProviderManager change * fixup! Remove unused __future__.annotations --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
* Initial plan * Add CLI section to provider system - schema and core implementation Co-authored-by: jason810496 <68415893+jason810496@users.noreply.github.com> * Add CLI functions for Keycloak and Amazon auth managers Co-authored-by: jason810496 <68415893+jason810496@users.noreply.github.com> * Simplify CLI schema and move functions to definition.py - Change CLI schema from object with 'function' property to simple string array - Update ProvidersManager to parse CLI items as strings directly - Move get_xxx_cli_commands functions from __init__.py to definition.py - Update all provider.yaml files to use simplified format: `cli: [path.to.function]` Co-authored-by: jason810496 <68415893+jason810496@users.noreply.github.com> * Add CLI latency benchmark script * Final refactor for ProviderManager * Add CLI command definitions to provider info for Amazon, Fab, and Keycloak * Refactor CLI commands for Celery executor and add provider info section * Refactor Kubernetes CLI integration and add provider info section * Add CLI commands for Edge Worker and integrate provider info retrieval * Fix CLI command definitions for Celery and Kubernetes providers * Fix static check * Refactor CLI tests with ProviderManger instead of AuthManger + ExecutorLoader * Add unit tests for Celery, Kubernetes, and Edge CLI command definitions * Fix k8s, edge compat test Fix conf_vars import in keycloak * Move cli_commands.definition to cli.definition for FAB Fix FAB get_provider_info * Add prek check to avoid import heavy module in cli.definition * Remove auth_manager prefix for CLI definition for AWS, FAB, and Keycloak * Doc: Add CLI commands directive and template for provider-level CLI commands * Doc: Move generate doc get_parser to .cli.definition for each provider - also unifiy the the provider CLI doc as cli-ref.rst * Doc: mention provider-level CLI in airflow-core doc * Doc: add CLI section to provider documentation and clarify CLI command usage * Improve cli_parser speed by skipping _correctness_check for AuthManager and Executor Refactor ProvidersManager to consolidate executor and auth manager tracking without correctness checks * Fix mypy error and import for test * Enhance CLI warnings for missing 'cli' sections in provider info for executors and auth managers * Fix tests * Fix list has no add attribute error Fix auth manager unused assign * Refactor skip_cli_test function to simplify logic and remove unnecessary condition for Airflow 3.2+ Try fix provider cli test Refactor CLI test skipping logic for Airflow version compatibility * Finialize compatibility test Fix parser attribute not found Fix executors test error Fix exeuctors compat test Fix compat test for k8s cli * Add test for ProviderManager change * fixup! Remove unused __future__.annotations --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>

related: https://lists.apache.org/thread/4mc41hws0655b2p4s88f1t83klppdmwq
Why
Before current refactor, no matter which
airflowcommand we execute,cli_parserwill import actual AuthManager and Executor we use just to callget_cli_commandsfor getting optional commands.Which means no matter what CLI commands we run, even we run
airflow --help, we will import heavy module like kubernetes, flask_appbuilder, etc (based onAIRFLOW__CORE__AUTH_MANAGERandAIRFLOW__CORE__EXECUTORconfig). In the worse case ( FabAuthManager + CeleryKubernetesExecutor ), it will took approximately 5 seconds just to showairflow --helpcommand based on the benchmark!How
The refactor includes:
clisection to provider metadata (provider.yaml/def get_provider_info) that points toget_cli_commandsget_cli_commandsinto a clean module that does not import any heavy dependenciesIt should only import from
airflow.cli.cli_configIt should rely on
lazy_load_commandWhat
Introduce
clisection in provider metadataThe main behavioral change is that, after this refactor, any installed provider that exposes CLI commands will have those commands available in the Airflow CLI, even if it is not configured as the active AuthManager or Executor.
Benchmark Result
Summary:
Full Airflow CLI Latency Benchmark - After Refactor
Benchmark results for
airflow --helpcommand with different Auth Manager and Executor combinations.Total combinations tested: 32
Results Table
Summary Statistics
Note: Each combination was run 3 times and averaged.
Full Airflow CLI Latency Benchmark - Before Refactor
Benchmark results for
airflow --helpcommand with different Auth Manager and Executor combinations.Total combinations tested: 32
Results Table
Summary Statistics
Note: Each combination was run 3 times and averaged.
Output Difference
`airflow --help` output after refactor
`airflow --help` output before refactor