Skip to content

Conversation

@yannlambret
Copy link
Contributor

When instantiating a PythonVirtualenvOperator, part of the context is propagated based on the requirements constructor parameter. The current logic implementation prevents 'pendulum' or 'apache-airflow' keys to be added to the Airflow context as soon as a version specifier is used.

For instance this requirement list will have the expected behavior:

        op = PythonVirtualenvOperator(
            task_id="task",
            python_callable=func,
            requirements=["pendulum"],
            system_site_packages=False,
        )

Whereas this one will cause the pendulum objects to be stripped out of the context:

        op = PythonVirtualenvOperator(
            task_id="task",
            python_callable=func,
            requirements=["pendulum<3.0"],
            system_site_packages=False,
        )

This PR aims to fix these limitations.

@boring-cyborg
Copy link

boring-cyborg bot commented May 10, 2025

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contributors' Guide (https://github.com/apache/airflow/blob/main/contributing-docs/README.rst)
Here are some useful points:

  • Pay attention to the quality of your code (ruff, mypy and type annotations). Our pre-commits will help you with that.
  • In case of a new feature add useful documentation (in docstrings or in docs/ directory). Adding a new operator? Check this short guide Consider adding an example DAG that shows how users should use it.
  • Consider using Breeze environment for testing locally, it's a heavy docker but it ships with a working Airflow and a lot of integrations.
  • Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
  • Please follow ASF Code of Conduct for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
  • Be sure to read the Airflow Coding style.
  • Always keep your Pull Requests rebased, otherwise your build might fail due to changes not related to your commits.
    Apache Airflow is a community-driven project and together we are making it better 🚀.
    In case of doubts contact the developers at:
    Mailing List: dev@airflow.apache.org
    Slack: https://s.apache.org/airflow-slack

Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice one - however, maybe instead of regexp you should parse the requirement with packaging requirements and take name of distribution from there: https://packaging.pypa.io/en/stable/requirements.html

This has additional advantage of actually parsing the requirements and will fail early in case any of the requirements are not following the specification.

@yannlambret yannlambret force-pushed the virtualenv-operator-serializable-context-keys branch from 931cc1b to d974c4b Compare May 11, 2025 13:09
@yannlambret
Copy link
Contributor Author

Hello @potiuk, thank you for the suggestion. I modified the code to use packaging.requirements. I did my best to implement a reasonably robust parsing logic, by taking care of a few edge cases (comments, etc.). Let me know what you think.

@yannlambret yannlambret force-pushed the virtualenv-operator-serializable-context-keys branch from d974c4b to 451914c Compare May 11, 2025 14:17
@yannlambret yannlambret changed the title Improve context propagation for custom operators Preserve all context keys during serialization May 11, 2025
@yannlambret yannlambret force-pushed the virtualenv-operator-serializable-context-keys branch from 451914c to 1794d68 Compare May 11, 2025 17:17
Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! One, very small nit. Good points with comment removals.

When instantiating a PythonVirtualenvOperator (or subclass), part of the
context is propagated based on the requirements constructor parameter.
The current implementation only adds the 'pendulum' and 'apache-airflow'
context keys for bare package names.

For instance:

    * requirements=["pendulum"] -> works

    * requirements=["pendulum<3.0"] -> does not work

This change uses 'packaging.requirements' to ensure all version
specifiers are handled correctly.
@yannlambret yannlambret force-pushed the virtualenv-operator-serializable-context-keys branch from 1794d68 to 857d2cb Compare May 12, 2025 06:35
@shahar1 shahar1 merged commit 56633e7 into apache:main May 12, 2025
2 checks passed
@boring-cyborg
Copy link

boring-cyborg bot commented May 12, 2025

Awesome work, congrats on your first merged pull request! You are invited to check our Issue Tracker for additional contributions.

@yannlambret yannlambret deleted the virtualenv-operator-serializable-context-keys branch May 12, 2025 20:43
Comment on lines +1519 to +1526
# invalid version format
"pendulum==3..0",
# invalid operator (=< instead of <=)
"apache-airflow=<2.0",
# same invalid operator on pendulum
"pendulum=<3.0",
# totally malformed
"invalid requirement",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not account for the case when more than one are passed.

yannlambret added a commit to yannlambret/airflow that referenced this pull request May 13, 2025
Follow-up to apache#50446 and apache#50521:

- Clean up requirements-flattening logic.
- Add tests for requirements passed as multi-line strings.
yannlambret added a commit to yannlambret/airflow that referenced this pull request May 14, 2025
Follow-up to apache#50446 and apache#50521:

- Clean up requirements-flattening logic.
- Add tests for requirements passed as multi-line strings.
yannlambret added a commit to yannlambret/airflow that referenced this pull request May 14, 2025
Follow-up to apache#50446 and apache#50521:

- Clean up requirements-flattening logic.
- Add tests for requirements passed as multi-line strings.
yannlambret added a commit to yannlambret/airflow that referenced this pull request May 16, 2025
Follow-up to apache#50446 and apache#50521:

- Clean up requirements-flattening logic.
- Add tests for requirements passed as multi-line strings.
shahar1 pushed a commit that referenced this pull request May 18, 2025
Follow-up to #50446 and #50521:

- Clean up requirements-flattening logic.
- Add tests for requirements passed as multi-line strings.
dadonnelly316 pushed a commit to dadonnelly316/airflow that referenced this pull request May 26, 2025
Follow-up to apache#50446 and apache#50521:

- Clean up requirements-flattening logic.
- Add tests for requirements passed as multi-line strings.
sanederchik pushed a commit to sanederchik/airflow that referenced this pull request Jun 7, 2025
When instantiating a PythonVirtualenvOperator (or subclass), part of the
context is propagated based on the requirements constructor parameter.
The current implementation only adds the 'pendulum' and 'apache-airflow'
context keys for bare package names.

For instance:

    * requirements=["pendulum"] -> works

    * requirements=["pendulum<3.0"] -> does not work

This change uses 'packaging.requirements' to ensure all version
specifiers are handled correctly.
sanederchik pushed a commit to sanederchik/airflow that referenced this pull request Jun 7, 2025
Follow-up to apache#50446 and apache#50521:

- Clean up requirements-flattening logic.
- Add tests for requirements passed as multi-line strings.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants