Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core operator python: check virtualenv is importable by importing #43403

Closed
wants to merge 3 commits into from

Conversation

thisiswhereitype
Copy link

@thisiswhereitype thisiswhereitype commented Oct 26, 2024

This imports the module to check for ModuleNotFoundError. A soft import is possible but then will then import it later anyway and this will be most robust. This change is needed as a system virtualenv is not always available in a venv that airflow is not running in.

Closes #40420

This imports the module to check for `ModuleNotFoundError`. A soft import is possible but then we importlib anyway and this will be most robust.
@boring-cyborg boring-cyborg bot added the area:core-operators Operators, Sensors and hooks within Core Airflow label Oct 26, 2024
Copy link

boring-cyborg bot commented Oct 26, 2024

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contributors' Guide (https://github.com/apache/airflow/blob/main/contributing-docs/README.rst)
Here are some useful points:

  • Pay attention to the quality of your code (ruff, mypy and type annotations). Our pre-commits will help you with that.
  • In case of a new feature add useful documentation (in docstrings or in docs/ directory). Adding a new operator? Check this short guide Consider adding an example DAG that shows how users should use it.
  • Consider using Breeze environment for testing locally, it's a heavy docker but it ships with a working Airflow and a lot of integrations.
  • Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
  • Please follow ASF Code of Conduct for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
  • Be sure to read the Airflow Coding style.
  • Always keep your Pull Requests rebased, otherwise your build might fail due to changes not related to your commits.
    Apache Airflow is a community-driven project and together we are making it better 🚀.
    In case of doubts contact the developers at:
    Mailing List: dev@airflow.apache.org
    Slack: https://s.apache.org/airflow-slack

@potiuk
Copy link
Member

potiuk commented Oct 26, 2024

Actually - this is not correct/complete - is really a breaking change.

So far PythonVirtualenv operator used virtualenv package to create the venvs, not the built-in venv module. Also there is a dedicated [virtualenv] extra to install the package. The package is here: https://pypi.org/project/virtualenv/

Note that virtualenv is stil maintained till today, even if venv became available

This is mostly due to historical reasons, but also because virtualenv created venv behaves slightly differently than the built-in venv one (one of the differences for example is that it cannot be relocatable).

I do not think we are bound to any specifics of venv vs virtualenv - however when customers use PythonVirtualenvOperat (and virtualenv under the hood) they might rely on some internals and differences of virtualenv vs venv.

So while I sympathise with the change - since venv is now built-in in Python since 3.3 - I think a change to use venv instead of virtualenv needs to be discussed in devlist, ending up as a potentially breaking change in Airflow 3 and newsfragment.

Also this change is plainly wrong - creating virtualenv currently actually uses "virtualenv" command line not venv module - so if we accept it as it is, it will check for venv but actually will use for virtualenv.

Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requesting changes until the actual scope of the change and whether we want to change from virtualenv to venv (and it should be complete change if we do).

@thisiswhereitype
Copy link
Author

thisiswhereitype commented Oct 27, 2024

Thanks for taking a look, I, for some reason though venv and virtualenv were the same. As you say this would be breaking.

https://peps.python.org/pep-0405/ - is relevant for any future readers.

Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is better (and I do recognise a need of synchronizing the check with the venv creation command), but not necessary ideal.

The actual virtualenv creation happens by

cmd = [sys.executable, "-m", "virtualenv", tmp_dir]

While the check for importing virtualenv is good and "sys.executable" should generally provide good results, but I can imagine some cases where your PYTHONPATH is modified after running the interpreter where it can still provide different results.

Maybe a better check will be checking if that works:

python -m virtualenv --version

which means runing this:

cmd = [sys.executable, "-m", "virtualenv", "--help"]

WDYT?

Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approved accidentally

@potiuk
Copy link
Member

potiuk commented Oct 31, 2024

Switching to venv in #43568 - closing that one

@potiuk potiuk closed this Oct 31, 2024
@thisiswhereitype thisiswhereitype deleted the patch-1 branch November 2, 2024 12:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:core-operators Operators, Sensors and hooks within Core Airflow
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PythonVirtualOperator fails silently when virtualenv is not installed.
2 participants