Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

experimental_index_url fails to import TensorFlow package correctly in the build context #1996

Closed
sundaram-lnti opened this issue Jun 20, 2024 · 8 comments · Fixed by #2007
Closed
Assignees

Comments

@sundaram-lnti
Copy link

🐞 bug report

I have a Bazel Python project that depends on TensorFlow. I recently started using the newly introduced experimental_index_url to cache the TensorFlow dependency and hopefully save time during CI when building the package. However, I noticed that with the recently introduced versions of rules_python, this package does not seem to be imported or included correctly in the build contexts, resulting in ModuleNotFound errors. It would be great to identify this issue and get it fixed.

Affected Rule

py_binary or any rules_python rule, I guess.

Is this a regression?

Yes, I believe this is a regression. When the experimental_index_url is not set, the TensorFlow package appears to be imported correctly in the build context, and the target runs fine. I recall that an earlier version of rules_python with experimental_index_url did not have this issue. Therefore, it seems that one of the recent changes introduced is causing this problem.

Description

I am experiencing issues importing the tensorflow package on MacOSX when using the experimental_index_url feature.

🔬 Minimal Reproduction

  1. Clone https://github.com/sundaram-lnti/rules-py-example
git clone git@github.com:sundaram-lnti/rules-py-example.git
  1. Run the dataset target. You should see the target failing with ModuleNotFoundError: No module named 'tensorflow'.

    bazel run //:dataset
    
  2. Now, remove the experimental_index_url option in MODULE.bazel and run the above target again. It should not produce the error.

bazel run //:dataset

🔥 Exception or Error


INFO: Analyzed target //:dataset (34 packages loaded, 1614 targets configured).
INFO: Found 1 target...
Target //:dataset up-to-date:
  bazel-bin/dataset
INFO: Elapsed time: 0.372s, Critical Path: 0.01s
INFO: 1 process: 1 internal.
INFO: Build completed successfully, 1 total action
INFO: Running command line: bazel-bin/dataset
Traceback (most recent call last):
  File "/private/var/tmp/_bazel_sundaramananthanarayanan/083e5cf8bb86e91a4da0a228060acc1c/execroot/_main/bazel-out/darwin_arm64-fastbuild/bin/dataset.runfiles/_main/dataset.py", line 2, in 
    import tensorflow as tf
ModuleNotFoundError: No module named 'tensorflow'

🌍 Your Environment

Operating System:

  
Darwin / M2 / arm64
  

Output of bazel version:

  
Bazelisk version: development
Build label: 7.2.0
Build target: @@//src/main/java/com/google/devtools/build/lib/bazel:BazelServer
Build time: Mon Jun 10 13:04:55 2024 (1718024695)
Build timestamp: 1718024695
Build timestamp as int: 1718024695
  

Rules_python version:

  
   0.33.1
  

Anything else relevant?

@aignas
Copy link
Collaborator

aignas commented Jun 20, 2024

In order to use the feature you need to lock the requirements file using rules uv or rules python. The index url flag currently does not support requirements files without hashes.

@sundaram-lnti
Copy link
Author

@aignas Thank you for taking a moment to reply to the issue and contribute this feature to the project. Here's some additional context on the matter.

I am currently locking the requirements file, as shown here, which was generated using compile_pip_requirements provided by rules_python.

To clarify, the experimental index URL works with other packages such as numpy or pandas, etc., but it does not work with tensorflow. This is because tensorflow brings in dynamic packages depending on the operating system. For example, tensorflow brings in tensorflow-macos when running on darwin.

tensorboard==2.15.2 \
    --hash=sha256:a6f6443728064d962caea6d34653e220e34ef8df764cb06a8212c17e1a8f0622
    # via tensorflow-macos

Interestingly, or I guess expectedly, if you import tensorflow using import tensorflow_macos as tf, the "module not found" error disappears. However, this code is not practical since it would need to change based on the platform, which is not feasible.

@aignas
Copy link
Collaborator

aignas commented Jun 20, 2024

Ah yes, sorry for not noticing the file in the repo via github UI. Until I get time to look at this, you could use bazel cquery on the failing target to filter out py_library deps targets and add them here. What would also help to include in the example is the module.bazel.lock file.

The cquery command might look something like

bazel cquery "kind(py_library, deps(foo))"

Give or take a few quotes. This will at least tell you if the dependency is included in the sandbox.

@sundaram-lnti
Copy link
Author

Here's the output of bazel cquery. I have also pushed MODULE.bazel.lock to the repo as requested. You can find it here.

> ~/P/py on main  bazel cquery "kind(py_library, deps(//:dataset))"                                                                                                                         (base) 21:33:14
INFO: Options provided by the client:
  Inherited 'common' options: --isatty=1 --terminal_columns=204
INFO: Reading rc options for 'cquery' from /Users/sundaramananthanarayanan/Projects/py/.bazelrc:
  Inherited 'build' options: --announce_rc --color=yes
INFO: Analyzed target //:dataset (0 packages loaded, 0 targets configured).
INFO: Found 1 target...
@@rules_python~~pip~pypi_39_tensorflow_cp39_cp39_macosx_12_0_arm64_896bda03//:pkg (096dcc8)
@@rules_python~~pip~pypi_39_numpy_cp39_cp39_macosx_11_0_arm64_9667575f//:pkg (096dcc8)
INFO: Elapsed time: 0.269s, Critical Path: 0.00s
INFO: 0 processes.
INFO: Build completed successfully, 0 total actions

@sundaram-lnti
Copy link
Author

FWIW, when I execute the same command on an earlier version of rules_python commit, I am able to use tensorflow with the experimental_index_url and I notice different results. It seems to pull in more transitive dependencies in the old version. I'm not certain if that's a positive thing (as it could have just reverted back to pip in the earlier case) since I'm not very familiar with how wheels work (I come from a JVM background).

> ~/P/py on main ⨯ bazel cquery "kind(py_library, deps(//:dataset))"                                                                                                                        (base) 21:40:03
INFO: Options provided by the client:
  Inherited 'common' options: --isatty=1 --terminal_columns=204
INFO: Reading rc options for 'cquery' from /Users/sundaramananthanarayanan/Projects/py/.bazelrc:
  Inherited 'build' options: --announce_rc --color=yes
INFO: Analyzed target //:dataset (35 packages loaded, 332 targets configured).
INFO: Found 1 target...
@@rules_python~~pip~pypi_39_tensorflow//:pkg (096dcc8)
@@rules_python~~pip~pypi_39_numpy//:pkg (096dcc8)
@@rules_python~~pip~pypi_39_tensorflow_macos//:pkg (096dcc8)
@@rules_python~~pip~pypi_39_six//:pkg (096dcc8)
@@rules_python~~pip~pypi_39_tensorflow_io_gcs_filesystem//:pkg (096dcc8)
@@rules_python~~pip~pypi_39_wrapt//:pkg (096dcc8)
@@rules_python~~pip~pypi_39_gast//:pkg (096dcc8)
@@rules_python~~pip~pypi_39_keras//:pkg (096dcc8)
@@rules_python~~pip~pypi_39_absl_py//:pkg (096dcc8)
@@rules_python~~pip~pypi_39_tensorflow_estimator//:pkg (096dcc8)
@@rules_python~~pip~pypi_39_astunparse//:pkg (096dcc8)
@@rules_python~~pip~pypi_39_google_pasta//:pkg (096dcc8)
@@rules_python~~pip~pypi_39_libclang//:pkg (096dcc8)
@@rules_python~~pip~pypi_39_flatbuffers//:pkg (096dcc8)
@@rules_python~~pip~pypi_39_packaging//:pkg (096dcc8)
@@rules_python~~pip~pypi_39_tensorboard//:pkg (096dcc8)
@@rules_python~~pip~pypi_39_setuptools//:pkg (096dcc8)
@@rules_python~~pip~pypi_39_termcolor//:pkg (096dcc8)
@@rules_python~~pip~pypi_39_opt_einsum//:pkg (096dcc8)
@@rules_python~~pip~pypi_39_protobuf//:pkg (096dcc8)
@@rules_python~~pip~pypi_39_ml_dtypes//:pkg (096dcc8)
@@rules_python~~pip~pypi_39_typing_extensions//:pkg (096dcc8)
@@rules_python~~pip~pypi_39_grpcio//:pkg (096dcc8)
@@rules_python~~pip~pypi_39_h5py//:pkg (096dcc8)
@@rules_python~~pip~pypi_39_tensorboard_data_server//:pkg (096dcc8)
@@rules_python~~pip~pypi_39_requests//:pkg (096dcc8)
@@rules_python~~pip~pypi_39_google_auth//:pkg (096dcc8)
@@rules_python~~pip~pypi_39_werkzeug//:pkg (096dcc8)
@@rules_python~~pip~pypi_39_markdown//:pkg (096dcc8)
@@rules_python~~pip~pypi_39_wheel//:pkg (096dcc8)
@@rules_python~~pip~pypi_39_google_auth_oauthlib//:pkg (096dcc8)
@@rules_python~~pip~pypi_39_urllib3//:pkg (096dcc8)
@@rules_python~~pip~pypi_39_certifi//:pkg (096dcc8)
@@rules_python~~pip~pypi_39_charset_normalizer//:pkg (096dcc8)
@@rules_python~~pip~pypi_39_rsa//:pkg (096dcc8)
@@rules_python~~pip~pypi_39_idna//:pkg (096dcc8)
@@rules_python~~pip~pypi_39_requests_oauthlib//:pkg (096dcc8)
@@rules_python~~pip~pypi_39_cachetools//:pkg (096dcc8)
@@rules_python~~pip~pypi_39_pyasn1_modules//:pkg (096dcc8)
@@rules_python~~pip~pypi_39_importlib_metadata//:pkg (096dcc8)
@@rules_python~~pip~pypi_39_markupsafe//:pkg (096dcc8)
@@rules_python~~pip~pypi_39_oauthlib//:pkg (096dcc8)
@@rules_python~~pip~pypi_39_pyasn1//:pkg (096dcc8)
@@rules_python~~pip~pypi_39_zipp//:pkg (096dcc8)
INFO: Elapsed time: 1.844s, Critical Path: 0.00s
INFO: 0 processes.
INFO: Build completed successfully, 0 total actions

@aignas
Copy link
Collaborator

aignas commented Jun 21, 2024

Yup, I think you've found a bug. The issue is in how we generate the deps
cycles from METADATA. A quick workaround in your case would eb to add

# .bazelrc contents
build --@rules_python//python/config_settings:python_version=3.9

This would workaround it.

The basic flow is currently:

  1. The whl_library repo rule gets experimental_target_platforms = ["osx_aarch64", "osx_x86_64"]
  2. It notices that the wheel is targeting a specific platform and overrides the
    value of the experimental_target_platforms to ["cp39_osx_aarch64"].
  3. Then we add the correct dependencies, but not for when python_version
    config setting is not set as per .bazelrc example above or transitions
    API.

I'll rubber duck a little here.

The way we could fix it is as follows: move the overrides based on the whl
filename to the pip.bzl extension and do the correct setting there. This
allows us to pass ["osx_aarch64", "cp39_osx_aarch64"] to the
cp39_osx_aarch64 wheels when 3.9 is the default version. This makes
whl_library not as smart. I wanted to say that this could break the cases
when a whl is built on a host platform, but in reality, the code will just
work. I think we should clear out the experimental_target_platforms when we
build the wheel from sdist in the whl_library context if the wheel we get out
of it is not multiplatform. However, this may break some cases where people are
relying on this experimental API, however, since it is broken already and is
experimental, maybe we should do this change.

However, the drawback is that the osx_aarch64 to the whl_library means
"infer the minor python version from the host python" and this will only work
if the python interpreter we pass to the whl_library is of default version.
This will be fine most of the times but could break if for some reason this
assumption is broken. I think in this case it would be better to be explicit.

An alternative fix would have involved changing how we are parsing the METADATA
and pass the default_version to whl_library. This mimics how the
pip_repository in bzlmod is receiving the data and could be a nice way to
unify the attr values. Then the Python code would have to get an extra
argument that would be --default-python-version, so that it can interpret the
--default-python-version 3.9 --target-platform osx_aarch64 correctly in all
cases. If the default version changes, we will correctly rebuild the
BUILD.bazel files and everything should be good.

The alternative means that users who rely on the current behaviour will get the
current behaviour and we fix the design moving forward. I did not like the idea
of passing default_python_version to each instance of whl_library but that
might be a lesser evil as it makes the code with fewer assumptions.

End of rubber ducking.

I think my plan to fix this would be:

  • add default_python_version attr to common_attrs in
    //python/pip_install:pip_repository.bzl.
  • add default-python-version arg to
    //python/pip_install/tools/wheel_installer:arguments.py.
  • Pass default_python_version to whl_library when the version is default
    otherwise leave it unset.
  • (maybe optional) force the code to pass around cp39_osx_aarch64
    target_platform flags everywhere to reduce the chance of errors. Right now we
    are passing osx_aarch64 from the pip extension. If the version is not
    specified and the default_python_version is not specified, error out. This
    may require adjusting //python/private:parse_requirements.bzl where we
    right now remove the cp39 part from the platforms and we always return
    osx_aarch64 style values. This should start returning cp39_osx_aarch64
    values and the function should accept the python_version argument.

Thanks for reporting this bug!

@sundaram-lnti
Copy link
Author

I really appreciate your detailed response regarding the bug. Although I may not understand all the technical details, the plan sounds good to me. I am willing to assist with testing or anything else you might need from me.

Currently, I am no longer facing any blockers and I am looking forward to seeing significant improvements in build times with this fix.

Thanks once again for your prompt responses.

@aignas aignas self-assigned this Jun 22, 2024
aignas added a commit to aignas/rules_python that referenced this issue Jun 23, 2024
It seems that there was a single-line bug that did not allow us to
handle arch-specific deps and since the percentage of such wheels is
extremely low, it went under the radar for quite some time.

I am going to not implement any explicit passing of the default python
version to `whl_library` as it is a much bigger change. Just doing this
could be sufficient for the time being.

Fixes bazelbuild#1996
@aignas
Copy link
Collaborator

aignas commented Jun 25, 2024

I was wrong about one thing - that the change would have to be invasive. There was a one line issue that affected system_platform annotations from the METADATA but nothing else. @sundaram-lnti, could you please check if the attached PR fixes your issue?

github-merge-queue bot pushed a commit that referenced this issue Jun 25, 2024
It seems that there was a single-line bug that did not allow us to
handle arch-specific deps and since the percentage of such wheels is
extremely low, it went under the radar for quite some time.

I am going to not implement any explicit passing of the default python
version to `whl_library` as it is a much bigger change. Just doing this
could be sufficient for the time being.

Fixes #1996
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants