-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
genrules with py_binary
in exec_tools
broken by TF python toolchain
#4862
Comments
py_binary
in exec_tools
are broken by TF python toolchainpy_binary
in exec_tools
broken by TF python toolchain
py_binary
in exec_tools
broken by TF python toolchainpy_binary
in exec_tools
broken by TF python toolchain
I've written up a repro script that consistently demonstrates the issue here: repro.sh.txt It creates 2 virtualenvs, builds a genrule once in the first venv, then switches venvs, builds the genrule again, and shows whether the genrule still uses the python binary from the first venv or not. Finally it switches the bazel output base (equivalent to Sample executions: # Run in an empty WORKSPACE - not affected
$ repro.sh
# Run in a WORKSPACE using `tf_workspace()` - reproduces the problem
$ repro.sh --tf-workspace
# Same as above but with patch to make Python toolchain reconfigure when `PATH` is changed - fixes the problem
$ repro.sh --tf-workspace-patched Full contents is included below as well. Full `repro.sh` script#!/bin/bash
# Repro script for TensorFlow Python toolchain resolution problems.
set -e
set -o pipefail
# Whether to use the TF workspace rules. If not, uses an empty WORKSPACE.
tf_workspace=0
while [ "$#" -gt 0 ]; do
case "$1" in
--tf-workspace)
tf_workspace=1
shift
;;
--tf-workspace-patched)
tf_workspace_patched=1
shift
;;
*)
printf >&2 'fatal: unknown argument "%s"' "$1"
exit 1
;;
esac
done
tmpdir="$(mktemp -d)"
workspace="${tmpdir}/src"
output_base_1="${tmpdir}/bazel_output_base_1"
output_base_2="${tmpdir}/bazel_output_base_2"
venv1="${tmpdir}/venv1"
venv2="${tmpdir}/venv2"
results="${tmpdir}/results"
# Helper to make a virtualenv and install numpy (needed for TF build).
function make-venv() {
name="$1"
(
virtualenv -q --python python3 "${name}"
source "${name}/bin/activate"
pip -q install numpy
)
}
# Helper to run bazel with a bunch of debugging flags.
function bazel-with-args() {
output_base="$1"
shift
command="$1"
shift
(
set -x
bazel --output_base="${output_base}" "${command}" \
--noshow_progress \
--toolchain_resolution_debug=true \
--experimental_repo_remote_exec \
"$@"
)
}
# Helper to run a single trial build.
function run_trial() {
trial="$1"
shift
output_base="$1"
shift
venv="$1"
shift
(
source "${venv}/bin/activate"
bazel-with-args "${output_base}" build //:foo.txt
bazel_bin="$(bazel-with-args "${output_base}" info bazel-bin)"
cat "${bazel_bin}"/foo.txt | tee "${results}/${trial}_foo.txt"
)
}
# Set up the example Bazel workspace.
mkdir "${workspace}"
cd "${workspace}"
touch WORKSPACE
touch tensorflow.patch
cat >optional_tensorflow.patch <<'EOF'
--- third_party/py/python_configure.bzl 1970-01-01 00:00:00.000000000 +0000
+++ third_party/py/python_configure.bzl 1970-01-01 00:00:00.000000000 +0000
@@ -268,6 +268,7 @@
_ENVIRONS = [
BAZEL_SH,
+ "PATH",
PYTHON_BIN_PATH,
PYTHON_LIB_PATH,
]
EOF
cat >TF_WORKSPACE <<'EOF'
load("@bazel_tools//tools/build_defs/repo:http.bzl", "http_archive")
http_archive(
name = "io_bazel_rules_closure",
sha256 = "6a900831c1eb8dbfc9d6879b5820fd614d4ea1db180eb5ff8aedcb75ee747c1f",
strip_prefix = "rules_closure-db4683a2a1836ac8e265804ca5fa31852395185b",
urls = [
"http://mirror.tensorflow.org/github.com/bazelbuild/rules_closure/archive/db4683a2a1836ac8e265804ca5fa31852395185b.tar.gz",
"https://github.com/bazelbuild/rules_closure/archive/db4683a2a1836ac8e265804ca5fa31852395185b.tar.gz", # 2020-01-15
],
)
load("@io_bazel_rules_closure//closure:repositories.bzl", "rules_closure_dependencies")
rules_closure_dependencies(
omit_bazel_skylib = True,
omit_com_google_protobuf = True,
omit_com_google_protobuf_js = True,
)
http_archive(
name = "org_tensorflow",
sha256 = "2595a5c401521f20a2734c4e5d54120996f8391f00bb62a57267d930bce95350",
strip_prefix = "tensorflow-2.3.0",
urls = [
"http://mirror.tensorflow.org/github.com/tensorflow/tensorflow/archive/v2.3.0.tar.gz", # 2020-07-23
"https://github.com/tensorflow/tensorflow/archive/v2.3.0.tar.gz",
],
patches = ["//:tensorflow.patch"],
)
load("@org_tensorflow//tensorflow:workspace.bzl", "tf_workspace")
tf_workspace()
EOF
cat >BUILD <<'EOF'
py_binary(
name = "foo",
srcs = ["foo.py"],
python_version = "PY3",
srcs_version = "PY3",
)
genrule(
name = "gen_foo",
outs = ["foo.txt"],
cmd = "$(execpath :foo) $(execpath :foo) >$@",
exec_tools = [":foo"],
)
EOF
cat >foo.py <<'EOF'
from __future__ import print_function
import sys
print(sys.version)
print(sys.executable)
print(sys.argv)
if len(sys.argv) >= 2:
path_to_exec_py_binary = sys.argv[1]
with open(path_to_exec_py_binary) as f:
for line in f:
if line.startswith("PYTHON_BINARY"):
print(line.rstrip())
EOF
if [ "${tf_workspace}" -eq 1 ]; then
# Run trials with TF workspace.
mv TF_WORKSPACE WORKSPACE
fi
if [ "${tf_workspace_patched}" -eq 1 ]; then
# Run trials with patched TF workspace.
mv TF_WORKSPACE WORKSPACE
mv optional_tensorflow.patch tensorflow.patch
fi
mkdir "${results}"
# Create the virtualenvs.
make-venv "${venv1}"
make-venv "${venv2}"
# Run trials. Switch output base without switching venv to simulate expunge;
# it's better than expunge since then we can still inspect sandbox contents.
run_trial "1-venv1" "${output_base_1}" "${venv1}"
run_trial "2-venv2" "${output_base_1}" "${venv2}"
run_trial "3-venv2-expunged" "${output_base_2}" "${venv2}"
# Print out results in consolidated format.
printf "\n\n\n"
printf "=================== RESULTS =====================\n"
for file in $(ls -1 "${results}"); do
printf "\n===== %s =====\n" "${file}"
cat "${results}/${file}"
done
|
The problem:
py_binary
targets listed inexec_tools
.//tensorboard/plugins/debugger_v2/tf_debugger_v2_plugin:generate_dtypes
exec_tools
thesepy_binary
targets get built in exec mode, using a special exec build configuration (rather than the host mode, which is what happens if you pass them intools
, and rather than the target mode, which is what happens if you just build thepy_binary
directly).python
binary on the build machine, and then "bakes" this into the Python toolchain configuration, which results in the exec-mode build of thepy_binary
rule's wrapper script hardcoding the path to whateverpython
was in use when it was first built.PATH
- but unlike other build artifacts that rebuild wheneverPATH
changes (Bazel build recompiles protoc every time PATH changes #3213), this build artifact doesn't get marked as stale ifPATH
changes (possibly because it's assumed to be an exec mode artifact and thus "independent" of the host machine, even though it's certainly not), and consequently seems to persist until you dobazel clean --expunge
or possibly some other scenarios (in practice it seemed to occasionally get cleared for no apparent reason at all).PATH
changes - for example, switching to a new virtualenv - everything else in our build will reflect the newpython
interpreter path (since it uses the Bazel auto-detecting python toolchain), except for thesepy_binary
targets, which continue to be hardcoded to point to the old path.numpy
gets uninstalled) you have a breakage that is not reliably reproducible and generally maddening to debug.Background on
exec_tools
: Bazel genrules have historically always run in host mode, and as such, thetools
attribute to genrules always builds its binaries in host mode. This was a problem for python 2 -> 3 since Bazel's host mode can only run either python 2 or python 3, but can't do both. So genrules had anexec_tools
attribute introduced in bazelbuild/bazel#6443 to allow tool deps for genrules to be build in exec mode and vary whether they used python 2 or python 3 on a per-target basis. In preparation for switching over the Bazel host mode default from 2 to 3, we were asked to migrate all our genrules fromtools
toexec_tools
, which we did in #4031. Now that they've cut over, nothing really prevents us from switchingexec_tools
back totools
to simplify our lives, except the note from Bazel team here.How do we actually inherit the TF toolchain?
tensorboard/WORKSPACE
Lines 83 to 97 in 1501f46
local_python_configure
is more-or-less the root of the issue; it's arepository_rule
that bakes many attributes of the local python environment into a synthetic repository using this BUILD file template, including thepy_toolchain
rule that's causing trouble for us.repository_rule
to pass different arguments that make it less of a footgun: https://docs.bazel.build/versions/master/skylark/lib/globals.html#repository_rulePATH
in itsenviron
argument, Bazel claims "If an environment variable in that list change, the repository will be refetched." And it does indeed usePATH
to resolve the python binary path it bakes into its python toolchain, as made clear by theget_python_bin()
helper error message.local = True
to be regenerated on every fetch, though this repo rule does enough setup that it's probably a non-trivial amount of time to rerun it on every single build.The text was updated successfully, but these errors were encountered: