Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor venv export #17098

Merged
merged 6 commits into from
Oct 5, 2022
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
164 changes: 83 additions & 81 deletions src/python/pants/backend/python/goals/export.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,9 +12,8 @@
from pants.backend.python.subsystems.setup import PythonSetup
from pants.backend.python.target_types import PythonResolveField
from pants.backend.python.util_rules.interpreter_constraints import InterpreterConstraints
from pants.backend.python.util_rules.pex import Pex, PexRequest
from pants.backend.python.util_rules.pex import Pex, PexProcess, PexRequest
from pants.backend.python.util_rules.pex_cli import PexPEX
from pants.backend.python.util_rules.pex_environment import PythonExecutable
from pants.backend.python.util_rules.pex_from_targets import RequirementsPexRequest
from pants.core.goals.export import (
ExportError,
Expand All @@ -28,8 +27,8 @@
from pants.engine.environment import EnvironmentName
from pants.engine.internals.native_engine import AddPrefix, Digest, MergeDigests
from pants.engine.internals.selectors import Get, MultiGet
from pants.engine.process import Process, ProcessResult
from pants.engine.rules import collect_rules, rule
from pants.engine.process import ProcessResult
from pants.engine.rules import collect_rules, rule, rule_helper
from pants.engine.target import Target
from pants.engine.unions import UnionMembership, UnionRule, union
from pants.util.docutil import bin_name
Expand Down Expand Up @@ -78,9 +77,69 @@ def debug_hint(self) -> str | None:
return self.resolve_name


@rule_helper
async def _do_export(
requirements_pex: Pex,
pex_pex: PexPEX,
dest: str,
resolve_name: str,
qualify_path_with_python_version: bool,
) -> ExportResult:
# Get the path to the interpreter, and the full python version (including patch #).
res = await Get(
ProcessResult,
PexProcess(
pex=requirements_pex,
description="Get interpreter path and version",
argv=[
"-c",
"import sys; print(sys.executable); print('.'.join(str(x) for x in sys.version_info[0:3]))",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seemed like a more robust way to get a kosher interpreter then the previous line interpreter = cast(PythonExecutable, requirements_pex.python) and its clarifying comment below.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not so sure about that. The internal-onlyness is a critical thing here that is just as obscure still.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The internal-onlyness was critical in the previous method of using requirements_pex.python. Why is it critical here? This must give us some compatible interpreter in any case, surely?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to point out you replaced a comment and re-use of an already computed and correct interpreter with another comment elsewhere about internal onlyness and a re-compute.

As far as criticality goes, it's critical always and until creating and storing a fully zipped PEX is fast and efficient. It never has been either. Zipping is always slow, and packed is needed for remote caching to work ~at all / not be storing both a wheel zip and a copy of that zipped up in 18 different non-packed PEX zips in LMDB.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, fair enough. In this case I interpreted "critical" to mean "for correctness" - i.e., requirements_pex.python is only guaranteed to be populated because of internal-onlyness. This code will blow up otherwise. Getting the path to an interpreter by running the pex instead, as this now does, sidesteps the need for this assumption and won't blow up if the Pex is not internal-only.

You're referring to internal-onlyness being critical for performance, which it definitely is.

But since performance is important, it may be better to blow up, as it indicates a code bug. I'll change this.

],
extra_env={"PEX_INTERPRETER": "1"},
),
)
interpreter_path, py_version = res.stdout.strip().decode().split("\n")

# NOTE: We add a unique prefix to the pex_pex path to avoid conflicts when multiple
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably materialize the Pex PEX once, to some location under .pants.d say, rather than once, in a temporary location, per export? But that can wait.

# venvs are concurrently exporting. Without this prefix all the invocations write
# the pex_pex to `python/virtualenvs/tools/pex`, and the `rm -f` of the pex_pex
# path in one export will delete the binary out from under the others.
pex_pex_dir = f".{resolve_name}.tmp"
pex_pex_digest = await Get(Digest, AddPrefix(pex_pex.digest, pex_pex_dir))
pex_pex_dest = os.path.join("{digest_root}", pex_pex_dir)

merged_digest = await Get(Digest, MergeDigests([pex_pex_digest, requirements_pex.digest]))

description = f"for {resolve_name} " if resolve_name else ""
return ExportResult(
f"virtualenv {description}(using Python {py_version})",
dest,
digest=merged_digest,
post_processing_cmds=[
PostProcessingCommand(
[
interpreter_path,
os.path.join(pex_pex_dest, pex_pex.exe),
os.path.join("{digest_root}", requirements_pex.name),
"venv",
"--pip",
"--collisions-ok",
"--remove=all",
f"{{digest_root}}/{py_version if qualify_path_with_python_version else ''}",
],
{"PEX_MODULE": "pex.tools"},
),
# Remove the PEX pex, to avoid confusion.
PostProcessingCommand(["rm", "-rf", pex_pex_dest]),
],
)


@rule
async def export_virtualenv(
request: _ExportVenvRequest, python_setup: PythonSetup, pex_pex: PexPEX
async def export_virtualenv_for_targets(
request: _ExportVenvRequest,
python_setup: PythonSetup,
pex_pex: PexPEX,
) -> ExportResult:
if request.resolve:
interpreter_constraints = InterpreterConstraints(
Expand All @@ -101,99 +160,42 @@ async def export_virtualenv(
),
)

# Note that an internal-only pex will always have the `python` field set.
# See the build_pex() rule in pex.py.
interpreter = cast(PythonExecutable, requirements_pex.python)

# Get the full python version (including patch #), so we can use it as the venv name.
res = await Get(
ProcessResult,
Process(
description="Get interpreter version",
argv=[
interpreter.path,
"-c",
"import sys; print('.'.join(str(x) for x in sys.version_info[0:3]))",
],
),
)
py_version = res.stdout.strip().decode()

dest = (
os.path.join("python", "virtualenvs", path_safe(request.resolve))
if request.resolve
else os.path.join("python", "virtualenv")
)

merged_digest = await Get(Digest, MergeDigests([pex_pex.digest, requirements_pex.digest]))
pex_pex_path = os.path.join("{digest_root}", pex_pex.exe)
maybe_resolve_str = f"for the resolve '{request.resolve}' " if request.resolve else ""
return ExportResult(
f"virtualenv {maybe_resolve_str}(using Python {py_version})",
export_result = await _do_export(
requirements_pex,
pex_pex,
dest,
digest=merged_digest,
post_processing_cmds=[
PostProcessingCommand(
[
interpreter.path,
pex_pex_path,
os.path.join("{digest_root}", requirements_pex.name),
"venv",
"--pip",
"--collisions-ok",
"--remove=all",
f"{{digest_root}}/{py_version}",
],
{"PEX_MODULE": "pex.tools"},
),
PostProcessingCommand(["rm", "-f", pex_pex_path]),
],
request.resolve or "",
qualify_path_with_python_version=True,
)
return export_result


@rule
async def export_tool(request: ExportPythonTool, pex_pex: PexPEX) -> ExportResult:
assert request.pex_request is not None

# TODO: Unify export_virtualenv() and export_tool(), since their implementations mostly overlap.
dest = os.path.join("python", "virtualenvs", "tools")
pex = await Get(Pex, PexRequest, request.pex_request)
if not request.pex_request.internal_only:
raise ExportError(f"The PexRequest for {request.resolve_name} must be internal_only.")

# Note that an internal-only pex will always have the `python` field set.
# See the build_pex() rule in pex.py.
interpreter = cast(PythonExecutable, pex.python)

# NOTE: We add a unique-per-tool prefix to the pex_pex path to avoid conflicts when
# multiple tools are concurrently exporting. Without this prefix all the `export_tool`
# invocations write the pex_pex to `python/virtualenvs/tools/pex`, and the `rm -f` of
# the pex_pex path in one export will delete the binary out from under the others.
pex_pex_dir = f".{request.resolve_name}.tmp"
pex_pex_dest = os.path.join("{digest_root}", pex_pex_dir)
pex_pex_digest = await Get(Digest, AddPrefix(pex_pex.digest, pex_pex_dir))

merged_digest = await Get(Digest, MergeDigests([pex_pex_digest, pex.digest]))
return ExportResult(
f"virtualenv for the tool '{request.resolve_name}'",
# TODO: It seems unnecessary to qualify with "tools", since the tool resolve names don't collide
# with user resolve names. We should get rid of this via a deprecation cycle.
dest = os.path.join("python", "virtualenvs", "tools", request.resolve_name)
pex = await Get(Pex, PexRequest, request.pex_request)
export_result = await _do_export(
pex,
pex_pex,
dest,
digest=merged_digest,
post_processing_cmds=[
PostProcessingCommand(
[
interpreter.path,
os.path.join(pex_pex_dest, pex_pex.exe),
os.path.join("{digest_root}", pex.name),
"venv",
"--collisions-ok",
"--remove=all",
f"{{digest_root}}/{request.resolve_name}",
],
{"PEX_MODULE": "pex.tools"},
),
PostProcessingCommand(["rm", "-rf", pex_pex_dest]),
],
request.resolve_name,
# TODO: It is pretty ad-hoc that we do add the interpreter version for resolves but not for tools.
# We should pick one and deprecate the other.
qualify_path_with_python_version=False,
)
return export_result


@rule
Expand Down
12 changes: 8 additions & 4 deletions src/python/pants/backend/python/goals/export_integration_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ class _ToolConfig:
version: str
experimental: bool = False
backend_prefix: str | None = "lint"
takes_ics: bool = True

@property
def package(self) -> str:
Expand All @@ -38,16 +39,16 @@ def package(self) -> str:
EXPORTED_TOOLS: List[_ToolConfig] = [
_ToolConfig(name="add-trailing-comma", version="2.2.3", experimental=True),
_ToolConfig(name="autoflake", version="1.3.1", experimental=True),
_ToolConfig(name="bandit", version="1.6.2"),
_ToolConfig(name="bandit", version="1.6.2", takes_ics=False),
_ToolConfig(name="black", version="22.3.0"),
_ToolConfig(name="docformatter", version="1.3.1"),
_ToolConfig(name="flake8", version="4.0.1"),
_ToolConfig(name="flake8", version="4.0.1", takes_ics=False),
_ToolConfig(name="isort", version="5.10.1"),
_ToolConfig(name="pylint", version="2.13.1"),
_ToolConfig(name="pylint", version="2.13.1", takes_ics=False),
_ToolConfig(name="pyupgrade", version="2.31.1", experimental=True),
_ToolConfig(name="yapf", version="0.32.0"),
_ToolConfig(name="mypy", version="0.940", backend_prefix="typecheck"),
_ToolConfig(name="pytest", version="7.1.0", backend_prefix=None),
_ToolConfig(name="pytest", version="7.1.0", backend_prefix=None, takes_ics=False),
]


Expand All @@ -70,6 +71,8 @@ def build_config(tmpdir: str) -> Mapping:
"version": f"{tool_config.name}=={tool_config.version}",
"lockfile": f"{tmpdir}/3rdparty/{tool_config.name}.lock",
}
if tool_config.takes_ics:
cfg[tool_config.name]["interpreter_constraints"] = (f"=={platform.python_version()}",)

if not tool_config.backend_prefix:
continue
Expand Down Expand Up @@ -110,6 +113,7 @@ def test_export() -> None:

# NOTE: Not every tool implements --version so this is the best we can do.
lib_dir = os.path.join(export_dir, "lib", f"python{py_minor_version}", "site-packages")

expected_tool_dir = os.path.join(
lib_dir, f"{tool_config.package}-{tool_config.version}.dist-info"
)
Expand Down
8 changes: 4 additions & 4 deletions src/python/pants/backend/python/goals/export_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -72,14 +72,14 @@ def run(enable_resolves: bool) -> ExportResults:
)
all_results = rule_runner.request(ExportResults, [ExportVenvsRequest(targets)])

for result in all_results:
for result, resolve in zip(all_results, ["a", "b"] if enable_resolves else [""]):
assert len(result.post_processing_cmds) == 2

ppc0 = result.post_processing_cmds[0]
assert ppc0.argv[1:] == (
# The first arg is the full path to the python interpreter, which we
# don't easily know here, so we ignore it in this comparison.
os.path.join("{digest_root}", ".", "pex"),
os.path.join("{digest_root}", f".{resolve}.tmp", ".", "pex"),
os.path.join("{digest_root}", "requirements.pex"),
"venv",
"--pip",
Expand All @@ -92,8 +92,8 @@ def run(enable_resolves: bool) -> ExportResults:
ppc1 = result.post_processing_cmds[1]
assert ppc1.argv == (
"rm",
"-f",
os.path.join("{digest_root}", ".", "pex"),
"-rf",
os.path.join("{digest_root}", f".{resolve}.tmp"),
)
assert ppc1.extra_env == FrozenDict()

Expand Down
7 changes: 1 addition & 6 deletions src/python/pants/core/goals/export.py
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ class ExportRequest:
@frozen_after_init
@dataclass(unsafe_hash=True)
class PostProcessingCommand:
"""A command to run as a local processe after an exported digest is materialized."""
"""A command to run as a local process after an exported digest is materialized."""

# Values in the argv tuple can contain the format specifier "{digest_root}", which will be
# substituted with the (absolute) path to the location under distdir in which the
Expand Down Expand Up @@ -71,11 +71,6 @@ class ExportResult:
# Materialize this digest.
digest: Digest
# Run these commands as local processes after the digest is materialized.
# Values in each args string tuple can contain the format specifier "{digest_root}", which
# will be substituted with the (absolute) path to the location under distdir in which the
# digest is materialized.
# Each command will be run with an environment consistent of just PATH, set to the Pants
# process's own PATH env var.
post_processing_cmds: tuple[PostProcessingCommand, ...]

def __init__(
Expand Down