Respect precise file arguments with V2 `./pants test` #9120

Eric-Arellano · 2020-02-12T23:26:42Z

Problem

When running ./pants test src/python/pants/rules/core/run_test.py, someone who has never used Pants before would expect Pants to only test run_test.py. However, Pants will end up testing all of rules/core:tests and users must specify ./pants test src/python/pants/rules/core/run_test.py --pytest-args='-k RunTest' to get what they want.

At the same time, we must still use the metadata from the owning target in order to prepare the test, such as resolving dependencies.

Solution

Preserve the OriginSpec and pass it down all the way to python_test_runner.py so that it may pass more precise arguments to Pytest. Previously, we always passed every single file belonging to the sources of the target. Now, when given a filesystem spec, we will pass the files explicitly specified (which are a subset of the target's sources).

In order to preserve the OriginSpec, we make several changes.

These changes are clunky due to the lack of multiple params in await Get (#7490). This PR helps to guide that design, though, while adding a useful feature needed today.

`AddressesWithOrigins`

The test @goal_rule now requests AddressesWithOrigins, rather than Addresses, so that we may preserve the original specs.

Here, we add a method to go from AddressesWithOrigins -> AddressWithOrigin.

`HydratedTargetWithOrigin`

This allows us to have a hydrated target without losing the OriginSpec information.

We need to be able to use HydratedTarget because this is how we access the underlying TargetAdaptor necessary to get the TestTarget union working. The original AddressWithOrigin from the @goal_rule isn't hydrated enough.

`TargetAdaptorWithOrigin`

Finally, we add TargetAdaptorWithOrigin, along with a type for each distinct TargetAdaptor, such as PythonTestsAdaptorWithOrigin.

We need so many new classes for the union mechanism to work properly. python_test_runner.py must have some way to signal that it only works with python_tests targets, which we previously did by registering UnionRule(TestTarget, PythonTestsAdaptor). But, we still need to preserve the OriginSpec, so we introduce PythonTestsAdaptorWithOrigin.

Rejected alternative: field on `TargetAdaptor`

One alternative design is to insert the OriginSpec proactively on the TargetAdaptor. This means that we would not need to introduce any TargetAdaptorWithOrigin types.

However, this approach suffers from not being typed. Given the type-driven API of rules, we want some structured way to specify that python_test_runner.py must have the OriginSpec preserved, rather than checking for some optional field defined on the TargetAdaptor.

Result

./pants test src/python/pants/rules/core/run_test.py only runs over run_test.py now.

In a followup, we will add precise file args to fmt2 and lint2.

This allows us to preserve the original spec and still be able to get the underlying TargetAdaptor, which we need for the sake of unions.

Eric-Arellano · 2020-02-12T23:27:33Z

src/python/pants/backend/python/rules/python_test_runner.py

+    # belonging to this target - those must be filtered out.
+    precise_files_specified = set(test_files_snapshot.files).intersection(origin.resolved_files)
+    test_files_snapshot = await Get[Snapshot](
+      SnapshotSubset(


+1 to @gshuflin for adding this intrinsic!

Eric-Arellano · 2020-02-12T23:29:28Z

src/python/pants/backend/python/rules/python_test_runner_integration_test.py

+    v1_target = self.target(f"{self.source_root}:target")
+    adaptor = PythonTestsAdaptor(
+      address=v1_target.address.to_address(), sources=v1_target._sources_field.sources,
    )


Previously, the engine would automatically populate the sources for the PythonTestsAdaptor. It doesn't do that anymore and I'm not sure why. But it does still auto-populate other fields like dependencies and compatibility?

HydratedTarget (used to?) depend on HydratedField instances: SourcesField and BundlesField: those would capture Snapshots for the sources.

I think it still does. But for some reason, when wrapping the PythonTestsAdaptor in PythonTestsAdaptorWithOrigin, we end up with this exception:

AttributeError: 'PythonTestsAdaptor' object has no attribute 'sources'

I'll add a TODO to figure out why this is happening. It's not worth spending too much time IMO because this is a single test and this works. I'm more curious than anything.

Eric-Arellano · 2020-02-12T23:30:17Z

src/python/pants/engine/addressable.py

+def _assert_single_address(addresses: Sequence[Address]) -> None:
+  """Assert that exactly one address must be contained in the collection."""
+  if len(addresses) == 0:
+    raise ResolveError("No targets were matched.")
+  if len(addresses) > 1:
+    output = '\n  * '.join(address.spec for address in addresses)
+    raise ResolveError(
+      "Expected a single target, but was given multiple targets.\n\n"
+      f"Did you mean one of:\n  * {output}"
+    )


This is simply generalized out from Addresses.expect_single() so that we may reuse it for AddressesWithOrigin.expect_single().

Eric-Arellano · 2020-02-12T23:31:31Z

src/python/pants/engine/legacy/structs.py

+# TODO(#4535): Find a less clunky solution for precise file args as part of the Target API. The
+# trick here is that we must have some way to preserve the OriginSpec while also having a distinct
+# type for each target so that unions work properly.


As mentioned in the PR description, I do not like the changes to this file at all. It's clunky due to the lack of the target API. But it helps to guide the design for when Stu and I work on this in March!

Based on the description (which is great: thanks!), it sounds like we are attempting to project origin information along with addresses in some cases. Rather than being related to #4535, I think that this might be more related to #7490?

What #7490 would allow for would be approximately:

.. = await Get[TestResult](Params(Address(..), Origin(..)))

...or doing the same thing here to project a HydratedTarget without adding a new HydratedTargetWithOrigin type.

Moreover, I think that #7490 is probably fairly straightforward, aside from deciding on the syntax... there should be relatively little FFI code, I think. I don't want to suggest blocking this on that... but... you might want to?

it sounds like we are attempting to project origin information along with addresses in some cases

Yes, this is correct.

Rather than being related to #4535, I think that this might be more related to #7490?

Interesting!! I like it! I'll update the TODOs to point to this.

Eric-Arellano · 2020-02-12T23:32:55Z

src/python/pants/rules/core/test.py

+
+  @staticmethod
+  def is_testable(
+    adaptor_with_origin: TargetAdaptorWithOrigin, *, union_membership: UnionMembership,


We no longer need to use address_origin_map because the TargetAdaptorWithOrigin has all the information we need about the OriginSpec.

In a followup, we likely want to remove address_origin_map. Generally, the rule could instead request an AddressWithOrigin if they care about what the original spec was.

Reasonable (with the caveat that we probably want to project an Address and an Origin as two separate params: see above.)

Eric-Arellano · 2020-02-12T23:34:28Z

src/python/pants/rules/core/test.py

+  adaptor_with_origin = TargetAdaptorWithOrigin.create(
+    adaptor=target.adaptor, origin=target_with_origin.origin
+  )


This is key to how the union works. adaptor_with_origin will have a specific type like PythonTestsAdaptorWithOrigin. That then gets passed to await Get[TestResult](TestTarget, adaptor_with_origin) and the union does its magic.

stuhood

I would strongly suggest taking a swing at #7490 (see below): if you have time to do it before landing this, I expect that it would significantly clean things up... if not, I would adjust all the TODOs to point there instead.

stuhood · 2020-02-13T03:16:56Z

src/python/pants/backend/python/rules/python_test_runner.py

+    # NB: we ensure that `precise_files_specified` is a subset of the original target's `sources`.
+    # It's possible when given a glob filesystem spec that the spec will have resolved files not
+    # belonging to this target - those must be filtered out.
+    precise_files_specified = set(test_files_snapshot.files).intersection(origin.resolved_files)


I don't think that it is strictly necessary to do this intersection? The Target's snapshot can only contain files relevant to the target, so it should be possible to apply the globs captured on the CLI directly, which has the same effect as the intersection.

And in fact, I think that the same thing applies in general here? The interesection will only be non-empty if the Target owned any of the files on the CLI. And if it did, they should be included, regardless of how many other targets own them.

So... do you actually need to associate addresses with Origins at all for this usecase...?

The Target's snapshot can only contain files relevant to the target,

Agreed.

so it should be possible to apply the globs captured on the CLI directly, which has the same effect as the intersection

Generally, yes, but the one foil is FilesystemIgnoreSpec. Those are not preserved in the TargetAdaptorWithOrigin, where Origin is only one single FilesystemResolvedSpec. Ignores apply to every single include glob, e.g. ./pants fmt2 '*.java' '*.py' '!test_*', so for us to have access to the FilesystemIgnoreGlobs passed, we would need to start storing that as a sequence in the FilesystemResolvedSpec.

Much simpler IMO is to preserve what literal files were resolved at the time that resolution happens:

pants/src/python/pants/engine/legacy/graph.py

Lines 781 to 789 in 93c3567

# We preserve what literal files any globs resolved to. This allows downstream goals to be

# more precise in which files they operate on.

origin = (

spec

if isinstance(spec, FilesystemLiteralSpec) else

FilesystemResolvedGlobSpec(glob=spec.glob, _snapshot=snapshot)

)

result.extend(AddressWithOrigin(address=address, origin=origin) for address in owners.addresses)

return AddressesWithOrigins(result)

With this design, a FilesystemResolvedSpec has a field resolved_files as a simple tuple of file names:

pants/src/python/pants/base/specs.py

Lines 326 to 331 in 93c3567

class FilesystemResolvedSpec(FilesystemSpec, metaclass=ABCMeta):

@property

@abstractmethod

def resolved_files(self) -> Tuple[str, ...]:

"""The literal files this spec refers to after resolving all globs and excludes."""

--

Why can't we simply pass origin.resolved_files / why do we have to do this intersection? Globs, as explained in the comment. If the original spec is *.py, but the target only owns test_strutil.py, we need to ensure that we don't also try telling Pytest to run test_dirutil.py, test_osutil.py, etc.

I don't think I understand this. It feels like if we're not preserving the information from the CLI in a way that allows us to have both the excludes and the includes available here to apply as PathGlobs, then we could change that, rather than relying on getting the resolved files. But I could be misunderstanding.

I understand the reason for the intersection, but with the complete set of globs from the CLI I'm pretty sure you can do the intersection directly between the globs and the target.

stuhood · 2020-02-13T03:23:02Z

src/python/pants/backend/python/rules/python_test_runner_integration_test.py

+    v1_target = self.target(f"{self.source_root}:target")
+    adaptor = PythonTestsAdaptor(
+      address=v1_target.address.to_address(), sources=v1_target._sources_field.sources,
    )


HydratedTarget (used to?) depend on HydratedField instances: SourcesField and BundlesField: those would capture Snapshots for the sources.

stuhood · 2020-02-13T03:26:18Z

src/python/pants/engine/legacy/structs.py

+# TODO(#4535): Find a less clunky solution for precise file args as part of the Target API. The
+# trick here is that we must have some way to preserve the OriginSpec while also having a distinct
+# type for each target so that unions work properly.


Based on the description (which is great: thanks!), it sounds like we are attempting to project origin information along with addresses in some cases. Rather than being related to #4535, I think that this might be more related to #7490?

What #7490 would allow for would be approximately:

.. = await Get[TestResult](Params(Address(..), Origin(..)))

...or doing the same thing here to project a HydratedTarget without adding a new HydratedTargetWithOrigin type.

Moreover, I think that #7490 is probably fairly straightforward, aside from deciding on the syntax... there should be relatively little FFI code, I think. I don't want to suggest blocking this on that... but... you might want to?

stuhood · 2020-02-13T03:29:16Z

src/python/pants/rules/core/test.py

+
+  @staticmethod
+  def is_testable(
+    adaptor_with_origin: TargetAdaptorWithOrigin, *, union_membership: UnionMembership,


Reasonable (with the caveat that we probably want to project an Address and an Origin as two separate params: see above.)

Eric-Arellano

I would strongly suggest taking a swing at #7490 (see below): if you have time to do it before landing this, I expect that it would significantly clean things up... if not, I would adjust all the TODOs to point there instead.

I really like the proposal. Thank you.

If it's okay, I'd like to punt on implementing #7490 for now. I absolutely agree it's the right way to go, but am hoping to land this now for a couple of reasons:

Sounds like @cosmicexplorer is starting work on this, per Slack.
The s/BuildFileAddresses/Addresses thread was entirely meant to be prework for this PR. Generally, I try to balance the amount of refactoring I do before landing new features and am afraid also doing allow specifying multiple Params to a Get() #7490 would be deviating too far from my original task.
This is a feature that's important for Toolchain to land now.
Landing this now ensures that the implementation for allow specifying multiple Params to a Get() #7490 (and for the target API) must work with this feature, i.e. this constrains those designs in a helpful way.

I'm very happy to help Danny with #7490, though! Danny, let me know how I can help, such as pairing on it or taking it over if other priorities come up.

Eric-Arellano · 2020-02-13T15:10:01Z

src/python/pants/backend/python/rules/python_test_runner_integration_test.py

+    v1_target = self.target(f"{self.source_root}:target")
+    adaptor = PythonTestsAdaptor(
+      address=v1_target.address.to_address(), sources=v1_target._sources_field.sources,
    )


I think it still does. But for some reason, when wrapping the PythonTestsAdaptor in PythonTestsAdaptorWithOrigin, we end up with this exception:

AttributeError: 'PythonTestsAdaptor' object has no attribute 'sources'

I'll add a TODO to figure out why this is happening. It's not worth spending too much time IMO because this is a single test and this works. I'm more curious than anything.

Eric-Arellano · 2020-02-13T15:21:11Z

src/python/pants/backend/python/rules/python_test_runner.py

+    # NB: we ensure that `precise_files_specified` is a subset of the original target's `sources`.
+    # It's possible when given a glob filesystem spec that the spec will have resolved files not
+    # belonging to this target - those must be filtered out.
+    precise_files_specified = set(test_files_snapshot.files).intersection(origin.resolved_files)


The Target's snapshot can only contain files relevant to the target,

Agreed.

so it should be possible to apply the globs captured on the CLI directly, which has the same effect as the intersection

Generally, yes, but the one foil is FilesystemIgnoreSpec. Those are not preserved in the TargetAdaptorWithOrigin, where Origin is only one single FilesystemResolvedSpec. Ignores apply to every single include glob, e.g. ./pants fmt2 '*.java' '*.py' '!test_*', so for us to have access to the FilesystemIgnoreGlobs passed, we would need to start storing that as a sequence in the FilesystemResolvedSpec.

Much simpler IMO is to preserve what literal files were resolved at the time that resolution happens:

pants/src/python/pants/engine/legacy/graph.py

Lines 781 to 789 in 93c3567

# We preserve what literal files any globs resolved to. This allows downstream goals to be

# more precise in which files they operate on.

origin = (

spec

if isinstance(spec, FilesystemLiteralSpec) else

FilesystemResolvedGlobSpec(glob=spec.glob, _snapshot=snapshot)

)

result.extend(AddressWithOrigin(address=address, origin=origin) for address in owners.addresses)

return AddressesWithOrigins(result)

With this design, a FilesystemResolvedSpec has a field resolved_files as a simple tuple of file names:

pants/src/python/pants/base/specs.py

Lines 326 to 331 in 93c3567

class FilesystemResolvedSpec(FilesystemSpec, metaclass=ABCMeta):

@property

@abstractmethod

def resolved_files(self) -> Tuple[str, ...]:

"""The literal files this spec refers to after resolving all globs and excludes."""

--

Why can't we simply pass origin.resolved_files / why do we have to do this intersection? Globs, as explained in the comment. If the original spec is *.py, but the target only owns test_strutil.py, we need to ensure that we don't also try telling Pytest to run test_dirutil.py, test_osutil.py, etc.

Eric-Arellano · 2020-02-13T15:22:09Z

src/python/pants/engine/legacy/structs.py

+# TODO(#4535): Find a less clunky solution for precise file args as part of the Target API. The
+# trick here is that we must have some way to preserve the OriginSpec while also having a distinct
+# type for each target so that unions work properly.


it sounds like we are attempting to project origin information along with addresses in some cases

Yes, this is correct.

Rather than being related to #4535, I think that this might be more related to #7490?

Interesting!! I like it! I'll update the TODOs to point to this.

benjyw

Definitely more fuel for a target API design... :)

But this feature is awesome, and worth the temporary clunkiness!

Eric-Arellano added 4 commits February 12, 2020 11:27

Change test.py to take AddressesWithOrigins rather than Addresses

8882c3e

Add HydratedTargetWithOrigin

9f71ea5

This allows us to preserve the original spec and still be able to get the underlying TargetAdaptor, which we need for the sake of unions.

Add TargetAdaptorWithOrigin

ffbb700

Add TargetAdaptorWithOrigin

2e95e88

Eric-Arellano requested review from stuhood, benjyw, illicitonion, cosmicexplorer, codealchemy and gshuflin February 12, 2020 23:26

Eric-Arellano commented Feb 12, 2020

View reviewed changes

stuhood approved these changes Feb 13, 2020

View reviewed changes

Eric-Arellano commented Feb 13, 2020

View reviewed changes

Update TODOs

dfd2821

benjyw approved these changes Feb 13, 2020

View reviewed changes

Eric-Arellano merged commit 98bedac into pantsbuild:master Feb 13, 2020

Eric-Arellano deleted the precise-test branch February 13, 2020 16:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Respect precise file arguments with V2 `./pants test` #9120

Respect precise file arguments with V2 `./pants test` #9120

Eric-Arellano commented Feb 12, 2020 •

edited

Loading

Eric-Arellano Feb 12, 2020

Eric-Arellano Feb 12, 2020

stuhood Feb 13, 2020

Eric-Arellano Feb 13, 2020

Eric-Arellano Feb 12, 2020

Eric-Arellano Feb 12, 2020

stuhood Feb 13, 2020

Eric-Arellano Feb 13, 2020

Eric-Arellano Feb 12, 2020

stuhood Feb 13, 2020

Eric-Arellano Feb 12, 2020

stuhood left a comment •

edited

Loading

stuhood Feb 13, 2020

stuhood Feb 13, 2020

Eric-Arellano Feb 13, 2020

stuhood Feb 15, 2020

stuhood Feb 13, 2020

stuhood Feb 13, 2020

stuhood Feb 13, 2020

Eric-Arellano left a comment

Eric-Arellano Feb 13, 2020

Eric-Arellano Feb 13, 2020

Eric-Arellano Feb 13, 2020

benjyw left a comment

	# We preserve what literal files any globs resolved to. This allows downstream goals to be
	# more precise in which files they operate on.
	origin = (
	spec
	if isinstance(spec, FilesystemLiteralSpec) else
	FilesystemResolvedGlobSpec(glob=spec.glob, _snapshot=snapshot)
	)
	result.extend(AddressWithOrigin(address=address, origin=origin) for address in owners.addresses)
	return AddressesWithOrigins(result)

	class FilesystemResolvedSpec(FilesystemSpec, metaclass=ABCMeta):

	@property
	@abstractmethod
	def resolved_files(self) -> Tuple[str, ...]:
	"""The literal files this spec refers to after resolving all globs and excludes."""

Respect precise file arguments with V2 ./pants test #9120

Respect precise file arguments with V2 ./pants test #9120

Conversation

Eric-Arellano commented Feb 12, 2020 • edited Loading

Problem

Solution

AddressesWithOrigins

HydratedTargetWithOrigin

TargetAdaptorWithOrigin

Rejected alternative: field on TargetAdaptor

Result

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stuhood left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Eric-Arellano left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

benjyw left a comment

Choose a reason for hiding this comment

Respect precise file arguments with V2 `./pants test` #9120

Respect precise file arguments with V2 `./pants test` #9120

Eric-Arellano commented Feb 12, 2020 •

edited

Loading

`AddressesWithOrigins`

`HydratedTargetWithOrigin`

`TargetAdaptorWithOrigin`

Rejected alternative: field on `TargetAdaptor`

stuhood left a comment •

edited

Loading