Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Respect precise file arguments with V2 ./pants test #9120

Merged
merged 5 commits into from
Feb 13, 2020

Conversation

Eric-Arellano
Copy link
Contributor

@Eric-Arellano Eric-Arellano commented Feb 12, 2020

Problem

When running ./pants test src/python/pants/rules/core/run_test.py, someone who has never used Pants before would expect Pants to only test run_test.py. However, Pants will end up testing all of rules/core:tests and users must specify ./pants test src/python/pants/rules/core/run_test.py --pytest-args='-k RunTest' to get what they want.

At the same time, we must still use the metadata from the owning target in order to prepare the test, such as resolving dependencies.

Solution

Preserve the OriginSpec and pass it down all the way to python_test_runner.py so that it may pass more precise arguments to Pytest. Previously, we always passed every single file belonging to the sources of the target. Now, when given a filesystem spec, we will pass the files explicitly specified (which are a subset of the target's sources).

In order to preserve the OriginSpec, we make several changes.

These changes are clunky due to the lack of multiple params in await Get (#7490). This PR helps to guide that design, though, while adding a useful feature needed today.

AddressesWithOrigins

The test @goal_rule now requests AddressesWithOrigins, rather than Addresses, so that we may preserve the original specs.

Here, we add a method to go from AddressesWithOrigins -> AddressWithOrigin.

HydratedTargetWithOrigin

This allows us to have a hydrated target without losing the OriginSpec information.

We need to be able to use HydratedTarget because this is how we access the underlying TargetAdaptor necessary to get the TestTarget union working. The original AddressWithOrigin from the @goal_rule isn't hydrated enough.

TargetAdaptorWithOrigin

Finally, we add TargetAdaptorWithOrigin, along with a type for each distinct TargetAdaptor, such as PythonTestsAdaptorWithOrigin.

We need so many new classes for the union mechanism to work properly. python_test_runner.py must have some way to signal that it only works with python_tests targets, which we previously did by registering UnionRule(TestTarget, PythonTestsAdaptor). But, we still need to preserve the OriginSpec, so we introduce PythonTestsAdaptorWithOrigin.

Rejected alternative: field on TargetAdaptor

One alternative design is to insert the OriginSpec proactively on the TargetAdaptor. This means that we would not need to introduce any TargetAdaptorWithOrigin types.

However, this approach suffers from not being typed. Given the type-driven API of rules, we want some structured way to specify that python_test_runner.py must have the OriginSpec preserved, rather than checking for some optional field defined on the TargetAdaptor.

Result

./pants test src/python/pants/rules/core/run_test.py only runs over run_test.py now.

In a followup, we will add precise file args to fmt2 and lint2.

This allows us to preserve the original spec and still be able to get the underlying TargetAdaptor, which we need for the sake of unions.
# belonging to this target - those must be filtered out.
precise_files_specified = set(test_files_snapshot.files).intersection(origin.resolved_files)
test_files_snapshot = await Get[Snapshot](
SnapshotSubset(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to @gshuflin for adding this intrinsic!

Comment on lines +137 to 140
v1_target = self.target(f"{self.source_root}:target")
adaptor = PythonTestsAdaptor(
address=v1_target.address.to_address(), sources=v1_target._sources_field.sources,
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, the engine would automatically populate the sources for the PythonTestsAdaptor. It doesn't do that anymore and I'm not sure why. But it does still auto-populate other fields like dependencies and compatibility?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HydratedTarget (used to?) depend on HydratedField instances: SourcesField and BundlesField: those would capture Snapshots for the sources.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it still does. But for some reason, when wrapping the PythonTestsAdaptor in PythonTestsAdaptorWithOrigin, we end up with this exception:

AttributeError: 'PythonTestsAdaptor' object has no attribute 'sources'

I'll add a TODO to figure out why this is happening. It's not worth spending too much time IMO because this is a single test and this works. I'm more curious than anything.

Comment on lines +17 to +26
def _assert_single_address(addresses: Sequence[Address]) -> None:
"""Assert that exactly one address must be contained in the collection."""
if len(addresses) == 0:
raise ResolveError("No targets were matched.")
if len(addresses) > 1:
output = '\n * '.join(address.spec for address in addresses)
raise ResolveError(
"Expected a single target, but was given multiple targets.\n\n"
f"Did you mean one of:\n * {output}"
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is simply generalized out from Addresses.expect_single() so that we may reuse it for AddressesWithOrigin.expect_single().

Comment on lines 314 to 316
# TODO(#4535): Find a less clunky solution for precise file args as part of the Target API. The
# trick here is that we must have some way to preserve the OriginSpec while also having a distinct
# type for each target so that unions work properly.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in the PR description, I do not like the changes to this file at all. It's clunky due to the lack of the target API. But it helps to guide the design for when Stu and I work on this in March!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the description (which is great: thanks!), it sounds like we are attempting to project origin information along with addresses in some cases. Rather than being related to #4535, I think that this might be more related to #7490?

What #7490 would allow for would be approximately:

.. = await Get[TestResult](Params(Address(..), Origin(..)))

...or doing the same thing here to project a HydratedTarget without adding a new HydratedTargetWithOrigin type.

Moreover, I think that #7490 is probably fairly straightforward, aside from deciding on the syntax... there should be relatively little FFI code, I think. I don't want to suggest blocking this on that... but... you might want to?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it sounds like we are attempting to project origin information along with addresses in some cases

Yes, this is correct.

Rather than being related to #4535, I think that this might be more related to #7490?

Interesting!! I like it! I'll update the TODOs to point to this.


@staticmethod
def is_testable(
adaptor_with_origin: TargetAdaptorWithOrigin, *, union_membership: UnionMembership,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We no longer need to use address_origin_map because the TargetAdaptorWithOrigin has all the information we need about the OriginSpec.

In a followup, we likely want to remove address_origin_map. Generally, the rule could instead request an AddressWithOrigin if they care about what the original spec was.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reasonable (with the caveat that we probably want to project an Address and an Origin as two separate params: see above.)

Comment on lines +187 to +189
adaptor_with_origin = TargetAdaptorWithOrigin.create(
adaptor=target.adaptor, origin=target_with_origin.origin
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is key to how the union works. adaptor_with_origin will have a specific type like PythonTestsAdaptorWithOrigin. That then gets passed to await Get[TestResult](TestTarget, adaptor_with_origin) and the union does its magic.

Copy link
Member

@stuhood stuhood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would strongly suggest taking a swing at #7490 (see below): if you have time to do it before landing this, I expect that it would significantly clean things up... if not, I would adjust all the TODOs to point there instead.

# NB: we ensure that `precise_files_specified` is a subset of the original target's `sources`.
# It's possible when given a glob filesystem spec that the spec will have resolved files not
# belonging to this target - those must be filtered out.
precise_files_specified = set(test_files_snapshot.files).intersection(origin.resolved_files)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that it is strictly necessary to do this intersection? The Target's snapshot can only contain files relevant to the target, so it should be possible to apply the globs captured on the CLI directly, which has the same effect as the intersection.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And in fact, I think that the same thing applies in general here? The interesection will only be non-empty if the Target owned any of the files on the CLI. And if it did, they should be included, regardless of how many other targets own them.

So... do you actually need to associate addresses with Origins at all for this usecase...?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Target's snapshot can only contain files relevant to the target,

Agreed.

so it should be possible to apply the globs captured on the CLI directly, which has the same effect as the intersection

Generally, yes, but the one foil is FilesystemIgnoreSpec. Those are not preserved in the TargetAdaptorWithOrigin, where Origin is only one single FilesystemResolvedSpec. Ignores apply to every single include glob, e.g. ./pants fmt2 '*.java' '*.py' '!test_*', so for us to have access to the FilesystemIgnoreGlobs passed, we would need to start storing that as a sequence in the FilesystemResolvedSpec.

Much simpler IMO is to preserve what literal files were resolved at the time that resolution happens:

# We preserve what literal files any globs resolved to. This allows downstream goals to be
# more precise in which files they operate on.
origin = (
spec
if isinstance(spec, FilesystemLiteralSpec) else
FilesystemResolvedGlobSpec(glob=spec.glob, _snapshot=snapshot)
)
result.extend(AddressWithOrigin(address=address, origin=origin) for address in owners.addresses)
return AddressesWithOrigins(result)

With this design, a FilesystemResolvedSpec has a field resolved_files as a simple tuple of file names:

class FilesystemResolvedSpec(FilesystemSpec, metaclass=ABCMeta):
@property
@abstractmethod
def resolved_files(self) -> Tuple[str, ...]:
"""The literal files this spec refers to after resolving all globs and excludes."""

--

Why can't we simply pass origin.resolved_files / why do we have to do this intersection? Globs, as explained in the comment. If the original spec is *.py, but the target only owns test_strutil.py, we need to ensure that we don't also try telling Pytest to run test_dirutil.py, test_osutil.py, etc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I understand this. It feels like if we're not preserving the information from the CLI in a way that allows us to have both the excludes and the includes available here to apply as PathGlobs, then we could change that, rather than relying on getting the resolved files. But I could be misunderstanding.

I understand the reason for the intersection, but with the complete set of globs from the CLI I'm pretty sure you can do the intersection directly between the globs and the target.

Comment on lines +137 to 140
v1_target = self.target(f"{self.source_root}:target")
adaptor = PythonTestsAdaptor(
address=v1_target.address.to_address(), sources=v1_target._sources_field.sources,
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HydratedTarget (used to?) depend on HydratedField instances: SourcesField and BundlesField: those would capture Snapshots for the sources.

Comment on lines 314 to 316
# TODO(#4535): Find a less clunky solution for precise file args as part of the Target API. The
# trick here is that we must have some way to preserve the OriginSpec while also having a distinct
# type for each target so that unions work properly.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the description (which is great: thanks!), it sounds like we are attempting to project origin information along with addresses in some cases. Rather than being related to #4535, I think that this might be more related to #7490?

What #7490 would allow for would be approximately:

.. = await Get[TestResult](Params(Address(..), Origin(..)))

...or doing the same thing here to project a HydratedTarget without adding a new HydratedTargetWithOrigin type.

Moreover, I think that #7490 is probably fairly straightforward, aside from deciding on the syntax... there should be relatively little FFI code, I think. I don't want to suggest blocking this on that... but... you might want to?


@staticmethod
def is_testable(
adaptor_with_origin: TargetAdaptorWithOrigin, *, union_membership: UnionMembership,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reasonable (with the caveat that we probably want to project an Address and an Origin as two separate params: see above.)

Copy link
Contributor Author

@Eric-Arellano Eric-Arellano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would strongly suggest taking a swing at #7490 (see below): if you have time to do it before landing this, I expect that it would significantly clean things up... if not, I would adjust all the TODOs to point there instead.

I really like the proposal. Thank you.

If it's okay, I'd like to punt on implementing #7490 for now. I absolutely agree it's the right way to go, but am hoping to land this now for a couple of reasons:

  1. Sounds like @cosmicexplorer is starting work on this, per Slack.
  2. The s/BuildFileAddresses/Addresses thread was entirely meant to be prework for this PR. Generally, I try to balance the amount of refactoring I do before landing new features and am afraid also doing allow specifying multiple Params to a Get() #7490 would be deviating too far from my original task.
  3. This is a feature that's important for Toolchain to land now.
  4. Landing this now ensures that the implementation for allow specifying multiple Params to a Get() #7490 (and for the target API) must work with this feature, i.e. this constrains those designs in a helpful way.

I'm very happy to help Danny with #7490, though! Danny, let me know how I can help, such as pairing on it or taking it over if other priorities come up.

Comment on lines +137 to 140
v1_target = self.target(f"{self.source_root}:target")
adaptor = PythonTestsAdaptor(
address=v1_target.address.to_address(), sources=v1_target._sources_field.sources,
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it still does. But for some reason, when wrapping the PythonTestsAdaptor in PythonTestsAdaptorWithOrigin, we end up with this exception:

AttributeError: 'PythonTestsAdaptor' object has no attribute 'sources'

I'll add a TODO to figure out why this is happening. It's not worth spending too much time IMO because this is a single test and this works. I'm more curious than anything.

# NB: we ensure that `precise_files_specified` is a subset of the original target's `sources`.
# It's possible when given a glob filesystem spec that the spec will have resolved files not
# belonging to this target - those must be filtered out.
precise_files_specified = set(test_files_snapshot.files).intersection(origin.resolved_files)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Target's snapshot can only contain files relevant to the target,

Agreed.

so it should be possible to apply the globs captured on the CLI directly, which has the same effect as the intersection

Generally, yes, but the one foil is FilesystemIgnoreSpec. Those are not preserved in the TargetAdaptorWithOrigin, where Origin is only one single FilesystemResolvedSpec. Ignores apply to every single include glob, e.g. ./pants fmt2 '*.java' '*.py' '!test_*', so for us to have access to the FilesystemIgnoreGlobs passed, we would need to start storing that as a sequence in the FilesystemResolvedSpec.

Much simpler IMO is to preserve what literal files were resolved at the time that resolution happens:

# We preserve what literal files any globs resolved to. This allows downstream goals to be
# more precise in which files they operate on.
origin = (
spec
if isinstance(spec, FilesystemLiteralSpec) else
FilesystemResolvedGlobSpec(glob=spec.glob, _snapshot=snapshot)
)
result.extend(AddressWithOrigin(address=address, origin=origin) for address in owners.addresses)
return AddressesWithOrigins(result)

With this design, a FilesystemResolvedSpec has a field resolved_files as a simple tuple of file names:

class FilesystemResolvedSpec(FilesystemSpec, metaclass=ABCMeta):
@property
@abstractmethod
def resolved_files(self) -> Tuple[str, ...]:
"""The literal files this spec refers to after resolving all globs and excludes."""

--

Why can't we simply pass origin.resolved_files / why do we have to do this intersection? Globs, as explained in the comment. If the original spec is *.py, but the target only owns test_strutil.py, we need to ensure that we don't also try telling Pytest to run test_dirutil.py, test_osutil.py, etc.

Comment on lines 314 to 316
# TODO(#4535): Find a less clunky solution for precise file args as part of the Target API. The
# trick here is that we must have some way to preserve the OriginSpec while also having a distinct
# type for each target so that unions work properly.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it sounds like we are attempting to project origin information along with addresses in some cases

Yes, this is correct.

Rather than being related to #4535, I think that this might be more related to #7490?

Interesting!! I like it! I'll update the TODOs to point to this.

Copy link
Contributor

@benjyw benjyw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely more fuel for a target API design... :)

But this feature is awesome, and worth the temporary clunkiness!

@Eric-Arellano Eric-Arellano merged commit 98bedac into pantsbuild:master Feb 13, 2020
@Eric-Arellano Eric-Arellano deleted the precise-test branch February 13, 2020 16:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants