-
-
Notifications
You must be signed in to change notification settings - Fork 636
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Respect precise file arguments with V2 ./pants test
#9120
Conversation
This allows us to preserve the original spec and still be able to get the underlying TargetAdaptor, which we need for the sake of unions.
# belonging to this target - those must be filtered out. | ||
precise_files_specified = set(test_files_snapshot.files).intersection(origin.resolved_files) | ||
test_files_snapshot = await Get[Snapshot]( | ||
SnapshotSubset( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to @gshuflin for adding this intrinsic!
v1_target = self.target(f"{self.source_root}:target") | ||
adaptor = PythonTestsAdaptor( | ||
address=v1_target.address.to_address(), sources=v1_target._sources_field.sources, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously, the engine would automatically populate the sources
for the PythonTestsAdaptor
. It doesn't do that anymore and I'm not sure why. But it does still auto-populate other fields like dependencies
and compatibility
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HydratedTarget
(used to?) depend on HydratedField
instances: SourcesField
and BundlesField
: those would capture Snapshots
for the sources.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it still does. But for some reason, when wrapping the PythonTestsAdaptor
in PythonTestsAdaptorWithOrigin
, we end up with this exception:
AttributeError: 'PythonTestsAdaptor' object has no attribute 'sources'
I'll add a TODO to figure out why this is happening. It's not worth spending too much time IMO because this is a single test and this works. I'm more curious than anything.
def _assert_single_address(addresses: Sequence[Address]) -> None: | ||
"""Assert that exactly one address must be contained in the collection.""" | ||
if len(addresses) == 0: | ||
raise ResolveError("No targets were matched.") | ||
if len(addresses) > 1: | ||
output = '\n * '.join(address.spec for address in addresses) | ||
raise ResolveError( | ||
"Expected a single target, but was given multiple targets.\n\n" | ||
f"Did you mean one of:\n * {output}" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is simply generalized out from Addresses.expect_single()
so that we may reuse it for AddressesWithOrigin.expect_single()
.
# TODO(#4535): Find a less clunky solution for precise file args as part of the Target API. The | ||
# trick here is that we must have some way to preserve the OriginSpec while also having a distinct | ||
# type for each target so that unions work properly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned in the PR description, I do not like the changes to this file at all. It's clunky due to the lack of the target API. But it helps to guide the design for when Stu and I work on this in March!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the description (which is great: thanks!), it sounds like we are attempting to project origin information along with addresses in some cases. Rather than being related to #4535, I think that this might be more related to #7490?
What #7490 would allow for would be approximately:
.. = await Get[TestResult](Params(Address(..), Origin(..)))
...or doing the same thing here to project a HydratedTarget
without adding a new HydratedTargetWithOrigin
type.
Moreover, I think that #7490 is probably fairly straightforward, aside from deciding on the syntax... there should be relatively little FFI code, I think. I don't want to suggest blocking this on that... but... you might want to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
@staticmethod | ||
def is_testable( | ||
adaptor_with_origin: TargetAdaptorWithOrigin, *, union_membership: UnionMembership, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We no longer need to use address_origin_map
because the TargetAdaptorWithOrigin
has all the information we need about the OriginSpec
.
In a followup, we likely want to remove address_origin_map
. Generally, the rule could instead request an AddressWithOrigin
if they care about what the original spec was.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reasonable (with the caveat that we probably want to project an Address
and an Origin
as two separate params: see above.)
adaptor_with_origin = TargetAdaptorWithOrigin.create( | ||
adaptor=target.adaptor, origin=target_with_origin.origin | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is key to how the union works. adaptor_with_origin
will have a specific type like PythonTestsAdaptorWithOrigin
. That then gets passed to await Get[TestResult](TestTarget, adaptor_with_origin)
and the union does its magic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would strongly suggest taking a swing at #7490 (see below): if you have time to do it before landing this, I expect that it would significantly clean things up... if not, I would adjust all the TODOs to point there instead.
# NB: we ensure that `precise_files_specified` is a subset of the original target's `sources`. | ||
# It's possible when given a glob filesystem spec that the spec will have resolved files not | ||
# belonging to this target - those must be filtered out. | ||
precise_files_specified = set(test_files_snapshot.files).intersection(origin.resolved_files) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that it is strictly necessary to do this intersection? The Target's snapshot can only contain files relevant to the target, so it should be possible to apply the globs captured on the CLI directly, which has the same effect as the intersection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And in fact, I think that the same thing applies in general here? The interesection will only be non-empty if the Target owned any of the files on the CLI. And if it did, they should be included, regardless of how many other targets own them.
So... do you actually need to associate addresses with Origins at all for this usecase...?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Target's snapshot can only contain files relevant to the target,
Agreed.
so it should be possible to apply the globs captured on the CLI directly, which has the same effect as the intersection
Generally, yes, but the one foil is FilesystemIgnoreSpec
. Those are not preserved in the TargetAdaptorWithOrigin
, where Origin
is only one single FilesystemResolvedSpec
. Ignores apply to every single include glob, e.g. ./pants fmt2 '*.java' '*.py' '!test_*'
, so for us to have access to the FilesystemIgnoreGlob
s passed, we would need to start storing that as a sequence in the FilesystemResolvedSpec
.
Much simpler IMO is to preserve what literal files were resolved at the time that resolution happens:
pants/src/python/pants/engine/legacy/graph.py
Lines 781 to 789 in 93c3567
# We preserve what literal files any globs resolved to. This allows downstream goals to be | |
# more precise in which files they operate on. | |
origin = ( | |
spec | |
if isinstance(spec, FilesystemLiteralSpec) else | |
FilesystemResolvedGlobSpec(glob=spec.glob, _snapshot=snapshot) | |
) | |
result.extend(AddressWithOrigin(address=address, origin=origin) for address in owners.addresses) | |
return AddressesWithOrigins(result) |
With this design, a FilesystemResolvedSpec
has a field resolved_files
as a simple tuple of file names:
pants/src/python/pants/base/specs.py
Lines 326 to 331 in 93c3567
class FilesystemResolvedSpec(FilesystemSpec, metaclass=ABCMeta): | |
@property | |
@abstractmethod | |
def resolved_files(self) -> Tuple[str, ...]: | |
"""The literal files this spec refers to after resolving all globs and excludes.""" |
--
Why can't we simply pass origin.resolved_files
/ why do we have to do this intersection? Globs, as explained in the comment. If the original spec is *.py
, but the target only owns test_strutil.py
, we need to ensure that we don't also try telling Pytest to run test_dirutil.py
, test_osutil.py
, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I understand this. It feels like if we're not preserving the information from the CLI in a way that allows us to have both the excludes and the includes available here to apply as PathGlobs, then we could change that, rather than relying on getting the resolved files. But I could be misunderstanding.
I understand the reason for the intersection, but with the complete set of globs from the CLI I'm pretty sure you can do the intersection directly between the globs and the target.
v1_target = self.target(f"{self.source_root}:target") | ||
adaptor = PythonTestsAdaptor( | ||
address=v1_target.address.to_address(), sources=v1_target._sources_field.sources, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HydratedTarget
(used to?) depend on HydratedField
instances: SourcesField
and BundlesField
: those would capture Snapshots
for the sources.
# TODO(#4535): Find a less clunky solution for precise file args as part of the Target API. The | ||
# trick here is that we must have some way to preserve the OriginSpec while also having a distinct | ||
# type for each target so that unions work properly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the description (which is great: thanks!), it sounds like we are attempting to project origin information along with addresses in some cases. Rather than being related to #4535, I think that this might be more related to #7490?
What #7490 would allow for would be approximately:
.. = await Get[TestResult](Params(Address(..), Origin(..)))
...or doing the same thing here to project a HydratedTarget
without adding a new HydratedTargetWithOrigin
type.
Moreover, I think that #7490 is probably fairly straightforward, aside from deciding on the syntax... there should be relatively little FFI code, I think. I don't want to suggest blocking this on that... but... you might want to?
|
||
@staticmethod | ||
def is_testable( | ||
adaptor_with_origin: TargetAdaptorWithOrigin, *, union_membership: UnionMembership, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reasonable (with the caveat that we probably want to project an Address
and an Origin
as two separate params: see above.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would strongly suggest taking a swing at #7490 (see below): if you have time to do it before landing this, I expect that it would significantly clean things up... if not, I would adjust all the TODOs to point there instead.
I really like the proposal. Thank you.
If it's okay, I'd like to punt on implementing #7490 for now. I absolutely agree it's the right way to go, but am hoping to land this now for a couple of reasons:
- Sounds like @cosmicexplorer is starting work on this, per Slack.
- The
s/BuildFileAddresses/Addresses
thread was entirely meant to be prework for this PR. Generally, I try to balance the amount of refactoring I do before landing new features and am afraid also doing allow specifying multiple Params to a Get() #7490 would be deviating too far from my original task. - This is a feature that's important for Toolchain to land now.
- Landing this now ensures that the implementation for allow specifying multiple Params to a Get() #7490 (and for the target API) must work with this feature, i.e. this constrains those designs in a helpful way.
I'm very happy to help Danny with #7490, though! Danny, let me know how I can help, such as pairing on it or taking it over if other priorities come up.
v1_target = self.target(f"{self.source_root}:target") | ||
adaptor = PythonTestsAdaptor( | ||
address=v1_target.address.to_address(), sources=v1_target._sources_field.sources, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it still does. But for some reason, when wrapping the PythonTestsAdaptor
in PythonTestsAdaptorWithOrigin
, we end up with this exception:
AttributeError: 'PythonTestsAdaptor' object has no attribute 'sources'
I'll add a TODO to figure out why this is happening. It's not worth spending too much time IMO because this is a single test and this works. I'm more curious than anything.
# NB: we ensure that `precise_files_specified` is a subset of the original target's `sources`. | ||
# It's possible when given a glob filesystem spec that the spec will have resolved files not | ||
# belonging to this target - those must be filtered out. | ||
precise_files_specified = set(test_files_snapshot.files).intersection(origin.resolved_files) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Target's snapshot can only contain files relevant to the target,
Agreed.
so it should be possible to apply the globs captured on the CLI directly, which has the same effect as the intersection
Generally, yes, but the one foil is FilesystemIgnoreSpec
. Those are not preserved in the TargetAdaptorWithOrigin
, where Origin
is only one single FilesystemResolvedSpec
. Ignores apply to every single include glob, e.g. ./pants fmt2 '*.java' '*.py' '!test_*'
, so for us to have access to the FilesystemIgnoreGlob
s passed, we would need to start storing that as a sequence in the FilesystemResolvedSpec
.
Much simpler IMO is to preserve what literal files were resolved at the time that resolution happens:
pants/src/python/pants/engine/legacy/graph.py
Lines 781 to 789 in 93c3567
# We preserve what literal files any globs resolved to. This allows downstream goals to be | |
# more precise in which files they operate on. | |
origin = ( | |
spec | |
if isinstance(spec, FilesystemLiteralSpec) else | |
FilesystemResolvedGlobSpec(glob=spec.glob, _snapshot=snapshot) | |
) | |
result.extend(AddressWithOrigin(address=address, origin=origin) for address in owners.addresses) | |
return AddressesWithOrigins(result) |
With this design, a FilesystemResolvedSpec
has a field resolved_files
as a simple tuple of file names:
pants/src/python/pants/base/specs.py
Lines 326 to 331 in 93c3567
class FilesystemResolvedSpec(FilesystemSpec, metaclass=ABCMeta): | |
@property | |
@abstractmethod | |
def resolved_files(self) -> Tuple[str, ...]: | |
"""The literal files this spec refers to after resolving all globs and excludes.""" |
--
Why can't we simply pass origin.resolved_files
/ why do we have to do this intersection? Globs, as explained in the comment. If the original spec is *.py
, but the target only owns test_strutil.py
, we need to ensure that we don't also try telling Pytest to run test_dirutil.py
, test_osutil.py
, etc.
# TODO(#4535): Find a less clunky solution for precise file args as part of the Target API. The | ||
# trick here is that we must have some way to preserve the OriginSpec while also having a distinct | ||
# type for each target so that unions work properly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely more fuel for a target API design... :)
But this feature is awesome, and worth the temporary clunkiness!
Problem
When running
./pants test src/python/pants/rules/core/run_test.py
, someone who has never used Pants before would expect Pants to only testrun_test.py
. However, Pants will end up testing all ofrules/core:tests
and users must specify./pants test src/python/pants/rules/core/run_test.py --pytest-args='-k RunTest'
to get what they want.At the same time, we must still use the metadata from the owning target in order to prepare the test, such as resolving
dependencies
.Solution
Preserve the
OriginSpec
and pass it down all the way topython_test_runner.py
so that it may pass more precise arguments to Pytest. Previously, we always passed every single file belonging to thesources
of the target. Now, when given a filesystem spec, we will pass the files explicitly specified (which are a subset of the target'ssources
).In order to preserve the
OriginSpec
, we make several changes.These changes are clunky due to the lack of multiple params in
await Get
(#7490). This PR helps to guide that design, though, while adding a useful feature needed today.AddressesWithOrigins
The test
@goal_rule
now requestsAddressesWithOrigins
, rather thanAddresses
, so that we may preserve the original specs.Here, we add a method to go from
AddressesWithOrigins -> AddressWithOrigin
.HydratedTargetWithOrigin
This allows us to have a hydrated target without losing the
OriginSpec
information.We need to be able to use
HydratedTarget
because this is how we access the underlyingTargetAdaptor
necessary to get theTestTarget
union working. The originalAddressWithOrigin
from the@goal_rule
isn't hydrated enough.TargetAdaptorWithOrigin
Finally, we add
TargetAdaptorWithOrigin
, along with a type for each distinctTargetAdaptor
, such asPythonTestsAdaptorWithOrigin
.We need so many new classes for the union mechanism to work properly.
python_test_runner.py
must have some way to signal that it only works withpython_tests
targets, which we previously did by registeringUnionRule(TestTarget, PythonTestsAdaptor)
. But, we still need to preserve theOriginSpec
, so we introducePythonTestsAdaptorWithOrigin
.Rejected alternative: field on
TargetAdaptor
One alternative design is to insert the
OriginSpec
proactively on theTargetAdaptor
. This means that we would not need to introduce anyTargetAdaptorWithOrigin
types.However, this approach suffers from not being typed. Given the type-driven API of rules, we want some structured way to specify that
python_test_runner.py
must have theOriginSpec
preserved, rather than checking for some optional field defined on theTargetAdaptor
.Result
./pants test src/python/pants/rules/core/run_test.py
only runs overrun_test.py
now.In a followup, we will add precise file args to
fmt2
andlint2
.