Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mark certain types as "side-effecting" #8922

Merged
merged 8 commits into from
Jan 11, 2020

Conversation

gshuflin
Copy link
Contributor

@gshuflin gshuflin commented Jan 9, 2020

Problem

Pants occasionally needs to perform some side-effecting operation that do not work well with the normal caching semantics provided by @rules. For instance, writing build artifacts to disk, or running a binary locally and interactively.

Over the course of discussing how to go about implementing these kinds of side-effecting operations, we've created a pattern: create a new python type that implements one of these side-effecting operations, and request it as the input to a @goal_rule. Right now, @goal_rules are top-level rules associated with a pants goal that also have the property of being marked as non-cacheable, so it is okay for side-effecting behavior to happen in this subset of rules.

Solution

This commit formalizes the notion of an side-effecting rule input type by giving the category a name - "side-effecting" - and establishing a check at rule registration time that only non-cacheable @rules are trying to request these types. Right now, the only type of @rule that is non-cacheable is a rule declared as a @goal_rule, but this might change in the future (and would require a corresponding update to the logic of this check).

@cosmicexplorer
Copy link
Contributor

Great idea! In #8917 I want to actually use the @rules around the ScmWrapper in this way -- as in, I want invoking the Scm tool to be uncacheable, since it's not clear how immediately to use the v2 engine with hermetic git invocations. That's for target roots calculation, so not a @console_rule -- but perhaps we could consider a more general concept than a @console_rule?

The two things we most often want uncacheably in a @console_rule (off the top of my head) might be Console and Workspace. With both of those, we also probably want mutex access to the Console and Workspace instances. Could we consider generalizing this into:

  1. an OwnedParams class (or maybe an OwnedParamRule subclass) which formalizes "this is a RootRule object we want mutex access to"
  2. making @console_rule implicitly/explicitly declare that specific params such as Console and Workspace are "owned" (which here just means mutex)
    ?

@gshuflin
Copy link
Contributor Author

gshuflin commented Jan 9, 2020

Yeah this commit explicitly marks this side_effecting flag on Workspace, Console, and InteractiveRunner, and I expect to use it on the HTTP requesting type I'm working on right now, which is why I want to get this commit in first.

Part of the point of this work is to link (conceptually as well as in code) the notion of a side-effecting type to a @console_rule, since the fundamental distinction between a @console_rule and a @rule is that the latter is cached but the former is not. In other words, this change forces developers doing something like what you're doing with Scm to push out the side-effecting work to a @console_rule, precisely to avoid having to reason about how disk-writing side-effects interact with cacheable @rules

I don't think it's a problem to use "subclass of a class OwnedClass" as the marker for a side-effecting type, but I don't see what that gets us over just using a class variable like side_effecting.

I'm not sure if we need to implement mutex access for Console or Workspace - we would if we wanted to run multiple console_rules simultaneously, but that's basically not something we do right now. If this is functionality we wanted to implement, I think we should think about exactly how to do it at that time, rather than trying to adjust this commit for that use case.

Copy link
Contributor

@Eric-Arellano Eric-Arellano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know what impact this would have on the new V2 python_test_runner integration tests for testing debug_python_test? https://github.com/pantsbuild/pants/pull/8924/files#r364545904. Would it possible for those self.request_single_product()-style tests to run the InteractiveProcessRequest even though the tests are not console rules?

Regardless, I think this is a good change but, hopefully, it will still be possible to test that debug_python_test works properly.


result = runner.run_local_interactive_process(run_request)
return TestDebugResult(result.process_exit_code)
return TestDebugRequest(run_request)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good change.

return _make_rule(return_type, parameter_types, cacheable=cacheable, name=name)(func)


def validate_parameter_types(func_id: str, parameter_types: Tuple[Type, ...], cacheable: bool):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: return type is -> None.

@@ -95,6 +95,20 @@ def test_run_rule_console_rule_generator(self):
)
self.assertEquals(res, Example(0))

def test_side_effecting_inputs(self) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the type hint :)

@@ -212,6 +212,7 @@ class UrlToFetch:
@dataclass(frozen=True)
class Workspace:
"""Abstract handle for operations that touch the real local filesystem."""
side_effecting = True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious, did you consider inheritance?

I strongly prefer using a class field as you do here because multiple inheritance confuses me and inheritance opens up a can of worms of the superclass being able to add methods. But, I want to stress test this decision.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cf. #8922 (comment) , I don't think marking this property via subclassing gets us anything marking it with a class variable doesn't

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either an annotation (@sideeffecting) or a marker trait would be more self-documenting... someone seeing this property in isolation would not have anything to reference to figure out what it does, whereas with either of those alternatives they could go and inspect the docs on the type/annotation.

Another consideration would be: if we do decide to allow types to be uncacheable, would we want to have a merged annotation to capture the fact that "sideeffecting, uncacheable, neither" is closer to being an enum? Or could that be handled with inheritance of the marker trait? etc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to a class decorator. I agree with the benefits of the decorator being more formalized, such as being able to jump to the definition, whereas this field would require grepping for side_effecting to try to piece together what this does.

Ftr, the decorator could simply add the class property side_effecting = True. Your validation code would stay the same, only the call sites would be updated.

-1 on a trait mixin per my above comments on inheritance.

Copy link
Member

@stuhood stuhood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After a discussion with @benjyw yesterday about side-effecting process executions (which we can't currently prevent without banning network access), I'm left uncertain about whether annotations will be sufficient. Because without deep sandboxing of processes, there will always be ExecuteProcessRequests that are potentially sideeffecting, and marking them using this annotation probably isn't feasible. Additionally, the foreground flag (which I'd like to try and finish implementing this weekend) would benefit from the ability not to be cached.

I think that it would be good to think through how an annotation like this would interact with #6598, which I still think that we will need to do, for the above reasons. In particular: rather than a "side-effecting" flag, should this be "uncacheable"? And if it is uncacheable, is the restriction on location something that we can evolve into a usage of #6598?

@gshuflin
Copy link
Contributor Author

gshuflin commented Jan 9, 2020

After a discussion with @benjyw yesterday about side-effecting process executions (which we can't currently prevent without banning network access), I'm left uncertain about whether annotations will be sufficient. Because without deep sandboxing of processes, there will always be ExecuteProcessRequests that are potentially sideeffecting, and marking them using this annotation probably isn't feasible.

Right, I don't expect ExecuteProcessRequest to be a formally side-effecting type; we use it deep within @rules all over the place. A process in an EPR that makes a network request is not treated as side-effecting from pants' perspective, since its result gets cached just like the result of any other @rule or cacheable intrinsic. Nothing external to pants that depends on that subprocess making a successful request can depend on that process to actually run or that request to be made on every pants invocation. That is the sense in which this side_effecting flag is inappropriate for EPR.

Additionally, the foreground flag (which I'd like to try and finish implementing this weekend) would benefit from the ability not to be cached.

I think that it would be good to think through how an annotation like this would interact with #6598, which I still think that we will need to do, for the above reasons. In particular: rather than a "side-effecting" flag, should this be "uncacheable"? And if it is uncacheable, is the restriction on location something that we can evolve into a usage of #6598?

I don't think it's important whether we call this flag "uncacheable" or "side_effecting". The important thing is that we have a way to mark types that can only be requested by rules that permit side effects in a way that the engine actually checks. Right now, that's only @console_rules.

If we had #6598 implemented, I don't think we would need console rules at all, and that means that this commit that formalizes the notion of behavior restricted to @console_rules would of course be moot. In the case of the foreground flag you're talking about, that flag would allow a rule author to create a non-cacheable EPR in any @rule, forcing that rule to become non-cacheable no matter how deep in the rule it is or how many other cached rules depend on the output of that EPR. #6598 is an attempt to modify the entire graph model to make this case work, and be smart about re-running rules that depend on the outputs of a suddenly-non-cached rule. I'm not actually sure how the foreground flag would work without also implementing #6598 .

So in sum, I think this commit makes sense if we want to continue to have a notion of separate @rule and @console_rules; #6598 would allow us to get rid of that distinction, making this PR moot - but also #6598 is a big undertaking and there may be other reasons we want to permanently keep a @rule/@console_rule distinction. But either way, a flag like this that marks certain types as non-cacheable in a way the engine can verify is useful.

@gshuflin gshuflin force-pushed the console_rule_only branch 2 times, most recently from a07ba20 to 08ecb24 Compare January 9, 2020 20:57
@gshuflin gshuflin changed the title Mark certain types as "side-effecting" Mark certain types as "uncacheable" Jan 10, 2020
Copy link
Contributor

@Eric-Arellano Eric-Arellano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, although probably want to wait for Stu to re-review.

@gshuflin gshuflin changed the title Mark certain types as "uncacheable" Mark certain types as "side-effecting" Jan 11, 2020
Copy link
Member

@stuhood stuhood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

The semantics of this are well understood. It might be worth taking one last look at the API to use here (property/annotation/marker-trait), but the meaning is clear.

@@ -212,6 +212,7 @@ class UrlToFetch:
@dataclass(frozen=True)
class Workspace:
"""Abstract handle for operations that touch the real local filesystem."""
side_effecting = True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either an annotation (@sideeffecting) or a marker trait would be more self-documenting... someone seeing this property in isolation would not have anything to reference to figure out what it does, whereas with either of those alternatives they could go and inspect the docs on the type/annotation.

Another consideration would be: if we do decide to allow types to be uncacheable, would we want to have a merged annotation to capture the fact that "sideeffecting, uncacheable, neither" is closer to being an enum? Or could that be handled with inheritance of the marker trait? etc.

@gshuflin gshuflin merged commit 94e5634 into pantsbuild:master Jan 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants