Mark certain types as "side-effecting" #8922

gshuflin · 2020-01-09T00:18:04Z

Problem

Pants occasionally needs to perform some side-effecting operation that do not work well with the normal caching semantics provided by @rules. For instance, writing build artifacts to disk, or running a binary locally and interactively.

Over the course of discussing how to go about implementing these kinds of side-effecting operations, we've created a pattern: create a new python type that implements one of these side-effecting operations, and request it as the input to a @goal_rule. Right now, @goal_rules are top-level rules associated with a pants goal that also have the property of being marked as non-cacheable, so it is okay for side-effecting behavior to happen in this subset of rules.

Solution

This commit formalizes the notion of an side-effecting rule input type by giving the category a name - "side-effecting" - and establishing a check at rule registration time that only non-cacheable @rules are trying to request these types. Right now, the only type of @rule that is non-cacheable is a rule declared as a @goal_rule, but this might change in the future (and would require a corresponding update to the logic of this check).

cosmicexplorer · 2020-01-09T00:33:19Z

Great idea! In #8917 I want to actually use the @rules around the ScmWrapper in this way -- as in, I want invoking the Scm tool to be uncacheable, since it's not clear how immediately to use the v2 engine with hermetic git invocations. That's for target roots calculation, so not a @console_rule -- but perhaps we could consider a more general concept than a @console_rule?

The two things we most often want uncacheably in a @console_rule (off the top of my head) might be Console and Workspace. With both of those, we also probably want mutex access to the Console and Workspace instances. Could we consider generalizing this into:

an OwnedParams class (or maybe an OwnedParamRule subclass) which formalizes "this is a RootRule object we want mutex access to"
making @console_rule implicitly/explicitly declare that specific params such as Console and Workspace are "owned" (which here just means mutex)
?

gshuflin · 2020-01-09T00:47:19Z

Yeah this commit explicitly marks this side_effecting flag on Workspace, Console, and InteractiveRunner, and I expect to use it on the HTTP requesting type I'm working on right now, which is why I want to get this commit in first.

Part of the point of this work is to link (conceptually as well as in code) the notion of a side-effecting type to a @console_rule, since the fundamental distinction between a @console_rule and a @rule is that the latter is cached but the former is not. In other words, this change forces developers doing something like what you're doing with Scm to push out the side-effecting work to a @console_rule, precisely to avoid having to reason about how disk-writing side-effects interact with cacheable @rules

I don't think it's a problem to use "subclass of a class OwnedClass" as the marker for a side-effecting type, but I don't see what that gets us over just using a class variable like side_effecting.

I'm not sure if we need to implement mutex access for Console or Workspace - we would if we wanted to run multiple console_rules simultaneously, but that's basically not something we do right now. If this is functionality we wanted to implement, I think we should think about exactly how to do it at that time, rather than trying to adjust this commit for that use case.

Eric-Arellano

Do you know what impact this would have on the new V2 python_test_runner integration tests for testing debug_python_test? https://github.com/pantsbuild/pants/pull/8924/files#r364545904. Would it possible for those self.request_single_product()-style tests to run the InteractiveProcessRequest even though the tests are not console rules?

Regardless, I think this is a good change but, hopefully, it will still be possible to test that debug_python_test works properly.

Eric-Arellano · 2020-01-09T04:40:52Z

src/python/pants/backend/python/rules/python_test_runner.py

-
-  result = runner.run_local_interactive_process(run_request)
-  return TestDebugResult(result.process_exit_code)
+  return TestDebugRequest(run_request)


Good change.

Eric-Arellano · 2020-01-09T04:41:38Z

src/python/pants/engine/rules.py

  return _make_rule(return_type, parameter_types, cacheable=cacheable, name=name)(func)


+def validate_parameter_types(func_id: str, parameter_types: Tuple[Type, ...], cacheable: bool):


Nit: return type is -> None.

Eric-Arellano · 2020-01-09T04:43:18Z

tests/python/pants_test/engine/test_rules.py

@@ -95,6 +95,20 @@ def test_run_rule_console_rule_generator(self):
    )
    self.assertEquals(res, Example(0))

+  def test_side_effecting_inputs(self) -> None:


Thanks for the type hint :)

Eric-Arellano · 2020-01-09T04:45:05Z

src/python/pants/engine/fs.py

@@ -212,6 +212,7 @@ class UrlToFetch:
 @dataclass(frozen=True)
 class Workspace:
  """Abstract handle for operations that touch the real local filesystem."""
+  side_effecting = True


I'm curious, did you consider inheritance?

I strongly prefer using a class field as you do here because multiple inheritance confuses me and inheritance opens up a can of worms of the superclass being able to add methods. But, I want to stress test this decision.

cf. #8922 (comment) , I don't think marking this property via subclassing gets us anything marking it with a class variable doesn't

Either an annotation (@sideeffecting) or a marker trait would be more self-documenting... someone seeing this property in isolation would not have anything to reference to figure out what it does, whereas with either of those alternatives they could go and inspect the docs on the type/annotation.

Another consideration would be: if we do decide to allow types to be uncacheable, would we want to have a merged annotation to capture the fact that "sideeffecting, uncacheable, neither" is closer to being an enum? Or could that be handled with inheritance of the marker trait? etc.

+1 to a class decorator. I agree with the benefits of the decorator being more formalized, such as being able to jump to the definition, whereas this field would require grepping for side_effecting to try to piece together what this does.

Ftr, the decorator could simply add the class property side_effecting = True. Your validation code would stay the same, only the call sites would be updated.

-1 on a trait mixin per my above comments on inheritance.

stuhood

After a discussion with @benjyw yesterday about side-effecting process executions (which we can't currently prevent without banning network access), I'm left uncertain about whether annotations will be sufficient. Because without deep sandboxing of processes, there will always be ExecuteProcessRequests that are potentially sideeffecting, and marking them using this annotation probably isn't feasible. Additionally, the foreground flag (which I'd like to try and finish implementing this weekend) would benefit from the ability not to be cached.

I think that it would be good to think through how an annotation like this would interact with #6598, which I still think that we will need to do, for the above reasons. In particular: rather than a "side-effecting" flag, should this be "uncacheable"? And if it is uncacheable, is the restriction on location something that we can evolve into a usage of #6598?

gshuflin · 2020-01-09T18:55:59Z

After a discussion with @benjyw yesterday about side-effecting process executions (which we can't currently prevent without banning network access), I'm left uncertain about whether annotations will be sufficient. Because without deep sandboxing of processes, there will always be ExecuteProcessRequests that are potentially sideeffecting, and marking them using this annotation probably isn't feasible.

Right, I don't expect ExecuteProcessRequest to be a formally side-effecting type; we use it deep within @rules all over the place. A process in an EPR that makes a network request is not treated as side-effecting from pants' perspective, since its result gets cached just like the result of any other @rule or cacheable intrinsic. Nothing external to pants that depends on that subprocess making a successful request can depend on that process to actually run or that request to be made on every pants invocation. That is the sense in which this side_effecting flag is inappropriate for EPR.

Additionally, the foreground flag (which I'd like to try and finish implementing this weekend) would benefit from the ability not to be cached.

I think that it would be good to think through how an annotation like this would interact with #6598, which I still think that we will need to do, for the above reasons. In particular: rather than a "side-effecting" flag, should this be "uncacheable"? And if it is uncacheable, is the restriction on location something that we can evolve into a usage of #6598?

I don't think it's important whether we call this flag "uncacheable" or "side_effecting". The important thing is that we have a way to mark types that can only be requested by rules that permit side effects in a way that the engine actually checks. Right now, that's only @console_rules.

If we had #6598 implemented, I don't think we would need console rules at all, and that means that this commit that formalizes the notion of behavior restricted to @console_rules would of course be moot. In the case of the foreground flag you're talking about, that flag would allow a rule author to create a non-cacheable EPR in any @rule, forcing that rule to become non-cacheable no matter how deep in the rule it is or how many other cached rules depend on the output of that EPR. #6598 is an attempt to modify the entire graph model to make this case work, and be smart about re-running rules that depend on the outputs of a suddenly-non-cached rule. I'm not actually sure how the foreground flag would work without also implementing #6598 .

So in sum, I think this commit makes sense if we want to continue to have a notion of separate @rule and @console_rules; #6598 would allow us to get rid of that distinction, making this PR moot - but also #6598 is a big undertaking and there may be other reasons we want to permanently keep a @rule/@console_rule distinction. But either way, a flag like this that marks certain types as non-cacheable in a way the engine can verify is useful.

Eric-Arellano

This looks good to me, although probably want to wait for Stu to re-review.

stuhood

Thanks!

The semantics of this are well understood. It might be worth taking one last look at the API to use here (property/annotation/marker-trait), but the meaning is clear.

stuhood · 2020-01-11T00:31:40Z

src/python/pants/engine/fs.py

@@ -212,6 +212,7 @@ class UrlToFetch:
 @dataclass(frozen=True)
 class Workspace:
  """Abstract handle for operations that touch the real local filesystem."""
+  side_effecting = True


Either an annotation (@sideeffecting) or a marker trait would be more self-documenting... someone seeing this property in isolation would not have anything to reference to figure out what it does, whereas with either of those alternatives they could go and inspect the docs on the type/annotation.

Another consideration would be: if we do decide to allow types to be uncacheable, would we want to have a merged annotation to capture the fact that "sideeffecting, uncacheable, neither" is closer to being an enum? Or could that be handled with inheritance of the marker trait? etc.

gshuflin force-pushed the console_rule_only branch from 7522955 to 20d0ea0 Compare January 9, 2020 00:18

gshuflin requested review from stuhood and cosmicexplorer January 9, 2020 00:18

gshuflin force-pushed the console_rule_only branch from 20d0ea0 to 9f2b605 Compare January 9, 2020 02:54

Eric-Arellano approved these changes Jan 9, 2020

View reviewed changes

stuhood reviewed Jan 9, 2020

View reviewed changes

gshuflin force-pushed the console_rule_only branch 2 times, most recently from a07ba20 to 08ecb24 Compare January 9, 2020 20:57

gshuflin mentioned this pull request Jan 9, 2020

Fix confusing error for an @console_rule not returning a Goal #8931

Merged

gshuflin force-pushed the console_rule_only branch from 08ecb24 to d51a060 Compare January 10, 2020 19:59

gshuflin changed the title ~~Mark certain types as "side-effecting"~~ Mark certain types as "uncacheable" Jan 10, 2020

gshuflin force-pushed the console_rule_only branch from d51a060 to d77ac95 Compare January 10, 2020 22:57

Eric-Arellano approved these changes Jan 10, 2020

View reviewed changes

gshuflin added 6 commits January 10, 2020 15:54

Mark certain types as console-rule-only

5c0c2e1

Rename side_effecting -> uncacheable

188cd5e

Fix return type

22601f7

Make InteractiveRunner only used in @console_rule

37eca64

Fix test_test.py

50860c7

rename uncacheable -> side_effecting

e7d4498

gshuflin force-pushed the console_rule_only branch from d77ac95 to e7d4498 Compare January 11, 2020 00:17

Eric-Arellano approved these changes Jan 11, 2020

View reviewed changes

gshuflin changed the title ~~Mark certain types as "uncacheable"~~ Mark certain types as "side-effecting" Jan 11, 2020

stuhood approved these changes Jan 11, 2020

View reviewed changes

gshuflin added 2 commits January 10, 2020 16:45

Use a class annotation to mark side_effecting

71ed47e

Fix order

34e4854

gshuflin force-pushed the console_rule_only branch from e3c2fb5 to 34e4854 Compare January 11, 2020 00:48

gshuflin merged commit 94e5634 into pantsbuild:master Jan 11, 2020

gshuflin deleted the console_rule_only branch January 11, 2020 07:18

gshuflin mentioned this pull request Jan 16, 2020

Optionally execute processes exclusively in the foreground #8974

Closed

stuhood mentioned this pull request Feb 12, 2020

Limit requesting certain types to @console_rule #8336

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mark certain types as "side-effecting" #8922

Mark certain types as "side-effecting" #8922

gshuflin commented Jan 9, 2020 •

edited

Loading

cosmicexplorer commented Jan 9, 2020

gshuflin commented Jan 9, 2020 •

edited

Loading

Eric-Arellano left a comment

Eric-Arellano Jan 9, 2020

Eric-Arellano Jan 9, 2020

Eric-Arellano Jan 9, 2020

Eric-Arellano Jan 9, 2020

gshuflin Jan 9, 2020

stuhood Jan 11, 2020

Eric-Arellano Jan 11, 2020

stuhood left a comment •

edited

Loading

gshuflin commented Jan 9, 2020 •

edited

Loading

Eric-Arellano left a comment

stuhood left a comment

stuhood Jan 11, 2020

		return _make_rule(return_type, parameter_types, cacheable=cacheable, name=name)(func)


		def validate_parameter_types(func_id: str, parameter_types: Tuple[Type, ...], cacheable: bool):

Mark certain types as "side-effecting" #8922

Mark certain types as "side-effecting" #8922

Conversation

gshuflin commented Jan 9, 2020 • edited Loading

Problem

Solution

cosmicexplorer commented Jan 9, 2020

gshuflin commented Jan 9, 2020 • edited Loading

Eric-Arellano left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stuhood left a comment • edited Loading

Choose a reason for hiding this comment

gshuflin commented Jan 9, 2020 • edited Loading

Eric-Arellano left a comment

Choose a reason for hiding this comment

stuhood left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gshuflin commented Jan 9, 2020 •

edited

Loading

gshuflin commented Jan 9, 2020 •

edited

Loading

stuhood left a comment •

edited

Loading

gshuflin commented Jan 9, 2020 •

edited

Loading