Per-run metrics for target roots, transitive target counts. #5651

kwlzn · 2018-04-03T19:10:44Z

This should help ensure a more stable sizing metric for runs.

[omerta pants (kwlzn/more_metrics)]$ PANTS_ENABLE_PANTSD=True ./pants -q list 3rdparty/python:psutil 3rdparty/python:requests --run-tracker-stats-local-json-file=test.json | wc -l ; cat test.json | json.tool | grep -A5 pantsd_stats

       2
    "pantsd_stats": {
        "affected_targets_size": 2,
        "preceding_graph_size": 0,
        "resulting_graph_size": 27,
        "target_root_size": 2
    },
[omerta pants (kwlzn/more_metrics)]$ PANTS_ENABLE_PANTSD=True ./pants -q list : --run-tracker-stats-local-json-file=test.json | wc -l ; cat test.json | json.tool | grep -A5 pantsd_stats

       8
    "pantsd_stats": {
        "affected_targets_size": 15,
        "preceding_graph_size": 5,
        "resulting_graph_size": 99,
        "target_root_size": 8
    },
[omerta pants (kwlzn/more_metrics)]$ PANTS_ENABLE_PANTSD=True ./pants -q list :: --run-tracker-stats-local-json-file=test.json | wc -l ; cat test.json | json.tool | grep -A5 pantsd_stats

    1558
    "pantsd_stats": {
        "affected_targets_size": 1562,
        "preceding_graph_size": 19,
        "resulting_graph_size": 15256,
        "target_root_size": 1558
    },
[omerta pants (kwlzn/more_metrics)]$ PANTS_ENABLE_PANTSD=True ./pants -q list : --run-tracker-stats-local-json-file=test.json | wc -l ; cat test.json | json.tool | grep -A5 pantsd_stats

       8
    "pantsd_stats": {
        "affected_targets_size": 15,
        "preceding_graph_size": 5086,
        "resulting_graph_size": 5166,
        "target_root_size": 8
    },
[omerta pants (kwlzn/more_metrics)]$ ./pants -q list : --run-tracker-stats-local-json-file=test.json | wc -l ; cat test.json | json.tool | grep -A5 pantsd_stats

       8
    "pantsd_stats": {
        "affected_targets_size": 15,
        "preceding_graph_size": -1,
        "resulting_graph_size": 95,
        "target_root_size": 8
    },
[omerta pants (kwlzn/more_metrics)]$ ./pants -q list :: --run-tracker-stats-local-json-file=test.json | wc -l ; cat test.json | json.tool | grep -A5 pantsd_stats

    1558
    "pantsd_stats": {
        "affected_targets_size": 1562,
        "preceding_graph_size": -1,
        "resulting_graph_size": 15254,
        "target_root_size": 1558
    },

stuhood

Thanks Kris. I think that we should avoid directly tracking targets in favor of tracking files.

If you have appetite for diving into the rust code, there is a relatively small solution available there... otherwise, can get similar information from the BuildGraph for now.

stuhood · 2018-04-03T19:16:24Z

src/python/pants/goal/context.py

+    self.run_tracker.pantsd_stats.set_target_root_size(target_count)
+    return target_count
+
+  def set_affected_target_count_in_runtracker(self):


Rather than exposing these as public methods on Context, is there somewhere else we could put them that would not potentially give the impression that they are intended for Task developers to call? We should be aiming to shrink the Context API if possible.

I'd assumed making them non-:API: public would suffice for that - but I've made them private and wrapped that with a contextmanager to tighten this up a bit. let me know what you think.

stuhood · 2018-04-03T19:37:50Z

src/python/pants/goal/context.py

+
+  def set_affected_target_count_in_runtracker(self):
+    """Sets the realized target count in the run tracker's daemon stats object."""
+    target_count = len(self.build_graph)


This is going to break when we stop computing a BuildGraph in all cases via #5639 and #4769. And it is explicitly the goal of those tickets to avoid computing this.

@benjyw and I talked about this yesterday, but moving forward, we're going to need to make a decision about what the graph looks like to tasks like depmap and dependencies... but it's entirely possible that that will not be literally the graph used by a goal like compile or test, since those will request exactly the products they need in order to execute. This is also touched on in the blog post I sent out about execution models.

I think that a metric that would be more durable as the concept of "targets" evolves might be something like: "number of involved files". There are a few ways to compute that (and ideally it would be computed by the engine itself: see below), but one easy approach for now would be to continue to use the BuildGraph (temporarily), and sum the counts of all files owned by targets.

If you have time for the more forwards compatible approach, it would be to add a method similar to graph_trace that counted how many files were accessible below some roots by calling fs_subject: in pseudocode, it would be something like:

impl Graph { fn count_fs_nodes(&self, roots: Vec<NodeKey>) -> usize { self .walk(roots.into_iter().flat_map(|n| self.entry_id(EntryKey.Value(n))).collect(), false) // Count entries which have an fs_subject. .filter(|entry_id| self.entry_for_id.and_then(|e| e.node.content().fs_subject()).is_some()) .count() } }

yeah, I fully expect this to evolve over time as we focus on e.g. the runtracker/reporting aspect of the v2 pipeline port. for now, was just going for something basic to establish a better approximate baseline than the accumulating metric we have now for product graph size. I've gone ahead and implemented the file counting idea using BuildGraph for the moment w/ a TODO to circle back to do that in the engine.

stuhood

Thanks. Would still recommend dropping the target count metric entirely.

stuhood · 2018-04-03T20:43:43Z

src/python/pants/goal/context.py

+
+  def _set_affected_target_count_in_runtracker(self):
+    """Sets the realized target count in the run tracker's daemon stats object."""
+    target_count = len(self.build_graph)


We're going to be breaking this metric pretty soon, so I'm not sure we even want to begin collecting it.

hmm, but only for the python related tasks afaict. my thinking is that it's still a fairly relevant and interesting grouping metric and one that might make sense to be able to compare/contrast file count to. ftr: I'm still not completely convinced that "files" is that much better than "targets" since they can both be deceptive as to substantive size on the surface. so was thinking both can't hurt, initially - and it's cheap to collect/report on.

wdyt?

I think that targets will "mostly" be going away, and that it will not be free to compute this any longer. If we essentially pre-deprecate this metric and bake in the assumption that only v1 tasks report it, then ok.

we essentially pre-deprecate this metric and bake in the assumption that only v1 tasks report it

yeah, that's the idea. this metric is exactly as "deprecated" as Target/BuildGraph is. once we erode Target et al in favor of pushing more into the v2 engine, I fully expect this metric to also erode/evolve alongside it - but think it might be useful to experiment with in the interim.

Ok. Fine with landing it.

hmm, but only for the python related tasks afaict.

The first task it will be going away for is list in #5639

kwlzn requested review from stuhood, ity and illicitonion April 3, 2018 19:11

kwlzn added 2 commits April 3, 2018 12:30

Emit a metric for per-run literal target root count.

611504f

Emit a metric for target count per run.

0b936be

stuhood reviewed Apr 3, 2018

View reviewed changes

kwlzn added 2 commits April 3, 2018 12:46

Wrap metrics setters in a contextmanager.

aaba827

Add target->file count metric.

cd52c0c

kwlzn force-pushed the kwlzn/more_metrics branch from 9f0dcf0 to cd52c0c Compare April 3, 2018 20:19

stuhood approved these changes Apr 3, 2018

View reviewed changes

Mystery fixups.

c9a88f1

kwlzn merged commit 7ffc115 into pantsbuild:master Apr 4, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Per-run metrics for target roots, transitive target counts. #5651

Per-run metrics for target roots, transitive target counts. #5651

kwlzn commented Apr 3, 2018 •

edited

Loading

stuhood left a comment

stuhood Apr 3, 2018

kwlzn Apr 3, 2018

stuhood Apr 3, 2018 •

edited

Loading

kwlzn Apr 3, 2018

stuhood left a comment

stuhood Apr 3, 2018

kwlzn Apr 3, 2018

stuhood Apr 3, 2018

kwlzn Apr 3, 2018

stuhood Apr 3, 2018

Per-run metrics for target roots, transitive target counts. #5651

Per-run metrics for target roots, transitive target counts. #5651

Conversation

kwlzn commented Apr 3, 2018 • edited Loading

stuhood left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stuhood Apr 3, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stuhood left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kwlzn commented Apr 3, 2018 •

edited

Loading

stuhood Apr 3, 2018 •

edited

Loading