-
-
Notifications
You must be signed in to change notification settings - Fork 649
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Per-run metrics for target roots, transitive target counts. #5651
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Kris. I think that we should avoid directly tracking targets in favor of tracking files.
If you have appetite for diving into the rust code, there is a relatively small solution available there... otherwise, can get similar information from the BuildGraph
for now.
src/python/pants/goal/context.py
Outdated
self.run_tracker.pantsd_stats.set_target_root_size(target_count) | ||
return target_count | ||
|
||
def set_affected_target_count_in_runtracker(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than exposing these as public methods on Context
, is there somewhere else we could put them that would not potentially give the impression that they are intended for Task developers to call? We should be aiming to shrink the Context API if possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd assumed making them non-:API: public
would suffice for that - but I've made them private and wrapped that with a contextmanager to tighten this up a bit. let me know what you think.
src/python/pants/goal/context.py
Outdated
|
||
def set_affected_target_count_in_runtracker(self): | ||
"""Sets the realized target count in the run tracker's daemon stats object.""" | ||
target_count = len(self.build_graph) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to break when we stop computing a BuildGraph
in all cases via #5639 and #4769. And it is explicitly the goal of those tickets to avoid computing this.
@benjyw and I talked about this yesterday, but moving forward, we're going to need to make a decision about what the graph looks like to tasks like depmap
and dependencies
... but it's entirely possible that that will not be literally the graph used by a goal like compile
or test
, since those will request exactly the products they need in order to execute. This is also touched on in the blog post I sent out about execution models.
I think that a metric that would be more durable as the concept of "targets" evolves might be something like: "number of involved files". There are a few ways to compute that (and ideally it would be computed by the engine itself: see below), but one easy approach for now would be to continue to use the BuildGraph
(temporarily), and sum the counts of all files owned by targets.
If you have time for the more forwards compatible approach, it would be to add a method similar to graph_trace that counted how many files were accessible below some roots by calling fs_subject: in pseudocode, it would be something like:
impl Graph {
fn count_fs_nodes(&self, roots: Vec<NodeKey>) -> usize {
self
.walk(roots.into_iter().flat_map(|n| self.entry_id(EntryKey.Value(n))).collect(), false)
// Count entries which have an fs_subject.
.filter(|entry_id| self.entry_for_id.and_then(|e| e.node.content().fs_subject()).is_some())
.count()
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, I fully expect this to evolve over time as we focus on e.g. the runtracker/reporting aspect of the v2 pipeline port. for now, was just going for something basic to establish a better approximate baseline than the accumulating metric we have now for product graph size. I've gone ahead and implemented the file counting idea using BuildGraph
for the moment w/ a TODO to circle back to do that in the engine.
9f0dcf0
to
cd52c0c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Would still recommend dropping the target count metric entirely.
|
||
def _set_affected_target_count_in_runtracker(self): | ||
"""Sets the realized target count in the run tracker's daemon stats object.""" | ||
target_count = len(self.build_graph) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're going to be breaking this metric pretty soon, so I'm not sure we even want to begin collecting it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, but only for the python related tasks afaict. my thinking is that it's still a fairly relevant and interesting grouping metric and one that might make sense to be able to compare/contrast file count to. ftr: I'm still not completely convinced that "files" is that much better than "targets" since they can both be deceptive as to substantive size on the surface. so was thinking both can't hurt, initially - and it's cheap to collect/report on.
wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that targets will "mostly" be going away, and that it will not be free to compute this any longer. If we essentially pre-deprecate this metric and bake in the assumption that only v1 tasks report it, then ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we essentially pre-deprecate this metric and bake in the assumption that only v1 tasks report it
yeah, that's the idea. this metric is exactly as "deprecated" as Target
/BuildGraph
is. once we erode Target
et al in favor of pushing more into the v2 engine, I fully expect this metric to also erode/evolve alongside it - but think it might be useful to experiment with in the interim.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. Fine with landing it.
hmm, but only for the python related tasks afaict.
The first task it will be going away for is list
in #5639
This should help ensure a more stable sizing metric for runs.