-
-
Notifications
You must be signed in to change notification settings - Fork 719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better instrumentation for Worker.gather_dep
#7217
Comments
There are two fundamental task journeys: execute
gatherThis is either triggered by a dependent in waiting state or by the Active Memory Manager.
A few observations: [1]When gathering, both the successful network transfer and the deserialization times need to be apportioned to the keys bundled together by [2]When gathering, it's probably lot more interesting to know why you're gathering (read: group by prefix of the dependent) than what you're gathering (group by prefix of the task itself) [3]It's not straightforward to break down the waiting time when more than one dependency needs fetching.
Again this is imperfect, but should handle reasonably well two big use cases: B) fetch two dependencies from different workers. They are transferred at the same time. [4]The above algorithm is fairly complex and requires collecting a wealth of new information in the worker state machine. [5]Those are... a lot of states to break down by task prefix. It would generate several pages worth of plain text in the prometheus output, and I'm very concerned it would be far from trivial in terms of volume. Not breaking everything down by task prefix is probably wiser. I would pragmatically suggest:
The above would produce less overwhelming data, and would make both the apportioning of deserialization times and waiting times unnecessary. |
The breakdown of execute described above is out of scope for this ticket and is discussed in #7565. |
#7665 lists a wealth of follow-up tickets. We should decide delivering which let us reach the Definition of Done on this ticket. Candidates:
|
I suggest to "park" this ticket for now, wrap up what we're currently doing around spans and metric and reassess what we want on top of that. I actually think that most of what I was looking for with this ticket will be implemented by then |
Task queuing has been proven to significantly improve performance by reducing root task overproduction
In recent benchmarks and tests I noticed that one major source for root task overproduction is not necessarily that reducers are not assigned fast enough to the workers but that the workers are unable to run these tasks since they need to fetch dependencies first. If average root task runtime is much smaller than it takes to fetch dependencies, this can cause workers to run many data producers before it has the possibility to run a reducer.
Right now, we're almost blind to this situation but could be exposing much better metrics on the dashboard (or Prometheus).
Specifically, I'm interested in
Ideally, I would love to get data for a Task X with dependencies deps that tells me
Some of this information is already available, other information we still need to collect. I don't think we have anything that can break it up this way and/or group by TaskGroups or individual tasks.
I think this kind of visibility would help us significantly with making decisions about optimizations, e.g. should we prioritize STA? Should we focus on getting a sendfile implementation up and running? Do connection attempts take way too long because event loops are blocked?
The text was updated successfully, but these errors were encountered: