Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[graph] visualization to indicate where a dataflow is stuck #14

Open
utaal opened this issue Jan 1, 2020 · 3 comments
Open

[graph] visualization to indicate where a dataflow is stuck #14

utaal opened this issue Jan 1, 2020 · 3 comments
Labels
enhancement New feature or request

Comments

@utaal
Copy link
Member

utaal commented Jan 1, 2020

@frankmcsherry:

It is hard to diagnose a "stuck" timely dataflow computation, where for some reason there is a capability (or perhaps message) in the system that prevents forward progress. In the system there is fairly clear information (in the progress tracking) about which pointstamps have non-zero accumulation, and although perhaps not strictly speaking a "visualization" we could imagine extracting and presenting this information.

@antiguru recently had a similar issue, in which he wanted to "complete" a dataflow without simply exiting the worker (to take some measurements), and when he attempts this the dataflow never reports completion. The root cause was ultimately that a forgotten input was left un-closed.

One idiom that seemed helpful here was to imagine a version of the dataflow graph that reports e.g. whether operators have been tombstoned or not (closed completely, memory reclaimed). This would reveal who was keeping a dataflow open, which is a rougher version of what is holding a dataflow back. We might also look for similar idioms that allow people to ask, for a given timestamp/frontier, which operators have moved past that frontier and which have not, revealing where in the dataflow graph a time is "stuck".

@utaal utaal added the enhancement New feature or request label Jan 1, 2020
@quentusrex
Copy link

Any more thoughts here?

@frankmcsherry
Copy link
Member

The closest is an open issue in the timely repo for logging progress computation, which (ideally) would allow one to track where there remains outstanding work (usually that incriminates some operators). It is languishing a bit for lack of requirements (it was formed in support of a research project that wanted lots of information, but should we actually aim at minimizing the information to e.g. the frontier of available work?).

@quentusrex
Copy link

When I think about the points in time that I'd want to use a feature like this, there are two groups of situations that come to mind: 1. a computation that does complete, or at least makes progress, but is behaving unexpectedly(too fast, or too slow) for the current data set. 2. computation that like the author mentions, just gets 'stuck' unexpectedly, which usually is when a larger input set is used vs the development data set.

For the first group of cases, it's generally been on a computation that has been through many development iterations, and is in need of a refactor or a wholistic review. Being able to visualize the computational flow and some info about the relative memory and cpu resources for each stage would be both useful and actionable. In my experiences it's often been when an incorrect variable is being used, and the variable names are too close to each other to make it obvious in a code review.

For the second case, I'm not sure what would be most helpful. Being able to see something about the available work seems it would be actionable in tracking down the issue, but at the same time being able to see how much of the computation remains(if there are 10 stages, and only the operators for stage 2 and 3 are available, that would be helpful to narrow down). I get the sense that seeing some approximation for what resources would be needed for each of the operators, based on some sample(or profiled) data set, would point out clearly where there is an unexpected order(s) of magnitude difference.

Not sure if the above are possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants