Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create system so that users can measure the performance of their task #215

Closed
nathanielc opened this issue Feb 5, 2016 · 11 comments
Closed
Milestone

Comments

@nathanielc
Copy link
Contributor

Beyond benchmarks of static tasks we need to make it easy for an end user to measure directly the performance of their own tasks so they can provision resources appropriately

@nathanielc nathanielc added this to the v0.11 milestone Feb 5, 2016
@nhproject
Copy link

+1
Such a system would be fantastic!

@qindj
Copy link

qindj commented Feb 6, 2016

+1

@nathanielc
Copy link
Contributor Author

@nhproject @qindj This is my current thinking, calculate the throughput per second for each node and add it as both an internal stat, and in the output of the show command.

Currently the output of the show command displays the node pipeline with counts for how many points have passed along the edge. Adding throughput would be on each node and would look it:

digraph cpu_alert {
stream0 [label="stream0 2.4/s"];
stream1 [label="stream1 1.4/s"];
stream0 -> stream1 [label="12"];
alert2 [label="alert2 1.4/s"];
stream1 -> alert2 [label="12"];
}

Which is valid dot syntax but is starting to get hard to read. It produces this diagram:

![Alt text](http://g.gravizo.com/g?
digraph cpu_alert {
graph [rankdir=LR];
stream0 [label="stream0 2.4/s"];
stream1 [label="stream1 1.4/s"];
stream0 -> stream1 [label="12"];
alert2 [label="alert2 1.4/s"];
stream1 -> alert2 [label="12"];
})

Thoughts? Do you only care about the throughput of the root of the pipeline or each node?

@nathanielc
Copy link
Contributor Author

Another thought is to just compute a throughput for the entire task but then compute average execution times for each node. Then it is apparent which node is a bottle neck in the DAG.

Something like this:

digraph cpu_alert {
stream0 [label="stream0 105ns"];
stream1 [label="stream1 50ns"];
stream0 -> stream1 [label="12"];
alert2 [label="alert2 200ns"];
stream1 -> alert2 [label="12"];
}

![Alt text](http://g.gravizo.com/g?
digraph cpu_alert {
graph [rankdir=LR];
stream0 [label="stream0 105ns"];
stream1 [label="stream1 50ns"];
stream0 -> stream1 [label="12"];
alert2 [label="alert2 200ns"];
stream1 -> alert2 [label="12"];
})

@panda87
Copy link

panda87 commented Feb 11, 2016

@nathanielc just a question, there will be different in the compute average execution times if I have one tick file or if I have 100?

@nathanielc
Copy link
Contributor Author

@panda87 There should not be a difference if your are running 1 task or 100s, but if you are hitting resource limits on your box you might see that. The important thing here is to expose the right actionable information so that if having multiple tasks does slow things down you will know it and be able to take appropriate action.

@panda87
Copy link

panda87 commented Feb 12, 2016

But if I have 100 tick files every data point which received will go through on each one right?
I mean that if I get 100 data points per sec, they will be computed 100 * 100, and it seems that my resources will have to handle computing of 100 * 100 per sec, which is more likely if I'd have 10,000 in sec and one tick file, that's it true?

@nhproject
Copy link

@nathanielc I think that getting the average execution times per each node (+ throughput for the entire task) is more informative.
In general, knowing about a bottle neck or about a node that is being executed relatively slowly is important.
Both ideas seems great btw. Thanks !

@panda87
Copy link

panda87 commented Feb 13, 2016

👍 @nhproject very important feedback

@yosiat
Copy link
Contributor

yosiat commented Feb 14, 2016

@nathanielc It would be nice to have both "points processed" and "points processed per second", something like:

digraph cpu_alert {
stream0 [processed="105",per_second="10"];
}

So it will be to parseable and you can add more metrics later if you want

@nathanielc
Copy link
Contributor Author

@nhproject @qindj @panda87 @yosiat I feel like #248 is ready to go. I plan to merge it tomorrow, any last comments?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants