Create system so that users can measure the performance of their task #215

nathanielc · 2016-02-05T17:18:34Z

Beyond benchmarks of static tasks we need to make it easy for an end user to measure directly the performance of their own tasks so they can provision resources appropriately

nhproject · 2016-02-05T18:55:54Z

+1
Such a system would be fantastic!

qindj · 2016-02-06T03:50:29Z

+1

nathanielc · 2016-02-11T17:26:15Z

@nhproject @qindj This is my current thinking, calculate the throughput per second for each node and add it as both an internal stat, and in the output of the show command.

Currently the output of the show command displays the node pipeline with counts for how many points have passed along the edge. Adding throughput would be on each node and would look it:

digraph cpu_alert {
stream0 [label="stream0 2.4/s"];
stream1 [label="stream1 1.4/s"];
stream0 -> stream1 [label="12"];
alert2 [label="alert2 1.4/s"];
stream1 -> alert2 [label="12"];
}

Which is valid dot syntax but is starting to get hard to read. It produces this diagram:

![Alt text](http://g.gravizo.com/g?
digraph cpu_alert {
graph [rankdir=LR];
stream0 [label="stream0 2.4/s"];
stream1 [label="stream1 1.4/s"];
stream0 -> stream1 [label="12"];
alert2 [label="alert2 1.4/s"];
stream1 -> alert2 [label="12"];
})

Thoughts? Do you only care about the throughput of the root of the pipeline or each node?

nathanielc · 2016-02-11T18:29:08Z

Another thought is to just compute a throughput for the entire task but then compute average execution times for each node. Then it is apparent which node is a bottle neck in the DAG.

Something like this:

digraph cpu_alert {
stream0 [label="stream0 105ns"];
stream1 [label="stream1 50ns"];
stream0 -> stream1 [label="12"];
alert2 [label="alert2 200ns"];
stream1 -> alert2 [label="12"];
}

![Alt text](http://g.gravizo.com/g?
digraph cpu_alert {
graph [rankdir=LR];
stream0 [label="stream0 105ns"];
stream1 [label="stream1 50ns"];
stream0 -> stream1 [label="12"];
alert2 [label="alert2 200ns"];
stream1 -> alert2 [label="12"];
})

panda87 · 2016-02-11T21:46:04Z

@nathanielc just a question, there will be different in the compute average execution times if I have one tick file or if I have 100?

nathanielc · 2016-02-11T21:48:44Z

@panda87 There should not be a difference if your are running 1 task or 100s, but if you are hitting resource limits on your box you might see that. The important thing here is to expose the right actionable information so that if having multiple tasks does slow things down you will know it and be able to take appropriate action.

panda87 · 2016-02-12T07:04:55Z

But if I have 100 tick files every data point which received will go through on each one right?
I mean that if I get 100 data points per sec, they will be computed 100 * 100, and it seems that my resources will have to handle computing of 100 * 100 per sec, which is more likely if I'd have 10,000 in sec and one tick file, that's it true?

nhproject · 2016-02-13T18:27:23Z

@nathanielc I think that getting the average execution times per each node (+ throughput for the entire task) is more informative.
In general, knowing about a bottle neck or about a node that is being executed relatively slowly is important.
Both ideas seems great btw. Thanks !

panda87 · 2016-02-13T20:17:34Z

👍 @nhproject very important feedback

yosiat · 2016-02-14T18:37:07Z

@nathanielc It would be nice to have both "points processed" and "points processed per second", something like:

digraph cpu_alert {
stream0 [processed="105",per_second="10"];
}

So it will be to parseable and you can add more metrics later if you want

nathanielc · 2016-02-24T00:30:31Z

@nhproject @qindj @panda87 @yosiat I feel like #248 is ready to go. I plan to merge it tomorrow, any last comments?

nathanielc added this to the v0.11 milestone Feb 5, 2016

This was referenced Feb 19, 2016

Add performance metrics to tasks and node #248

Merged

Add buffering to InfluxDBOutNode #250

Closed

nathanielc closed this as completed Feb 22, 2016

nathanielc reopened this Feb 22, 2016

nathanielc closed this as completed in #248 Feb 24, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create system so that users can measure the performance of their task #215

Create system so that users can measure the performance of their task #215

nathanielc commented Feb 5, 2016

nhproject commented Feb 5, 2016

qindj commented Feb 6, 2016

nathanielc commented Feb 11, 2016

nathanielc commented Feb 11, 2016

panda87 commented Feb 11, 2016

nathanielc commented Feb 11, 2016

panda87 commented Feb 12, 2016

nhproject commented Feb 13, 2016

panda87 commented Feb 13, 2016

yosiat commented Feb 14, 2016

nathanielc commented Feb 24, 2016

Create system so that users can measure the performance of their task #215

Create system so that users can measure the performance of their task #215

Comments

nathanielc commented Feb 5, 2016

nhproject commented Feb 5, 2016

qindj commented Feb 6, 2016

nathanielc commented Feb 11, 2016

nathanielc commented Feb 11, 2016

panda87 commented Feb 11, 2016

nathanielc commented Feb 11, 2016

panda87 commented Feb 12, 2016

nhproject commented Feb 13, 2016

panda87 commented Feb 13, 2016

yosiat commented Feb 14, 2016

nathanielc commented Feb 24, 2016