-
Notifications
You must be signed in to change notification settings - Fork 308
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add aggregation and display of metrics obtained from Spark #293
Conversation
This change adds an new command-line option called "print_metrics" which causes a listener to be registered with Spark, which accumulates metrics when each task and stage end. When a command is complete the metrics are logged to stdout.
Can one of the admins verify this patch? |
Jenkins, add to whitelist and test this please. |
Can one of the admins verify this patch? |
Jenkins, add to whitelist. |
Can one of the admins verify this patch? |
add to whitelist |
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/ADAM-prb/71/ |
@nfergu this looks great! It looks like Jenkins is complaining about the source formatting; you should be able to fix that on your end by running
Sounds reasonable!
I'd nominally prefer it to be logged via the Spark Logger (org.apache.spark.Logging). Logging to stdout can be a bit of a hassle when running distributed. The text will still get logged, but you then need to fetch the output files from the distributed hosts.
Normally, we prefer not to include binary files in the repo (the JAR in this case), however, I'm not sure what's the best practice for this. Thoughts, @massie @heuermh? |
Thanks for the comments @fnothaft. Seems I was reading an old version of the contributing docs and hadn't run format-source. Should be fixed in the latest commit. I'll change the logging to use the Spark logger now. |
All automated tests passed. |
@@ -154,6 +154,14 @@ | |||
<groupId>org.apache.httpcomponents</groupId> | |||
<artifactId>httpclient</artifactId> | |||
</dependency> | |||
<dependency> | |||
<groupId>com.netflix.servo</groupId> | |||
<artifactId>servo-core</artifactId> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the license for the servo-core and btc-ascii-table dependencies?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both are Apache 2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
w00t
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason not to use Coda Hale's Metrics library? Asking because Spark chose that library for their monitoring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@carlyeks -- there's a discussion about which metrics library to use on the mailing list here: https://groups.google.com/forum/#!topic/adam-developers/52tqcKD1IdU. However, the main reasons for not using Coda Hale's Metrics are that a) it doesn't provide access to the raw data, so it's not possible to aggregate data across a cluster, b) it doesn't provide total time for metrics.
@nfergu this looks great! I've left a few small comments inline. |
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/ADAM-prb/73/ |
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/ADAM-prb/74/ |
Just applying code review comments, but have to go away for a while. Will push them later today. |
@nfergu no problem; take your time, and thanks for going through the comments! There may be a few more folks who take a look tomorrow. |
The Hadoop 2.3.0 build gives: FATAL: Could not checkout null with start point 5f5ceb1efb1f20430cf6bc9ef0505bb9b7505c34 Which is a bit mysterious. Anyone know why this might happen? |
…rics obtained from Spark)
Just pushed changes based on @fnothaft's comments. I'm hoping that this will trigger a new build in Jenkins and fix the mysterious build issue. |
All automated tests passed. |
Neil, it looks like this PR includes a JAR file -- is that intentional, or accidental? |
@tdanford That's intentional:
|
Ah, I missed that bullet point above -- sorry. How crucial is this table formatting, that we want to statically include a JAR? Any chance that just outputting tab-separated lines would do the trick too? |
We have been trying to avoid checking in binary files to the repo since git doesn't handle them well. Even if we remove it later, it's still in our history. Could we just remove the dependency and create the output ourselves? |
Also (and this comment is directed mostly at @massie I guess) if we're adding new dependencies with each PR, we should probably be monitoring how close we're coming to the magic "64k file" limit on the zip archive, right? |
It may make more sense to cut over to appassembler, and to stop worrying about the 64k limit (I'm doing this already with a downstream app). We just need to make sure we can easily use appassembler across a cluster, if we're going to do that. Essentially, worrying about the 64k file limit is like sucking in our bellies; it's not a long term solution... |
Agreed, but since we haven't done the cut-over yet... |
I've removed the JAR file, but don't want to push at the moment as ADAM doesn't build for me. Jenkins is complaining as well (https://amplab.cs.berkeley.edu/jenkins/job/ADAM/49/) so it's not just me! From a quick look it looks like a change bdg-formats repository broke it (removal of the group ID from ADAMRecord). As a side note regarding the 64K file limit on JARs, I think this has gone away in Java 7. Apparently Java 7 supports zip64: https://blogs.oracle.com/xuemingshen/entry/zip64_support_for_4g_zipfile |
@nfergu Go ahead and push the changes. I've reverted the commit that broke the build in bdg-formats. Thanks for your contribution, Neil. Good stuff. |
OK, latest changes pushed (removal of ascii table JAR file and addition of a simple class to render ascii tables) |
All automated tests passed. |
Add aggregation and display of metrics obtained from Spark
Merged! Awesome work @nfergu! |
@@ -278,6 +278,11 @@ | |||
<id>hadoop-bam</id> | |||
<url>http://hadoop-bam.sourceforge.net/maven/</url> | |||
</repository> | |||
<repository> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be removed now correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yes.
Exemplary open-source work, Neil! Thanks for the contribution. |
This change adds an new command-line option called "print_metrics" which causes a listener to be registered with Spark, which accumulates metrics when each task and stage end. When a command is complete the metrics are logged to stdout.
Some notes: