Add aggregation and display of metrics obtained from Spark #293

nfergu · 2014-07-06T14:51:08Z

This change adds an new command-line option called "print_metrics" which causes a listener to be registered with Spark, which accumulates metrics when each task and stage end. When a command is complete the metrics are logged to stdout.

Some notes:

The "print_metrics" option is added in the Args4jBase class. I thought about adding it in SparkArgs, but the intention is that it can be used for non Spark-related metrics in the future.
The metrics are currently logged to stdout. They could be logged to SLF4J instead if that's better -- let me know and I'll change it if necessary.
I've added a dependency on this library: http://code.google.com/p/java-ascii-table/ for printing ascii tables (Apache-licensed). It's not on any public maven repos, so I've added a local maven repo at /repo which contains the JAR file and POM. This seems to work fine both with the command line and Intellij, but let me know if there are any issues.

This change adds an new command-line option called "print_metrics" which causes a listener to be registered with Spark, which accumulates metrics when each task and stage end. When a command is complete the metrics are logged to stdout.

AmplabJenkins · 2014-07-06T14:51:53Z

Can one of the admins verify this patch?

fnothaft · 2014-07-06T14:52:07Z

Jenkins, add to whitelist and test this please.

AmplabJenkins · 2014-07-06T14:54:53Z

Can one of the admins verify this patch?

fnothaft · 2014-07-06T14:55:15Z

Jenkins, add to whitelist.

AmplabJenkins · 2014-07-06T14:55:54Z

Can one of the admins verify this patch?

fnothaft · 2014-07-06T14:56:53Z

add to whitelist

AmplabJenkins · 2014-07-06T15:04:04Z

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/ADAM-prb/71/

fnothaft · 2014-07-06T15:09:49Z

@nfergu this looks great! It looks like Jenkins is complaining about the source formatting; you should be able to fix that on your end by running ./scripts/format-source, and pushing an update.

The "print_metrics" option is added in the Args4jBase class. I thought about adding it in SparkArgs, but the intention is that it can be used for non Spark-related metrics in the future.

Sounds reasonable!

The metrics are currently logged to stdout. They could be logged to SLF4J instead if that's better -- let me know and I'll change it if necessary.

I'd nominally prefer it to be logged via the Spark Logger (org.apache.spark.Logging). Logging to stdout can be a bit of a hassle when running distributed. The text will still get logged, but you then need to fetch the output files from the distributed hosts.

I've added a dependency on this library: http://code.google.com/p/java-ascii-table/ for printing ascii tables (Apache-licensed). It's not on any public maven repos, so I've added a local maven repo at /repo which contains the JAR file and POM. This seems to work fine both with the command line and Intellij, but let me know if there are any issues.

Normally, we prefer not to include binary files in the repo (the JAR in this case), however, I'm not sure what's the best practice for this. Thoughts, @massie @heuermh?

498d837)

nfergu · 2014-07-06T15:24:27Z

Thanks for the comments @fnothaft.

Seems I was reading an old version of the contributing docs and hadn't run format-source. Should be fixed in the latest commit.

I'll change the logging to use the Spark logger now.

AmplabJenkins · 2014-07-06T15:29:16Z

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/ADAM-prb/72/

massie · 2014-07-06T15:32:38Z

adam-core/pom.xml

@@ -154,6 +154,14 @@
            <groupId>org.apache.httpcomponents</groupId>
            <artifactId>httpclient</artifactId>
        </dependency>
+        <dependency>
+            <groupId>com.netflix.servo</groupId>
+            <artifactId>servo-core</artifactId>


What is the license for the servo-core and btc-ascii-table dependencies?

Both are Apache 2.

Any reason not to use Coda Hale's Metrics library? Asking because Spark chose that library for their monitoring.

@carlyeks -- there's a discussion about which metrics library to use on the mailing list here: https://groups.google.com/forum/#!topic/adam-developers/52tqcKD1IdU. However, the main reasons for not using Coda Hale's Metrics are that a) it doesn't provide access to the raw data, so it's not possible to aggregate data across a cluster, b) it doesn't provide total time for metrics.

fnothaft · 2014-07-06T15:47:18Z

@nfergu this looks great! I've left a few small comments inline.

AmplabJenkins · 2014-07-06T16:02:02Z

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/ADAM-prb/73/

AmplabJenkins · 2014-07-06T16:09:47Z

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/ADAM-prb/74/

nfergu · 2014-07-06T16:20:54Z

Just applying code review comments, but have to go away for a while. Will push them later today.

fnothaft · 2014-07-06T16:22:22Z

@nfergu no problem; take your time, and thanks for going through the comments! There may be a few more folks who take a look tomorrow.

nfergu · 2014-07-06T17:14:20Z

The Hadoop 2.3.0 build gives:

FATAL: Could not checkout null with start point 5f5ceb1efb1f20430cf6bc9ef0505bb9b7505c34
hudson.plugins.git.GitException: Could not checkout null with start point 5f5ceb1efb1f20430cf6bc9ef0505bb9b7505c34

Which is a bit mysterious. Anyone know why this might happen?

…rics obtained from Spark)

nfergu · 2014-07-07T17:40:29Z

Just pushed changes based on @fnothaft's comments. I'm hoping that this will trigger a new build in Jenkins and fix the mysterious build issue.

AmplabJenkins · 2014-07-07T17:49:28Z

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/ADAM-prb/75/

tdanford · 2014-07-07T18:58:01Z

Neil, it looks like this PR includes a JAR file -- is that intentional, or accidental?

fnothaft · 2014-07-07T18:58:38Z

@tdanford That's intentional:

I've added a dependency on this library: http://code.google.com/p/java-ascii-table/ for printing ascii tables (Apache-licensed). It's not on any public maven repos, so I've added a local maven repo at /repo which contains the JAR file and POM. This seems to work fine both with the command line and Intellij, but let me know if there are any issues.

tdanford · 2014-07-07T19:00:00Z

Ah, I missed that bullet point above -- sorry.

How crucial is this table formatting, that we want to statically include a JAR? Any chance that just outputting tab-separated lines would do the trick too?

massie · 2014-07-07T21:46:58Z

We have been trying to avoid checking in binary files to the repo since git doesn't handle them well. Even if we remove it later, it's still in our history.

Could we just remove the dependency and create the output ourselves?

nfergu · 2014-07-07T22:26:06Z

@tdanford @massie -- I can certainly write something to format the output rather than using the library. Not sure that simple tab-separated lines will be readable enough, but shouldn't be too difficult.

tdanford · 2014-07-08T10:23:55Z

Also (and this comment is directed mostly at @massie I guess) if we're adding new dependencies with each PR, we should probably be monitoring how close we're coming to the magic "64k file" limit on the zip archive, right?

fnothaft · 2014-07-08T12:16:59Z

Also (and this comment is directed mostly at @massie I guess) if we're adding new dependencies with each PR, we should probably be monitoring how close we're coming to the magic "64k file" limit on the zip archive, right?

It may make more sense to cut over to appassembler, and to stop worrying about the 64k limit (I'm doing this already with a downstream app). We just need to make sure we can easily use appassembler across a cluster, if we're going to do that.

Essentially, worrying about the 64k file limit is like sucking in our bellies; it's not a long term solution...

tdanford · 2014-07-08T13:06:57Z

Essentially, worrying about the 64k file limit is like sucking in our bellies; it's not a long term solution...

Agreed, but since we haven't done the cut-over yet...

nfergu · 2014-07-08T22:43:17Z

I've removed the JAR file, but don't want to push at the moment as ADAM doesn't build for me. Jenkins is complaining as well (https://amplab.cs.berkeley.edu/jenkins/job/ADAM/49/) so it's not just me! From a quick look it looks like a change bdg-formats repository broke it (removal of the group ID from ADAMRecord).

As a side note regarding the 64K file limit on JARs, I think this has gone away in Java 7. Apparently Java 7 supports zip64: https://blogs.oracle.com/xuemingshen/entry/zip64_support_for_4g_zipfile

massie · 2014-07-08T23:34:22Z

@nfergu Go ahead and push the changes. I've reverted the commit that broke the build in bdg-formats.

Thanks for your contribution, Neil. Good stuff.

nfergu · 2014-07-09T06:56:51Z

OK, latest changes pushed (removal of ascii table JAR file and addition of a simple class to render ascii tables)

AmplabJenkins · 2014-07-09T07:04:36Z

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/ADAM-prb/77/

Add aggregation and display of metrics obtained from Spark

fnothaft · 2014-07-09T15:35:07Z

Merged! Awesome work @nfergu!

massie · 2014-07-09T15:42:15Z

pom.xml

@@ -278,6 +278,11 @@
            <id>hadoop-bam</id>
            <url>http://hadoop-bam.sourceforge.net/maven/</url>
        </repository>
+        <repository>


This can be removed now correct?

massie · 2014-07-09T15:50:20Z

Exemplary open-source work, Neil! Thanks for the contribution.

Running ./scripts/format-source on files checked-in by previous commit (

79797fb

498d837)

massie reviewed Jul 6, 2014
View reviewed changes

nfergu added 2 commits July 6, 2014 16:48

Logging metrics to the Spark logger instead of stdout

587809d

Set a larger output buffer for printing metrics

2ec39c0

Changes based on review of PR 292 (Add aggregation and display of met…

d31e641

…rics obtained from Spark)

Removed btc ascii table library and added class to render ascii tables

5bbf42c

fnothaft added a commit that referenced this pull request Jul 9, 2014

Merge pull request #293 from nfergu/spark-metrics

9c31cda

Add aggregation and display of metrics obtained from Spark

fnothaft merged commit 9c31cda into bigdatagenomics:master Jul 9, 2014

massie reviewed Jul 9, 2014
View reviewed changes

Add aggregation and display of metrics obtained from Spark #293

Add aggregation and display of metrics obtained from Spark #293

Conversation

nfergu commented Jul 6, 2014

AmplabJenkins commented Jul 6, 2014

fnothaft commented Jul 6, 2014

AmplabJenkins commented Jul 6, 2014

fnothaft commented Jul 6, 2014

AmplabJenkins commented Jul 6, 2014

fnothaft commented Jul 6, 2014

AmplabJenkins commented Jul 6, 2014

fnothaft commented Jul 6, 2014

nfergu commented Jul 6, 2014

AmplabJenkins commented Jul 6, 2014

massie Jul 6, 2014

Choose a reason for hiding this comment

fnothaft Jul 6, 2014

Choose a reason for hiding this comment

massie Jul 7, 2014

Choose a reason for hiding this comment

carlyeks Jul 7, 2014

Choose a reason for hiding this comment

nfergu Jul 7, 2014

Choose a reason for hiding this comment

fnothaft commented Jul 6, 2014

AmplabJenkins commented Jul 6, 2014

AmplabJenkins commented Jul 6, 2014

nfergu commented Jul 6, 2014

fnothaft commented Jul 6, 2014

nfergu commented Jul 6, 2014

nfergu commented Jul 7, 2014

AmplabJenkins commented Jul 7, 2014

tdanford commented Jul 7, 2014

fnothaft commented Jul 7, 2014

tdanford commented Jul 7, 2014

massie commented Jul 7, 2014

nfergu commented Jul 7, 2014

tdanford commented Jul 8, 2014

fnothaft commented Jul 8, 2014

tdanford commented Jul 8, 2014

nfergu commented Jul 8, 2014

massie commented Jul 8, 2014

nfergu commented Jul 9, 2014

AmplabJenkins commented Jul 9, 2014

fnothaft commented Jul 9, 2014

massie Jul 9, 2014

Choose a reason for hiding this comment

fnothaft Jul 9, 2014

Choose a reason for hiding this comment

massie commented Jul 9, 2014