Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add aggregation and display of metrics obtained from Spark #293

Merged
merged 6 commits into from
Jul 9, 2014

Conversation

nfergu
Copy link
Contributor

@nfergu nfergu commented Jul 6, 2014

This change adds an new command-line option called "print_metrics" which causes a listener to be registered with Spark, which accumulates metrics when each task and stage end. When a command is complete the metrics are logged to stdout.

Some notes:

  • The "print_metrics" option is added in the Args4jBase class. I thought about adding it in SparkArgs, but the intention is that it can be used for non Spark-related metrics in the future.
  • The metrics are currently logged to stdout. They could be logged to SLF4J instead if that's better -- let me know and I'll change it if necessary.
  • I've added a dependency on this library: http://code.google.com/p/java-ascii-table/ for printing ascii tables (Apache-licensed). It's not on any public maven repos, so I've added a local maven repo at /repo which contains the JAR file and POM. This seems to work fine both with the command line and Intellij, but let me know if there are any issues.

This change adds an new command-line option called "print_metrics" which causes a listener to be registered with Spark, which accumulates metrics when each task and stage end. When a command is complete the metrics are logged to stdout.
@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@fnothaft
Copy link
Member

fnothaft commented Jul 6, 2014

Jenkins, add to whitelist and test this please.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@fnothaft
Copy link
Member

fnothaft commented Jul 6, 2014

Jenkins, add to whitelist.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@fnothaft
Copy link
Member

fnothaft commented Jul 6, 2014

add to whitelist

@AmplabJenkins
Copy link

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/ADAM-prb/71/

@fnothaft
Copy link
Member

fnothaft commented Jul 6, 2014

@nfergu this looks great! It looks like Jenkins is complaining about the source formatting; you should be able to fix that on your end by running ./scripts/format-source, and pushing an update.

  • The "print_metrics" option is added in the Args4jBase class. I thought about adding it in SparkArgs, but the intention is that it can be used for non Spark-related metrics in the future.

Sounds reasonable!

  • The metrics are currently logged to stdout. They could be logged to SLF4J instead if that's better -- let me know and I'll change it if necessary.

I'd nominally prefer it to be logged via the Spark Logger (org.apache.spark.Logging). Logging to stdout can be a bit of a hassle when running distributed. The text will still get logged, but you then need to fetch the output files from the distributed hosts.

  • I've added a dependency on this library: http://code.google.com/p/java-ascii-table/ for printing ascii tables (Apache-licensed). It's not on any public maven repos, so I've added a local maven repo at /repo which contains the JAR file and POM. This seems to work fine both with the command line and Intellij, but let me know if there are any issues.

Normally, we prefer not to include binary files in the repo (the JAR in this case), however, I'm not sure what's the best practice for this. Thoughts, @massie @heuermh?

@nfergu
Copy link
Contributor Author

nfergu commented Jul 6, 2014

Thanks for the comments @fnothaft.

Seems I was reading an old version of the contributing docs and hadn't run format-source. Should be fixed in the latest commit.

I'll change the logging to use the Spark logger now.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/ADAM-prb/72/

@@ -154,6 +154,14 @@
<groupId>org.apache.httpcomponents</groupId>
<artifactId>httpclient</artifactId>
</dependency>
<dependency>
<groupId>com.netflix.servo</groupId>
<artifactId>servo-core</artifactId>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the license for the servo-core and btc-ascii-table dependencies?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both are Apache 2.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

w00t

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason not to use Coda Hale's Metrics library? Asking because Spark chose that library for their monitoring.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@carlyeks -- there's a discussion about which metrics library to use on the mailing list here: https://groups.google.com/forum/#!topic/adam-developers/52tqcKD1IdU. However, the main reasons for not using Coda Hale's Metrics are that a) it doesn't provide access to the raw data, so it's not possible to aggregate data across a cluster, b) it doesn't provide total time for metrics.

@fnothaft
Copy link
Member

fnothaft commented Jul 6, 2014

@nfergu this looks great! I've left a few small comments inline.

@AmplabJenkins
Copy link

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/ADAM-prb/73/

@AmplabJenkins
Copy link

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/ADAM-prb/74/

@nfergu
Copy link
Contributor Author

nfergu commented Jul 6, 2014

Just applying code review comments, but have to go away for a while. Will push them later today.

@fnothaft
Copy link
Member

fnothaft commented Jul 6, 2014

@nfergu no problem; take your time, and thanks for going through the comments! There may be a few more folks who take a look tomorrow.

@nfergu
Copy link
Contributor Author

nfergu commented Jul 6, 2014

The Hadoop 2.3.0 build gives:

FATAL: Could not checkout null with start point 5f5ceb1efb1f20430cf6bc9ef0505bb9b7505c34
hudson.plugins.git.GitException: Could not checkout null with start point 5f5ceb1efb1f20430cf6bc9ef0505bb9b7505c34

Which is a bit mysterious. Anyone know why this might happen?

@nfergu
Copy link
Contributor Author

nfergu commented Jul 7, 2014

Just pushed changes based on @fnothaft's comments. I'm hoping that this will trigger a new build in Jenkins and fix the mysterious build issue.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/ADAM-prb/75/

@tdanford
Copy link
Contributor

tdanford commented Jul 7, 2014

Neil, it looks like this PR includes a JAR file -- is that intentional, or accidental?

@fnothaft
Copy link
Member

fnothaft commented Jul 7, 2014

@tdanford That's intentional:

  • I've added a dependency on this library: http://code.google.com/p/java-ascii-table/ for printing ascii tables (Apache-licensed). It's not on any public maven repos, so I've added a local maven repo at /repo which contains the JAR file and POM. This seems to work fine both with the command line and Intellij, but let me know if there are any issues.

@tdanford
Copy link
Contributor

tdanford commented Jul 7, 2014

Ah, I missed that bullet point above -- sorry.

How crucial is this table formatting, that we want to statically include a JAR? Any chance that just outputting tab-separated lines would do the trick too?

@massie
Copy link
Member

massie commented Jul 7, 2014

We have been trying to avoid checking in binary files to the repo since git doesn't handle them well. Even if we remove it later, it's still in our history.

Could we just remove the dependency and create the output ourselves?

@nfergu
Copy link
Contributor Author

nfergu commented Jul 7, 2014

@tdanford @massie -- I can certainly write something to format the output rather than using the library. Not sure that simple tab-separated lines will be readable enough, but shouldn't be too difficult.

@tdanford
Copy link
Contributor

tdanford commented Jul 8, 2014

Also (and this comment is directed mostly at @massie I guess) if we're adding new dependencies with each PR, we should probably be monitoring how close we're coming to the magic "64k file" limit on the zip archive, right?

@fnothaft
Copy link
Member

fnothaft commented Jul 8, 2014

Also (and this comment is directed mostly at @massie I guess) if we're adding new dependencies with each PR, we should probably be monitoring how close we're coming to the magic "64k file" limit on the zip archive, right?

It may make more sense to cut over to appassembler, and to stop worrying about the 64k limit (I'm doing this already with a downstream app). We just need to make sure we can easily use appassembler across a cluster, if we're going to do that.

Essentially, worrying about the 64k file limit is like sucking in our bellies; it's not a long term solution...

@tdanford
Copy link
Contributor

tdanford commented Jul 8, 2014

Essentially, worrying about the 64k file limit is like sucking in our bellies; it's not a long term solution...

Agreed, but since we haven't done the cut-over yet...

@nfergu
Copy link
Contributor Author

nfergu commented Jul 8, 2014

I've removed the JAR file, but don't want to push at the moment as ADAM doesn't build for me. Jenkins is complaining as well (https://amplab.cs.berkeley.edu/jenkins/job/ADAM/49/) so it's not just me! From a quick look it looks like a change bdg-formats repository broke it (removal of the group ID from ADAMRecord).

As a side note regarding the 64K file limit on JARs, I think this has gone away in Java 7. Apparently Java 7 supports zip64: https://blogs.oracle.com/xuemingshen/entry/zip64_support_for_4g_zipfile

@massie
Copy link
Member

massie commented Jul 8, 2014

@nfergu Go ahead and push the changes. I've reverted the commit that broke the build in bdg-formats.

Thanks for your contribution, Neil. Good stuff.

@nfergu
Copy link
Contributor Author

nfergu commented Jul 9, 2014

OK, latest changes pushed (removal of ascii table JAR file and addition of a simple class to render ascii tables)

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/ADAM-prb/77/

fnothaft added a commit that referenced this pull request Jul 9, 2014
Add aggregation and display of metrics obtained from Spark
@fnothaft fnothaft merged commit 9c31cda into bigdatagenomics:master Jul 9, 2014
@fnothaft
Copy link
Member

fnothaft commented Jul 9, 2014

Merged! Awesome work @nfergu!

@@ -278,6 +278,11 @@
<id>hadoop-bam</id>
<url>http://hadoop-bam.sourceforge.net/maven/</url>
</repository>
<repository>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be removed now correct?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes.

@massie
Copy link
Member

massie commented Jul 9, 2014

Exemplary open-source work, Neil! Thanks for the contribution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants