Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ADAM-1501] Compute coverage using Dataset API. #1528

Merged

Conversation

fnothaft
Copy link
Member

@fnothaft fnothaft commented May 14, 2017

Resolves #1501. Depends on #1391. Perf numbers forthcoming.

@fnothaft fnothaft added this to the 0.23.0 milestone May 14, 2017
@coveralls
Copy link

coveralls commented May 14, 2017

Coverage Status

Coverage decreased (-6.4%) to 75.579% when pulling e1d4159 on fnothaft:issues/1501-coverage-dataset into 18191f9 on bigdatagenomics:master.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/2007/
Test PASSed.

@coveralls
Copy link

coveralls commented May 15, 2017

Coverage Status

Coverage decreased (-6.4%) to 75.579% when pulling 39ce835 on fnothaft:issues/1501-coverage-dataset into 18191f9 on bigdatagenomics:master.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/2008/
Test PASSed.

@fnothaft
Copy link
Member Author

This provides a 2x speedup on a high coverage WGS dataset.

@fnothaft fnothaft force-pushed the issues/1501-coverage-dataset branch from 39ce835 to c65f144 Compare May 24, 2017 06:28
@coveralls
Copy link

coveralls commented May 24, 2017

Coverage Status

Coverage decreased (-7.1%) to 74.939% when pulling c65f144 on fnothaft:issues/1501-coverage-dataset into 2820e94 on bigdatagenomics:master.

@coveralls
Copy link

coveralls commented May 24, 2017

Coverage Status

Coverage decreased (-6.5%) to 75.518% when pulling c65f144 on fnothaft:issues/1501-coverage-dataset into 2820e94 on bigdatagenomics:master.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/2049/
Test PASSed.

@fnothaft fnothaft force-pushed the issues/1501-coverage-dataset branch from c65f144 to ce0dcbf Compare June 22, 2017 07:53
@coveralls
Copy link

coveralls commented Jun 22, 2017

Coverage Status

Changes Unknown when pulling ce0dcbf on fnothaft:issues/1501-coverage-dataset into ** on bigdatagenomics:master**.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/2121/
Test PASSed.

@fnothaft fnothaft force-pushed the issues/1501-coverage-dataset branch from ce0dcbf to 8ab6e0a Compare June 24, 2017 03:47
@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/2146/

Build result: FAILURE

[...truncated 15 lines...] > /home/jenkins/git2/bin/git fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ # timeout=15 > /home/jenkins/git2/bin/git rev-parse origin/pr/1528/merge^{commit} # timeout=10 > /home/jenkins/git2/bin/git branch -a -v --no-abbrev --contains 2279526 # timeout=10Checking out Revision 2279526 (origin/pr/1528/merge) > /home/jenkins/git2/bin/git config core.sparsecheckout # timeout=10 > /home/jenkins/git2/bin/git checkout -f 2279526 > /home/jenkins/git2/bin/git rev-list 9b78f51ed5925f3542ac6eb5cfe67458e13348c4 # timeout=10Triggering ADAM-prb ? 2.3.0,2.10,1.6.1,centosTriggering ADAM-prb ? 2.6.0,2.11,1.6.1,centosTriggering ADAM-prb ? 2.6.0,2.10,2.0.0,centosTriggering ADAM-prb ? 2.6.0,2.10,1.6.1,centosTriggering ADAM-prb ? 2.6.0,2.11,2.0.0,centosTriggering ADAM-prb ? 2.3.0,2.11,1.6.1,centosTriggering ADAM-prb ? 2.3.0,2.11,2.0.0,centosTriggering ADAM-prb ? 2.3.0,2.10,2.0.0,centosADAM-prb ? 2.3.0,2.10,1.6.1,centos completed with result FAILUREADAM-prb ? 2.6.0,2.11,1.6.1,centos completed with result FAILUREADAM-prb ? 2.6.0,2.10,2.0.0,centos completed with result FAILUREADAM-prb ? 2.6.0,2.10,1.6.1,centos completed with result FAILUREADAM-prb ? 2.6.0,2.11,2.0.0,centos completed with result FAILUREADAM-prb ? 2.3.0,2.11,1.6.1,centos completed with result FAILUREADAM-prb ? 2.3.0,2.11,2.0.0,centos completed with result FAILUREADAM-prb ? 2.3.0,2.10,2.0.0,centos completed with result FAILURENotifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

@fnothaft fnothaft force-pushed the issues/1501-coverage-dataset branch from 8ab6e0a to 903facc Compare July 11, 2017 03:58
@fnothaft
Copy link
Member Author

Rebased.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/2195/

Build result: FAILURE

[...truncated 15 lines...] > /home/jenkins/git2/bin/git fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ # timeout=15 > /home/jenkins/git2/bin/git rev-parse origin/pr/1528/merge^{commit} # timeout=10 > /home/jenkins/git2/bin/git branch -a -v --no-abbrev --contains b1b7e9a # timeout=10Checking out Revision b1b7e9a (origin/pr/1528/merge) > /home/jenkins/git2/bin/git config core.sparsecheckout # timeout=10 > /home/jenkins/git2/bin/git checkout -f b1b7e9a > /home/jenkins/git2/bin/git rev-list 2279526 # timeout=10Triggering ADAM-prb ? 2.3.0,2.10,1.6.1,centosTriggering ADAM-prb ? 2.6.0,2.11,1.6.1,centosTriggering ADAM-prb ? 2.6.0,2.10,2.0.0,centosTriggering ADAM-prb ? 2.6.0,2.10,1.6.1,centosTriggering ADAM-prb ? 2.6.0,2.11,2.0.0,centosTriggering ADAM-prb ? 2.3.0,2.11,1.6.1,centosTriggering ADAM-prb ? 2.3.0,2.11,2.0.0,centosTriggering ADAM-prb ? 2.3.0,2.10,2.0.0,centosADAM-prb ? 2.3.0,2.10,1.6.1,centos completed with result FAILUREADAM-prb ? 2.6.0,2.11,1.6.1,centos completed with result FAILUREADAM-prb ? 2.6.0,2.10,2.0.0,centos completed with result FAILUREADAM-prb ? 2.6.0,2.10,1.6.1,centos completed with result FAILUREADAM-prb ? 2.6.0,2.11,2.0.0,centos completed with result FAILUREADAM-prb ? 2.3.0,2.11,1.6.1,centos completed with result FAILUREADAM-prb ? 2.3.0,2.11,2.0.0,centos completed with result FAILUREADAM-prb ? 2.3.0,2.10,2.0.0,centos completed with result FAILURENotifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

@fnothaft fnothaft force-pushed the issues/1501-coverage-dataset branch from 903facc to b885248 Compare July 11, 2017 05:59
@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/2197/

Build result: FAILURE

[...truncated 15 lines...] > /home/jenkins/git2/bin/git fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ # timeout=15 > /home/jenkins/git2/bin/git rev-parse origin/pr/1528/merge^{commit} # timeout=10 > /home/jenkins/git2/bin/git branch -a -v --no-abbrev --contains 358225c # timeout=10Checking out Revision 358225c (origin/pr/1528/merge) > /home/jenkins/git2/bin/git config core.sparsecheckout # timeout=10 > /home/jenkins/git2/bin/git checkout -f 358225c > /home/jenkins/git2/bin/git rev-list b1b7e9a # timeout=10Triggering ADAM-prb ? 2.3.0,2.10,1.6.1,centosTriggering ADAM-prb ? 2.6.0,2.11,1.6.1,centosTriggering ADAM-prb ? 2.6.0,2.10,2.0.0,centosTriggering ADAM-prb ? 2.6.0,2.10,1.6.1,centosTriggering ADAM-prb ? 2.6.0,2.11,2.0.0,centosTriggering ADAM-prb ? 2.3.0,2.11,1.6.1,centosTriggering ADAM-prb ? 2.3.0,2.11,2.0.0,centosTriggering ADAM-prb ? 2.3.0,2.10,2.0.0,centosADAM-prb ? 2.3.0,2.10,1.6.1,centos completed with result FAILUREADAM-prb ? 2.6.0,2.11,1.6.1,centos completed with result FAILUREADAM-prb ? 2.6.0,2.10,2.0.0,centos completed with result FAILUREADAM-prb ? 2.6.0,2.10,1.6.1,centos completed with result FAILUREADAM-prb ? 2.6.0,2.11,2.0.0,centos completed with result FAILUREADAM-prb ? 2.3.0,2.11,1.6.1,centos completed with result FAILUREADAM-prb ? 2.3.0,2.11,2.0.0,centos completed with result FAILUREADAM-prb ? 2.3.0,2.10,2.0.0,centos completed with result FAILURENotifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

@fnothaft
Copy link
Member Author

Jenkins, test this please.

@coveralls
Copy link

coveralls commented Jul 11, 2017

Coverage Status

Coverage decreased (-0.02%) to 83.942% when pulling b885248 on fnothaft:issues/1501-coverage-dataset into 467db1f on bigdatagenomics:master.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/2199/
Test PASSed.

Copy link
Member

@devin-petersohn devin-petersohn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few nits, mostly conciseness. Let me know if there's a specific reason for the things I pointed out.

}
}

private case class AlignmentWindow(contigName: String, start: Long, end: Long) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is our policy on brackets for case classes with no body?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My general policy is to always put brackets on them, because you know that having brackets and an empty body will always be OK, but you could conceive that empty-body/no brackets could get deprecated someday...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

I like to add an // empty comment between the squigglies so that it is obvious they are intentionally empty, but that convention isn't used in this code base.


readMapped
}).flatMap(r => {
val t: List[Long] = List.range(r.getStart, r.getEnd)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

t -> positions (or something like that)

readMapped
}).flatMap(r => {
val t: List[Long] = List.range(r.getStart, r.getEnd)
t.map(n => (ReferenceRegion(r.getContigName, n, n + 1), 1))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is ReferencePosition more appropriate here? It is slightly more concise at least.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

.flatMap(w => {
val width = (w.end - w.start).toInt
val buffer = new Array[Coverage](width)
var idx = 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be more concise to do the following:
val positions = Array.range(w.start, w.end)
positions.map(f => Coverage(w.contigname, f, f + 1L, 1.0))

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for perf reasons. Makes a small (5%?) perf improvement to write it this way.

@fnothaft fnothaft force-pushed the issues/1501-coverage-dataset branch from b885248 to 56dd89b Compare July 11, 2017 16:39
@fnothaft
Copy link
Member Author

Re-rebased and addressed review comments.

@coveralls
Copy link

coveralls commented Jul 11, 2017

Coverage Status

Coverage decreased (-0.4%) to 83.664% when pulling 56dd89b on fnothaft:issues/1501-coverage-dataset into 324ae74 on bigdatagenomics:master.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/2204/

Build result: FAILURE

[...truncated 15 lines...] > /home/jenkins/git2/bin/git fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ # timeout=15 > /home/jenkins/git2/bin/git rev-parse origin/pr/1528/merge^{commit} # timeout=10 > /home/jenkins/git2/bin/git branch -a -v --no-abbrev --contains d2726cbf553d735190f232d2b754b7247d4d26cf # timeout=10Checking out Revision d2726cbf553d735190f232d2b754b7247d4d26cf (origin/pr/1528/merge) > /home/jenkins/git2/bin/git config core.sparsecheckout # timeout=10 > /home/jenkins/git2/bin/git checkout -f d2726cbf553d735190f232d2b754b7247d4d26cf > /home/jenkins/git2/bin/git rev-list 358225c # timeout=10Triggering ADAM-prb ? 2.3.0,2.10,1.6.1,centosTriggering ADAM-prb ? 2.3.0,2.10,2.1.0,centosTriggering ADAM-prb ? 2.6.0,2.10,2.1.0,centosTriggering ADAM-prb ? 2.6.0,2.11,1.6.1,centosTriggering ADAM-prb ? 2.6.0,2.11,2.1.0,centosTriggering ADAM-prb ? 2.3.0,2.11,2.1.0,centosTriggering ADAM-prb ? 2.6.0,2.10,1.6.1,centosTriggering ADAM-prb ? 2.3.0,2.11,1.6.1,centosADAM-prb ? 2.3.0,2.10,1.6.1,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.10,2.1.0,centos completed with result FAILUREADAM-prb ? 2.6.0,2.10,2.1.0,centos completed with result FAILUREADAM-prb ? 2.6.0,2.11,1.6.1,centos completed with result SUCCESSADAM-prb ? 2.6.0,2.11,2.1.0,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.11,2.1.0,centos completed with result SUCCESSADAM-prb ? 2.6.0,2.10,1.6.1,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.11,1.6.1,centos completed with result SUCCESSNotifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

@devin-petersohn
Copy link
Member

I am not getting these build issues locally. I'm not sure why it's failing with Jenkins, but not locally.

@fnothaft
Copy link
Member Author

fnothaft commented Jul 11, 2017

@devin-petersohn the jenkins-test script needs to move to Spark 2.1.0 for #1397, so until #1397 merges, I have to manually toggle Jenkins between 2.0.0 and 2.1.0, hence this failure.

@fnothaft
Copy link
Member Author

Jenkins, test this please.

@coveralls
Copy link

coveralls commented Jul 11, 2017

Coverage Status

Coverage decreased (-0.08%) to 84.015% when pulling 56dd89b on fnothaft:issues/1501-coverage-dataset into 324ae74 on bigdatagenomics:master.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/2205/
Test PASSed.

}
}

private case class AlignmentWindow(contigName: String, start: Long, end: Long) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

I like to add an // empty comment between the squigglies so that it is obvious they are intentionally empty, but that convention isn't used in this code base.

@fnothaft
Copy link
Member Author

Ping for merge?

@devin-petersohn devin-petersohn merged commit 238e044 into bigdatagenomics:master Jul 11, 2017
@devin-petersohn
Copy link
Member

Thanks @fnothaft

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants