Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support locus predicate in Transform #1540

Merged

Conversation

fnothaft
Copy link
Member

@fnothaft fnothaft commented May 18, 2017

Resolves #1047, #1539. H/T to @ryan-williams for the really useful genomic-loci library. One pending task:

  • Update Transform markdown docs.
  • Add NucleotideContigFragment

Code is ready for review.

@fnothaft fnothaft added this to the 0.23.0 milestone May 18, 2017
@coveralls
Copy link

coveralls commented May 18, 2017

Coverage Status

Coverage increased (+0.7%) to 82.709% when pulling a50c351 on fnothaft:issues/1047-1539-locus-predicate into fe1cba2 on bigdatagenomics:master.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/2026/
Test PASSed.

@heuermh
Copy link
Member

heuermh commented May 22, 2017

@ryan-williams Would adding this dependency complicate your builds too much? E.g. if ADAM depends on genomic-loci that may peg any projects of yours downstream of ADAM to a version you don't like.

Locus/loci as terms aren't used very consistently across our API and docs, might region predicate be better?

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/2048/

Build result: FAILURE

[...truncated 15 lines...] > /home/jenkins/git2/bin/git fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ # timeout=15 > /home/jenkins/git2/bin/git rev-parse origin/pr/1540/merge^{commit} # timeout=10 > /home/jenkins/git2/bin/git branch -a -v --no-abbrev --contains a12689ff51f47565fec54ba02070e64e3970c07d # timeout=10Checking out Revision a12689ff51f47565fec54ba02070e64e3970c07d (origin/pr/1540/merge) > /home/jenkins/git2/bin/git config core.sparsecheckout # timeout=10 > /home/jenkins/git2/bin/git checkout -f a12689ff51f47565fec54ba02070e64e3970c07dFirst time build. Skipping changelog.Triggering ADAM-prb ? 2.3.0,2.11,2.0.0,centosTriggering ADAM-prb ? 2.6.0,2.11,1.6.1,centosTriggering ADAM-prb ? 2.6.0,2.11,2.0.0,centosTriggering ADAM-prb ? 2.6.0,2.10,1.6.1,centosTriggering ADAM-prb ? 2.6.0,2.10,2.0.0,centosTriggering ADAM-prb ? 2.3.0,2.11,1.6.1,centosTriggering ADAM-prb ? 2.3.0,2.10,1.6.1,centosTriggering ADAM-prb ? 2.3.0,2.10,2.0.0,centosADAM-prb ? 2.3.0,2.11,2.0.0,centos completed with result FAILUREADAM-prb ? 2.6.0,2.11,1.6.1,centos completed with result FAILUREADAM-prb ? 2.6.0,2.11,2.0.0,centos completed with result FAILUREADAM-prb ? 2.6.0,2.10,1.6.1,centos completed with result FAILUREADAM-prb ? 2.6.0,2.10,2.0.0,centos completed with result FAILUREADAM-prb ? 2.3.0,2.11,1.6.1,centos completed with result FAILUREADAM-prb ? 2.3.0,2.10,1.6.1,centos completed with result FAILUREADAM-prb ? 2.3.0,2.10,2.0.0,centos completed with result FAILURENotifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

@fnothaft fnothaft force-pushed the issues/1047-1539-locus-predicate branch from fb52ef4 to e368695 Compare May 24, 2017 07:17
@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/2051/

Build result: FAILURE

[...truncated 15 lines...] > /home/jenkins/git2/bin/git fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ # timeout=15 > /home/jenkins/git2/bin/git rev-parse origin/pr/1540/merge^{commit} # timeout=10 > /home/jenkins/git2/bin/git branch -a -v --no-abbrev --contains 597f25a69ce56cc05f7923b7fbf7e2a1041d3d1d # timeout=10Checking out Revision 597f25a69ce56cc05f7923b7fbf7e2a1041d3d1d (origin/pr/1540/merge) > /home/jenkins/git2/bin/git config core.sparsecheckout # timeout=10 > /home/jenkins/git2/bin/git checkout -f 597f25a69ce56cc05f7923b7fbf7e2a1041d3d1dFirst time build. Skipping changelog.Triggering ADAM-prb ? 2.3.0,2.11,2.0.0,centosTriggering ADAM-prb ? 2.6.0,2.11,1.6.1,centosTriggering ADAM-prb ? 2.6.0,2.11,2.0.0,centosTriggering ADAM-prb ? 2.6.0,2.10,1.6.1,centosTriggering ADAM-prb ? 2.6.0,2.10,2.0.0,centosTriggering ADAM-prb ? 2.3.0,2.11,1.6.1,centosTriggering ADAM-prb ? 2.3.0,2.10,1.6.1,centosTriggering ADAM-prb ? 2.3.0,2.10,2.0.0,centosADAM-prb ? 2.3.0,2.11,2.0.0,centos completed with result FAILUREADAM-prb ? 2.6.0,2.11,1.6.1,centos completed with result FAILUREADAM-prb ? 2.6.0,2.11,2.0.0,centos completed with result FAILUREADAM-prb ? 2.6.0,2.10,1.6.1,centos completed with result FAILUREADAM-prb ? 2.6.0,2.10,2.0.0,centos completed with result FAILUREADAM-prb ? 2.3.0,2.11,1.6.1,centos completed with result FAILUREADAM-prb ? 2.3.0,2.10,1.6.1,centos completed with result FAILUREADAM-prb ? 2.3.0,2.10,2.0.0,centos completed with result FAILURENotifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

@fnothaft fnothaft force-pushed the issues/1047-1539-locus-predicate branch from e368695 to 3cb7dd1 Compare May 24, 2017 07:59
@fnothaft
Copy link
Member Author

This is good to go on my end.

@coveralls
Copy link

coveralls commented May 24, 2017

Coverage Status

Coverage increased (+0.7%) to 82.754% when pulling 3cb7dd1 on fnothaft:issues/1047-1539-locus-predicate into 2820e94 on bigdatagenomics:master.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/2053/
Test PASSed.

@heuermh
Copy link
Member

heuermh commented Jun 6, 2017

Locus/loci as terms aren't used very consistently across our API and docs, might region predicate be better?

@fnothaft @devin-petersohn @akmorrow13 Thoughts on this question?

@fnothaft
Copy link
Member Author

fnothaft commented Jun 6, 2017

Locus/loci as terms aren't used very consistently across our API and docs, might region predicate be better?

Thoughts on this question?

Either way is fine with me.

@fnothaft fnothaft force-pushed the issues/1047-1539-locus-predicate branch from 3cb7dd1 to dc74620 Compare June 22, 2017 07:56
@coveralls
Copy link

coveralls commented Jun 22, 2017

Coverage Status

Changes Unknown when pulling dc74620 on fnothaft:issues/1047-1539-locus-predicate into ** on bigdatagenomics:master**.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/2120/
Test PASSed.

@fnothaft
Copy link
Member Author

Ping for review...

@heuermh
Copy link
Member

heuermh commented Jul 7, 2017

I'll vote for region predicate. Haven't heard from @ryan-williams so I assume the circular dependency will be fine. I looked at the dependency tree resulting from adding the dependency and it looks fine to me.

@fnothaft fnothaft force-pushed the issues/1047-1539-locus-predicate branch from dc74620 to 5e5cf8f Compare July 7, 2017 04:17
@fnothaft
Copy link
Member Author

fnothaft commented Jul 7, 2017

Rebased, and changed locus predicate to region predicate.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/2178/

Build result: FAILURE

[...truncated 15 lines...] > /home/jenkins/git2/bin/git fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ # timeout=15 > /home/jenkins/git2/bin/git rev-parse origin/pr/1540/merge^{commit} # timeout=10 > /home/jenkins/git2/bin/git branch -a -v --no-abbrev --contains 2cab2548ad207f8ab34c8dcb31688f0bd0696e67 # timeout=10Checking out Revision 2cab2548ad207f8ab34c8dcb31688f0bd0696e67 (origin/pr/1540/merge) > /home/jenkins/git2/bin/git config core.sparsecheckout # timeout=10 > /home/jenkins/git2/bin/git checkout -f 2cab2548ad207f8ab34c8dcb31688f0bd0696e67First time build. Skipping changelog.Triggering ADAM-prb ? 2.3.0,2.10,1.6.1,centosTriggering ADAM-prb ? 2.6.0,2.11,1.6.1,centosTriggering ADAM-prb ? 2.6.0,2.10,2.0.0,centosTriggering ADAM-prb ? 2.6.0,2.10,1.6.1,centosTriggering ADAM-prb ? 2.6.0,2.11,2.0.0,centosTriggering ADAM-prb ? 2.3.0,2.11,1.6.1,centosTriggering ADAM-prb ? 2.3.0,2.11,2.0.0,centosTriggering ADAM-prb ? 2.3.0,2.10,2.0.0,centosADAM-prb ? 2.3.0,2.10,1.6.1,centos completed with result FAILUREADAM-prb ? 2.6.0,2.11,1.6.1,centos completed with result FAILUREADAM-prb ? 2.6.0,2.10,2.0.0,centos completed with result FAILUREADAM-prb ? 2.6.0,2.10,1.6.1,centos completed with result FAILUREADAM-prb ? 2.6.0,2.11,2.0.0,centos completed with result FAILUREADAM-prb ? 2.3.0,2.11,1.6.1,centos completed with result FAILUREADAM-prb ? 2.3.0,2.11,2.0.0,centos completed with result FAILUREADAM-prb ? 2.3.0,2.10,2.0.0,centos completed with result FAILURENotifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

Copy link
Member

@heuermh heuermh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor doc nits, then looks good to me

@@ -146,6 +146,14 @@ These options fall into several general categories:
not load the `attributes` or `origQual` fields of the `AlignmentRecord`.
* `-aligned_read_predicate`: If loading as Parquet, only loads aligned
reads.
* `-locus_predicate`: A string indicating that reads should be filtered on
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missed replace here...

separated list, where each element in the list takes the form:
* `contig:pos` for a single genomic position, or
* `contig:start-end` for a genomic range with closed start and open end
E.g., `-locus_predicate 1:100,2:1000-2000` would filter all reads that
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...and here

@fnothaft fnothaft force-pushed the issues/1047-1539-locus-predicate branch from 5e5cf8f to 9f339d7 Compare July 7, 2017 06:25
@fnothaft
Copy link
Member Author

fnothaft commented Jul 7, 2017

Fixed the doc nits.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/2179/

Build result: FAILURE

[...truncated 15 lines...] > /home/jenkins/git2/bin/git fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ # timeout=15 > /home/jenkins/git2/bin/git rev-parse origin/pr/1540/merge^{commit} # timeout=10 > /home/jenkins/git2/bin/git branch -a -v --no-abbrev --contains 8390f86 # timeout=10Checking out Revision 8390f86 (origin/pr/1540/merge) > /home/jenkins/git2/bin/git config core.sparsecheckout # timeout=10 > /home/jenkins/git2/bin/git checkout -f 8390f86fd0645ac71582b358189401c6a934887bFirst time build. Skipping changelog.Triggering ADAM-prb ? 2.3.0,2.10,1.6.1,centosTriggering ADAM-prb ? 2.6.0,2.11,1.6.1,centosTriggering ADAM-prb ? 2.6.0,2.10,2.0.0,centosTriggering ADAM-prb ? 2.6.0,2.10,1.6.1,centosTriggering ADAM-prb ? 2.6.0,2.11,2.0.0,centosTriggering ADAM-prb ? 2.3.0,2.11,1.6.1,centosTriggering ADAM-prb ? 2.3.0,2.11,2.0.0,centosTriggering ADAM-prb ? 2.3.0,2.10,2.0.0,centosADAM-prb ? 2.3.0,2.10,1.6.1,centos completed with result FAILUREADAM-prb ? 2.6.0,2.11,1.6.1,centos completed with result FAILUREADAM-prb ? 2.6.0,2.10,2.0.0,centos completed with result FAILUREADAM-prb ? 2.6.0,2.10,1.6.1,centos completed with result FAILUREADAM-prb ? 2.6.0,2.11,2.0.0,centos completed with result FAILUREADAM-prb ? 2.3.0,2.11,1.6.1,centos completed with result FAILUREADAM-prb ? 2.3.0,2.11,2.0.0,centos completed with result FAILUREADAM-prb ? 2.3.0,2.10,2.0.0,centos completed with result FAILURENotifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

@heuermh
Copy link
Member

heuermh commented Jul 8, 2017

Jenkins, retest this please

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/2181/

Build result: FAILURE

[...truncated 15 lines...] > /home/jenkins/git2/bin/git fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ # timeout=15 > /home/jenkins/git2/bin/git rev-parse origin/pr/1540/merge^{commit} # timeout=10 > /home/jenkins/git2/bin/git branch -a -v --no-abbrev --contains 1aeb4d8db31bc9cc7c33b6625b61164cbd7c016b # timeout=10Checking out Revision 1aeb4d8db31bc9cc7c33b6625b61164cbd7c016b (origin/pr/1540/merge) > /home/jenkins/git2/bin/git config core.sparsecheckout # timeout=10 > /home/jenkins/git2/bin/git checkout -f 1aeb4d8db31bc9cc7c33b6625b61164cbd7c016bFirst time build. Skipping changelog.Triggering ADAM-prb ? 2.3.0,2.10,1.6.1,centosTriggering ADAM-prb ? 2.6.0,2.11,1.6.1,centosTriggering ADAM-prb ? 2.6.0,2.10,2.0.0,centosTriggering ADAM-prb ? 2.6.0,2.10,1.6.1,centosTriggering ADAM-prb ? 2.6.0,2.11,2.0.0,centosTriggering ADAM-prb ? 2.3.0,2.11,1.6.1,centosTriggering ADAM-prb ? 2.3.0,2.11,2.0.0,centosTriggering ADAM-prb ? 2.3.0,2.10,2.0.0,centosADAM-prb ? 2.3.0,2.10,1.6.1,centos completed with result FAILUREADAM-prb ? 2.6.0,2.11,1.6.1,centos completed with result FAILUREADAM-prb ? 2.6.0,2.10,2.0.0,centos completed with result FAILUREADAM-prb ? 2.6.0,2.10,1.6.1,centos completed with result FAILUREADAM-prb ? 2.6.0,2.11,2.0.0,centos completed with result FAILUREADAM-prb ? 2.3.0,2.11,1.6.1,centos completed with result FAILUREADAM-prb ? 2.3.0,2.11,2.0.0,centos completed with result FAILUREADAM-prb ? 2.3.0,2.10,2.0.0,centos completed with result FAILURENotifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

@@ -70,4 +70,24 @@ class TransformAlignmentsSuite extends ADAMFunSuite {
assert(qualityScoreCounts(30) === 92899)
assert(qualityScoreCounts(10) === 7101)
}

sparkTest("run locus predicate") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, missed these before

val outputPath2 = tmpFile("predicate.2.adam")
val outputPath3 = tmpFile("predicate.3.adam")
TransformAlignments(Array(inputPath, outputPath1,
"-locus_predicate", "1:0-200,chr2:0-1000",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-locus_predicate-region_predicate, and below

@fnothaft fnothaft force-pushed the issues/1047-1539-locus-predicate branch from 9f339d7 to e10723b Compare July 8, 2017 04:00
@fnothaft
Copy link
Member Author

fnothaft commented Jul 8, 2017

Fixed tests and rebased @heuermh.

@coveralls
Copy link

coveralls commented Jul 8, 2017

Coverage Status

Coverage increased (+0.7%) to 84.165% when pulling e10723b on fnothaft:issues/1047-1539-locus-predicate into 0e92cfe on bigdatagenomics:master.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/2183/
Test PASSed.

@heuermh heuermh merged commit acb0c07 into bigdatagenomics:master Jul 8, 2017
@heuermh
Copy link
Member

heuermh commented Jul 8, 2017

Thank you, @fnothaft!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants