Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ADAM-461] Fix ReferenceRegion and ReferencePosition impl #469

Merged
merged 1 commit into from
Nov 17, 2014

Conversation

laserson
Copy link
Contributor

@laserson laserson commented Nov 7, 2014

This commit mainly does 3 things:

  1. Fixes impl of ReferenceRegion.width.
  2. Removes Options from ReferencePosition.
  3. Corrected impl of liftOver. However, it now has different semantics: in all cases, the list of regions must be provided in genome order.
  4. Simplifies structure of case classes.

Fixes #461.

@laserson
Copy link
Contributor Author

laserson commented Nov 7, 2014

@tdanford please review closely :)

@laserson
Copy link
Contributor Author

laserson commented Nov 7, 2014

also, having a list of exons for a reverse-strand transcript still makes more sense to me to be in genome-order, and traversing it backwards (from 5' to 3'). What do you think?

}
}

case class ReferencePositionWithOrientation(refPos: Option[ReferencePosition],
case class ReferencePositionWithOrientation(refPos: ReferencePosition,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1; this shouldn't have been an option.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 (did I write this code originally? if so, it was me being even more of a Scala-n00b than I am now.)

@fnothaft
Copy link
Member

fnothaft commented Nov 7, 2014

also, having a list of exons for a reverse-strand transcript still makes more sense to me to be in genome-order, and traversing it backwards (from 5' to 3'). What do you think?

Both ways seem reasonable to me. Why do you prefer them to be genome-ordered, instead of transcript-ordered?

@@ -29,10 +29,6 @@ class IndelTableSuite extends SparkFunSuite {
assert(indelTable.getIndelsInRegion(ReferenceRegion("1", 0L, 2000L)).length === 1)
}

test("check for indels in a contig that doesn't exist") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why'd this test get dropped?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What exactly is this testing? Does it make sense to instantiate an empty ReferenceRegion?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The important thing this is testing is that no indels exist on a contig ("0") that doesn't exist. You are correct though; it shouldn't be empty.

@laserson
Copy link
Contributor Author

laserson commented Nov 7, 2014

ReferenceRegion enforces that start < end, so it seems like big/little endian-style confusion. To avoid that, my preference is that all objects are oriented in a way so that if you just glue them together, they are correct. Put another way, it's confusing to traverse the list of exons in order, but the sequence of the exon backwards.

@fnothaft
Copy link
Member

fnothaft commented Nov 7, 2014

Put another way, it's confusing to traverse the list of exons in order, but the sequence of the exon backwards.

That's a good TL;DR. I would agree with your point.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/384/
Test PASSed.

@@ -30,7 +30,7 @@ class IndelTableSuite extends SparkFunSuite {
}

test("check for indels in a contig that doesn't exist") {
assert(indelTable.getIndelsInRegion(ReferenceRegion("0", 0L, 0L)).length === 0)
assert(indelTable.getIndelsInRegion(ReferenceRegion("0", 0L, 1L)).length === 0)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, test is back, and passes now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent. Thank you for catching the empty region!

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/385/
Test PASSed.

* strand, else it will be on the positive strand.
* @return Returns an oriented reference region.
* @param referenceName The name of the sequence (chromosome) in the reference genome
* @param start The 0-based residue-coordinate for the start of the region
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@massie and I were talking about this in IRC, we really need to explain somewhere not in code what our coordinate conventions are. I mean, I like the comments, but we also need non-code versions of the same comments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strongly agree with that.

Note, btw, that in my mental model, region.start should always be <
region.end.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed -- the convention of "end < start" to designate negative-strand regions is messed up

@tdanford
Copy link
Contributor

I've started a rudimentary non-code writeup of the coordinate convention question here:
https://github.com/bigdatagenomics/adam/wiki/Coordinate-Conventions

Please feel free to (a) edit it, (b) tell me I'm wrong and get me to edit it some more, or (c) adapt it as content in some other location (the docs/ directory, Matt?)

@fnothaft
Copy link
Member

Please feel free to (a) edit it, (b) tell me I'm wrong and get me to edit it some more, or (c) adapt it as content in some other location (the docs/ directory, Matt?)

I think the docs directory would be the best approach. We're trying to move documentation to there, so that it is autogenerated on each release.

@fnothaft
Copy link
Member

@laserson can you rebase? Then, I'll merge.

@laserson
Copy link
Contributor Author

Done.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/402/

Build result: FAILURE

GitHub pull request #469 of commit 619ec42 automatically merged.[EnvInject] - Loading node environment variables.Building remotely on amp-jenkins-slave-01 (centos) in workspace /home/jenkins/workspace/ADAM-prb > git rev-parse --is-inside-work-tree # timeout=10Fetching changes from the remote Git repository > git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > git --version # timeout=10 > git fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ > git rev-parse origin/pr/469/merge^{commit} # timeout=10Checking out Revision 207a1b3277d39d915e80d7e2f49fe5e5979a07f4 (detached) > git config core.sparsecheckout # timeout=10 > git checkout -f 207a1b3277d39d915e80d7e2f49fe5e5979a07f4 > git rev-list cc683a5 # timeout=10Triggering ADAM-prb » 2.3.0,centosTriggering ADAM-prb » 1.0.4,centosTriggering ADAM-prb » 2.2.0,centosADAM-prb » 2.3.0,centos completed with result FAILUREADAM-prb » 1.0.4,centos completed with result FAILUREADAM-prb » 2.2.0,centos completed with result FAILURE
Test FAILed.

@fnothaft
Copy link
Member

Can you run the format-source script?

@laserson
Copy link
Contributor Author

Done.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/403/

Build result: ABORTED

GitHub pull request #469 of commit 9b8c7b0 automatically merged.[EnvInject] - Loading node environment variables.Building remotely on amp-jenkins-slave-01 (centos) in workspace /home/jenkins/workspace/ADAM-prb > git rev-parse --is-inside-work-tree # timeout=10Fetching changes from the remote Git repository > git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > git --version # timeout=10 > git fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ > git rev-parse origin/pr/469/merge^{commit} # timeout=10Checking out Revision ba61a5e (detached) > git config core.sparsecheckout # timeout=10 > git checkout -f ba61a5e > git rev-list cc683a5 # timeout=10Triggering ADAM-prb » 2.3.0,centosTriggering ADAM-prb » 1.0.4,centosTriggering ADAM-prb » 2.2.0,centosADAM-prb » 2.3.0,centos completed with result ABORTEDADAM-prb » 1.0.4,centos completed with result ABORTEDADAM-prb » 2.2.0,centos completed with result ABORTED
Test FAILed.

@fnothaft
Copy link
Member

Jenkins, retest this please.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/404/

Build result: ABORTED

GitHub pull request #469 of commit 9b8c7b0 automatically merged.[EnvInject] - Loading node environment variables.Building remotely on amp-jenkins-slave-01 (centos) in workspace /home/jenkins/workspace/ADAM-prb > git rev-parse --is-inside-work-tree # timeout=10Fetching changes from the remote Git repository > git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > git --version # timeout=10 > git fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ > git rev-parse origin/pr/469/merge^{commit} # timeout=10Checking out Revision ba61a5e (detached) > git config core.sparsecheckout # timeout=10 > git checkout -f ba61a5e > git rev-list cc683a5 # timeout=10Triggering ADAM-prb » 2.3.0,centosTriggering ADAM-prb » 1.0.4,centosTriggering ADAM-prb » 2.2.0,centosADAM-prb » 2.3.0,centos completed with result ABORTEDADAM-prb » 1.0.4,centos completed with result ABORTEDADAM-prb » 2.2.0,centos completed with result ABORTED
Test FAILed.

@fnothaft
Copy link
Member

Jenkins, retest this please.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/405/

Build result: ABORTED

GitHub pull request #469 of commit 9b8c7b0 automatically merged.[EnvInject] - Loading node environment variables.Building remotely on amp-jenkins-slave-01 (centos) in workspace /home/jenkins/workspace/ADAM-prb > git rev-parse --is-inside-work-tree # timeout=10Fetching changes from the remote Git repository > git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > git --version # timeout=10 > git fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ > git rev-parse origin/pr/469/merge^{commit} # timeout=10Checking out Revision ba61a5e (detached) > git config core.sparsecheckout # timeout=10 > git checkout -f ba61a5e > git rev-list cc683a5 # timeout=10Triggering ADAM-prb » 2.3.0,centosTriggering ADAM-prb » 1.0.4,centosTriggering ADAM-prb » 2.2.0,centosADAM-prb » 2.3.0,centos completed with result ABORTEDADAM-prb » 1.0.4,centos completed with result ABORTEDADAM-prb » 2.2.0,centos completed with result ABORTED
Test FAILed.

@fnothaft
Copy link
Member

@sknapp Can you take a look? This build has timed out several times. I'll test on my local machine to see if there's a problem with this changeset.

@laserson
Copy link
Contributor Author

Builds fine on my local machine.

@laserson
Copy link
Contributor Author

If it's come to building locally :( any chance we can get this in as well?

@fnothaft
Copy link
Member

OK, I've upped the build timeout. Can you rebase this?

@laserson
Copy link
Contributor Author

Rebased.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/412/

Build result: ABORTED

GitHub pull request #469 of commit 4ec66c7 automatically merged.[EnvInject] - Loading node environment variables.Building remotely on amp-jenkins-slave-01 (centos) in workspace /home/jenkins/workspace/ADAM-prb > git rev-parse --is-inside-work-tree # timeout=10Fetching changes from the remote Git repository > git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > git --version # timeout=10 > git fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ > git rev-parse origin/pr/469/merge^{commit} # timeout=10Checking out Revision 28dc0d1063cc42a77951c19afe5841252369016e (detached) > git config core.sparsecheckout # timeout=10 > git checkout -f 28dc0d1063cc42a77951c19afe5841252369016e > git rev-list f211b54 # timeout=10Triggering ADAM-prb » 2.3.0,centosTriggering ADAM-prb » 1.0.4,centosTriggering ADAM-prb » 2.2.0,centosADAM-prb » 2.3.0,centos completed with result ABORTEDADAM-prb » 1.0.4,centos completed with result ABORTEDADAM-prb » 2.2.0,centos completed with result ABORTED
Test FAILed.

@fnothaft
Copy link
Member

Jenkins, retest this please.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/413/

Build result: FAILURE

GitHub pull request #469 of commit 4ec66c7 automatically merged.[EnvInject] - Loading node environment variables.Building remotely on amp-jenkins-slave-01 (centos) in workspace /home/jenkins/workspace/ADAM-prb > git rev-parse --is-inside-work-tree # timeout=10Fetching changes from the remote Git repository > git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > git --version # timeout=10 > git fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ > git rev-parse origin/pr/469/merge^{commit} # timeout=10Checking out Revision 28dc0d1063cc42a77951c19afe5841252369016e (detached) > git config core.sparsecheckout # timeout=10 > git checkout -f 28dc0d1063cc42a77951c19afe5841252369016e > git rev-list f211b54 # timeout=10Triggering ADAM-prb » 2.3.0,centosTriggering ADAM-prb » 1.0.4,centosTriggering ADAM-prb » 2.2.0,centosADAM-prb » 2.3.0,centos completed with result FAILUREADAM-prb » 1.0.4,centos completed with result FAILUREADAM-prb » 2.2.0,centos completed with result FAILURE
Test FAILed.

@fnothaft
Copy link
Member

@laserson can you run scripts/format-source?

This commit mainly does 3 things:

1. Fixes impl of `ReferenceRegion.width`.

2. Removes `Option`s from `ReferencePosition`.

3. Corrected impl of liftOver.  However, it now has different semantics: in all cases, the list of regions must be provided in genome order.

4. Simplifies structure of case classes.

Fixes bigdatagenomics#461.
@laserson
Copy link
Contributor Author

argh! sorry...fixed it

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/414/
Test PASSed.

fnothaft added a commit that referenced this pull request Nov 17, 2014
[ADAM-461] Fix ReferenceRegion and ReferencePosition impl
@fnothaft fnothaft merged commit 51610d1 into bigdatagenomics:master Nov 17, 2014
@fnothaft
Copy link
Member

Merged! Thanks @laserson, and thanks for the patience with the build timeouts...

@laserson laserson deleted the ADAM-461 branch December 2, 2014 03:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ReferenceRegion implements width incorrectly
4 participants