Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace hard-coded sequence utils with an Alphabet model #410

Closed
tdanford opened this issue Oct 9, 2014 · 6 comments
Closed

Replace hard-coded sequence utils with an Alphabet model #410

tdanford opened this issue Oct 9, 2014 · 6 comments

Comments

@tdanford
Copy link
Contributor

tdanford commented Oct 9, 2014

As part of the work on Gene models, we introduced the SequenceUtils object (in adam.utils), which is just a hacky container for some simple "complement" operations on DNA sequences themselves.

This should be replaced by an actual Alphabet model, with support for ambiguity codes, different alphabets, etc.

@ansalaza
Copy link

I am interested in picking up this issue. I was thinking of creating an abstract class and extending it with appropriate methods/functions for DNA and RNA sequence and then adding IUPAC codes. Does this approach address this issue? It would also mean changing some code in /adam-core/src/main/scala/org/bdgenomics/adam/models/Gene.scala

@fnothaft
Copy link
Member

There had been discussion about using the Alphabet classes from BioJava. However, BioJava is LGPL so we can't package it under Apache 2... @heuermh are you aware of any non-LGPL toolkits that implement Alphabet functionality?

@heuermh
Copy link
Member

heuermh commented Dec 26, 2014

You're right, BioJava is LGPL and unfortunately is not explicit as to which version is in use. I tried to add "either version XXX of the License, or (at your option) any later version" language at some point but was unsuccessful.

I don't understand though, what the conflict with packaging?

Would LGPL version 3 or later be compatible with packaging under Apache 2?

@Agent007
Copy link

@fnothaft
Copy link
Member

@Agent007 nice! Thanks for the heads up there; I wasn't aware of that, but it's definitely worth a look.

bryanjj added a commit to bryanjj/adam that referenced this issue Apr 30, 2015
…cs#410

removed the SequenceUtils class and updated all references.  ran mvn verify and all tests pass.

ran ./scripts/format-source

make changes as suggested in the pull request:
1. unknown characters should throw illegalArgumentException instead of keyNotFound
2. case-insensitive changed to mean that it will map both lower and upper case characters to the same symbol instead of mapping lower to lower and upper to upper

clean up import

squashed commits
fnothaft pushed a commit that referenced this issue May 4, 2015
removed the SequenceUtils class and updated all references.  ran mvn verify and all tests pass.

ran ./scripts/format-source

make changes as suggested in the pull request:
1. unknown characters should throw illegalArgumentException instead of keyNotFound
2. case-insensitive changed to mean that it will map both lower and upper case characters to the same symbol instead of mapping lower to lower and upper to upper

clean up import

squashed commits
@fnothaft
Copy link
Member

fnothaft commented May 4, 2015

Closed by #653.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants