-
Notifications
You must be signed in to change notification settings - Fork 309
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance improvements to SAM reading and processing #2280
Conversation
Can one of the admins verify this patch? |
Jenkins, test this please |
Jenkins, add to whitelist |
Hello @benraha! Thank you for this pull request, that sounds like an impressive gain! Please give me a heads up when you believe this is ready for review and merge. |
Test FAILed. Build result: FAILURE[...truncated 5 lines...]Cloning the remote Git repositoryCloning repository https://github.com/bigdatagenomics/adam.git > git init /home/jenkins/workspace/ADAM-prb # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > git --version # timeout=10 > git fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/heads/:refs/remotes/origin/ # timeout=15 > git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10 > git config --add remote.origin.fetch +refs/heads/:refs/remotes/origin/ # timeout=10 > git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > git fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ # timeout=15 > git rev-parse origin/pr/2280/merge^{commit} # timeout=10 > git branch -a -v --no-abbrev --contains 61a1585 # timeout=10Checking out Revision 61a1585 (origin/pr/2280/merge) > git config core.sparsecheckout # timeout=10 > git checkout -f 61a1585bacd0628cb688f5810739383657c2d6b5First time build. Skipping changelog.Triggering ADAM-prb ? 2.7.5,2.12,3.0.1,ubuntuTriggering ADAM-prb ? 2.7.5,2.11,2.4.7,ubuntuTriggering ADAM-prb ? 2.7.5,2.12,2.4.7,ubuntuADAM-prb ? 2.7.5,2.12,3.0.1,ubuntu completed with result SUCCESSADAM-prb ? 2.7.5,2.11,2.4.7,ubuntu completed with result SUCCESSADAM-prb ? 2.7.5,2.12,2.4.7,ubuntu completed with result FAILURENotifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'Test FAILed. |
Jenkins, test this please |
Test PASSed. |
@heuermh Hi! I'm ready to review and merge this one. Let me know if you prefer two separate PR since I'm touching two not related parts of the code. |
@@ -184,7 +184,7 @@ class SequenceDictionary(val records: Vector[SequenceRecord]) extends Serializab | |||
* @return Returns a SAM formatted sequence dictionary. | |||
*/ | |||
def toSAMSequenceDictionary: SAMSequenceDictionary = { | |||
new SAMSequenceDictionary(records.iterator.map(_.toSAMSequenceRecord).toList) | |||
new SAMSequenceDictionary(records.map(_.toSAMSequenceRecord).asJava) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good catch, did you have profiling results leading you here, or were you able to eyeball it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Profiler, of course!
@@ -36,16 +36,20 @@ case class Attribute(tag: String, tagType: TagType.Value, value: Any) { | |||
val byteSequenceTypes = Array(TagType.NumericByteSequence, TagType.NumericUnsignedByteSequence) | |||
val intSequenceTypes = Array(TagType.NumericIntSequence, TagType.NumericUnsignedIntSequence) | |||
val shortSequenceTypes = Array(TagType.NumericShortSequence, TagType.NumericUnsignedShortSequence) | |||
val sb = new StringBuilder | |||
sb.append(tag) | |||
sb.append(":") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These could be all one line for code style purposes, but I don't mind either way
I am ok with that, will squash and merge the commits. Thank you again, @benraha! |
Thanks! My pleasure! |
These changes significantly affect the performance of processing SAM/BAM files (x60 in my test cases).
The changes:
Keep up the great work!