Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New, improved, rewritten assembler #127

Closed
wants to merge 43 commits into from

Conversation

fnothaft
Copy link
Member

@fnothaft fnothaft commented Jan 3, 2015

Resolves #97. Eliminates most of the old assembler code. Adds a new assembler that emits observations directly from the de Brujin graph. In the old assembler, we generated haplotypes and realigned the reads to the haplotypes before calculating genotype likelihoods. In this assembler, we can handle longer haplotypes, and we eliminate the need to realign reads.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/avocado-prb/36/
Test PASSed.

@pgrosu
Copy link

pgrosu commented Jan 3, 2015

Hi Frank,

Happy New Year! Thank you for the nice updates, though I seem to be getting some weird errors when I run mvn package:

$ mvn package
[INFO] Scanning for projects...
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Build Order:
[INFO]
[INFO] avocado: A Variant Caller, Distributed
[INFO] avocado: sufficient statistics packaging code/format
[INFO] avocado-core: A Variant Caller, Distributed
[INFO]
[INFO] Using the builder org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder with a thread count of 1
[INFO]
[INFO] ------------------------------------------------------------------------
[INFO] Building avocado: A Variant Caller, Distributed 0.0.2
[INFO] ------------------------------------------------------------------------
Downloading: http://hadoop-bam.sourceforge.net/maven/edu/berkeley/cs/amplab/adam/adam-format/0.7.1-SNAPSHOT/maven-metadata.xml
Downloading: http://oss.sonatype.org/content/repositories/snapshots/edu/berkeley/cs/amplab/adam/adam-format/0.7.1-SNAPSHOT/maven-metadata.xml
Downloading: http://people.apache.org/repo/m2-snapshot-repository/edu/berkeley/cs/amplab/adam/adam-format/0.7.1-SNAPSHOT/maven-metadata.xml
Downloading: http://oss.sonatype.org/content/repositories/snapshots/edu/berkeley/cs/amplab/adam/adam-format/0.7.1-SNAPSHOT/adam-format-0.7.1-SNAPSHOT.pom
Downloading: http://hadoop-bam.sourceforge.net/maven/edu/berkeley/cs/amplab/adam/adam-format/0.7.1-SNAPSHOT/adam-format-0.7.1-SNAPSHOT.pom
Downloading: http://people.apache.org/repo/m2-snapshot-repository/edu/berkeley/cs/amplab/adam/adam-format/0.7.1-SNAPSHOT/adam-format-0.7.1-SNAPSHOT.pom
[WARNING] The POM for edu.berkeley.cs.amplab.adam:adam-format:jar:0.7.1-SNAPSHOT is missing, no dependency information available
Downloading: http://oss.sonatype.org/content/repositories/snapshots/edu/berkeley/cs/amplab/adam/adam-core/0.7.1-SNAPSHOT/maven-metadata.xml
Downloading: http://hadoop-bam.sourceforge.net/maven/edu/berkeley/cs/amplab/adam/adam-core/0.7.1-SNAPSHOT/maven-metadata.xml
Downloading: http://people.apache.org/repo/m2-snapshot-repository/edu/berkeley/cs/amplab/adam/adam-core/0.7.1-SNAPSHOT/maven-metadata.xml
Downloading: http://oss.sonatype.org/content/repositories/snapshots/edu/berkeley/cs/amplab/adam/adam-core/0.7.1-SNAPSHOT/adam-core-0.7.1-SNAPSHOT.pom
Downloading: http://hadoop-bam.sourceforge.net/maven/edu/berkeley/cs/amplab/adam/adam-core/0.7.1-SNAPSHOT/adam-core-0.7.1-SNAPSHOT.pom
Downloading: http://people.apache.org/repo/m2-snapshot-repository/edu/berkeley/cs/amplab/adam/adam-core/0.7.1-SNAPSHOT/adam-core-0.7.1-SNAPSHOT.pom
[WARNING] The POM for edu.berkeley.cs.amplab.adam:adam-core:jar:0.7.1-SNAPSHOT is missing, no dependency information available
Downloading: http://oss.sonatype.org/content/repositories/snapshots/edu/berkeley/cs/amplab/adam/adam-cli/0.7.1-SNAPSHOT/maven-metadata.xml
Downloading: http://people.apache.org/repo/m2-snapshot-repository/edu/berkeley/cs/amplab/adam/adam-cli/0.7.1-SNAPSHOT/maven-metadata.xml
Downloading: http://hadoop-bam.sourceforge.net/maven/edu/berkeley/cs/amplab/adam/adam-cli/0.7.1-SNAPSHOT/maven-metadata.xml
Downloading: http://oss.sonatype.org/content/repositories/snapshots/edu/berkeley/cs/amplab/adam/adam-cli/0.7.1-SNAPSHOT/adam-cli-0.7.1-SNAPSHOT.pom
Downloading: http://hadoop-bam.sourceforge.net/maven/edu/berkeley/cs/amplab/adam/adam-cli/0.7.1-SNAPSHOT/adam-cli-0.7.1-SNAPSHOT.pom
Downloading: http://people.apache.org/repo/m2-snapshot-repository/edu/berkeley/cs/amplab/adam/adam-cli/0.7.1-SNAPSHOT/adam-cli-0.7.1-SNAPSHOT.pom
[WARNING] The POM for edu.berkeley.cs.amplab.adam:adam-cli:jar:0.7.1-SNAPSHOT is missing, no dependency information available
Downloading: http://oss.sonatype.org/content/repositories/snapshots/args4j/args4j/2.0.23/args4j-2.0.23.pom
Downloading: http://hadoop-bam.sourceforge.net/maven/args4j/args4j/2.0.23/args4j-2.0.23.pom
Downloading: http://people.apache.org/repo/m2-snapshot-repository/args4j/args4j/2.0.23/args4j-2.0.23.pom
Downloading: http://repo.maven.apache.org/maven2/args4j/args4j/2.0.23/args4j-2.0.23.pom
Downloaded: http://repo.maven.apache.org/maven2/args4j/args4j/2.0.23/args4j-2.0.23.pom (2 KB at 14.7 KB/sec)
Downloading: http://oss.sonatype.org/content/repositories/snapshots/args4j/args4j-site/2.0.23/args4j-site-2.0.23.pom
Downloading: http://hadoop-bam.sourceforge.net/maven/args4j/args4j-site/2.0.23/args4j-site-2.0.23.pom
Downloading: http://people.apache.org/repo/m2-snapshot-repository/args4j/args4j-site/2.0.23/args4j-site-2.0.23.pom
Downloading: http://repo.maven.apache.org/maven2/args4j/args4j-site/2.0.23/args4j-site-2.0.23.pom
Downloaded: http://repo.maven.apache.org/maven2/args4j/args4j-site/2.0.23/args4j-site-2.0.23.pom (5 KB at 149.3 KB/sec)
Downloading: http://oss.sonatype.org/content/repositories/snapshots/org/kohsuke/pom/3/pom-3.pom
Downloading: http://hadoop-bam.sourceforge.net/maven/org/kohsuke/pom/3/pom-3.pom
Downloading: http://people.apache.org/repo/m2-snapshot-repository/org/kohsuke/pom/3/pom-3.pom
Downloading: http://repo.maven.apache.org/maven2/org/kohsuke/pom/3/pom-3.pom
Downloaded: http://repo.maven.apache.org/maven2/org/kohsuke/pom/3/pom-3.pom (4 KB at 124.9 KB/sec)
Downloading: http://oss.sonatype.org/content/repositories/snapshots/edu/berkeley/cs/amplab/adam/adam-format/0.7.1-SNAPSHOT/adam-format-0.7.1-SNAPSHOT.jar
Downloading: http://oss.sonatype.org/content/repositories/snapshots/edu/berkeley/cs/amplab/adam/adam-core/0.7.1-SNAPSHOT/adam-core-0.7.1-SNAPSHOT.jar
Downloading: http://oss.sonatype.org/content/repositories/snapshots/edu/berkeley/cs/amplab/adam/adam-cli/0.7.1-SNAPSHOT/adam-cli-0.7.1-SNAPSHOT.jar
Downloading: http://oss.sonatype.org/content/repositories/snapshots/args4j/args4j/2.0.23/args4j-2.0.23.jar
Downloading: http://hadoop-bam.sourceforge.net/maven/edu/berkeley/cs/amplab/adam/adam-format/0.7.1-SNAPSHOT/adam-format-0.7.1-SNAPSHOT.jar
Downloading: http://hadoop-bam.sourceforge.net/maven/edu/berkeley/cs/amplab/adam/adam-core/0.7.1-SNAPSHOT/adam-core-0.7.1-SNAPSHOT.jar
Downloading: http://hadoop-bam.sourceforge.net/maven/edu/berkeley/cs/amplab/adam/adam-cli/0.7.1-SNAPSHOT/adam-cli-0.7.1-SNAPSHOT.jar
Downloading: http://hadoop-bam.sourceforge.net/maven/args4j/args4j/2.0.23/args4j-2.0.23.jar
Downloading: http://people.apache.org/repo/m2-snapshot-repository/edu/berkeley/cs/amplab/adam/adam-format/0.7.1-SNAPSHOT/adam-format-0.7.1-SNAPSHOT.jar
Downloading: http://people.apache.org/repo/m2-snapshot-repository/edu/berkeley/cs/amplab/adam/adam-core/0.7.1-SNAPSHOT/adam-core-0.7.1-SNAPSHOT.jar
Downloading: http://people.apache.org/repo/m2-snapshot-repository/edu/berkeley/cs/amplab/adam/adam-cli/0.7.1-SNAPSHOT/adam-cli-0.7.1-SNAPSHOT.jar
Downloading: http://people.apache.org/repo/m2-snapshot-repository/args4j/args4j/2.0.23/args4j-2.0.23.jar
Downloading: http://repo.maven.apache.org/maven2/args4j/args4j/2.0.23/args4j-2.0.23.jar
Downloaded: http://repo.maven.apache.org/maven2/args4j/args4j/2.0.23/args4j-2.0.23.jar (66 KB at 1251.3 KB/sec)
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] avocado: A Variant Caller, Distributed ............ FAILURE [ 12.523 s]
[INFO] avocado: sufficient statistics packaging code/format  SKIPPED
[INFO] avocado-core: A Variant Caller, Distributed ....... SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 12.768 s
[INFO] Finished at: 2015-01-02T23:25:00-05:00
[INFO] Final Memory: 8M/1042M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal on project avocado: Could not resolve dependencies for project edu.berkeley.cs.amplab.avocado:avocado:pom:0.0.2: The following artifacts could not be resolved: edu.berkeley.cs.amplab.adam:adam-format:jar:0.7.1-SNAPSHOT, edu.berkeley.cs.amplab.adam:adam-core:jar:0.7.1-SNAPSHOT, edu.berkeley.cs.amplab.adam:adam-cli:jar:0.7.1-SNAPSHOT: Could not find artifact edu.berkeley.cs.amplab.adam:adam-format:jar:0.7.1-SNAPSHOT in Sonatype (http://oss.sonatype.org/content/repositories/snapshots/) -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException
$

Thanks,
~p

@fnothaft
Copy link
Member Author

fnothaft commented Jan 3, 2015

@pgrosu thanks; happy new years to you too!

As for the compile issue, it looks like you're building from an out-of-date repository. The edu.berkeley.cs.amplab dependencies moved to org.bdgenomics a bit back; if you pull and update your branch you should be able to compile fine.

@pgrosu
Copy link

pgrosu commented Jan 5, 2015

I think I did a little too much celebrating during the New Year, and only ran the clone without the pull. Running the following steps made it all better :)

$ git clone https://github.com/fnothaft/avocado threaded-assembler
$ cd threaded-assembler/
$ git pull https://github.com/fnothaft/avocado threaded-assembler
$ mvn package

Thanks Frank!
~p

@fnothaft
Copy link
Member Author

fnothaft commented Jan 5, 2015

Ah! No worries; glad to hear things are working now.

@fnothaft
Copy link
Member Author

I added documentation, so this now depends on #129.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/avocado-prb/40/

Build result: FAILURE

GitHub pull request #127 of commit f22c078 automatically merged.[EnvInject] - Loading node environment variables.Building remotely on amp-jenkins-worker-02 (centos) in workspace /home/jenkins/workspace/avocado-prb > git rev-parse --is-inside-work-tree # timeout=10Fetching changes from the remote Git repository > git config remote.origin.url https://github.com/bigdatagenomics/avocado.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/avocado.git > git --version # timeout=10 > git fetch --tags --progress https://github.com/bigdatagenomics/avocado.git +refs/pull/:refs/remotes/origin/pr/ > git rev-parse origin/pr/127/merge^{commit} # timeout=10Checking out Revision d16281d (detached) > git config core.sparsecheckout # timeout=10 > git checkout -f d16281d > git rev-list 343b70d # timeout=10Triggering avocado-prb ? 1.0.4,centosTriggering avocado-prb ? 2.3.0,centosTriggering avocado-prb ? 2.2.0,centosavocado-prb ? 1.0.4,centos completed with result FAILUREavocado-prb ? 2.3.0,centos completed with result FAILUREavocado-prb ? 2.2.0,centos completed with result FAILURE
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/avocado-prb/42/

Build result: FAILURE

GitHub pull request #127 of commit 5b81229 automatically merged.[EnvInject] - Loading node environment variables.Building remotely on amp-jenkins-worker-02 (centos) in workspace /home/jenkins/workspace/avocado-prb > git rev-parse --is-inside-work-tree # timeout=10Fetching changes from the remote Git repository > git config remote.origin.url https://github.com/bigdatagenomics/avocado.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/avocado.git > git --version # timeout=10 > git fetch --tags --progress https://github.com/bigdatagenomics/avocado.git +refs/pull/:refs/remotes/origin/pr/ > git rev-parse origin/pr/127/merge^{commit} # timeout=10Checking out Revision d9a4a2d (detached) > git config core.sparsecheckout # timeout=10 > git checkout -f d9a4a2d > git rev-list 343b70d # timeout=10Triggering avocado-prb ? 2.3.0,centosTriggering avocado-prb ? 1.0.4,centosTriggering avocado-prb ? 2.2.0,centosavocado-prb ? 2.3.0,centos completed with result FAILUREavocado-prb ? 1.0.4,centos completed with result FAILUREavocado-prb ? 2.2.0,centos completed with result FAILURE
Test FAILed.

</dependency>
<dependency>
<groupId>org.seqdoop</groupId>
<artifactId>cofoja</artifactId>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a separate dist of the cofoja library than what we were depending on elsewhere? Did cofoja (do we know) ever get itself in working order for Java 1.8? That was an issue, a while ago...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nah, this is the same Cofoja. Cofoja is a won't fix for Java 8, IIRC, but was removed from the latest version of HTSJDK/Picard. Howaaaaver, no one has cut a new release of HTSJDK yet. See samtools/htsjdk#115 (comment).

@fnothaft
Copy link
Member Author

fnothaft commented Feb 3, 2015

Jenkins, retest this please.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/avocado-prb/45/

Build result: FAILURE

GitHub pull request #127 of commit 5b81229 automatically merged.[EnvInject] - Loading node environment variables.Building remotely on amp-jenkins-worker-02 (centos) in workspace /home/jenkins/workspace/avocado-prb > git rev-parse --is-inside-work-tree # timeout=10Fetching changes from the remote Git repository > git config remote.origin.url https://github.com/bigdatagenomics/avocado.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/avocado.git > git --version # timeout=10 > git fetch --tags --progress https://github.com/bigdatagenomics/avocado.git +refs/pull/:refs/remotes/origin/pr/ > git rev-parse origin/pr/127/merge^{commit} # timeout=10Checking out Revision d9a4a2d (detached) > git config core.sparsecheckout # timeout=10 > git checkout -f d9a4a2d > git rev-list 63afcba # timeout=10Triggering avocado-prb ? 2.3.0,centosTriggering avocado-prb ? 1.0.4,centosTriggering avocado-prb ? 2.2.0,centosavocado-prb ? 2.3.0,centos completed with result FAILUREavocado-prb ? 1.0.4,centos completed with result FAILUREavocado-prb ? 2.2.0,centos completed with result FAILURE
Test FAILed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/avocado-prb/46/
Test PASSed.

val kmerSequences = KmerGraph.getKmerSequences(startFlank, kmerLength) ++ KmerGraph.getKmerSequences(endFlank, kmerLength)
addSequencesToGraph(kmerSequences, true)
override def toString(): String = {
"Sources: " + sourceKmers.map(_.kmerSeq).reduce(_ + ", " + _) + "\n" +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same mkString comment as above.

@fnothaft
Copy link
Member Author

fnothaft commented Feb 5, 2015

Rebased on #131 and added an active region threshold.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/avocado-prb/48/
Test PASSed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/avocado-prb/49/

Build result: FAILURE

GitHub pull request #127 of commit fea3a9c automatically merged.[EnvInject] - Loading node environment variables.Building remotely on amp-jenkins-worker-02 (centos) in workspace /home/jenkins/workspace/avocado-prb > git rev-parse --is-inside-work-tree # timeout=10Fetching changes from the remote Git repository > git config remote.origin.url https://github.com/bigdatagenomics/avocado.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/avocado.git > git --version # timeout=10 > git fetch --tags --progress https://github.com/bigdatagenomics/avocado.git +refs/pull/:refs/remotes/origin/pr/ > git rev-parse origin/pr/127/merge^{commit} # timeout=10Checking out Revision a6dc5f0 (detached) > git config core.sparsecheckout # timeout=10 > git checkout -f a6dc5f0 > git rev-list 63afcba # timeout=10Triggering avocado-prb ? 1.0.4,centosTriggering avocado-prb ? 2.2.0,centosTriggering avocado-prb ? 2.3.0,centosavocado-prb ? 1.0.4,centos completed with result FAILUREavocado-prb ? 2.2.0,centos completed with result FAILUREavocado-prb ? 2.3.0,centos completed with result FAILURE
Test FAILed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/avocado-prb/53/
Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/avocado-prb/56/
Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/avocado-prb/57/
Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/avocado-prb/58/
Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/avocado-prb/89/
Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/avocado-prb/90/
Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/avocado-prb/91/
Test PASSed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add back the local assembler
4 participants