Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Writing to a BAM file with adamSAMSave consistently fails #721

Closed
danvk opened this issue Jul 1, 2015 · 3 comments
Closed

Writing to a BAM file with adamSAMSave consistently fails #721

danvk opened this issue Jul 1, 2015 · 3 comments

Comments

@danvk
Copy link

danvk commented Jul 1, 2015

I'm running this code on a yarn cluster. It's trying to filter a BAM file to just those alignments which are either on chr22 or have a mate on chr22.

override def run(args: Arguments, sc: SparkContext): Unit = {
  val filterContig = args.filterContig
  val alignments = sc.loadAlignments(args.reads)
  val matchingAlignments = alignments.filter(matchesContig(_, filterContig))
  matchingAlignments.persist()
  println("Found " + matchingAlignments.count() + " alignments with   one pair in " + filterContig)
  matchingAlignments.coalesce(10).adamSAMSave(args.outputPath, asSam = false)
}

I'm consistently getting this error:

Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 5 in stage 5.0 failed 4 times, most recent failure: Lost task 5.3 in stage 5.0 (TID 706, demeter-csmaz08-10.demeter.hpc.mssm.edu): java.lang.AssertionError: assertion failed: Cannot return header if not attached.

My command line is this:

spark-submit --master yarn --deploy-mode client --executor-memory 16g --driver-memory 10g --num-executors 1000 --executor-cores 1 --driver-java-options "-Dyarn.resourcemanager.am.max-attempts=1 -Dlog4j.configuration=scripts/log4j.properties" --class org.hammerlab.guacamole.Guacamole --verbose target/guacamole-with-dependencies-0.0.1-SNAPSHOT.jar structural-variant --reads hdfs:///datasets/dream/data/synthetic-challenge-4/synthetic.challenge.set4.tumour.bam --filter-contig 22 --out hdfs:///user/vanded03/synth4.tumor.chr22+mate.bam

(the input is from the dream challenge)

Would this be expected to work? cc @ryan-williams

@arahuja
Copy link
Contributor

arahuja commented Jul 6, 2015

I was seeing the same issue in #676 - which was apparently fixed, but I haven't checked since.

@danvk
Copy link
Author

danvk commented Jul 6, 2015

@arahuja I believe that issue was specifically when you used .coalesce(1). I ran out of memory when I tried that, so I'm using .coalesce(10) and running into this issue.

@ryan-williams
Copy link
Member

Closing as a ~dupe of #676

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants