Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ADAM-353] Fixing issue with SAM/BAM/VCF header attachment when running distributed #354

Merged

Conversation

fnothaft
Copy link
Member

Fixes issue #353. Runs header attachment per partition; not the cleanest fix, but it'll do. Modifies the SAMHeaderWritable class; without this modification, I was running into serialization issues.

@ryan-williams Can you give this a shot on your end? I've tried this on EC2 and it seems to work OK but would like to confirm that it works OK for you before merging it.

}
protected val rgs = RecordGroupDictionary.fromSAMHeader(hdr)

// recreate header when requested to get around header not being serializable,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove trailing comma?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes! Thanks @hammer!

@fnothaft
Copy link
Member Author

I've added a commit here that fixes #351 as well; we weren't handing symbolic alleles correctly. I'm glad to move that into it's own PR if people would prefer, LMK.

@fnothaft
Copy link
Member Author

Rebased and squashed down to one commit per issue (#353 and #351).

@fnothaft fnothaft force-pushed the fix-distributed-legacy-output branch from c588a34 to a6f04c4 Compare August 29, 2014 22:32
@fnothaft
Copy link
Member Author

Rebased on master.

@@ -24,7 +24,7 @@
<spark.version>1.0.1</spark.version>
<parquet.version>1.4.3</parquet.version>
<!-- Edit the following line to configure the Hadoop (HDFS) version. -->
<hadoop.version>2.2.0</hadoop.version>
<hadoop.version>1.0.4</hadoop.version>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this intensional?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*intentional

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, sorry, did that for testing on EC2. I'll move it back.

@fnothaft fnothaft force-pushed the fix-distributed-legacy-output branch from a6f04c4 to 8672534 Compare September 2, 2014 20:37
@fnothaft
Copy link
Member Author

fnothaft commented Sep 2, 2014

Rebased, and fixed hadoop version issue.

assert(allele.isNonReference && !allele.isSymbolic)
val variant = createADAMVariant(vc, Some(allele.getBaseString))
val genotypes = extractNonReferenceGenotypes(vc, variant, calling_annotations)
assert(allele.isNonReference,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean to drop the check if the allele is not symbolic?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes; this causes the problem in #351 with structural variants. I've added bigdatagenomics/bdg-formats#27 and #358 which are necessary to cleanly fix issue #351.

massie added a commit that referenced this pull request Sep 2, 2014
[ADAM-353] Fixing issue with SAM/BAM/VCF header attachment when running distributed
@massie massie merged commit f50af87 into bigdatagenomics:master Sep 2, 2014
@massie
Copy link
Member

massie commented Sep 2, 2014

Thanks, Frank!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants