Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vcf work rdd master merge #124

Merged
merged 22 commits into from
Feb 24, 2014
Merged

Vcf work rdd master merge #124

merged 22 commits into from
Feb 24, 2014

Conversation

nealsid
Copy link

@nealsid nealsid commented Feb 20, 2014

This is a merge of master into the vcf-work branch. It contains all the changes of vcf-work-rdd, so it has more than necessary to be reviewed and should drop in size once vcf-work-rdd is merged into vcf-work.

fnothaft and others added 19 commits February 11, 2014 09:19
Adding ability to convert reference FASTA files for nucleotide sequences
Add initial documentation on contributing
…mmand.

This commit fixes issue 92 (#92).

The old style of encoding the "optional fields" from the SAM/BAM was to store
them as key=value pairs in the ADAMRecord.attributes string. However, this
loses information about the _type_ of the tag/value, which is necessary if
we want to reconstruct the original value type (for example, for re-exporting
BAM files from ADAM files).

This update is non-backwards-compatible, changing the format of the attributes
field to tag:type:value and introducing a new Attribute class for parsing and
handling these values.  It also adds functions to AdamRDDFunctions to allow for
filtering and subsetting of reads based on their tags, or to count the number of
distinct tags or tag-values across a set of reads.
Encoding tag types in the ADAMRecord attributes, adding the 'tags' command
Cleaning up change documentation.
We've been getting intermittent errors, with respect to Spark being unable to bind to a port,
in the context of repeated unit tests.  This apparently is a known problem, see the thread here:
  http://blog.quantifind.com/posts/spark-unit-test/
and the follow-up from Matei here:
  https://groups.google.com/forum/#!topic/spark-users/MeVzgoJXm8I

The upshot is that we need to clearProperty('spark.driver.port') when we shut down our sparkContext
after a sparkTest.
Added the port erasure to SparkFunSuite's cleanup.
- A plugin class can be defined outside of the Adam jar, but run through
  the normal AdamMain
- An example plugin, the "Take10Plugin" is included in the test
  directory
- Adds a test suite to the cli module, which can reference the items
  available in the core module
- Adds notion of AccessControl to control the records which can be
  accessed
- Functional test
- More comments
Adding new PluginExecutor command
… vcf-work-rdd-master-merge

Conflicts:
	adam-cli/src/main/scala/edu/berkeley/cs/amplab/adam/cli/Adam2Vcf.scala
	adam-core/src/main/scala/edu/berkeley/cs/amplab/adam/converters/VariantContextConverter.scala
	adam-core/src/main/scala/edu/berkeley/cs/amplab/adam/models/ADAMVariantContext.scala
	adam-core/src/main/scala/edu/berkeley/cs/amplab/adam/rdd/AdamContext.scala
	adam-core/src/main/scala/edu/berkeley/cs/amplab/adam/rdd/AdamRDDFunctions.scala
	adam-core/src/test/scala/edu/berkeley/cs/amplab/adam/converters/VariantContextConverterSuite.scala
	adam-core/src/test/scala/edu/berkeley/cs/amplab/adam/models/ADAMVariantContextSuite.scala
	adam-format/src/main/resources/avro/adam.avdl
@AmplabJenkins
Copy link

One or more automated tests failed
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/ADAM-prb/141/

@hammer
Copy link
Contributor

hammer commented Feb 20, 2014

Can we file an issue to improve the filter representation after the topic branch is merged into master?

@AmplabJenkins
Copy link

One or more automated tests failed
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/ADAM-prb/143/

@massie
Copy link
Member

massie commented Feb 20, 2014

@nealsid The tests are failing

[ERROR] /root/workspace/ADAM-prb/adam-core/src/main/scala/edu/berkeley/cs/amplab/adam/models/ReferencePosition.scala:108: value getReferenceId is not a member of edu.berkeley.cs.amplab.adam.avro.ADAMVariant
[ERROR]     new ReferencePosition(variant.getReferenceId, variant.getPosition)
[ERROR]                                   ^
[ERROR] /root/workspace/ADAM-prb/adam-core/src/main/scala/edu/berkeley/cs/amplab/adam/models/ReferencePosition.scala:121: value getReferenceId is not a member of edu.berkeley.cs.amplab.adam.avro.ADAMGenotype
[ERROR]     new ReferencePosition(genotype.getReferenceId, genotype.getPosition)
[ERROR]                                    ^
[ERROR] /root/workspace/ADAM-prb/adam-core/src/main/scala/edu/berkeley/cs/amplab/adam/models/ReferencePosition.scala:121: value getPosition is not a member of edu.berkeley.cs.amplab.adam.avro.ADAMGenotype
[ERROR]     new ReferencePosition(genotype.getReferenceId, genotype.getPosition)
[ERROR]                                                             ^
[ERROR] /root/workspace/ADAM-prb/adam-core/src/main/scala/edu/berkeley/cs/amplab/adam/rdd/AdamContext.scala:296: value remapReferenceId is not a member of org.apache.spark.rdd.RDD[edu.berkeley.cs.amplab.adam.avro.ADAMRecord]
[ERROR]             else v._2.remapReferenceId(v._1.mapTo(head._1).toMap)(sc)
[ERROR]                       ^
[ERROR] /root/workspace/ADAM-prb/adam-core/src/main/scala/edu/berkeley/cs/amplab/adam/rdd/AdamRDDFunctions.scala:215: not found: value MapTools
[ERROR]         MapTools.add(map1, map2)
[ERROR]         ^

Is that an expected failure? I know that some of the intermediate work may not pass all tests.

@nealsid
Copy link
Author

nealsid commented Feb 20, 2014

I think it make sense to fix the build now, since it's close to being ready to be merged back to master, so I'll ping the pull request when it's green and ready for merging into vcf-work

@nealsid
Copy link
Author

nealsid commented Feb 21, 2014

Still needs work, unfortunately. Merging ReferenceRegion the new Variant schema took a little work, and the tests still aren't passing.

@AmplabJenkins
Copy link

One or more automated tests failed
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/ADAM-prb/146/

… and left outer join bug)

Update code to use integers for reference/contig ids
@nealsid
Copy link
Author

nealsid commented Feb 24, 2014

Jenkins, test this please

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/ADAM-prb/158/

@massie
Copy link
Member

massie commented Feb 24, 2014

Nice! All the tests pass now.

Thanks, Neal!

massie added a commit that referenced this pull request Feb 24, 2014
@massie massie merged commit d59aeaa into bigdatagenomics:vcf-work Feb 24, 2014
@nealsid nealsid deleted the vcf-work-rdd-master-merge branch February 25, 2014 03:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants