Add schema for sample #84

heuermh · 2016-06-06T18:36:57Z

AmplabJenkins · 2016-06-06T18:37:18Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/bdg-formats-prb/101/
Test PASSed.

jpdna · 2016-06-07T00:34:09Z

see my comment bigdatagenomics/adam#1039 (comment)
about moving sampleDescription and processingDescription into this Sample

jpdna · 2016-06-07T00:51:56Z

@heuermh - if you want to keep this PR focused on schema you can assign an issue to me to update the ETL code to load and save using the new Sample and with these fields in Genotype moved.

heuermh · 2016-06-07T00:53:09Z

How are those fields populated? What do they mean? Do they map to SRA metadata fields, which per earlier conversation and the doc comments all go into attributes?

jpdna · 2016-06-07T01:11:11Z

How are those fields populated?

VCF spec, in context of "5.4.10 Sample mixtures" has

##SAMPLE=<ID=Blood,Genomes=Germline,Mixture=1.,Description="Patient germline genome">

but I'm not so sure that is going into one of these fields.
I'd suspect they don't get populated now.
However, these two fields sampleDescription and processingDescription seems reasonable enough in how they are currently described in comments in bdg-formats genotype and not expensive if they are moved to Sample - and can be null

heuermh · 2016-06-07T01:23:06Z

The cardinality for Genomes, Mixture, and Description is 0..*, so they would need to go in array fields. Or alternatively a single array of Genome records, which would have name, mixture, and description fields.

However, the doc already specifically recommends those go in attributes, where cardinality doesn't matter (repeated keys are ok).

jpdna · 2016-06-07T01:25:18Z

the doc already specifically recommends those go in attributes

putting in attributes is fine with me

heuermh · 2016-06-07T01:26:17Z

Although now that I've said that repeated keys in attributes are ok, they are not, since we're using an avro map<string>. In which case last in wins.

jpdna · 2016-06-07T01:43:38Z

I'm not so convinced that these two fields are coming from this sample mixture related tag anyhow, or that they have any source in VCF currently, so maybe wait to worry about modeling Genomes, Mixture, Description cardinality for now until we know more.

I don't think these fields get populated anyhow - I am more concerned for the principle that in case sample level metadata like this appears in future and somehow are shoe-horned into these fields that it not appear in our inner-inner loop inside of Genotype

fnothaft · 2016-06-15T18:38:17Z

Do we want to get this into the next bdg-formats release? CC @heuermh @jpdna

heuermh · 2016-06-15T19:39:44Z

Yes, and #83. I'll rebase to resolve conflicts in a sec

heuermh · 2016-06-15T19:46:09Z

Attributes with duplicate keys are going to get clobbered. That wasn't a major issue in the Feature record, where this also came up, but will be here. The cardinality of the VCF sample attributes and nearly all of the SRA attributes are 0..*.

Is this a reasonable workaround?

  // ...
  array<Attribute> attributes = [];
}

record Attribute {
  string key;
  string value;
}

or do we want to reopen discussion about a bunch of array fields?

AmplabJenkins · 2016-06-15T19:47:18Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/bdg-formats-prb/102/
Test PASSed.

AmplabJenkins · 2016-06-21T22:42:19Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/bdg-formats-prb/104/
Test PASSed.

heuermh · 2016-06-27T19:48:59Z

Thanks!

heuermh changed the title ~~Adding schema for sample~~ Add schema for sample Jun 6, 2016

heuermh force-pushed the sample branch from 80a9b26 to a3203f4 Compare June 15, 2016 19:44

Adding schema for sample

bb85178

heuermh force-pushed the sample branch from a3203f4 to bb85178 Compare June 21, 2016 22:40

heuermh modified the milestone: 0.8.1 Jun 22, 2016

fnothaft merged commit 244e736 into bigdatagenomics:master Jun 27, 2016

heuermh deleted the sample branch June 27, 2016 19:48

heuermh mentioned this pull request Jun 27, 2016

VCF sample metadata - proposal for a GenotypedSampleMetadata object bigdatagenomics/adam#1039

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add schema for sample #84

Add schema for sample #84

heuermh commented Jun 6, 2016

AmplabJenkins commented Jun 6, 2016

jpdna commented Jun 7, 2016

jpdna commented Jun 7, 2016

heuermh commented Jun 7, 2016

jpdna commented Jun 7, 2016 •

edited

Loading

heuermh commented Jun 7, 2016 •

edited

Loading

jpdna commented Jun 7, 2016

heuermh commented Jun 7, 2016

jpdna commented Jun 7, 2016

fnothaft commented Jun 15, 2016

heuermh commented Jun 15, 2016

heuermh commented Jun 15, 2016

AmplabJenkins commented Jun 15, 2016

AmplabJenkins commented Jun 21, 2016

heuermh commented Jun 27, 2016

Add schema for sample #84

Add schema for sample #84

Conversation

heuermh commented Jun 6, 2016

AmplabJenkins commented Jun 6, 2016

jpdna commented Jun 7, 2016

jpdna commented Jun 7, 2016

heuermh commented Jun 7, 2016

jpdna commented Jun 7, 2016 • edited Loading

heuermh commented Jun 7, 2016 • edited Loading

jpdna commented Jun 7, 2016

heuermh commented Jun 7, 2016

jpdna commented Jun 7, 2016

fnothaft commented Jun 15, 2016

heuermh commented Jun 15, 2016

heuermh commented Jun 15, 2016

AmplabJenkins commented Jun 15, 2016

AmplabJenkins commented Jun 21, 2016

heuermh commented Jun 27, 2016

jpdna commented Jun 7, 2016 •

edited

Loading

heuermh commented Jun 7, 2016 •

edited

Loading