Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use .adam/_{seq,rg}dict.avro paths for Avro-formatted dictionaries #978

Merged
merged 1 commit into from
Mar 29, 2016

Conversation

heuermh
Copy link
Member

@heuermh heuermh commented Mar 24, 2016

Fixes #945

This results in a Parquet folder structure of

$ ls -ls /var/folders/3y/61r1w_cs4hbdr_34nrdrbhww0000gn/T/3911097172870281306/reads12.adam
total 176
 8 -rw-r--r--   1 user  staff      8 Mar 24 17:38 ._SUCCESS.crc
 8 -rw-r--r--   1 user  staff     92 Mar 24 17:38 ._common_metadata.crc
 8 -rw-r--r--   1 user  staff    120 Mar 24 17:38 ._metadata.crc
 8 -rw-r--r--   1 user  staff     20 Mar 24 17:38 ._rgdict.avro.crc
 8 -rw-r--r--   1 user  staff     20 Mar 24 17:38 ._seqdict.avro.crc
 8 -rw-r--r--   1 user  staff    204 Mar 24 17:38 .part-r-00000.gz.parquet.crc
 0 -rw-r--r--   1 user  staff      0 Mar 24 17:38 _SUCCESS
24 -rw-r--r--   1 user  staff  10494 Mar 24 17:38 _common_metadata
32 -rw-r--r--   1 user  staff  14304 Mar 24 17:38 _metadata
 8 -rw-r--r--   1 user  staff   1247 Mar 24 17:38 _rgdict.avro
 8 -rw-r--r--   1 user  staff   1450 Mar 24 17:38 _seqdict.avro
56 -rw-r--r--   1 user  staff  24716 Mar 24 17:38 part-r-00000.gz.parquet

Another option would be .adam/.{seq,rg}dict.avro.

Other file names are mistaken by Parquet to be Parquet-formatted files and a RuntimeException is thrown, e.g.

RuntimeException: file:/var/folders/3y/61r1w_cs4hbdr_34nrdrbhww0000gn/T/
7766788398861766842/reads12.adam/seqdict.avro is not a Parquet file. expected
magic number at tail [80, 65, 82, 49] but found [-104, -80, 71, -108]

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1117/

Build result: FAILURE

[...truncated 24 lines...]Triggering ADAM-prb ? 2.6.0,2.10,1.3.1,centosTriggering ADAM-prb ? 2.6.0,2.10,1.6.0,centosTriggering ADAM-prb ? 2.6.0,2.10,1.4.1,centosTriggering ADAM-prb ? 2.6.0,2.11,1.3.1,centosTriggering ADAM-prb ? 2.3.0,2.10,1.4.1,centosTriggering ADAM-prb ? 2.3.0,2.10,1.3.1,centosTriggering ADAM-prb ? 2.3.0,2.11,1.4.1,centosTriggering ADAM-prb ? 2.3.0,2.11,1.3.1,centosTriggering ADAM-prb ? 2.3.0,2.10,1.5.2,centosADAM-prb ? 2.6.0,2.11,1.6.0,centos completed with result FAILUREADAM-prb ? 2.3.0,2.11,1.5.2,centos completed with result FAILUREADAM-prb ? 2.6.0,2.11,1.4.1,centos completed with result FAILUREADAM-prb ? 2.3.0,2.10,1.6.0,centos completed with result FAILUREADAM-prb ? 2.3.0,2.11,1.6.0,centos completed with result FAILUREADAM-prb ? 2.6.0,2.10,1.3.1,centos completed with result FAILUREADAM-prb ? 2.6.0,2.10,1.6.0,centos completed with result FAILUREADAM-prb ? 2.6.0,2.10,1.4.1,centos completed with result FAILUREADAM-prb ? 2.6.0,2.11,1.3.1,centos completed with result FAILUREADAM-prb ? 2.3.0,2.10,1.4.1,centos completed with result FAILUREADAM-prb ? 2.3.0,2.10,1.3.1,centos completed with result FAILUREADAM-prb ? 2.3.0,2.11,1.4.1,centos completed with result FAILUREADAM-prb ? 2.3.0,2.11,1.3.1,centos completed with result FAILUREADAM-prb ? 2.3.0,2.10,1.5.2,centos completed with result FAILURENotifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1118/

Build result: FAILURE

[...truncated 24 lines...]Triggering ADAM-prb ? 2.6.0,2.10,1.3.1,centosTriggering ADAM-prb ? 2.6.0,2.10,1.6.0,centosTriggering ADAM-prb ? 2.6.0,2.10,1.4.1,centosTriggering ADAM-prb ? 2.6.0,2.11,1.3.1,centosTriggering ADAM-prb ? 2.3.0,2.10,1.4.1,centosTriggering ADAM-prb ? 2.3.0,2.10,1.3.1,centosTriggering ADAM-prb ? 2.3.0,2.11,1.4.1,centosTriggering ADAM-prb ? 2.3.0,2.11,1.3.1,centosTriggering ADAM-prb ? 2.3.0,2.10,1.5.2,centosADAM-prb ? 2.6.0,2.11,1.6.0,centos completed with result FAILUREADAM-prb ? 2.3.0,2.11,1.5.2,centos completed with result FAILUREADAM-prb ? 2.6.0,2.11,1.4.1,centos completed with result FAILUREADAM-prb ? 2.3.0,2.10,1.6.0,centos completed with result FAILUREADAM-prb ? 2.3.0,2.11,1.6.0,centos completed with result FAILUREADAM-prb ? 2.6.0,2.10,1.3.1,centos completed with result FAILUREADAM-prb ? 2.6.0,2.10,1.6.0,centos completed with result FAILUREADAM-prb ? 2.6.0,2.10,1.4.1,centos completed with result FAILUREADAM-prb ? 2.6.0,2.11,1.3.1,centos completed with result FAILUREADAM-prb ? 2.3.0,2.10,1.4.1,centos completed with result FAILUREADAM-prb ? 2.3.0,2.10,1.3.1,centos completed with result FAILUREADAM-prb ? 2.3.0,2.11,1.4.1,centos completed with result FAILUREADAM-prb ? 2.3.0,2.11,1.3.1,centos completed with result FAILUREADAM-prb ? 2.3.0,2.10,1.5.2,centos completed with result FAILURENotifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

@heuermh
Copy link
Member Author

heuermh commented Mar 24, 2016

Jenkins, retest this please

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1119/
Test PASSed.

@fnothaft
Copy link
Member

Oh, savvy! Looks great @heuermh!

OOC, why's there a big deletion in FieldEnumerationSuite?

@heuermh
Copy link
Member Author

heuermh commented Mar 25, 2016

why's there a big deletion in FieldEnumerationSuite?

I started digging in there due to a failing unit test, turns out it was that saveAsParquet is order dependent. At it was, the saveAvro calls would create the parent directory which would blow up in adamParquetSave.

That test was goofy though, in that it was creating the parquet folder in adam-core/target/scala-2.10.4/test-classes instead of somewhere reasonable, so I kept the changes in.

@fnothaft
Copy link
Member

why's there a big deletion in FieldEnumerationSuite?

I started digging in there due to a failing unit test, turns out it was that saveAsParquet is order dependent. At it was, the saveAvro calls would create the parent directory which would blow up in adamParquetSave.

That test was goofy though, in that it was creating the parquet folder in adam-core/target/scala-2.10.4/test-classes instead of somewhere reasonable, so I kept the changes in.

Ah, that makes sense. Thanks for the follow up.

This looks good to merge for me, but I will keep this open until tomorrow in case anyone else wants to chime in.

@fnothaft fnothaft merged commit 65f893f into bigdatagenomics:master Mar 29, 2016
@fnothaft
Copy link
Member

Thanks @heuermh! Merged.

@heuermh heuermh deleted the dict-in-adam-dir branch March 29, 2016 15:22
@heuermh
Copy link
Member Author

heuermh commented Mar 29, 2016

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants