HBase genotypes backend -revised #1335

jpdna · 2017-01-03T17:58:59Z

supersedes #1246
and addresses comments from there.

Please make a review pass on this.

I'll add instructions soon as to how to run on cluster for reviewers to try live.

Added HBaseFunctions Added VCF genotype save and load Hbase functions Added saveHBaseGenotypesSingleSample packing multiple genotypes at same position and samle allele into same hbase value Added multisample capable loadHBaseGenotypes() Removed commented-out earlier versions of save and load genotypes removed more dead code Clean up formatting - limit line length Added saveHbaseSampleMetadata function Added save and load SequenceDictionary functions to HBaseFunctions Added createHbaseGenotypeTable Adding loadGenotypeRddFromHBase in progress updates to multi-sample hbase save multi sample VCF save and load now working Added repartitioning parameter tp hbase genotype load Added comments identifying the public api vs helper functions COB Aug 25 Added genomic start and stop parameters to loadGenotypesFromHBaseToGenotypeRDD Added boolean saveSequenceDictionary toggle parameter to saveVariantContextRddToHBase fixed start, stop null ptr exeception first steps in adding hbaseToVariantcontextRDD Changed region query to use ADAM ReferenceRegion Added custom HBaseEncoder1 save function Added custom Encoder1 Hbase loader Added Encoder1 hbase variant context loader Working - before rowkey int Changed end in key to be size, added data block encoding in create table Added create table splits Removed dead code of encoder2 Added option to repartion vcrdd before saving to HBase Added bulk load save to Hbase option changed to cdh hbase api depedencies in POM allow sample name file list as input to load functions made sample_ids lis parameter in load optional Added deleteSamplesFromHBase function Fixed bulk delete and made loadVariantContext work even when requested samplids are missing Removed code from failed version of sample delete function Moved delete function up with test of genotype code Fixed errors after rebase against master small formatting cleanup first pass hbase cli demo second pass hbase cli demo remove saveSeqDict check add seq dict id to cli import clean up removed undeeded demo and temp.js code Ran ./scripts/format-source due to build failure on Jenkins changed Hbase dependencies to provided addressed some review issues Changed hbases dependencies back to NOT being provided fixed None.get first step key Strategy pattern second step key Strategy pattern more cleanup from review made private functions package private for hbase Factored out hbase connection Factored out hbase connection - fix1 Factored out hbase connection - cleanup Removed the non-bulk load hbase function step one adding tests first step implement DAO step2 DAO step3 DAO Added first loadHBase Test Added some javadoc use dao everywhere added more javadoc step 1 in adding save hbase test step 2 in adding save hbase test clean up hbase doc add more assertions to hbase load test improved POM and imports in HBaseFunctions fixed more PR suggestions

AmplabJenkins · 2017-01-03T18:14:30Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1714/

Build result: FAILURE

[...truncated 3 lines...]Building remotely on amp-jenkins-worker-05 (centos spark-test) in workspace /home/jenkins/workspace/ADAM-prbWiping out workspace first.Cloning the remote Git repositoryCloning repository https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git init /home/jenkins/workspace/ADAM-prb # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git --version # timeout=10 > /home/jenkins/git2/bin/git -c core.askpass=true fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/heads/:refs/remotes/origin/ # timeout=15 > /home/jenkins/git2/bin/git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10 > /home/jenkins/git2/bin/git config --add remote.origin.fetch +refs/heads/:refs/remotes/origin/ # timeout=10 > /home/jenkins/git2/bin/git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git -c core.askpass=true fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ # timeout=15 > /home/jenkins/git2/bin/git rev-parse origin/pr/1335/merge^{commit} # timeout=10 > /home/jenkins/git2/bin/git branch -a --contains 5903417 # timeout=10 > /home/jenkins/git2/bin/git rev-parse remotes/origin/pr/1335/merge^{commit} # timeout=10Checking out Revision 5903417 (origin/pr/1335/merge) > /home/jenkins/git2/bin/git config core.sparsecheckout # timeout=10 > /home/jenkins/git2/bin/git checkout -f 5903417ce134e3726b6d0df89b1a4eeab3394a3dFirst time build. Skipping changelog.Triggering ADAM-prb ? 2.6.0,2.11,1.5.2,centosTriggering ADAM-prb ? 2.6.0,2.10,1.5.2,centosTouchstone configurations resulted in FAILURE, so aborting...Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

fnothaft

Looks really good, definitely heading in the right direction!

fnothaft · 2017-01-05T03:17:57Z