Protobuf as bytearray for spark conf #76

mlathara · 2019-12-18T23:21:43Z

We were passing in protobuf as json string in for spark. It turns out that Google's python protobuf sends in int64 as string (for reasons specified in many places, for instance here)

So this moves us to expect protobuf bytestream instead. Since we use hadoop configuration to pass the config, it gets encoded as a base64 string on the sender side - so the receiver decodes that first before ingesting the protobuf object.

Also, a few annoying travis Mac OS changes.

codecov · 2019-12-18T23:47:16Z

Codecov Report

Merging #76 into develop will decrease coverage by 28.36%.
The diff coverage is 63.63%.

@@              Coverage Diff               @@
##             develop      #76       +/-   ##
==============================================
- Coverage      77.49%   49.13%   -28.37%     
- Complexity         0      412      +412     
==============================================
  Files            117      111        -6     
  Lines          16062    14974     -1088     
  Branches         330      330               
==============================================
- Hits           12447     7357     -5090     
- Misses          3426     7431     +4005     
+ Partials         189      186        -3

Impacted Files	Coverage Δ	Complexity Δ
...a/org/genomicsdb/reader/GenomicsDBQueryStream.java	`48.71% <ø> (ø)`	`10 <0> (+10)`	⬆️
...g/genomicsdb/reader/GenomicsDBFeatureIterator.java	`79.26% <ø> (ø)`	`17 <0> (+17)`	⬆️
...csdb/importer/extensions/CallSetMapExtensions.java	`40% <ø> (-21.34%)`	`8 <0> (+8)`
...ain/java/org/genomicsdb/spark/GenomicsDBInput.java	`62.31% <100%> (-4.98%)`	`41 <0> (+41)`
.../org/genomicsdb/spark/GenomicsDBConfiguration.java	`57.33% <50%> (-2.81%)`	`20 <0> (+20)`
...va/org/genomicsdb/spark/GenomicsDBInputFormat.java	`63.33% <60%> (ø)`	`8 <0> (+8)`	⬆️
src/main/cpp/include/api/genomicsdb.h	`0% <0%> (-100%)`	`0% <0%> (ø)`
src/main/cpp/src/api/genomicsdb.cc	`0.41% <0%> (-95.4%)`	`0% <0%> (ø)`
...rc/main/cpp/src/genomicsdb/genomicsdb_iterators.cc	`0.27% <0%> (-94.45%)`	`0% <0%> (ø)`
...cpp/include/genomicsdb/genomicsdb_columnar_field.h	`0% <0%> (-93.45%)`	`0% <0%> (ø)`
... and 70 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 49e2773...e464055. Read the comment docs.

src/main/java/org/genomicsdb/spark/GenomicsDBInput.java

nalinigans

Looks good to me.

kgururaj

Looks good to me

I'm assuming the reason to convert back and forth between a bytearray and String is that Spark configuration doesn't accept binary values.
Facepalm for the string int64 representation in JSON to deal with Javascript

mlathara · 2019-12-20T18:59:08Z

Yup - Spark conf doesn't take binary values.

mlathara added 2 commits December 17, 2019 14:22

change to take protobuf as bytearray instead of jsonstring

89ef2a0

fix java docs and java version plus misc changes for travis mac issues

5438d0b

mlathara requested review from kgururaj and nalinigans December 18, 2019 23:21

mlathara changed the base branch from master to develop December 18, 2019 23:22

nalinigans reviewed Dec 19, 2019

View reviewed changes

src/main/java/org/genomicsdb/spark/GenomicsDBInput.java Outdated Show resolved Hide resolved

mlathara and others added 2 commits December 19, 2019 09:48

resolve review comments

f567fea

Merge branch 'develop' into ml_proto_bytearray

e464055

mlathara requested a review from nalinigans December 19, 2019 17:49

nalinigans approved these changes Dec 19, 2019

View reviewed changes

kgururaj approved these changes Dec 20, 2019

View reviewed changes

mlathara merged commit 31d4020 into develop Dec 20, 2019

mlathara deleted the ml_proto_bytearray branch December 20, 2019 18:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Protobuf as bytearray for spark conf #76

Protobuf as bytearray for spark conf #76

mlathara commented Dec 18, 2019

codecov bot commented Dec 18, 2019 •

edited

Loading

nalinigans left a comment

kgururaj left a comment

mlathara commented Dec 20, 2019

Protobuf as bytearray for spark conf #76

Protobuf as bytearray for spark conf #76

Conversation

mlathara commented Dec 18, 2019

codecov bot commented Dec 18, 2019 • edited Loading

Codecov Report

nalinigans left a comment

Choose a reason for hiding this comment

kgururaj left a comment

Choose a reason for hiding this comment

mlathara commented Dec 20, 2019

codecov bot commented Dec 18, 2019 •

edited

Loading