-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Protobuf as bytearray for spark conf #76
Conversation
Codecov Report
@@ Coverage Diff @@
## develop #76 +/- ##
==============================================
- Coverage 77.49% 49.13% -28.37%
- Complexity 0 412 +412
==============================================
Files 117 111 -6
Lines 16062 14974 -1088
Branches 330 330
==============================================
- Hits 12447 7357 -5090
- Misses 3426 7431 +4005
+ Partials 189 186 -3
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me
- I'm assuming the reason to convert back and forth between a bytearray and String is that Spark configuration doesn't accept binary values.
- Facepalm for the string int64 representation in JSON to deal with Javascript
Yup - Spark conf doesn't take binary values. |
We were passing in protobuf as json string in for spark. It turns out that Google's python protobuf sends in int64 as string (for reasons specified in many places, for instance here)
So this moves us to expect protobuf bytestream instead. Since we use hadoop configuration to pass the config, it gets encoded as a base64 string on the sender side - so the receiver decodes that first before ingesting the protobuf object.
Also, a few annoying travis Mac OS changes.