-
-
Notifications
You must be signed in to change notification settings - Fork 154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG]: Class serializer writes map key as optional #396
Comments
I'm failing to see what this library has to do with the issue. Seems like protobuf generator makes those class properties optional. Parquet is not protobuf, so you'll probably have to manually adjust output to make two incompatible formats work? |
hi @aloneguid - I'm not sure protobuf generator does make them optional. this comes out OPTIONAL in parquet
but this is REQUIRED in parquet:
only way I've found to get strings working is like so
but I can't get Maps to work - this lib is definitely allowing a map key to be optional which breaks parquet:
|
Fair enough, map keys should (but not must) be required. |
according to the spec https://github.com/apache/parquet-format/blob/master/LogicalTypes.md |
Strange as parquet-mr (used in Spark) allows it. I might add integration tests with PyArrow. |
I've just added PyArrow integration test, and it proves that pyArrow requires keys to be maked as "required", unlike identical test with parquet-mr (Java) does not. Will fix in a bit. |
pyarrow now reads schema as
|
fixed in latest release |
Thanks, map issue is definitely working now. |
Thanks for confirming |
Library Version
4.16.3
OS
Any
OS Architecture
64 bit
How to reproduce?
test.proto
protoc -I=./protobuf --csharp_out=. test.proto
When writing to parquet the schema incorrectly makes all strings, including the map key as optional. Bool, float, int etc are all required by default.
I can manually add
Parquet.Serialization.Attributes.ParquetRequired
as an attribute to the proto class to fix strings, but this doesn't work for Map keys and map keys are not allowed to be nullable in parquet so my output does not work in the consumers. Any other workarounds?Failing test
No response
The text was updated successfully, but these errors were encountered: