Skip to content

PARQUET-286: Update String support to match upstream Avro.#201

Closed
rdblue wants to merge 2 commits intoapache:masterfrom
rdblue:PARQUET-286-avro-utf8-support
Closed

PARQUET-286: Update String support to match upstream Avro.#201
rdblue wants to merge 2 commits intoapache:masterfrom
rdblue:PARQUET-286-avro-utf8-support

Conversation

@rdblue
Copy link
Contributor

@rdblue rdblue commented May 26, 2015

This adds getStringableClass, which determines what String
representation upstream Avro would use. Specific and reflect will use an
alternative String class if java-class is set that is instantiated using
a constructor that takes a String. Otherwise, reflect will always use
String and both specific and generic will use Utf8 or String depending
on whether avro.java.string is set to "string".

The new string representations required two new converters: one for Utf8
and one for stringable classes (those with constructors that take a
single String). The converters have also been refactored so that all
binary converters now implement dictionary support.

@rdblue
Copy link
Contributor Author

rdblue commented May 26, 2015

@SinghAsDev could you take a look at this?

@rdblue rdblue force-pushed the PARQUET-286-avro-utf8-support branch from be50c40 to 33634c1 Compare June 3, 2015 20:48
@rdblue
Copy link
Contributor Author

rdblue commented Jun 3, 2015

Thanks for the review, @SinghAsDev. Good call on adding tests that verify both Avro and Parquet behavior. I've done that and added comments to clarify the method that determines the String type. I didn't separate the behavior into specific, generic, and reflect blocks because I thought that would lead to a lot of duplication, though.

@isnotinvain or @tomwhite, could you take a look at this to get it merged? It is a blocker for 1.8.0.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these three keys come from avro, or can we choose any name we'd like for them? If we can choose the name, they should probably be "parquet.avro.*" and be dot separated right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those are from Avro. I was avoiding having them in here, but I needed them to mimic Avro's String deserialization. Most of the time, we use a call to Avro's getClass(Schema) rather than using them directly, but the method we need isn't public.

@isnotinvain
Copy link
Contributor

+1, just one minor question

I'm not very familiar w/ avro's APIs but everything LGTM as far as I can tell.

This adds getStringableClass, which determines what String
representation upstream Avro would use. Specific and reflect will use an
alternative String class if java-class is set that is instantiated using
a constructor that takes a String. Otherwise, reflect will always use
String and both specific and generic will use Utf8 or String depending
on whether avro.java.string is set to "string".

The new string representations required two new converters: one for Utf8
and one for stringable classes (those with constructors that take a
single String). The converters have also been refactored so that all
binary converters now implement dictionary support.
@rdblue rdblue force-pushed the PARQUET-286-avro-utf8-support branch from 33634c1 to ee71cc7 Compare June 4, 2015 16:19
The new test validates that the behavior for generic, specific, and
reflect matches Avro's behavior.

This required a fix to make Stringable work. When a reflected class uses
the Stringable annotation, Avro doesn't include the type in the field's
schema. Instead, the accessor is built using the expected type. Parquet
needs to mimic this behavior, so the record converter will detect the
Stringable annotation and replace a FieldStringConverter with a
FieldStringableConverter.
@rdblue rdblue force-pushed the PARQUET-286-avro-utf8-support branch from ee71cc7 to beb5a44 Compare June 4, 2015 17:14
@asfgit asfgit closed this in 918609f Jun 4, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants