-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor to use a NestedField builder and remove nested defaults. #1
Conversation
67e8cd8
to
53c1b7b
Compare
return new Builder(true, name); | ||
} | ||
|
||
public static class Builder { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wmoustafa, I'm interested to get your feedback on the builder vs new factory methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Ryan. Left comment in the main conversation.
private static Object castDefault(Object defaultValue, Type type) { | ||
if (type.isNestedType() && defaultValue != null) { | ||
throw new IllegalArgumentException( | ||
String.format("Invalid default value for %s: %s (must be null)", type, defaultValue)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now, we don't support non-primitive literals so this removes support for nested types. We can add that back but I think this is a good start.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once we support non-primitive types, this would be heavy-weight. Do we want to do such a heavy-weight casting in the constructor? This was the main reason I did not include it in the original PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. The reason not to extend this to non-primitives right now is that we will need to do even more heavy-weight work for nested types
public Builder withDefault(Object defaultValue) { | ||
this.initialDefault = defaultValue; | ||
this.writeDefault = defaultValue; | ||
return this; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would skip this method since semantics of initial and write default are too different. Further, initial cannot change but write can. Also, we may get unexpected results of chaining this with the other two methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about these issues originally, but ended up not adding validations to mirror the behavior of the original version that used factory methods. I think it's fine to have a method that sets both as a convenience.
The question is whether to enforce default value behavior here or elsewhere. I think the validation should be done elsewhere because:
- As I noted above, the builder is doing the same thing as the factory method -- there was no validation before
- There isn't enough information to validate here. This could only allow setting the
initialDefault
once, but what does that accomplish? It only limits the builder API by assuming what the caller is doing when constructing a field. It would be perfectly valid to use this builder to copy another field (which setsinitialDefault
) and then change the initial default to create a new field with a different one. Schema validation is not the builder's responsibility.
public Builder withId(int id) { | ||
this.id = id; | ||
return this; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is already withFieldId()
. How about deprecating/removing it before it spreads?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's part of the public API. We can deprecate it, but we can't remove it until v2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @rdblue for the suggestions. Overall I am fine with adding a Builder
. I also thought that the NestedField
constructor started to become bloated, but also did not feel strongly about exposing both constructor and builder.
My first preference is to continue consistently with the existing API (e.g., NestedField.of
) and revise the API in a major version, but I am okay with the current refactor in this PR as well. I see value in both.
I left a few comments around Literals and non-primitives. It is okay to postpone non-primitives as long as we are clear on the path forward.
private static Object castDefault(Object defaultValue, Type type) { | ||
if (type.isNestedType() && defaultValue != null) { | ||
throw new IllegalArgumentException( | ||
String.format("Invalid default value for %s: %s (must be null)", type, defaultValue)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once we support non-primitive types, this would be heavy-weight. Do we want to do such a heavy-weight casting in the constructor? This was the main reason I did not include it in the original PR.
* @return a Literal for the given value | ||
* @throws IllegalArgumentException if the value has no literal implementation | ||
*/ | ||
public static <T> Literal<T> lit(T value) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not quite clear on the need for literals. Literals are part of expressions but not sure if default values are first-class expressions. I see we could use them in expressions, but as part of the NestedField
API, I do not see a need to bring in expressions yet.
Also note that the input value comes from SingleValueParser
which returns Iceberg data. I am guessing the input to Literals
is weakly defined now (since it is constrained to primitives) and it is not clear if it will be Iceberg data or Java data, etc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The purpose of this is to support the same set of values as expressions do.
Adding the default value methods to the API creates a weird situation in which we need to coerce values from Object
to a value of the appropriate type and internal representation. Rather than create completely different code to do that, I'm reusing what we already support for expressions by creating a literal (which handles coercing from Object
) and then converting it to a specific type using to
(which handles conversion to the right Iceberg representation).
I think it's important to have consistency here.
throw new IllegalArgumentException( | ||
String.format("Invalid default value for %s: %s (must be null)", type, defaultValue)); | ||
} else if (defaultValue != null) { | ||
return Expressions.lit(defaultValue).to(type).value(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the description you mentioned that literals help with validations. Cannot we validated without bringing in literals?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should support the same values and coercion that we have implemented for literals, so I think this is the right way forward.
Thanks @rdblue for the PR. Let us follow up with nested types in a separate PR. For now I will merge this PR to my branch and apply minor fixes to the |
While reviewing apache#9502, I accumulated some changes:
NestedField
. I think this is a bit cleaner since we are now at the point where the combinations (with/without docs, with/without default, etc.) are going to get much larger.NestedField
. To do this, I create aLiteral
and convert to the field's type usingto
. This gives behavior for interpreting default values that is identical to the expressions library. The drawback is that it doesn't support nested types.