-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify serialization by removing redundant PrimitiveScalarValue
#3612
Conversation
PrimitiveScalarType null_value = 19; | ||
// was PrimitiveScalarType null_value = 19; | ||
// Null value of any type | ||
ArrowType null_value = 33; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be a dumb question but why are nulls typed? This encoding of scalarvalue seems to conflate encoding the schema with encoding the values, which seems unfortunate.
Perhaps we could take a look at what substrait does and copy that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean why are nulls typed in general? Basically because of how ScalarValue::
is implemented (as an Option<>
around the underlying native type). I think @jimexist tried to clean it up at some point and make ScalarValue::None
and then all the variants like ScalarValue::Int8
have values like i8
rather than Option<i8>
.
I can't remember what the problem was but it didn't work easily.
In my opinion at least the serialization should follow how they are implemented in ScalarValue
and if we improve ScalarValue
then we can also improve the serialization code
ecf9925
to
d754b94
Compare
@@ -803,48 +804,9 @@ message Decimal128{ | |||
int64 s = 3; | |||
} | |||
|
|||
// Contains all valid datafusion scalar type except for | |||
// List | |||
enum PrimitiveScalarType{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of these types are already handled in ArrowType
@@ -1359,184 +1246,6 @@ fn vec_to_array<T, const N: usize>(v: Vec<T>) -> [T; N] { | |||
}) | |||
} | |||
|
|||
//Does not typecheck lists | |||
fn typechecked_scalar_value_conversion( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is all redundant with code that (now) exists in ScalarValue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks fine for me, not sure what stability guarantees we provide on this API. Theoretically we should go through a period of supporting both, but I don't know if this is worth the effort
Current answer I believe is "we provide no guarantees" (even though in theory we could provide a backwards compatible APIs). For example #3547 was likely not backward compatible but yet did not seem to cause changes. cc @andygrove and @yahoNanJing I think Ballista is the other major user of this API |
I plan to merge this tomorrow unless someone has additional comments or would like more time to review |
Benchmark runs are scheduled for baseline = 48f73c6 and contender = 54d2870. 54d2870 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
…pache#3612) * Simplify serialization by removing redundant PrimitiveScalarValue * comments * it compiles * Add additional scalar value null construction * reserve old field name
Draft as it builds on #3547Which issue does this PR close?
Re #3531
This scratches an itch I had while last working with this code
Rationale for this change
While working on #3531 I noticed that
PrimitiveScalarType
duplciated the functionality ofDataType
-- this makes working with the protobuf serialization code, which is already hard, even harder.As the DataFusion community grows, we need to keep our code as easy to work with as possible to make reviews as well as new contirbutions easy.
What changes are included in this PR?
PrimitiveScalarValue
and use the existing Arrow DataTypeAre there any user-facing changes?
No