-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add serialization of ScalarValue::Binary
and ScalarValue::LargeBinary
, ScalarValue::Time64
#3534
Conversation
fa2c10c
to
813b56b
Compare
@@ -796,6 +799,9 @@ enum PrimitiveScalarType{ | |||
TIME_MILLISECOND = 22; | |||
INTERVAL_YEARMONTH = 23; | |||
INTERVAL_DAYTIME = 24; | |||
|
|||
BINARY = 25; | |||
LARGE_BINARY = 26; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interestingly TIME_NANOSECOND already existed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not a huge fan of the duplication between PrimitiveScalarType
and ArrowType
-- I am just following the existing patterns in this PR, but I will attempt to fix this in a follow on PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turns out this was a bug in my original implementation (which was caught by #3537)
813b56b
to
677df35
Compare
@@ -786,16 +789,21 @@ enum PrimitiveScalarType{ | |||
UTF8 = 11; | |||
LARGE_UTF8 = 12; | |||
DATE32 = 13; | |||
TIME_MICROSECOND = 14; | |||
TIME_NANOSECOND = 15; | |||
TIMESTAMP_MICROSECOND = 14; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I renamed these fields because they are for Timestamp
not actually Time
(which are different in Arrow).
DataType::Time64(TimeUnit::Nanosecond) | ||
} | ||
protobuf::PrimitiveScalarType::TimestampMicrosecond => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the names of PrimitiveScalarType
from Time
here to Timestamp
be consistent with the ScalarValue
variants as well as the arrow type system
PrimitiveScalarType::Decimal128 => Self::Decimal128(None, 0, 0), | ||
PrimitiveScalarType::Date64 => Self::Date64(None), | ||
PrimitiveScalarType::TimeSecond => Self::TimestampSecond(None, None), | ||
PrimitiveScalarType::TimeMillisecond => { | ||
PrimitiveScalarType::TimestampSecond => Self::TimestampSecond(None, None), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These were incorrectly previously set to be Time
rather than Timestamp
…ary`, `ScalarValue::Time64`
677df35
to
cc15db5
Compare
@@ -1098,7 +1098,7 @@ impl TryFrom<&ScalarValue> for protobuf::ScalarValue { | |||
}) | |||
} | |||
datafusion::scalar::ScalarValue::TimestampMicrosecond(val, tz) => { | |||
create_proto_scalar(val, PrimitiveScalarType::TimeMicrosecond, |s| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these names were super confusing as the protobuf definition used Time
and DataType
and ScalarValue
used Timestamp
.
Making it more confusing is that ScalarValue::Time64
is not a timestamp (it is the time of day!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think TimeMicrosecond
stands for Timestamp with time unit as TimeUnit::MicroSecond
so that it names TimeMicroSecond
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename to TimestampMicrosecond LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alamb do you know any history reason this field still named as TimeMilliSecond
?
https://github.com/apache/arrow-datafusion/blob/master/datafusion/proto/proto/datafusion.proto#L669-L674
i think the original naming comes from here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alamb do you know any history reason this field still named as TimeMilliSecond?
I do not know why that field is called TimeMillisecond
-- it is called Millisecond
in the arrow schema so I think we could do the same in Datafusion: https://docs.rs/arrow/23.0.0/arrow/datatypes/enum.TimeUnit.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is a PR to make the naming of TimeUnit
consistent: #3575
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM as far as extending existing patterns. I very much agree with you @alamb that the whole situation is very confusing however. One day I hope someone has enough high-level knowledge to clean it up in a sensible way.
I have plans (see #3547 ) but it has somewhat turned into I think I can get remove the entire |
Benchmark runs are scheduled for baseline = 6be3301 and contender = 0a2b0a7. 0a2b0a7 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Which issue does this PR close?
Draft as it builds onScalarValue::Dictionary
to datafusion-proto #3532ScalarValue
s are the same after round trip serialization #3537Part of #3531
Rationale for this change
See #3531
What changes are included in this PR?
ScalarValue::{,Large}Binary
ScalarValue::Time64
ScalarValue::Timestamp*
Are there any user-facing changes?
Better serialization support