-
Notifications
You must be signed in to change notification settings - Fork 838
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Format Timestamps as RFC3339 #2939
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly just some minor nits, that can also be addressed in subsequent cleanup PRs.
I would like to double-check with @alamb as this will likely churn tests in downstreams
make_string_datetime!(array::TimestampSecondArray, column, row) | ||
DataType::Timestamp(unit, tz_string_opt) if *unit == TimeUnit::Second => { | ||
match tz_string_opt { | ||
Some(tz_string) => make_string_datetime_with_tz!( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be nice to combine the make_string_datetime_with_tz and make_string_datetime macros together?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would also be awesome to avoid having to make a string at all (like return a impl Display
) but that is definitely not something to do in this PR 😆
/// Generate an array with type $ARRAYTYPE with a numeric value of | ||
/// $VALUE, and compare $EXPECTED_RESULT to the output of | ||
/// formatting that array with `pretty_format_batches` | ||
macro_rules! check_datetime_with_timezone { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think passing an optional timezone to check_datetime and calling with_timezone_opt
would avoid some duplication.
On a related note, I don't think either need to be macros and therefore probably shouldn't be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As in "make them functions" 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the approach and results -- thank you @waitingkuo
I had some comments on the implementation, but this PR is quite close I think
/// // Corresponds to single element array with entry 1970-05-09T14:25:11+0:00 | ||
/// let arr = TimestampSecondArray::from(vec![11111111]); | ||
/// // OR | ||
/// let arr = TimestampSecondArray::from(vec![Some(11111111)]); | ||
/// let utc_offset = FixedOffset::east(0); | ||
/// let utc_tz: Tz = "+00:00".parse().unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be nice to show an example of going from a chrono
timezone (like FixedOffset
) to a Arrow::tz (though maybe that is already available -- I didn't check)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alamb
I can't add it here as inner data is private now
error[E0423]: cannot initialize a tuple struct which contains private fields
--> src/array/primitive_array.rs:143:14
|
12 | let utc_tz = Tz(FixedOffset::east(0));
| ^^
|
note: constructor is not visible here due to private fields
--> /Users/willy/willy/waitingkuo/arrow-rs/arrow-array/src/timezone.rs:246:19
|
246 | pub struct Tz(FixedOffset);
| ^^^^^^^^^^^ private field
arrow-rs/arrow-array/src/timezone.rs
Lines 249 to 251 in 87ac05b
/// An Arrow [`TimeZone`] | |
#[derive(Debug, Copy, Clone)] | |
pub struct Tz(FixedOffset); |
as #2909 is to add a timezone abstraction, do we still encourage user to use chrono api directly?
@tustvold do you have any comments?
make_string_datetime!(array::TimestampSecondArray, column, row) | ||
DataType::Timestamp(unit, tz_string_opt) if *unit == TimeUnit::Second => { | ||
match tz_string_opt { | ||
Some(tz_string) => make_string_datetime_with_tz!( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would also be awesome to avoid having to make a string at all (like return a impl Display
) but that is definitely not something to do in this PR 😆
.unwrap_or_else(|| "ERROR CONVERTING DATE".to_string()) | ||
}; | ||
|
||
Ok(s) | ||
}}; | ||
} | ||
|
||
macro_rules! make_string_datetime_with_tz { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it might also be worth commenting that date/times are formatted as rfc3339?
.unwrap_or_else(|| "ERROR CONVERTING DATE".to_string()), | ||
Err(_) => array | ||
.value_as_datetime($row) | ||
.map(|d| format!("{:?} (Unknown Time Zone '{}')", d, $tz_string)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't seem right -- isn't this error for the case when $ts_string.parse()
fails? As written I think it will be returned when the entry in array at $row is null (which should never happen).
I think we should remove the match statement
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the logic behind is:
while parsing TZ failed, parse datetime only
- if it works, show the datetime part with a unknown time zone warning
+--------------------------------------------------------+
| Timestamp(Second, Some("Asia/Taipei2")) |
+--------------------------------------------------------+
| 1970-01-01T00:00:00 (Unknown Time Zone 'Asia/Taipei2') |
+--------------------------------------------------------+
- if it doesn't work, display something (similar as
make_string_datetime
) like this
+--------------------------------------------------------+
| Timestamp(Second, Some("Asia/Taipei2")) |
+--------------------------------------------------------+
| ERROR CONVERTING DATE |
+--------------------------------------------------------+
@alamb @tustvold
I wonder if it's preferred. we could also show followings directly while failed to parse timezone
+--------------------------------------------------------+
| Timestamp(Second, Some("Asia/Taipei2")) |
+--------------------------------------------------------+
| Unknown Time Zone 'Asia/Taipei2' |
+--------------------------------------------------------+
i agree the logic here is confusing, i should've added some comments here
let s = if array.is_null($row) { | ||
"".to_string() | ||
} else { | ||
match $tz_string.parse::<Tz>() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as written this is going to parse the timezone string for every row -- perhaps we can parse it once per array 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general this display code is very inefficient, I think there is a broader issue to clean it up.
See the disclaimer on https://docs.rs/arrow/latest/arrow/util/display/fn.array_value_to_string.html
/// Generate an array with type $ARRAYTYPE with a numeric value of | ||
/// $VALUE, and compare $EXPECTED_RESULT to the output of | ||
/// formatting that array with `pretty_format_batches` | ||
macro_rules! check_datetime_with_timezone { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As in "make them functions" 👍
Lets get this in so that it can make the release, further cleanups can be performed as follow on PRs |
Benchmark runs are scheduled for baseline = 843a2e5 and contender = cbee739. cbee739 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Which issue does this PR close?
Closes #2934
Closes #2937
2000-01-01T00:00:00+08:00
instead of2000-01-01 00:00:00 +08:00
this
outputs
Rationale for this change
What changes are included in this PR?
Are there any user-facing changes?
yes, timestamp format changed