-
Notifications
You must be signed in to change notification settings - Fork 819
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Timezone aware timestamp parsing (#3794) #3795
Timezone aware timestamp parsing (#3794) #3795
Conversation
d878113
to
e621c84
Compare
@@ -76,12 +42,14 @@ use chrono::prelude::*; | |||
/// "2023-01-01 040506 +07:30:00", | |||
/// "2023-01-01 04:05:06.789 PST", | |||
/// "2023-01-01 04:05:06.789 -08", | |||
#[inline] | |||
pub fn string_to_timestamp_nanos(s: &str) -> Result<i64, ArrowError> { | |||
pub fn string_to_datetime<T: TimeZone>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a question here:
if I want to parse a string with time zone, what should I do with string_to_datetime
? Do I have to use string_to_timestamp_nanos
?
The precondition is I do not know the format of timestamp-like string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't fully understand this question
I would think you could call string_to_datetime<Utc>(...)
which would give you a chrono DateTime<Utc>
and you can then then do whatever you want with the result (e.g. convert it into a nanosecond timestamp, etc)
let utc = date.naive_utc().to_string(); | ||
assert_eq!(utc, "2020-09-08 13:42:29"); | ||
let local = date.naive_local().to_string(); | ||
assert_eq!(local, "2020-09-08 15:42:29"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add some tests to check timestamp is correct?
assert_eq!(local, "2020-09-08 15:42:29"); | |
assert_eq!(local, "2020-09-08 15:42:29"); | |
let &tz = Local::now().offset(); | |
let date = string_to_datetime(&tz, "2020-09-08T13:42:29Z").unwrap(); | |
let dt = NaiveDateTime::parse_from_str("2020-09-08T13:42:29Z","%Y-%m-%dT%H:%M:%SZ").unwrap(); | |
let ts = date.timestamp(); | |
assert_eq!(dt.timestamp(), ts); | |
let date = string_to_datetime(&tz, "2020-09-08 13:42:29").unwrap(); | |
assert_eq!(dt.timestamp() - (tz.local_minus_utc() as i64), date.timestamp()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to avoid using Local::now
in the tests as this makes the tests not reproducible across machines. The tests above should cover this, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the tests above can run ok across machines, you can try.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They do, but they don't consistently test the behaviour. For example my machine (and CI) is set to UTC, which means it won't test the behaviour at all...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can use two different time zone to test the behavior. For example:
let tz1: Tz = "+02:00".parse().unwrap();
let tz2: Tz = "+08:00".parse().unwrap();
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a further test, let me know if that isn't what you meant
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is OK, thx.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense to me -- thank you @tustvold and thank you @MachaelLee for bringing this issue up.
I wonder if you could test that this change works for your project's usecase somehow?
@@ -76,12 +42,14 @@ use chrono::prelude::*; | |||
/// "2023-01-01 040506 +07:30:00", | |||
/// "2023-01-01 04:05:06.789 PST", | |||
/// "2023-01-01 04:05:06.789 -08", | |||
#[inline] | |||
pub fn string_to_timestamp_nanos(s: &str) -> Result<i64, ArrowError> { | |||
pub fn string_to_datetime<T: TimeZone>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't fully understand this question
I would think you could call string_to_datetime<Utc>(...)
which would give you a chrono DateTime<Utc>
and you can then then do whatever you want with the result (e.g. convert it into a nanosecond timestamp, etc)
@@ -112,34 +80,44 @@ pub fn string_to_timestamp_nanos(s: &str) -> Result<i64, ArrowError> { | |||
// without a timezone specifier as a local time, using T as a separator | |||
// Example: 2020-09-08T13:42:29.190855 | |||
if let Ok(ts) = NaiveDateTime::parse_from_str(s, "%Y-%m-%dT%H:%M:%S%.f") { | |||
return to_timestamp_nanos(ts); | |||
if let Some(offset) = timezone.offset_from_local_datetime(&ts).single() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// For example, both `1997-01-31 09:26:56.123Z`, `1997-01-31T09:26:56.123`, | ||
/// and `1997-01-31T14:26:56.123-05:00` will be parsed as the same value | ||
/// |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think some context got lost -- I only think this statement is true if the system timezone is set to UTC - 5:
/// For example, both `1997-01-31 09:26:56.123Z`, `1997-01-31T09:26:56.123`, | |
/// and `1997-01-31T14:26:56.123-05:00` will be parsed as the same value | |
/// | |
/// For example, `1997-01-31 09:26:56.123Z`, `1997-01-31T09:26:56.123`, | |
/// and `1997-01-31T14:26:56.123-05:00` will be parsed as the same value | |
/// if the system timezone is set to Americas/New_York (UTC-5). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, that context was actually incorrect. They will always parse as the same regardless of the local timezone. If you want local timezone specific behaviour you should pass Local
to the function above
let dt = | ||
NaiveDateTime::parse_from_str("2020-09-08T13:42:29Z", "%Y-%m-%dT%H:%M:%SZ") | ||
.unwrap(); | ||
let local: Tz = "+08:00".parse().unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Benchmark runs are scheduled for baseline = 7fdd0d8 and contender = 6cd0917. 6cd0917 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Which issue does this PR close?
Closes #3794
Rationale for this change
What changes are included in this PR?
Are there any user-facing changes?