Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#560: Unable to parse millisecond unix timestamps #561

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ Versions with only mechanical changes will be omitted from the following list.
* Add support for microseconds timestamps serde serialization/deserialization (#304)
* Fix `DurationRound` is not TZ aware (#495)
* Implement `DurationRound` for `NaiveDateTime`
* Fix 'Unable to parse millisecond unix timestamps' (#560)

## 0.4.19

Expand Down
35 changes: 32 additions & 3 deletions src/format/parse.rs
Original file line number Diff line number Diff line change
Expand Up @@ -215,6 +215,20 @@ fn parse_rfc3339<'a>(parsed: &mut Parsed, mut s: &'a str) -> ParseResult<(&'a st
Ok((s, ()))
}

fn get_fixed_item_len<'a, B>(item: Option<&B>) -> Option<usize>
where
B: Borrow<Item<'a>>,
{
use super::Fixed::*;

item.map(|i| match *i.borrow() {
Item::Fixed(Internal(InternalFixed { val: InternalInternal::Nanosecond3NoDot })) => 3,
Item::Fixed(Internal(InternalFixed { val: InternalInternal::Nanosecond6NoDot })) => 6,
Item::Fixed(Internal(InternalFixed { val: InternalInternal::Nanosecond9NoDot })) => 9,
_ => 0,
})
}

/// Tries to parse given string into `parsed` with given formatting items.
/// Returns `Ok` when the entire string has been parsed (otherwise `parsed` should not be used).
/// There should be no trailing string after parsing;
Expand Down Expand Up @@ -260,7 +274,17 @@ where
}};
}

for item in items {
// convert items to a pair (current, Option(next_item))
// so we can have information about the next item
let items: Vec<B> = items.collect::<Vec<_>>();
let last_item = items.last();
let mut items: Vec<(&B, Option<&B>)> =
items.windows(2).map(|arr| (&arr[0], Some(&arr[1]))).collect::<Vec<_>>();
if let Some(last_item) = last_item {
items.push((last_item, None));
}

for (item, next_item) in items {
match *item.borrow() {
Item::Literal(prefix) => {
if s.len() < prefix.len() {
Expand Down Expand Up @@ -316,8 +340,10 @@ where
Minute => (2, false, Parsed::set_minute),
Second => (2, false, Parsed::set_second),
Nanosecond => (9, false, Parsed::set_nanosecond),
Timestamp => (usize::MAX, false, Parsed::set_timestamp),

Timestamp => {
let next_size = get_fixed_item_len(next_item).unwrap_or(0);
(s.len() - next_size, false, Parsed::set_timestamp)
Copy link

@meiamsome meiamsome Dec 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using s.len() here is problematic for parsing any formats where the timestamp does not terminate at the end of the string.

For example, I have seen a format that is like %s%3f%z - so 123456789+0000 - in this case you hit %s consuming all of 123456789 because it is limited to the max of 123456789+0.

Another case I've seen is Date(%s%3f) - so Date(123456789) - in this case you hit %s consuming all of 1234567 because that is the max, but %3f is left with three extra characters.

It seems the only real way would be to do something like scan::number to read however many chars Timestamp would capture and then go from there. Sadly, the return of scan::number doesn't currently include the number of characters read, instead returning a new slice pointer so that is not a trivial fix.

Alternatively you could do something like this:

Suggested change
(s.len() - next_size, false, Parsed::set_timestamp)
let numeric_bytes_available =
s.as_bytes().iter().take_while(|&&c| b'0' <= c && c <= b'9').count();
(numeric_bytes_available - next_size, false, Parsed::set_timestamp)

But it seems like leaking the implementation details of scan::number that shouldn't really be leaked here.

I don't know if it's really necessary to fix this for those formats, if they should just be a separate issue, or if they should just not be fixed by the library and left to consumers to implement tricky logic.

}
// for the future expansion
Internal(ref int) => match int._dummy {},
};
Expand Down Expand Up @@ -802,6 +828,9 @@ fn test_parse() {
check!("12345678901234.56789",
[num!(Timestamp), fix!(Nanosecond)];
nanosecond: 567_890_000, timestamp: 12_345_678_901_234);
check!("12345678901234567",
[num!(Timestamp), internal_fix!(Nanosecond3NoDot)];
nanosecond: 567_000_000, timestamp: 12_345_678_901_234);
}

#[cfg(test)]
Expand Down