#560: Unable to parse millisecond unix timestamps #561

jgoday · 2021-05-14T21:33:10Z

Thanks for contributing to chrono!

Have you added yourself and the change to the changelog? (Don't worry
about adding the PR number)
If this pull request fixes a bug, does it add a test that verifies that
we can't reintroduce it?

This PR tries to solve #560.
When parsing Timestamps, it allows to check the next item fixed length to avoid Timestamp consuming all the remain digits (see #560 example with nanoseconds "%s%3f").
It groups items in tuples (current and next item) inside parse_internal, using slice windows.

jszwedko · 2021-12-06T16:13:28Z

Circling back to this. This seems like a reasonable approach to me.

meiamsome · 2021-12-08T21:14:08Z

src/format/parse.rs

-
+                    Timestamp => {
+                        let next_size = get_fixed_item_len(next_item).unwrap_or(0);
+                        (s.len() - next_size, false, Parsed::set_timestamp)


Using s.len() here is problematic for parsing any formats where the timestamp does not terminate at the end of the string.

For example, I have seen a format that is like %s%3f%z - so 123456789+0000 - in this case you hit %s consuming all of 123456789 because it is limited to the max of 123456789+0.

Another case I've seen is Date(%s%3f) - so Date(123456789) - in this case you hit %s consuming all of 1234567 because that is the max, but %3f is left with three extra characters.

It seems the only real way would be to do something like scan::number to read however many chars Timestamp would capture and then go from there. Sadly, the return of scan::number doesn't currently include the number of characters read, instead returning a new slice pointer so that is not a trivial fix.

Alternatively you could do something like this:

Suggested change

(s.len() - next_size, false, Parsed::set_timestamp)

let numeric_bytes_available =

s.as_bytes().iter().take_while(|&&c| b'0' <= c && c <= b'9').count();

(numeric_bytes_available - next_size, false, Parsed::set_timestamp)

But it seems like leaking the implementation details of scan::number that shouldn't really be leaked here.

I don't know if it's really necessary to fix this for those formats, if they should just be a separate issue, or if they should just not be fixed by the library and left to consumers to implement tricky logic.

pitdicker · 2023-04-26T05:31:10Z

I agree this is worth fixing. Combining %s%3f to parse a millisecond timestamp is clever. On the other hand it feels just a bit too magical.

An alternative is to add new specifiers to allow parsing millisecond, microsecond, and nanosecond timestamps as a single integer. @djc any preferences?

Also @meiamsome raised a good issue about the implementation here. @jgoday This PR is near the end of its second year... Would you still be available to push it forwards?

pitdicker · 2023-04-26T06:17:15Z

The approach in this PR seems wordt persuing to me, but in a more general way.

It would be nice to be able to parse 63015 to 6:30:15 with %-H%M%S. I.e. support parsing one variable-length number followed by one or more fixed-length numbers.

djc · 2023-05-26T09:44:33Z

Given the lack of response from the original submitter, going to just close this for now. IMO the original issue isn't a high priority.

jtmoon79 · 2023-05-28T02:45:53Z

An alternative is to add new specifiers to allow parsing millisecond, microsecond, and nanosecond timestamps as a single integer. @djc any preferences?

This is a worthy improvement.

djc · 2023-05-28T09:47:29Z

Worthy how? Have you personally had a use case for this? Given the relative lack of resources I would like to avoid adding features people just think of.

jszwedko · 2024-01-10T14:46:45Z

The lack of support for parsing string millisecond timestamps does come up in https://vector.dev/ occasionally, where we use chrono for date/timestamp parsing. Recent example: vectordotdev/vector#19577

djc · 2024-01-10T15:51:40Z

@jszwedko please open a new issue explaining your use case.

jszwedko · 2024-01-10T17:52:40Z

@jszwedko please open a new issue explaining your use case.

I'd actually opened the original issue 🙂 It's here: #560

djc · 2024-01-11T08:42:13Z

Okay, I'm open to reviewing a new PR that takes into account the feedback on this PR so far.

jgoday added 2 commits May 14, 2021 20:31

chronotope#560: Unable to parse millisecond unix timestamps

c44fd78

Update changelog

642e416

meiamsome reviewed Dec 8, 2021

View reviewed changes

djc closed this May 26, 2023

pitdicker mentioned this pull request Feb 3, 2024

Parsing numeric literals can cause TooShort/Invalid #1399

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

#560: Unable to parse millisecond unix timestamps #561

#560: Unable to parse millisecond unix timestamps #561

jgoday commented May 14, 2021

jszwedko commented Dec 6, 2021

meiamsome Dec 8, 2021 •

edited

Loading

pitdicker commented Apr 26, 2023

pitdicker commented Apr 26, 2023 •

edited

Loading

djc commented May 26, 2023

jtmoon79 commented May 28, 2023

djc commented May 28, 2023

jszwedko commented Jan 10, 2024

djc commented Jan 10, 2024

jszwedko commented Jan 10, 2024

djc commented Jan 11, 2024

-                        (s.len() - next_size, false, Parsed::set_timestamp)
+                        let numeric_bytes_available =
+                            s.as_bytes().iter().take_while(|&&c| b'0' <= c && c <= b'9').count();
+                        (numeric_bytes_available - next_size, false, Parsed::set_timestamp)

#560: Unable to parse millisecond unix timestamps #561

#560: Unable to parse millisecond unix timestamps #561

Conversation

jgoday commented May 14, 2021

Thanks for contributing to chrono!

jszwedko commented Dec 6, 2021

meiamsome Dec 8, 2021 • edited Loading

Choose a reason for hiding this comment

pitdicker commented Apr 26, 2023

pitdicker commented Apr 26, 2023 • edited Loading

djc commented May 26, 2023

jtmoon79 commented May 28, 2023

djc commented May 28, 2023

jszwedko commented Jan 10, 2024

djc commented Jan 10, 2024

jszwedko commented Jan 10, 2024

djc commented Jan 11, 2024

meiamsome Dec 8, 2021 •

edited

Loading

pitdicker commented Apr 26, 2023 •

edited

Loading