Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#560: Unable to parse millisecond unix timestamps #561

Closed
wants to merge 2 commits into from

Conversation

jgoday
Copy link

@jgoday jgoday commented May 14, 2021

Thanks for contributing to chrono!

  • Have you added yourself and the change to the changelog? (Don't worry
    about adding the PR number)
  • If this pull request fixes a bug, does it add a test that verifies that
    we can't reintroduce it?

This PR tries to solve #560.
When parsing Timestamps, it allows to check the next item fixed length to avoid Timestamp consuming all the remain digits (see #560 example with nanoseconds "%s%3f").
It groups items in tuples (current and next item) inside parse_internal, using slice windows.

@jszwedko
Copy link

jszwedko commented Dec 6, 2021

Circling back to this. This seems like a reasonable approach to me.


Timestamp => {
let next_size = get_fixed_item_len(next_item).unwrap_or(0);
(s.len() - next_size, false, Parsed::set_timestamp)
Copy link

@meiamsome meiamsome Dec 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using s.len() here is problematic for parsing any formats where the timestamp does not terminate at the end of the string.

For example, I have seen a format that is like %s%3f%z - so 123456789+0000 - in this case you hit %s consuming all of 123456789 because it is limited to the max of 123456789+0.

Another case I've seen is Date(%s%3f) - so Date(123456789) - in this case you hit %s consuming all of 1234567 because that is the max, but %3f is left with three extra characters.

It seems the only real way would be to do something like scan::number to read however many chars Timestamp would capture and then go from there. Sadly, the return of scan::number doesn't currently include the number of characters read, instead returning a new slice pointer so that is not a trivial fix.

Alternatively you could do something like this:

Suggested change
(s.len() - next_size, false, Parsed::set_timestamp)
let numeric_bytes_available =
s.as_bytes().iter().take_while(|&&c| b'0' <= c && c <= b'9').count();
(numeric_bytes_available - next_size, false, Parsed::set_timestamp)

But it seems like leaking the implementation details of scan::number that shouldn't really be leaked here.

I don't know if it's really necessary to fix this for those formats, if they should just be a separate issue, or if they should just not be fixed by the library and left to consumers to implement tricky logic.

@pitdicker
Copy link
Collaborator

I agree this is worth fixing. Combining %s%3f to parse a millisecond timestamp is clever. On the other hand it feels just a bit too magical.

An alternative is to add new specifiers to allow parsing millisecond, microsecond, and nanosecond timestamps as a single integer. @djc any preferences?

Also @meiamsome raised a good issue about the implementation here. @jgoday This PR is near the end of its second year... Would you still be available to push it forwards?

@pitdicker
Copy link
Collaborator

pitdicker commented Apr 26, 2023

The approach in this PR seems wordt persuing to me, but in a more general way.

It would be nice to be able to parse 63015 to 6:30:15 with %-H%M%S. I.e. support parsing one variable-length number followed by one or more fixed-length numbers.

@djc
Copy link
Member

djc commented May 26, 2023

Given the lack of response from the original submitter, going to just close this for now. IMO the original issue isn't a high priority.

@djc djc closed this May 26, 2023
@jtmoon79
Copy link
Contributor

An alternative is to add new specifiers to allow parsing millisecond, microsecond, and nanosecond timestamps as a single integer. @djc any preferences?

This is a worthy improvement.

@djc
Copy link
Member

djc commented May 28, 2023

Worthy how? Have you personally had a use case for this? Given the relative lack of resources I would like to avoid adding features people just think of.

@jszwedko
Copy link

The lack of support for parsing string millisecond timestamps does come up in https://vector.dev/ occasionally, where we use chrono for date/timestamp parsing. Recent example: vectordotdev/vector#19577

@djc
Copy link
Member

djc commented Jan 10, 2024

@jszwedko please open a new issue explaining your use case.

@jszwedko
Copy link

@jszwedko please open a new issue explaining your use case.

I'd actually opened the original issue 🙂 It's here: #560

@djc
Copy link
Member

djc commented Jan 11, 2024

Okay, I'm open to reviewing a new PR that takes into account the feedback on this PR so far.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants