Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add functionality to parse datetimes according C standard format codes. #165

Merged
merged 28 commits into from
Nov 25, 2023

Conversation

bendichter
Copy link
Contributor

@bendichter bendichter commented Nov 9, 2023

fix #150

@wimglenn
Copy link
Collaborator

wimglenn commented Nov 9, 2023

No f-string please

@bendichter
Copy link
Contributor Author

@wimglenn done.

@wimglenn
Copy link
Collaborator

wimglenn commented Nov 9, 2023

Thanks. I just blackened the code in #166 which made some conflicts for you, sorry about that

@bendichter
Copy link
Contributor Author

@wimglenn fixed

@bendichter
Copy link
Contributor Author

@wimglenn is merging this on the roadmap? Are there any more changes you would like to request?

@wimglenn
Copy link
Collaborator

@bendichter This looks like a great idea to me, I'll give a careful review when I have a spare moment.

@bendichter
Copy link
Contributor Author

Sounds good. I'd be happy to answer any questions

@wimglenn
Copy link
Collaborator

I do have a question - I'd really like anything that works with strftime to work here, e.g. this should be able to roundtrip:

>>> dt = datetime.now()
>>> f"{dt:%Y-%m-%d %H:%M:%S}"
'2023-11-21 13:23:27'
>>> parse.parse("{dt:%Y-%m-%d %H:%M:%S}", "2023-11-21 13:23:27")
...
# ValueError: too many values to unpack (expected 2)

So on line 616 where there is name, format = field.split(":") I think it should be name, format = field.split(":", 1). Does that sound reasonable to you?

@bendichter
Copy link
Contributor Author

Yes, good catch! I'll make that change and add a test

parse.py Outdated
"Y": "[0-9]{4}",
"H": "[0-9]{1,2}",
"B": "(?:January|February|March|April|May|June|July|August|September|October|November|December)",
"b": "(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if these should be made locale-aware...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah good idea I'll look into it

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be addressed at same time as #1

README.rst Outdated Show resolved Hide resolved
@bendichter
Copy link
Contributor Author

I gave it more flexibility for the %z directive. %Z gets complicated

strptime() only accepts certain values for %Z:
any value in time.tzname for your machine’s locale
the hard-coded values UTC and GMT

(source)

If using a timezone out of that range, strptime will fail, even if the timezone is valid:

datetime.strptime("2023-11-21 13:23:27 PST", "%Y-%m-%d %H:%M:%S %Z")
ValueError: time data '2023-11-21 13:23:27 PST' does not match format '%Y-%m-%d %H:%M:%S %Z'

If using a timezone within that range, strptime will succeed however the output datetime is not timezone-aware!

datetime.strptime("2023-11-21 13:23:27 EST", "%Y-%m-%d %H:%M:%S %Z")
datetime.datetime(2023, 11, 21, 13, 23, 27)

This seems to me like a bug. And in any case things are getting messy so I'd rather not support this.

@wimglenn
Copy link
Collaborator

Yes the zone stuff looks tricky due to Python itself (e.g. https://bugs.python.org/issue22377)
I suppose users can always plug in a dateutil.parser if they want any better than strptime is capable of..

@codecov-commenter
Copy link

codecov-commenter commented Nov 22, 2023

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 94.11765% with 3 lines in your changes missing coverage. Please review.

Project coverage is 94.59%. Comparing base (29e5f8e) to head (58c998b).
Report is 24 commits behind head on master.

Files Patch % Lines
parse.py 94.11% 1 Missing and 2 partials ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #165      +/-   ##
==========================================
- Coverage   94.72%   94.59%   -0.13%     
==========================================
  Files           1        1              
  Lines         512      537      +25     
  Branches      123      131       +8     
==========================================
+ Hits          485      508      +23     
- Misses         17       18       +1     
- Partials       10       11       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

parse.py Outdated
@@ -271,6 +271,46 @@ def date_convert(
return d


dt_format_to_regex = {symbol: "[0-9]{2}" for symbol in "ymdIMSUW"}
dt_format_to_regex.update({"-" + symbol: "[0-9]{1,2}" for symbol in "ymdIMS"})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some reason I get a problem with these:

>>> datetime(2023, 1, 1).strftime("%Y/%-m/%-d")
'2023/1/1'

But

>>> parse("{:%Y/%-m/%-d}", "2023/1/1")
ValueError: '-' is a bad directive in format '%Y/%-m/%-d'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm I'll take a look tomorrow

Copy link
Contributor Author

@bendichter bendichter Nov 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now fixed and tested. It turns out strptime does not use the negative sign (I'm not quite sure why I thought it did). In strftime, %d outputs a zero-padded number e.g. "01". For strptime, %d matches a zero-padded number and can also match a non-zero padded number e.g. "1".

Copy link
Contributor Author

@bendichter bendichter Nov 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now the following works:

>>> parse("{:%Y/%m/%d}", "2023/01/01")
<Result (datetime.datetime(2023, 1, 1, 0, 0),) {}>
>>> parse("{:%Y/%m/%d}", "2023/1/1")
<Result (datetime.datetime(2023, 1, 1, 0, 0),) {}>

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's better. %j should get the same treatment, because it has the same issue (i.e. "%-j" doesn't work with strptime, and %j doesn't care if the number is zero-padded or not).

After removing the mapping for "-j", you can also remove the re.escape since the remaining characters are all letters and don't need escaping.

@bendichter
Copy link
Contributor Author

bendichter commented Nov 23, 2023

hold it, I think you want j to map to {1,3}. Let's also add a test for it

@bendichter
Copy link
Contributor Author

I tried to see if I could replace some of the existing datetime logic, but most of them allow for variants of each format, e.g. using either slashes or dashes, so this approach does not work well. I was able to change "tc" to use this approach though.

@wimglenn
Copy link
Collaborator

wimglenn commented Nov 23, 2023

Let's not change those, because there is little benefit, and I can't be sure it doesn't modify existing behaviour.

I'm also not sure about the condition for returning a datetime vs a time. I think we should either remove that, and always return a datetime, or extend it to return a datetime, time, or date (depending on whether the directives are solely time-related or solely date-related).

@bendichter
Copy link
Contributor Author

OK, I reverted to the old logic for "tc".

The datetime vs. date vs. time thing is tricky. I would like to avoid situations like having parse("{:%H:%M}", "10:26") resolve to datetime(1900, 1, 1, 10, 26, 0). There is precedent in this library for resolving missing year to the current year:

parse/parse.py

Line 194 in 04314d8

y = datetime.today().year

but I still think that only makes sense when a month is present as it is in

parse/tests/test_parse.py

Lines 575 to 589 in 04314d8

y(
"a {:ts} b",
"a Nov 21 10:21:36 b",
datetime(datetime.today().year, 11, 21, 10, 21, 36),
)
y(
"a {:ts} b",
"a Nov 1 10:21:36 b",
datetime(datetime.today().year, 11, 1, 10, 21, 36),
)
y(
"a {:ts} b",
"a Nov 1 03:21:36 b",
datetime(datetime.today().year, 11, 1, 3, 21, 36),
)

I suppose the logic could be:

  • If it contains day, month, or year, it's a date. Missing years default to the current year.
  • If it contains hours, minutes, seconds, or milliseconds, its a time.
  • If it is a date and a time it's a datetime.

@bendichter
Copy link
Contributor Author

Maybe consider configuring black as a precommit hook, e.g. https://github.com/catalystneuro/neuroconv/blob/1d3d58d4b53570d20b5e81abde7ada7f9f02fa47/.pre-commit-config.yaml#L8-L12

@wimglenn
Copy link
Collaborator

wimglenn commented Nov 23, 2023

Yes, that logic sounds good to me.
Directives indicative of dates: aAwdbBmyYjUW
Directives indicative of times: HIpMSfz
I've gotta sign off now so I'll leave it in your capable hands and check it out tomorrow

bendichter and others added 3 commits November 22, 2023 23:15
If it contains day, month, or year, it's a date. Missing years default to the current year.
If it contains hours, minutes, seconds, or milliseconds, its a time.
If it is a date and a time it's a datetime.
@bendichter
Copy link
Contributor Author

@wimglenn Thanks for all the collaboration on this. I am happy with its current state. Feel free to merge when you are ready.

@wimglenn wimglenn merged commit a2e1334 into r1chardj0n3s:master Nov 25, 2023
22 checks passed
@wimglenn
Copy link
Collaborator

Merged and released 1.20.0 to PyPI. Thank you for your contributions, Ben!

elif is_time:
return dt.time()
else:
ValueError("Datetime not a date nor a time?")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bendichter Somehow we both missed a raise here (ref #196)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops. Thanks for catching that

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

strftime formatting
3 participants