Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add functionality to parse datetimes according C standard format codes. #165

Merged
merged 28 commits into from
Nov 25, 2023
Merged
Changes from 1 commit
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
8237019
Add functionality to parse datetimes according C standard format codes.
bendichter Nov 9, 2023
f4b0cbe
use regex to dynamically create specific datetime regex paterns
bendichter Nov 9, 2023
c1bee76
change from f-string to str.format syntax
bendichter Nov 9, 2023
28b6336
remove copy import
bendichter Nov 9, 2023
30173ed
Merge branch 'og_master' into parse_flexible_dates
bendichter Nov 9, 2023
cd31fc2
allow for colons in time format
bendichter Nov 21, 2023
3d2ea99
allow for more flexible parsing of %z
bendichter Nov 21, 2023
32d83fb
shield Python 2 from timezone features
bendichter Nov 21, 2023
acdfeb0
add time parsing
bendichter Nov 21, 2023
5cc7111
uglify code with black
wimglenn Nov 22, 2023
f71aa7a
bump version
wimglenn Nov 22, 2023
72591f7
handle and test single digits for day and month
bendichter Nov 22, 2023
c3a4f32
Merge remote-tracking branch 'origin/parse_flexible_dates' into parse…
bendichter Nov 22, 2023
5210e18
remove %-j handling
wimglenn Nov 23, 2023
fb6d2c0
make j flexible number of digits
bendichter Nov 23, 2023
550f5e5
simplify and reorder, to match docs
wimglenn Nov 23, 2023
8771d3f
just use the map directly
wimglenn Nov 23, 2023
fb0c8d9
readability improvements
wimglenn Nov 23, 2023
77328ef
change "tc" to use generic datetime parsing approach
bendichter Nov 23, 2023
11342e1
Merge remote-tracking branch 'origin/parse_flexible_dates' into parse…
bendichter Nov 23, 2023
34edfc8
use new conv variable
bendichter Nov 23, 2023
2c5f905
revert to old logic for tc
bendichter Nov 23, 2023
6af64dd
blacken again
wimglenn Nov 23, 2023
b08ed21
blacken again sorry
wimglenn Nov 23, 2023
4b1ba61
new logic:
bendichter Nov 23, 2023
fd1a414
Merge remote-tracking branch 'origin/parse_flexible_dates' into parse…
bendichter Nov 23, 2023
44c97bb
doc update
wimglenn Nov 23, 2023
58c998b
test roundtrip every directive
wimglenn Nov 25, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Next Next commit
Add functionality to parse datetimes according C standard format codes.
bendichter committed Nov 9, 2023
commit 8237019b0915f99e0a8932e3e475448e06497c16
9 changes: 8 additions & 1 deletion README.rst
Original file line number Diff line number Diff line change
@@ -206,6 +206,10 @@ tt Time time
e.g. 10:21:36 PM -5:30
===== =========================================== ========

The type can also be a datetime format string, following the
`1989 C standard format codes`_, e.g. %Y-%m-%d. Any type containing %Y
or %y will be parsed and output as a ``datetime.datetime``.

Some examples of typed parsing with ``None`` returned if the typing
does not match:

@@ -244,7 +248,7 @@ a maximum. For example:
>>> parse('{:2d}{:2d}', '0440') # parsing two contiguous numbers
<Result (4, 40) {}>

Some notes for the date and time types:
Some notes for the special date and time types:

- the presence of the time part is optional (including ISO 8601, starting
at the "T"). A full datetime object will always be returned; the time
@@ -277,6 +281,9 @@ that this limit will be removed one day.
https://docs.python.org/3/library/string.html#format-string-syntax
.. _`Format Specification Mini-Language`:
https://docs.python.org/3/library/string.html#format-specification-mini-language
.. _`1989 C standard format codes`:
https://docs.python.org/3/library/datetime.html#strftime-and-strptime-format-codes



Result and Match Objects
33 changes: 32 additions & 1 deletion parse.py
Original file line number Diff line number Diff line change
@@ -5,6 +5,7 @@
# yes, I now have two problems
import re
import sys
from copy import copy
from datetime import datetime, time, tzinfo, timedelta
from decimal import Decimal
from functools import partial
@@ -272,6 +273,33 @@ def date_convert(
return d


dt_format_to_regex = {symbol: "[0-9]{2}" for symbol in "ymdIMSUW"}
dt_format_to_regex.update({"-" + symbol: "[0-9]{1,2}" for symbol in "ymdIMS"})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some reason I get a problem with these:

>>> datetime(2023, 1, 1).strftime("%Y/%-m/%-d")
'2023/1/1'

But

>>> parse("{:%Y/%-m/%-d}", "2023/1/1")
ValueError: '-' is a bad directive in format '%Y/%-m/%-d'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm I'll take a look tomorrow

Copy link
Contributor Author

@bendichter bendichter Nov 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now fixed and tested. It turns out strptime does not use the negative sign (I'm not quite sure why I thought it did). In strftime, %d outputs a zero-padded number e.g. "01". For strptime, %d matches a zero-padded number and can also match a non-zero padded number e.g. "1".

Copy link
Contributor Author

@bendichter bendichter Nov 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now the following works:

>>> parse("{:%Y/%m/%d}", "2023/01/01")
<Result (datetime.datetime(2023, 1, 1, 0, 0),) {}>
>>> parse("{:%Y/%m/%d}", "2023/1/1")
<Result (datetime.datetime(2023, 1, 1, 0, 0),) {}>

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's better. %j should get the same treatment, because it has the same issue (i.e. "%-j" doesn't work with strptime, and %j doesn't care if the number is zero-padded or not).

After removing the mapping for "-j", you can also remove the re.escape since the remaining characters are all letters and don't need escaping.


dt_format_to_regex.update(
{
"a": "(?:Sun|Mon|Tue|Wed|Thu|Fri|Sat)",
"A": "(?:Sunday|Monday|Tuesday|Wednesday|Thursday|Friday|Saturday)",
"Y": "[0-9]{4}",
"H": "[0-9]{1,2}",
"B": "(?:January|February|March|April|May|June|July|August|September|October|November|December)",
"b": "(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if these should be made locale-aware...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah good idea I'll look into it

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be addressed at same time as #1

"f": "[0-9]{6}",
"p": "(?:AM|PM)",
"z": "[+|-][0-9]{4}",
"j": "[0-9]{3}",
"-j": "[0-9]{1,3}",
}
)


def get_regex_for_datetime_format(format_):
regex = copy(format_)
for k, v in dt_format_to_regex.items():
regex = regex.replace(f"%{k}", v)
return regex


class TooManyFields(ValueError):
pass

@@ -327,7 +355,7 @@ def extract_format(format, extra_types):

# the rest is the type, if present
type = format
if type and type not in ALLOWED_TYPES and type not in extra_types:
if type and type not in ALLOWED_TYPES and type not in extra_types and "%Y" not in type and "%y" not in type:
raise ValueError('format spec %r not recognised' % type)

return locals()
@@ -666,6 +694,9 @@ def _handle_field(self, field):
self._type_conversions[
group
] = int_convert() # do not specify number base, determine it automatically
elif "%Y" in type or "%y" in type:
s = get_regex_for_datetime_format(type)
self._type_conversions[group] = lambda x, _: datetime.strptime(x, type)
elif type == 'ti':
s = r'(\d{4}-\d\d-\d\d)((\s+|T)%s)?(Z|\s*[-+]\d\d:?\d\d)?' % TIME_PAT
n = self._group_index
23 changes: 23 additions & 0 deletions test_parse.py
Original file line number Diff line number Diff line change
@@ -444,6 +444,29 @@ def test_two_datetimes(self):
self.assertEqual(r[0], datetime(1997, 7, 16))
self.assertEqual(r[1], datetime(2012, 8, 1))

def test_flexible_datetimes(self):
r = parse.parse('a {:%Y-%m-%d} b', "a 1997-07-16 b")
self.assertEqual(len(r.fixed), 1)
self.assertEqual(r[0], datetime(1997, 7, 16))

r = parse.parse('a {:%Y-%b-%d} b', "a 1997-Feb-16 b")
self.assertEqual(len(r.fixed), 1)
self.assertEqual(r[0], datetime(1997, 2, 16))

r = parse.parse('a {:%Y-%b-%d} {:d} b', "a 1997-Feb-16 8 b")
self.assertEqual(len(r.fixed), 2)
self.assertEqual(r[0], datetime(1997, 2, 16))

r = parse.parse('a {my_date:%Y-%b-%d} {num:d} b', "a 1997-Feb-16 8 b")
self.assertEqual((r.named["my_date"]), datetime(1997, 2, 16))
self.assertEqual((r.named["num"]), 8)

r = parse.parse('a {:%Y-%B-%d} b', "a 1997-February-16 b")
self.assertEqual(r[0], datetime(1997, 2, 16))

r = parse.parse('a {:%Y%m%d} b', "a 19970716 b")
self.assertEqual(r[0], datetime(1997, 7, 16))

def test_datetimes(self):
def y(fmt, s, e, tz=None):
p = parse.compile(fmt)