Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parse arbitrary datetime strings #152

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 11 additions & 6 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ Parse strings using a specification based on the Python format() syntax.
The module is set up to only export ``parse()``, ``search()``, ``findall()``,
and ``with_pattern()`` when ``import \*`` is used:

>>> from parse import *
>>> from fparse import *

From there it's a simple thing to parse a string:

Expand Down Expand Up @@ -35,7 +35,7 @@ compile it once:

.. code-block:: pycon

>>> from parse import compile
>>> from fparse import compile
>>> p = compile("It's {}, I love it!")
>>> print(p)
<Parser "It's {}, I love it!">
Expand Down Expand Up @@ -132,7 +132,7 @@ format specification might have been used.

Most of `format()`'s `Format Specification Mini-Language`_ is supported:

[[fill]align][sign][0][width][.precision][type]
[[fill]align][0][width][.precision][type]

The differences between `parse()` and `format()` are:

Expand All @@ -143,8 +143,7 @@ The differences between `parse()` and `format()` are:
That is, the "#" format character is handled automatically by d, b, o
and x formats. For "d" any will be accepted, but for the others the correct
prefix must be present if at all.
- Numeric sign is handled automatically. A sign specifier can be given, but
has no effect.
- Numeric sign is handled automatically.
- The thousands separator is handled automatically if the "n" type is used.
- The types supported are a slightly different mix to the format() types. Some
format() types come directly over: "d", "n", "%", "f", "e", "b", "o" and "x".
Expand Down Expand Up @@ -193,6 +192,10 @@ tt Time time
e.g. 10:21:36 PM -5:30
===== =========================================== ========

The type can also be a datetime format string, following the
`1989 C standard format codes`_, e.g. %Y-%m-%d. Any type containing %Y
or %y will be parsed and output as a ``datetime.datetime``.

Some examples of typed parsing with ``None`` returned if the typing
does not match:

Expand Down Expand Up @@ -231,7 +234,7 @@ a maximum. For example:
>>> parse('{:2d}{:2d}', '0440') # parsing two contiguous numbers
<Result (4, 40) {}>

Some notes for the date and time types:
Some notes for the special date and time types:

- the presence of the time part is optional (including ISO 8601, starting
at the "T"). A full datetime object will always be returned; the time
Expand Down Expand Up @@ -264,6 +267,8 @@ that this limit will be removed one day.
http://docs.python.org/library/string.html#format-string-syntax
.. _`Format Specification Mini-Language`:
http://docs.python.org/library/string.html#format-specification-mini-language
.. _`1989 C standard format codes`:
https://docs.python.org/3/library/datetime.html#strftime-and-strptime-format-codes


Result and Match Objects
Expand Down
47 changes: 42 additions & 5 deletions parse.py → fparse.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
The module is set up to only export ``parse()``, ``search()``, ``findall()``,
and ``with_pattern()`` when ``import \*`` is used:

>>> from parse import *
>>> from fparse import *

From there it's a simple thing to parse a string:

Expand Down Expand Up @@ -35,7 +35,7 @@

.. code-block:: pycon

>>> from parse import compile
>>> from fparse import compile
>>> p = compile("It's {}, I love it!")
>>> print(p)
<Parser "It's {}, I love it!">
Expand Down Expand Up @@ -192,6 +192,10 @@
e.g. 10:21:36 PM -5:30
===== =========================================== ========

The type can also be a datetime format string, following the
`1989 C standard format codes`_, e.g. %Y-%m-%d. Any type containing %Y
or %y will be parsed and output as a ``datetime.datetime``.

Some examples of typed parsing with ``None`` returned if the typing
does not match:

Expand Down Expand Up @@ -230,7 +234,7 @@
>>> parse('{:2d}{:2d}', '0440') # parsing two contiguous numbers
<Result (4, 40) {}>

Some notes for the date and time types:
Some notes for the special date and time types:

- the presence of the time part is optional (including ISO 8601, starting
at the "T"). A full datetime object will always be returned; the time
Expand Down Expand Up @@ -263,6 +267,8 @@
http://docs.python.org/library/string.html#format-string-syntax
.. _`Format Specification Mini-Language`:
http://docs.python.org/library/string.html#format-specification-mini-language
.. _`1989 C standard format codes`:
https://docs.python.org/3/library/datetime.html#strftime-and-strptime-format-codes


Result and Match Objects
Expand Down Expand Up @@ -469,11 +475,12 @@

from __future__ import absolute_import

__version__ = '1.19.0'
__version__ = '1.20.0'

# yes, I now have two problems
import re
import sys
from copy import copy
from datetime import datetime, time, tzinfo, timedelta
from decimal import Decimal
from functools import partial
Expand Down Expand Up @@ -741,6 +748,33 @@ def date_convert(
return d


dt_format_to_regex = {symbol: "[0-9]{2}" for symbol in "ymdIMSUW"}
dt_format_to_regex.update({"-" + symbol: "[0-9]{1,2}" for symbol in "ymdIMS"})

dt_format_to_regex.update(
{
"a": "(?:Sun|Mon|Tue|Wed|Thu|Fri|Sat)",
"A": "(?:Sunday|Monday|Tuesday|Wednesday|Thursday|Friday|Saturday)",
"Y": "[0-9]{4}",
"H": "[0-9]{1,2}",
"B": "(?:January|February|March|April|May|June|July|August|September|October|November|December)",
"b": "(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)",
"f": "[0-9]{6}",
"p": "(?:AM|PM)",
"z": "[+|-][0-9]{4}",
"j": "[0-9]{3}",
"-j": "[0-9]{1,3}",
}
)


def get_regex_for_datetime_format(format_):
regex = copy(format_)
for k, v in dt_format_to_regex.items():
regex = regex.replace(f"%{k}", v)
return regex


class TooManyFields(ValueError):
pass

Expand Down Expand Up @@ -796,7 +830,7 @@ def extract_format(format, extra_types):

# the rest is the type, if present
type = format
if type and type not in ALLOWED_TYPES and type not in extra_types:
if type and type not in ALLOWED_TYPES and type not in extra_types and "%Y" not in type and "%y" not in type:
raise ValueError('format spec %r not recognised' % type)

return locals()
Expand Down Expand Up @@ -1135,6 +1169,9 @@ def _handle_field(self, field):
self._type_conversions[
group
] = int_convert() # do not specify number base, determine it automatically
elif "%Y" in type or "%y" in type:
s = get_regex_for_datetime_format(type)
self._type_conversions[group] = lambda x, _: datetime.strptime(x, type)
elif type == 'ti':
s = r'(\d{4}-\d\d-\d\d)((\s+|T)%s)?(Z|\s*[-+]\d\d:?\d\d)?' % TIME_PAT
n = self._group_index
Expand Down
4 changes: 2 additions & 2 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,14 +4,14 @@

from setuptools import setup

from parse import __version__, __doc__
from fparse import __version__, __doc__

with open('README.rst', 'w') as f:
f.write(__doc__)

# perform the setup action
setup(
name = "parse",
name = "fparse",
version = __version__,
description = "parse() is the opposite of format()",
long_description = __doc__,
Expand Down
Loading