-
-
Notifications
You must be signed in to change notification settings - Fork 31.6k
WIP: bpo-1100942: Add datetime.time.strptime and datetime.date.strptime #5578
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This PR is for a very long issue, since 2005. We have a PR in 2018 👍 |
I restarted the travis job. It still did not do the full CPython test suite. So please rebase :) |
Thanks, I didn't see your message, works on this issue today. |
ad32c42
to
a7b624a
Compare
@Mariatta rebased and the tests pass on the CIs |
Lib/_strptime.py
Outdated
the number of microseconds based on the input string and the | ||
format string.""" | ||
format string, and the GMT offset.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably consistently use "UTC offset", though I suppose it doesn't matter much since it's not a public-facing docstring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pganssle I am not really confident with the offsets and the datetime. Do you think we could keep it like that and propose an other bpo ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just meant use UTC instead of GMT everywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, just replace "GMT" with "UTC". datetime.tzinfo has a utcoffset() method, so "UTC" is preferred in datetime.
(And there are some subtle differences between GMT and UTC that I forgot.)
Lib/_strptime.py
Outdated
@@ -565,6 +566,10 @@ def _strptime(data_string, format="%a %b %d %H:%M:%S %Y"): | |||
hour, minute, second, | |||
weekday, julian, tz, tzname, gmtoff), fraction, gmtoff_fraction | |||
|
|||
date_specs = ('%a', '%A', '%b', '%B', '%c', '%d', '%j', '%m', '%U', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is missing %G
, %u
and %V
, the ISO 8601 week calendar directives.
Lib/_strptime.py
Outdated
@@ -565,6 +566,10 @@ def _strptime(data_string, format="%a %b %d %H:%M:%S %Y"): | |||
hour, minute, second, | |||
weekday, julian, tz, tzname, gmtoff), fraction, gmtoff_fraction | |||
|
|||
date_specs = ('%a', '%A', '%b', '%B', '%c', '%d', '%j', '%m', '%U', | |||
'%w', '%W', '%x', '%y', '%Y',) | |||
time_specs = ('%T', '%R', '%H', '%I', '%M', '%S', '%f', '%i', '%s',) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see any references to %T
, %R
, %i
or %s
in the docs or elsewhere in the code. What do these represent?
Lib/_strptime.py
Outdated
_time = _strptime_datetime(datetime_datetime, data_string, format) | ||
return _time.time() | ||
|
||
def _check_invalid_datetime_specs(fmt, specs, msg): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm understanding this correctly, this function seems to have the inverted sense of what I would expect. It seems that it checks if fmt
is a valid spec.
From the names of the variables and functions, I was thinking that this would be a whitelist not a blacklist. Does it make sense to switch to a whitelist approach? If not, can we maybe change specs
to be blacklist_specs
or something?
Lib/_strptime.py
Outdated
@@ -565,6 +566,10 @@ def _strptime(data_string, format="%a %b %d %H:%M:%S %Y"): | |||
hour, minute, second, | |||
weekday, julian, tz, tzname, gmtoff), fraction, gmtoff_fraction | |||
|
|||
date_specs = ('%a', '%A', '%b', '%B', '%c', '%d', '%j', '%m', '%U', | |||
'%w', '%W', '%x', '%y', '%Y',) | |||
time_specs = ('%T', '%R', '%H', '%I', '%M', '%S', '%f', '%i', '%s',) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have strong opinions about this, but my intuition is that these should be set
or frozenset
rather than tuple
. Are these tuples for performance reasons (I am not sure I know when exactly it's faster to use a set for "lookup membership" rather than a tuple or list).
Lib/_strptime.py
Outdated
def _strptime_datetime_date(data_string, format): | ||
"""Return a date based on the input string and the format string.""" | ||
if not format: | ||
raise ValueError("Date format is not valid.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line and its equivalent in _time
are not being hit. If I understand correctly this branch is only hit if format
is empty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
useless, thanks
tests = [('2004-12-01 13:02:47.197', | ||
'%Y-%m-%d %H:%M:%S.%f'), | ||
('2004-12-01', '%Y-%m-%d'),] | ||
for date_string, date_format in tests: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be good to use with self.subTest
for these parametrized tests.
Also, per the other comment I guess you need to add something like (
'12:30:15', '')` to get full coverage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you need two more test cases:
('1900-01-01 12:30', '%Y-%m-%d %H:%M'),
('12:30:15', ''),
date.strptime
has similarly missing tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the test with ('1900-01-01 12:30', '%Y-%m-%d %H:%M')
does not raise an exception but returns datetime.time(12, 30)
For the other test, yep, there is an issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fact that it doesn't raise an exception is an issue. I'm a bit surprised that it doesn't raise an exception on pure Python, that's a bug, because I'm pretty sure that:
datetime.time.strptime("1901-01-01 12:30", "%Y-%m-%d %H:%M")
does raise an exception.
Modules/_datetimemodule.c
Outdated
*/ | ||
if (emptyDatetime == NULL) { | ||
PyObject *emptyStringPair = Py_BuildValue("ss", "", ""); | ||
if (emptyStringPair == NULL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm interpreting PEP 7 correctly, these if
statements need curly braces. The relevant section is Code layout.
Hi @pganssle thank you for your review, I am going to fix it asap but I am not the author of the code, just the author of the PR. so, maybe I would need your help. Thanks |
@pganssle I just rebased my branch with master. I am going to work on this PR. Do you want to help me because you are mister dateutil ;-) |
@matrixise Sorry this is on my list but probably can't get to it until the end of the month. 😟 |
@pganssle ok, in this case, I will try to fix all the issues alone ;-) but I am not worried ;-) |
1d7a5b0
to
c286708
Compare
Hi, I just updated this PR with master. |
a6722e7
to
7526de9
Compare
@pganssle when you have time, could you review this PR, we started together, just comment when you find a mistake, thanks |
ping @pganssle ;-) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't had a chance to look at the PR thoroughly, but I have a first-pass review of things that should be changed.
I also haven't checked exactly how you are doing it, but I believe we may want to / be able to refactor the tests a bit to take advantage of the existing test suite for datetime.strptime
by separating out the date
-only and time-only
formats and reusing the tests with date
, time
and datetime
.
Additionally, I should note that it's unfortunate that if this is merged, we'll have time.strptime
and datetime.time.strptime
, the first of which returns a timetuple (which is actually more like a datetime), and the second returning a datetime.time
object. I don't see any way around this, but it will add confusion. :(
Doc/library/datetime.rst
Outdated
Return a :class:`date` corresponding to *date_string*, parsed according to | ||
*format*. :exc:`ValueError` is raised if the date string and format can't be | ||
parsed by `time.strptime`, or if it returns a value where the time part is | ||
nonzero. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here and below, you have nonzero
instead of non-zero
. If we keep this wording, the hyphen needs to be added.
However, this PR does not parse to a timetuple
or something and check if certain components were zero, it checks to see if the format string contains time
components, and fails in that case (Edit: I was looking at just the pure python implementation - I now realize that this is precisely what the C implementation is doing, but I think it's the wrong thing to do anyway). The way the docs are currently worded, you would expect this to work:
from datetime import date
date.strptime("2018-01-01 00:00", "%Y-%m-%d %H:%M")
But it will fail (rightly so, I think).
I believe you can change the last part of the last sentence as such:
- parsed by `time.strptime`, or if it returns a value where the time part is
- nonzero.
+ parsed by `time.strptime`, or if time components are present in the format string.
Also, I think it needs to be
:meth:`time.strptime`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One other note on the documentation, the datetime.strptime
documentation links to
:ref:`strftime-strptime-behavior`.
I think these should too.
Modules/_datetimemodule.c
Outdated
{ | ||
return new_date(GET_YEAR(self), | ||
GET_MONTH(self), | ||
GET_DAY(self)); | ||
} | ||
|
||
static PyObject * | ||
datetime_gettime(PyDateTime_DateTime *self, PyObject *Py_UNUSED(ignored)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know why this PyObject *Py_UNUSED(ignored)
is here, so per Chesterton's Fence, I'm not comfortable removing it. Any insight as to why it is here and what the consequences will be in removing it?
Possibly it will be a breaking change in the C ABI?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to git blame
, this is to silence a warning in gcc8
. Is this still relevant @siddhesh @serhiy-storchaka?
Edit: Looking closer, the merged PR is from April, so I'm guessing it is, but now I'm wondering if this will break the C ABI the other way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, datetime_gettime
should have two arguments, the second is ignored.
return NULL; | ||
} | ||
|
||
if (DATE_GET_HOUR(datetime) || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, if I understand correctly, this is inconsistent with the pure Python version. The pure Python version checks if there are any time components in the format string, whereas the C version parses to a datetime and then checks to see if there are any time components.
I think checking the format string is the better way to do this, as I mention in another comment, this approach seems to indicate that date.strptime("2018-01-01 00:00", "%Y-%m-%d %H:%M")
would succeed, which is not the right thing to do. I will comment on the test suite to add a test for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I will check the Python implementation, but in this case, we could migrate the Python implementation to the C layer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, port the Python implementation to the C layer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lol ;-) I am not an expert with the C-API, but I could try, it's a good exercise for my comprehension of the C-API of Python
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, you have to reimplement _check_invalid_datetime_specs() in C.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does it have to be implemented in C? There are already calls out to the _strptime Python-language module here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure, but because there is an other test in C for the date, and maybe we could put the verification process in only one function. @vstinner do you confirm?
tests = [('2004-12-01 13:02:47.197', | ||
'%Y-%m-%d %H:%M:%S.%f'), | ||
('2004-12-01', '%Y-%m-%d'),] | ||
for date_string, date_format in tests: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you need two more test cases:
('1900-01-01 12:30', '%Y-%m-%d %H:%M'),
('12:30:15', ''),
date.strptime
has similarly missing tests.
tests = [ | ||
('2004-12-01 13:02:47.197', '%Y-%m-%d %H:%M:%S.%f'), | ||
('01', '%M'), | ||
('02', '%H'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs at least two more test cases:
('2018-01-01 00:00', '%Y-%m-%d %H:%M'),
('2018-01-01', ''),
Both should fail.
@pganssle thanks for your review, I am going to update this PR asap. |
1ca284e
to
43a77af
Compare
ok, rebased with the last master. |
ok, I will check and try to fix that, but because I don't know this part
and the main patch was not mine, that will be difficult.
but okay for the fix. thanks for the review.
|
@pganssle Are you ready for a new review of this PR? I will continue my PR ;-) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest you check the discussion about the previous patches. I noticed I repeated some old comments and ideas.
Doc/library/datetime.rst
Outdated
Return a :class:`date` corresponding to *date_string*, parsed according to | ||
*format*. :exc:`ValueError` is raised if the date string and format can't be | ||
parsed by :meth:`time.strptime`, or if time components are present in the | ||
format string. For a complete list of formatting directives, see |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps clarify “time.strptime” does not refer to the strptime function in the time module. The reader may not see the meth RST code, HTML links, etc.
Doc/library/datetime.rst
Outdated
Return a :class:`time` corresponding to *date_string, parsed according to | ||
*format*. :exc:`ValueError` is raised if the date string and format can't be | ||
parsed by :meth:`time.strptime`, if it returns a value which isn't a time | ||
tuple, or if the date part is nonzero. For a complete list of formatting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
“By time.strptime” is redundant, unless you mean to refer to the strptime function in the time module.
What does “the date part is nonzero” mean? I would expect the time class to work without specifying a date, zero or otherwise.
Doc/library/datetime.rst
Outdated
is substituted for the year, and ``1`` for the month and day. | ||
The :meth:`date.strptime` class method creates a :class:`date` object from a | ||
string representing a date and a corresponding format string. :exc:`ValueError` | ||
raised if the format codes for hours, minutes, seconds, and microseconds are used. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ValueError is raised . . .
And microseconds should be or microseconds, unless you must combine all four codes to get the error.
But wouldn’t these details be better placed directly under the date and time classes, not in this common section about format codes in general?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have fixed the grammar issues, but for the details I don't know.
Doc/library/datetime.rst
Outdated
@@ -2023,13 +2046,13 @@ equivalent to ``datetime(*(time.strptime(date_string, format)[0:6]))``, except | |||
when the format includes sub-second components or timezone offset information, | |||
which are supported in ``datetime.strptime`` but are discarded by ``time.strptime``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to clarify time.strptime refers to the time module, not your new strptime method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Doc/library/datetime.rst
Outdated
|
||
.. classmethod:: time.strptime(date_string, format) | ||
|
||
Return a :class:`time` corresponding to *date_string, parsed according to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why `time`
here, but `.time`
below (with a dot)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 and there was an other erorr with *date_string*
datetime | ||
-------- | ||
|
||
Added :func:`~datetime.date.strptime` and :func:`~datetime.time.strptime`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I expect you lose the reference to date and time, so all you are saying is you added two functions with the same name repeated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vadmium yep, here, we just add the class methods datetime.date.strptime
and datetime.time.strptime
Modules/_datetimemodule.c
Outdated
return NULL; | ||
} | ||
|
||
assert(PyTuple_CheckExact(specs) || PyList_CheckExact(specs)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would it be a list? This seems like unused code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the first version was only for the tuples. After, I wanted to accept the list for my tests. but I am not against to only accept the tuples.
return NULL; | ||
} | ||
|
||
if (DATE_GET_HOUR(datetime) || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does it have to be implemented in C? There are already calls out to the _strptime Python-language module here.
"""Return a date based on the input string and the format string.""" | ||
msg = "'{!s}' {} not valid in date format specification." | ||
from _datetime import _check_invalid_datetime_specs | ||
if _check_invalid_datetime_specs(format, time_specs, msg): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not move the msg string into the check function, and just pass the bit that varies ('date' or 'time') as an argument? It would make the code easier to read.
Looks like the check function either raises an exception or returns True. It would be clearer to not use an if statement here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
at the beginning, this PR was a submitted patch by other contributors, I just wanted to convert it to a PR. and now, I try to fix all the issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@matrixise, @pganssle There's been a lot of work done on this one. Is it something we should try to move forward on? Thanks! |
This PR is stale because it has been open for 30 days with no activity. |
@matrixise, @pganssle, @csabella while there has been lots of work for this where are we at on this? Its around a bit over two years old for the PR but, these new features would be great! |
No activity since 2019. Someone has to step in and restart the work (ex: create a new PR and update the PR). |
I'm going to try to resolve the merge conflicts and make the requested changes. |
Looks like I'll have to start over. |
I close the inactive PR. |
Add datetime.date.strptime and datetime.time.strptime.
Fix the documentation of _strptime._strptime, the documentation was
wrong, return a 3-tuple and not a 2-tuple
Co-authored-by: Alexander Belopolsky alexander.belopolsky@gmail.com
Co-authored-by: Amaury Forgeot d'Arc amauryfa@gmail.com
Co-authored-by: Berker Peksag berker.peksag@gmail.com
Co-authored-by: Josh-sf josh-sf@users.sourceforge.net
Co-authored-by: Juarez Bochi jbochi@gmail.com
Co-authored-by: Maciej Szulik soltysh@gmail.com
Co-authored-by: Stéphane Wirtel stephane@wirtel.be
Co-authored-by: Matheus Vieira Portela matheus.v.portela@gmail.com
https://bugs.python.org/issue1100942