Skip to content

WIP: bpo-1100942: Add datetime.time.strptime and datetime.date.strptime #5578

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 16 commits into from

Conversation

matrixise
Copy link
Member

@matrixise matrixise commented Feb 7, 2018

Add datetime.date.strptime and datetime.time.strptime.

Fix the documentation of _strptime._strptime, the documentation was
wrong, return a 3-tuple and not a 2-tuple

Co-authored-by: Alexander Belopolsky alexander.belopolsky@gmail.com
Co-authored-by: Amaury Forgeot d'Arc amauryfa@gmail.com
Co-authored-by: Berker Peksag berker.peksag@gmail.com
Co-authored-by: Josh-sf josh-sf@users.sourceforge.net
Co-authored-by: Juarez Bochi jbochi@gmail.com
Co-authored-by: Maciej Szulik soltysh@gmail.com
Co-authored-by: Stéphane Wirtel stephane@wirtel.be
Co-authored-by: Matheus Vieira Portela matheus.v.portela@gmail.com

https://bugs.python.org/issue1100942

@matrixise
Copy link
Member Author

This PR is for a very long issue, since 2005. We have a PR in 2018 👍

@Mariatta
Copy link
Member

Mariatta commented Feb 9, 2018

I restarted the travis job. It still did not do the full CPython test suite. So please rebase :)

@matrixise
Copy link
Member Author

Thanks, I didn't see your message, works on this issue today.

@matrixise
Copy link
Member Author

@Mariatta rebased and the tests pass on the CIs

Lib/_strptime.py Outdated
the number of microseconds based on the input string and the
format string."""
format string, and the GMT offset."""
Copy link
Member

@pganssle pganssle May 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably consistently use "UTC offset", though I suppose it doesn't matter much since it's not a public-facing docstring.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pganssle I am not really confident with the offsets and the datetime. Do you think we could keep it like that and propose an other bpo ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just meant use UTC instead of GMT everywhere.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, just replace "GMT" with "UTC". datetime.tzinfo has a utcoffset() method, so "UTC" is preferred in datetime.

(And there are some subtle differences between GMT and UTC that I forgot.)

Lib/_strptime.py Outdated
@@ -565,6 +566,10 @@ def _strptime(data_string, format="%a %b %d %H:%M:%S %Y"):
hour, minute, second,
weekday, julian, tz, tzname, gmtoff), fraction, gmtoff_fraction

date_specs = ('%a', '%A', '%b', '%B', '%c', '%d', '%j', '%m', '%U',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is missing %G, %u and %V, the ISO 8601 week calendar directives.

Lib/_strptime.py Outdated
@@ -565,6 +566,10 @@ def _strptime(data_string, format="%a %b %d %H:%M:%S %Y"):
hour, minute, second,
weekday, julian, tz, tzname, gmtoff), fraction, gmtoff_fraction

date_specs = ('%a', '%A', '%b', '%B', '%c', '%d', '%j', '%m', '%U',
'%w', '%W', '%x', '%y', '%Y',)
time_specs = ('%T', '%R', '%H', '%I', '%M', '%S', '%f', '%i', '%s',)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any references to %T, %R, %i or %s in the docs or elsewhere in the code. What do these represent?

Lib/_strptime.py Outdated
_time = _strptime_datetime(datetime_datetime, data_string, format)
return _time.time()

def _check_invalid_datetime_specs(fmt, specs, msg):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm understanding this correctly, this function seems to have the inverted sense of what I would expect. It seems that it checks if fmt is a valid spec.

From the names of the variables and functions, I was thinking that this would be a whitelist not a blacklist. Does it make sense to switch to a whitelist approach? If not, can we maybe change specs to be blacklist_specs or something?

Lib/_strptime.py Outdated
@@ -565,6 +566,10 @@ def _strptime(data_string, format="%a %b %d %H:%M:%S %Y"):
hour, minute, second,
weekday, julian, tz, tzname, gmtoff), fraction, gmtoff_fraction

date_specs = ('%a', '%A', '%b', '%B', '%c', '%d', '%j', '%m', '%U',
'%w', '%W', '%x', '%y', '%Y',)
time_specs = ('%T', '%R', '%H', '%I', '%M', '%S', '%f', '%i', '%s',)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have strong opinions about this, but my intuition is that these should be set or frozenset rather than tuple. Are these tuples for performance reasons (I am not sure I know when exactly it's faster to use a set for "lookup membership" rather than a tuple or list).

Lib/_strptime.py Outdated
def _strptime_datetime_date(data_string, format):
"""Return a date based on the input string and the format string."""
if not format:
raise ValueError("Date format is not valid.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line and its equivalent in _time are not being hit. If I understand correctly this branch is only hit if format is empty?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

useless, thanks

tests = [('2004-12-01 13:02:47.197',
'%Y-%m-%d %H:%M:%S.%f'),
('2004-12-01', '%Y-%m-%d'),]
for date_string, date_format in tests:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be good to use with self.subTest for these parametrized tests.

Also, per the other comment I guess you need to add something like ('12:30:15', '')` to get full coverage.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need two more test cases:

    ('1900-01-01 12:30', '%Y-%m-%d %H:%M'),
    ('12:30:15', ''),

date.strptime has similarly missing tests.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the test with ('1900-01-01 12:30', '%Y-%m-%d %H:%M') does not raise an exception but returns datetime.time(12, 30)

For the other test, yep, there is an issue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fact that it doesn't raise an exception is an issue. I'm a bit surprised that it doesn't raise an exception on pure Python, that's a bug, because I'm pretty sure that:

datetime.time.strptime("1901-01-01 12:30", "%Y-%m-%d %H:%M") does raise an exception.

*/
if (emptyDatetime == NULL) {
PyObject *emptyStringPair = Py_BuildValue("ss", "", "");
if (emptyStringPair == NULL)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm interpreting PEP 7 correctly, these if statements need curly braces. The relevant section is Code layout.

@matrixise
Copy link
Member Author

Hi @pganssle

thank you for your review, I am going to fix it asap but I am not the author of the code, just the author of the PR. so, maybe I would need your help. Thanks

@matrixise
Copy link
Member Author

@pganssle I just rebased my branch with master. I am going to work on this PR. Do you want to help me because you are mister dateutil ;-)

@pganssle
Copy link
Member

pganssle commented Oct 5, 2018

@matrixise Sorry this is on my list but probably can't get to it until the end of the month. 😟

@matrixise
Copy link
Member Author

@pganssle ok, in this case, I will try to fix all the issues alone ;-) but I am not worried ;-)

@matrixise
Copy link
Member Author

Hi, I just updated this PR with master.

@matrixise
Copy link
Member Author

@pganssle when you have time, could you review this PR, we started together, just comment when you find a mistake, thanks

@matrixise
Copy link
Member Author

ping @pganssle ;-)

Copy link
Member

@pganssle pganssle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't had a chance to look at the PR thoroughly, but I have a first-pass review of things that should be changed.

I also haven't checked exactly how you are doing it, but I believe we may want to / be able to refactor the tests a bit to take advantage of the existing test suite for datetime.strptime by separating out the date-only and time-only formats and reusing the tests with date, time and datetime.

Additionally, I should note that it's unfortunate that if this is merged, we'll have time.strptime and datetime.time.strptime, the first of which returns a timetuple (which is actually more like a datetime), and the second returning a datetime.time object. I don't see any way around this, but it will add confusion. :(

Return a :class:`date` corresponding to *date_string*, parsed according to
*format*. :exc:`ValueError` is raised if the date string and format can't be
parsed by `time.strptime`, or if it returns a value where the time part is
nonzero.
Copy link
Member

@pganssle pganssle Nov 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here and below, you have nonzero instead of non-zero. If we keep this wording, the hyphen needs to be added.

However, this PR does not parse to a timetuple or something and check if certain components were zero, it checks to see if the format string contains time components, and fails in that case (Edit: I was looking at just the pure python implementation - I now realize that this is precisely what the C implementation is doing, but I think it's the wrong thing to do anyway). The way the docs are currently worded, you would expect this to work:

from datetime import date
date.strptime("2018-01-01 00:00", "%Y-%m-%d %H:%M")

But it will fail (rightly so, I think).

I believe you can change the last part of the last sentence as such:

-     parsed by `time.strptime`, or if it returns a value where the time part is
-     nonzero.
+     parsed by `time.strptime`, or if time components are present in the format string.

Also, I think it needs to be

:meth:`time.strptime`

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One other note on the documentation, the datetime.strptime documentation links to

:ref:`strftime-strptime-behavior`.

I think these should too.

{
return new_date(GET_YEAR(self),
GET_MONTH(self),
GET_DAY(self));
}

static PyObject *
datetime_gettime(PyDateTime_DateTime *self, PyObject *Py_UNUSED(ignored))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know why this PyObject *Py_UNUSED(ignored) is here, so per Chesterton's Fence, I'm not comfortable removing it. Any insight as to why it is here and what the consequences will be in removing it?

Possibly it will be a breaking change in the C ABI?

Copy link
Member

@pganssle pganssle Nov 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to git blame, this is to silence a warning in gcc8. Is this still relevant @siddhesh @serhiy-storchaka?

Edit: Looking closer, the merged PR is from April, so I'm guessing it is, but now I'm wondering if this will break the C ABI the other way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, datetime_gettime should have two arguments, the second is ignored.

return NULL;
}

if (DATE_GET_HOUR(datetime) ||
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, if I understand correctly, this is inconsistent with the pure Python version. The pure Python version checks if there are any time components in the format string, whereas the C version parses to a datetime and then checks to see if there are any time components.

I think checking the format string is the better way to do this, as I mention in another comment, this approach seems to indicate that date.strptime("2018-01-01 00:00", "%Y-%m-%d %H:%M") would succeed, which is not the right thing to do. I will comment on the test suite to add a test for this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I will check the Python implementation, but in this case, we could migrate the Python implementation to the C layer?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, port the Python implementation to the C layer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol ;-) I am not an expert with the C-API, but I could try, it's a good exercise for my comprehension of the C-API of Python

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, you have to reimplement _check_invalid_datetime_specs() in C.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does it have to be implemented in C? There are already calls out to the _strptime Python-language module here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure, but because there is an other test in C for the date, and maybe we could put the verification process in only one function. @vstinner do you confirm?

tests = [('2004-12-01 13:02:47.197',
'%Y-%m-%d %H:%M:%S.%f'),
('2004-12-01', '%Y-%m-%d'),]
for date_string, date_format in tests:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need two more test cases:

    ('1900-01-01 12:30', '%Y-%m-%d %H:%M'),
    ('12:30:15', ''),

date.strptime has similarly missing tests.

tests = [
('2004-12-01 13:02:47.197', '%Y-%m-%d %H:%M:%S.%f'),
('01', '%M'),
('02', '%H'),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs at least two more test cases:

    ('2018-01-01 00:00', '%Y-%m-%d %H:%M'),
    ('2018-01-01', ''),

Both should fail.

@matrixise
Copy link
Member Author

@pganssle thanks for your review, I am going to update this PR asap.

@matrixise
Copy link
Member Author

ok, rebased with the last master.

@matrixise
Copy link
Member Author

matrixise commented Nov 11, 2018 via email

@matrixise
Copy link
Member Author

@pganssle Are you ready for a new review of this PR? I will continue my PR ;-)

Copy link
Member

@vadmium vadmium left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest you check the discussion about the previous patches. I noticed I repeated some old comments and ideas.

Return a :class:`date` corresponding to *date_string*, parsed according to
*format*. :exc:`ValueError` is raised if the date string and format can't be
parsed by :meth:`time.strptime`, or if time components are present in the
format string. For a complete list of formatting directives, see
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps clarify “time.strptime” does not refer to the strptime function in the time module. The reader may not see the meth RST code, HTML links, etc.

Return a :class:`time` corresponding to *date_string, parsed according to
*format*. :exc:`ValueError` is raised if the date string and format can't be
parsed by :meth:`time.strptime`, if it returns a value which isn't a time
tuple, or if the date part is nonzero. For a complete list of formatting
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

“By time.strptime” is redundant, unless you mean to refer to the strptime function in the time module.

What does “the date part is nonzero” mean? I would expect the time class to work without specifying a date, zero or otherwise.

is substituted for the year, and ``1`` for the month and day.
The :meth:`date.strptime` class method creates a :class:`date` object from a
string representing a date and a corresponding format string. :exc:`ValueError`
raised if the format codes for hours, minutes, seconds, and microseconds are used.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ValueError is raised . . .

And microseconds should be or microseconds, unless you must combine all four codes to get the error.

But wouldn’t these details be better placed directly under the date and time classes, not in this common section about format codes in general?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have fixed the grammar issues, but for the details I don't know.

@@ -2023,13 +2046,13 @@ equivalent to ``datetime(*(time.strptime(date_string, format)[0:6]))``, except
when the format includes sub-second components or timezone offset information,
which are supported in ``datetime.strptime`` but are discarded by ``time.strptime``.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to clarify time.strptime refers to the time module, not your new strptime method.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍


.. classmethod:: time.strptime(date_string, format)

Return a :class:`time` corresponding to *date_string, parsed according to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why `time` here, but `.time` below (with a dot)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 and there was an other erorr with *date_string*

datetime
--------

Added :func:`~datetime.date.strptime` and :func:`~datetime.time.strptime`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expect you lose the reference to date and time, so all you are saying is you added two functions with the same name repeated.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vadmium yep, here, we just add the class methods datetime.date.strptime and datetime.time.strptime

return NULL;
}

assert(PyTuple_CheckExact(specs) || PyList_CheckExact(specs));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would it be a list? This seems like unused code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the first version was only for the tuples. After, I wanted to accept the list for my tests. but I am not against to only accept the tuples.

return NULL;
}

if (DATE_GET_HOUR(datetime) ||
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does it have to be implemented in C? There are already calls out to the _strptime Python-language module here.

"""Return a date based on the input string and the format string."""
msg = "'{!s}' {} not valid in date format specification."
from _datetime import _check_invalid_datetime_specs
if _check_invalid_datetime_specs(format, time_specs, msg):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not move the msg string into the check function, and just pass the bit that varies ('date' or 'time') as an argument? It would make the code easier to read.

Looks like the check function either raises an exception or returns True. It would be clearer to not use an if statement here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at the beginning, this PR was a submitted patch by other contributors, I just wanted to convert it to a PR. and now, I try to fix all the issues.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but for this point, the first step was the conversion of _check_invalid_datetime_specs to C, and after start to improve the code with the other recommendations. (from @vstinner and @pganssle)

@matrixise matrixise changed the title bpo-1100942: Add datetime.time.strptime and datetime.date.strptime WIP: bpo-1100942: Add datetime.time.strptime and datetime.date.strptime Mar 24, 2019
@csabella
Copy link
Contributor

@matrixise, @pganssle There's been a lot of work done on this one. Is it something we should try to move forward on? Thanks!

@github-actions
Copy link

This PR is stale because it has been open for 30 days with no activity.

@github-actions github-actions bot added the stale Stale PR or inactive for long period of time. label Aug 15, 2022
@abalkin abalkin marked this pull request as draft February 8, 2023 14:59
@abalkin abalkin self-assigned this Feb 8, 2023
@abalkin abalkin linked an issue Feb 8, 2023 that may be closed by this pull request
@github-actions github-actions bot removed the stale Stale PR or inactive for long period of time. label May 1, 2023
@zitterbewegung
Copy link
Contributor

@matrixise, @pganssle, @csabella while there has been lots of work for this where are we at on this? Its around a bit over two years old for the PR but, these new features would be great!

@vstinner
Copy link
Member

where are we at on this?

No activity since 2019. Someone has to step in and restart the work (ex: create a new PR and update the PR).

@nineteendo
Copy link
Contributor

I'm going to try to resolve the merge conflicts and make the requested changes.

@nineteendo
Copy link
Contributor

Looks like I'll have to start over.

@vstinner
Copy link
Member

I close the inactive PR.

@vstinner vstinner closed this Jun 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add datetime.time.strptime and datetime.date.strptime