-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Separate out strptime.pyx from tslib #17342
Conversation
Please don't make changes and move things in the same commit - it completely hides the changes when we're reviewing - thanks! |
what is the rationale for moving this in the first place? |
Codecov Report
@@ Coverage Diff @@
## master #17342 +/- ##
==========================================
- Coverage 91.03% 90.99% -0.05%
==========================================
Files 162 162
Lines 49567 49567
==========================================
- Hits 45125 45103 -22
- Misses 4442 4464 +22
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #17342 +/- ##
==========================================
- Coverage 91.26% 91.24% -0.02%
==========================================
Files 163 163
Lines 49776 49807 +31
==========================================
+ Hits 45426 45447 +21
- Misses 4350 4360 +10
Continue to review full report at Codecov.
|
In order of "most specific to this PR" to "most general to this sequence of PRs":
One of the big goals of this sequence of PRs is handling this. |
Just pushed a commit to remove cython "non-standard" cython decorators and revert a camelCase fix. i.e. with the exception of a couple of flake8 whitespace changes, this should be a cut/paste of the existing functions. |
this will need a rebase after #17422 |
pandas/_libs/src/datetime.pxd
Outdated
|
||
|
||
cdef inline check_dts_bounds(pandas_datetimestruct *dts): | ||
cdef: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better to make this a bint, and just return True/False.
Then make another function which calls this one which actually raises the error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The longer-term solution that I've implemented but not PRed is to move a bunch of datetime.pxd into a pyx file where OutOfBoundsDatetime can be defined. This also ends up clearing out a lot of the setup.py dependencies (orthogonal to this discussion, but still).
Is returning a bint actually any simpler than re-raising? It isn't obvious whether True means "error state is True" or "everything is OK is True", whereas raising is unambiguous.
That said, I'll do it your way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah its ok, just don't like to muddle even more.
pandas/_libs/tslib.pyx
Outdated
error = True | ||
|
||
if error: | ||
# Retaining this is a kludge because I haven't figured out how |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see above
setup.py
Outdated
@@ -464,6 +464,8 @@ def pxd(name): | |||
'pandas/_libs/src/period_helper.h', | |||
'pandas/_libs/src/datetime.pxd'] | |||
|
|||
np_dtime_strs_srcs = ['pandas/_libs/src/datetime/np_datetime.c', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these sources are also used else where, just list them for now directly (e.g. like period.pyx and others do it). This doesn't add anything
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. This will lead to flake8 warnings that I'm happy to ignore.
pandas/_libs/tslib.pyx
Outdated
@@ -5356,317 +5074,4 @@ def shift_months(int64_t[:] dtindex, int months, object day=None): | |||
|
|||
#---------------------------------------------------------------------- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can take this comment out :<
pandas/_libs/tslib.pyx
Outdated
|
||
# def _strptime_time(data_string, format="%a %b %d %H:%M:%S %Y"): | ||
# return _strptime(data_string, format)[0] | ||
from tslibs.strptime import array_strptime |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you need this at all; this is called in exactly 1 place in the code simply call it there. then strptime is decoupled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds great. I've been importing things back into tslib
with #noqa
to keep the namespace unchanged. Is that unnecessary more generally?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we might not be fully flaking this type of thing (or maybe we are and have to use noqa). I would like to remove non-essential imports though. These are ll private namespaces.
Hello @jbrockmendel! Thanks for updating the PR.
Comment last updated on September 25, 2017 at 00:13 Hours UTC |
pandas/_libs/tslibs/strptime.pyx
Outdated
|
||
FUNCTIONS: | ||
_getlang -- Figure out what language is being used for the locale | ||
strptime -- Calculates the time struct represented by the passed-in string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know you copy pasted this, but this does not really suite for the module docstring, as this is only for part of the file.
So I would either move this inline to where those functions/classes are defined (as it was in tslib), or update it to reflect the actual file contents
If I understand correctly array_strptime
is the only 'public' (for pandas, as used outside this file) function? If so, I would focus on that in the module docstring
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a handful of style issues I'd like to circle back to (e.g. mixed_camel_Case), will add this to the list. For the time being I'm really eager to wrapping this up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not really a style issue, but how you copy pasted things. If you want to clean-up later, I would just put this in the middle close of the file close to the relevant code for now (like it was in tslib)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I understand. Are you OK with this being addressed in a follow-up PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rather you fix the docs now. just move these docs down to where they are used, and put array_strptime up front
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jbrockmendel I see you integrated them into the class/functions docstrings, which is good, but I would still include a comment where LocaleTime starts (the place where previously the comment was) to make it clear that this part is some kind of vendored code (it just comes from the standard library: https://github.com/python/cpython/blob/master/Lib/_strptime.py)
There is some other code related to parsing strings in tslib as well. Does it make sense to move those as well? |
Ah, I see there is already #17363 :-) |
pandas/_libs/tslibs/strptime.pyx
Outdated
|
||
cdef set _nat_strings = set(['NaT', 'nat', 'NAT', 'nan', 'NaN', 'NAN']) | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we move both of these to util.pxd? to avoid repeating code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_nat_strings
and what else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
checknull_with_nat, this seems like its repeating in lots of places.
needs a rebase |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
generally ok with this. a couple of doc comments. if you can move checknull_with_nat (to avoid repeating code all over the place) to a .pxd great. if not, can do later. but pls make an issue in that case.
pandas/_libs/tslibs/strptime.pyx
Outdated
|
||
FUNCTIONS: | ||
_getlang -- Figure out what language is being used for the locale | ||
strptime -- Calculates the time struct represented by the passed-in string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rather you fix the docs now. just move these docs down to where they are used, and put array_strptime up front
pandas/_libs/tslibs/strptime.pyx
Outdated
|
||
cdef set _nat_strings = set(['NaT', 'nat', 'NAT', 'nan', 'NaN', 'NAN']) | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
checknull_with_nat, this seems like its repeating in lots of places.
Ultimately both |
pandas/_libs/tslibs/strptime.pyx
Outdated
|
||
cdef inline bint _checknull_with_nat(object val): | ||
""" utility to check if a value is a nat or not """ | ||
return (val is None or |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move this to the top and add a TODO to consolidate in future
pandas/_libs/tslibs/strptime.pyx
Outdated
return 1 + days_to_week + day_of_week | ||
|
||
|
||
# def _strptime_time(data_string, format="%a %b %d %H:%M:%S %Y"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove commented
if is_coerce: | ||
iresult[i] = NPY_NAT | ||
continue | ||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe worth it having a tiny extension util.pyx that houses things like this (and checknull for example)
something that we can import anywhere that isn't a dependency itself
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually i bet inference.pyx can be this extension
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
inference.pyx can be this extension
? inference.pyx depends on a bunch of stuff, including tslib
.
The solution I've used locally is to put this into a tslibs.npy_dtime
which has no dependencies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, the name needs work though :>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, the name needs work though :>
OK, as long as it isn't another thing named datetime
.
thanks! |
This is the 2nd of an N part series of PRs to split
tslib
into independent modules.At the moment there is a big chunk of code at the bottom of
tslib
that looks like it was pasted in from somewhere else. The header for that section of the file reads# Don't even ask
. So I won't.The new
tslibs.strptime
only used in one place:array_strptime
is called intools.datetimes
. Other than that, nothing needs to be exposed, and nothing else intslib
relies on it.The one function from
tslib
thatstrptime
does need is_check_dts_bounds
. This (mostly) moves that up to datetime.pxd, which bothtslib
andstrptime
already import anyway.This is mostly a copy/paste of the existing functions+classes. I cleaned up a couple of places where variables used camelCase.
git diff upstream/master -u -- "*.py" | flake8 --diff