-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
PERF: Move Period class and related functions to Cython module #9440
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PERF: Move Period class and related functions to Cython module #9440
Conversation
try not to change the meat of any tests (moving around is ok) ideally we could merge this and then you can build enhancements on top in another pull request ping when passing |
I didn't move any tests (and won't without discussion, and probably a separate PR unless writing new tests). Very good. Can we get #9415 merged first or will you merge it as part of this? |
cd2d2ce
to
d85f32c
Compare
@jreback Fixed. Some commits are different so please start fresh. |
for i in range(n): | ||
val = arr[i] | ||
if val != iNaT: | ||
val = func(val, relation, &finfo) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just combine these into a single loop
In python this matters but not in cython where these are fast comparisons (you can do a mini profile to see if it matters in any event) - it's just more clear in the code
Can we hold off on that? I did not want to change any logic except to remove dependencies. |
yep no prob on changing code - pls do a perf run (vbenches) keep in mind it's not clear on how much of this code is well tested so if u think something is not being hit addrl tests to validate are welcome |
That is, not until the next PR. |
One comment, not on the actual content of this PR, but on the file structure: you created a new module I know for a lot of other cython/python modules this is the same (but I think we do this already to much). Possible options: make it |
@jorisvandenbossche Right now, tslib is a monstrosity. I tried using If you put Honestly, I put off making that choice until comments. I was confused about what to do since there are other files to move or rename too. |
@jorisvandenbossche I should also mentioned that names for namespaces of pandas Cython modules are defined in I think |
pls run versus master to isolate these changes |
I've been refactoring Period more today with success. Here's what I've got so far: 4a6c533c. I removed the tests for period_asfreq because it does not follow the behavior of Period.asfreq. Is that ok? Let me know if I should push it to this PR or wait. |
@@ -458,6 +455,11 @@ def pxd(name): | |||
'sources': ['pandas/src/datetime/np_datetime.c', | |||
'pandas/src/datetime/np_datetime_strings.c', | |||
'pandas/src/period.c']}, | |||
period=dict(pyxfile='period', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use {
here (just for consistency)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do I need to make this change here or #9415 to wrap it up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can close #9415 and just do it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I don't use this until this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I liked it better this way, and ext_data['parser'] is also like that. Do you want me to change that one too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just wanted them all to be consistent, using {...}
for dict is what I recall
ffdac9b
to
ba535d7
Compare
@jreback Gist comments don't give notifications. What are some good setting for vbench num calls and repeats? I'm running it on a desktop Intel i7, 16GB RAM, and mirrored RAID and not running anything intensive during vbench (like other tests). Maybe I should do some performance testing between commits. At the very least the first in the stream. |
ok merged #9415 I just wanted you to be familiar with the perf testing.... try you can use |
Just letting you know about lack of gist notifications. I thought they were a feature of GitHub too. I read your gist comment after other discussion. No worries. I'm going to try booting with no GUI to running vbench. I think it's getting in the way. I'm going to do some vbench tests on particular commits early and late in the stream to determine where my problem is. vbench documentation and searching about vbench hasn't shown me much. Thanks for the extra input. And baring with me! |
vbench docs are here: https://github.com/pydata/pandas/wiki/Performance-Testing additions always welcome to this |
Sorry, I meant that they shown me much about guidelines for number of calls and repeats parameters. I'll gladly add some guidelines about that if we can start an issue and get some consensus (after I examine that myself 😃). |
I can't get conclusive results on my main workstation. I always have one or more inconsistent outliers. Going to try some other machines and get back on this soon. |
Running the test on server grade hardware seems to reduce the variance. Also, my issue may have partly been due to using pyenv which compiles python by default with CFLAGS='-O3'. |
ok, pls rebase and squash to a smaller number of commits. pls let me know and we can get this in. |
Ok I'll do that in a little bit. I can get the commit count down to three. I have more work refactoring Period ready but will submit on another PR. |
that's all fine ping when revised and green and add a release note referencing this PR number (you can put in performance section) |
ba535d7
to
190845b
Compare
@jreback can you review the name changes in the last commit? I used |
this all looks good. (e.g. see #5294) for how all things should be named (e.g. like you are doing is good, these imports should be |
pls rebase on top of this: #9500 (I am going to merge soon). As the setup.py is failing on windows after a clean (the path comparison was wrong). With this (you will change |
Two successive vbench runs before next rebase: https://gist.github.com/blbradley/cfb6e46f439dd1b2efd4 Only thing to show there is no likely performance regressions. |
e017b33
to
8a2cb63
Compare
Moved many import statement to local scope in period module, similar to tslib. This fixes circular dependency problems.
8a2cb63
to
42b5416
Compare
Good to go. |
ordinal = self.ordinal | ||
else: | ||
ordinal = self.ordinal + (nanos // offset_nanos) | ||
return Period(ordinal=ordinal, freq=self.freq) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to put on your todo list, this code in _add_delta
looks a bit suspicous with all of this nested if-then (its the same as the original), but take a look in the next PR
@blbradley looks good; except that |
I was trying to not subclass from |
@blbradley I know, and there are inconsitencies there as well. Ok, on the list then is a MixIn in Cython that you can sub-class from (like |
Just make up a list of all of these issues, post it and then you can close via PRs. merging this. |
PERF: Move Period class and related functions to Cython module
@jreback thanks a bunch for your time! more coming soon. if any other issues arise from this, I'll try to help field them. |
I propose that Period should be a more basic type, like Timestamp.
I wanted to start with implementing frequencies with multipliers (see issue #7811), but the code for Period was all over the place. So, I moved all code directly related to Period to a Cython module. Changed a bit of the logic to make calls from Cython instead of Python to get rid of some Python dependencies. However, the 'meat' of Period is still there. I will be refactoring Period to use Cython over the next week or two.
I've only done a bit of performance testing, but the results from moving Period are good so far without any real Cython refactoring.
Note: First commit in this stream is from #9415 because I needed that bug fixed to make this work.
Comments most welcome!