Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PERF: Implement get_freq_code in cython frequencies #17422

Merged
merged 6 commits into from
Sep 8, 2017

Conversation

jbrockmendel
Copy link
Member

There's a whole bunch of tseries.frequencies and tseries.offsets that should be moved into cython to improve Period (and PeriodIndex, and to a lesser extend Timestamp) performance. This PR just starts with the function get_freq_code`.

To keep the PR small, there are a bunch of things it doesnt do that can further improve things in follow-ups:

  • Nothing is cdef'd or declared in a pxd file
  • _libs.period still gets get_freq_code from tseries.frequencies instead of directly from tslibs.frequencies
  • A handful of now-duplicate constants in tseries.frequencies are not removed.
In [6]: per = pd.Period.now('min')
In [7]: %timeit per.month

Before:

The slowest run took 15.03 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 1.4 µs per loop

After:

The slowest run took 25.64 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 744 ns per loop
  • closes #xxxx
  • tests added / passed
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

@codecov
Copy link

codecov bot commented Sep 2, 2017

Codecov Report

Merging #17422 into master will decrease coverage by 0.02%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #17422      +/-   ##
==========================================
- Coverage   91.15%   91.13%   -0.03%     
==========================================
  Files         163      163              
  Lines       49581    49552      -29     
==========================================
- Hits        45198    45161      -37     
- Misses       4383     4391       +8
Flag Coverage Δ
#multiple 88.92% <100%> (-0.01%) ⬇️
#single 40.24% <100%> (-0.08%) ⬇️
Impacted Files Coverage Δ
pandas/tseries/frequencies.py 96.31% <100%> (-0.2%) ⬇️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.72% <0%> (-0.1%) ⬇️
pandas/core/indexes/datetimes.py 95.43% <0%> (+0.09%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1981b67...f48c5c3. Read the comment docs.

@codecov
Copy link

codecov bot commented Sep 2, 2017

Codecov Report

Merging #17422 into master will decrease coverage by 0.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #17422      +/-   ##
==========================================
- Coverage   91.15%   91.14%   -0.02%     
==========================================
  Files         163      163              
  Lines       49581    49561      -20     
==========================================
- Hits        45198    45171      -27     
- Misses       4383     4390       +7
Flag Coverage Δ
#multiple 88.92% <100%> (-0.01%) ⬇️
#single 40.23% <100%> (-0.09%) ⬇️
Impacted Files Coverage Δ
pandas/tseries/frequencies.py 96.31% <100%> (-0.2%) ⬇️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/io/parquet.py 65.38% <0%> (-0.37%) ⬇️
pandas/core/frame.py 97.72% <0%> (-0.1%) ⬇️
pandas/core/generic.py 91.98% <0%> (-0.01%) ⬇️
pandas/core/indexes/base.py 96.29% <0%> (ø) ⬆️
pandas/core/indexes/multi.py 96.9% <0%> (ø) ⬆️
pandas/io/parsers.py 95.46% <0%> (ø) ⬆️
pandas/plotting/_core.py 82.69% <0%> (+0.01%) ⬆️
pandas/core/indexes/datetimes.py 95.52% <0%> (+0.19%) ⬆️
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1981b67...77e2e75. Read the comment docs.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if u are claiming. perf boost then pls add asv

from util cimport is_integer_object


@cython.ccall
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does this decorator do?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cython.ccall
def foo(x):

is equivalent to cpdef foo(x):

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then use cpdef foo(x); don't try to change the existing style in a single PR.

@jbrockmendel
Copy link
Member Author

asv is throwing build-time errors. I've filed an issue there. For now the timeit results will have to do.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i still want actual asv code

@jbrockmendel
Copy link
Member Author

@jorisvandenbossche Any chance you want to play "good cop" here? I could use some help with asv, see error discussion here.

@gfyoung gfyoung added Clean Internals Related to non-user accessible pandas implementation labels Sep 3, 2017
@jreback
Copy link
Contributor

jreback commented Sep 3, 2017

@jbrockmendel you need to add asv benchmarks to he code you are pushing

you are touching highly perf sensistive code and it is very easy to break things
think of these like tests

@jbrockmendel
Copy link
Member Author

Just pushed a commit that adds asv checks for Period property lookups. I can't get asv to run it, would appreciate it if someone else would give it a shot.

@sinhrks sinhrks added the Frequency DateOffsets label Sep 4, 2017
from util cimport is_integer_object


@cython.ccall
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then use cpdef foo(x); don't try to change the existing style in a single PR.

@jbrockmendel
Copy link
Member Author

then use cpdef foo(x); don't try to change the existing style in a single PR.

Note that this code was moved from non-cython, so is not changing existing style. The idea was that where possible, retaining pure-python compatibility would be nice. But not a big deal, so will change.

@jbrockmendel
Copy link
Member Author

Looks like Travis error is an unrelated HTTP timeout.


import numpy as np
cimport numpy as np
np.import_array() # Without this, `is_integer_object` causes segfaults
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove the comment, this is standard to do.

@jreback jreback added this to the 0.21.0 milestone Sep 7, 2017
@jreback
Copy link
Contributor

jreback commented Sep 7, 2017

ok this looks good. just remove that comment and push again to get everything passing. ping on green.

@jreback jreback changed the title Implement get_freq_code in cython frequencies PERF: Implement get_freq_code in cython frequencies Sep 7, 2017
@jreback jreback added the Performance Memory or execution speed performance label Sep 7, 2017
@jbrockmendel
Copy link
Member Author

ping.

@jreback
Copy link
Contributor

jreback commented Sep 7, 2017

https://travis-ci.org/pandas-dev/pandas/jobs/272723119 is failing. FYI just because they are in the allowed failing section doesn't mean they actually should fail. this is the build test, which installs like a user, w/o a development build.

@@ -492,6 +493,8 @@ def pxd(name):
'sources': ['pandas/_libs/src/datetime/np_datetime.c',
'pandas/_libs/src/datetime/np_datetime_strings.c',
'pandas/_libs/src/period_helper.c']},
'_libs.tslibs.frequencies': {'pyxfile': '_libs/tslibs/frequencies',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this also needs to be listed in the setup function, IOW all sub-dirs have to be listed, put it after pandas._libs

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see comments

@jbrockmendel
Copy link
Member Author

Looks like the allowed-fail mentioned earlier is now OK, but the 3.5 OSX is now failing, says it just stalled and received no output for 10 minutes. The only change since last night's all-green is the entry in the setup.py packages.

@jreback jreback merged commit 7e4e8ac into pandas-dev:master Sep 8, 2017
jbrockmendel added a commit to jbrockmendel/pandas that referenced this pull request Sep 8, 2017
Implement .pxd file for tslibs.frequencies

cimport get_freq_code in period.pyx

Replace isinstance checks in period.pyx with C-equivalents form util

Remove unused imports from period.pyx

Remove code in tseries.frequencies that can now be imported from tslibs.frequencies
jreback pushed a commit that referenced this pull request Sep 8, 2017
jbrockmendel added a commit to jbrockmendel/pandas that referenced this pull request Sep 10, 2017
@jbrockmendel jbrockmendel deleted the cy_freqs branch October 30, 2017 16:25
alanbato pushed a commit to alanbato/pandas that referenced this pull request Nov 10, 2017
alanbato pushed a commit to alanbato/pandas that referenced this pull request Nov 10, 2017
No-Stream pushed a commit to No-Stream/pandas that referenced this pull request Nov 28, 2017
No-Stream pushed a commit to No-Stream/pandas that referenced this pull request Nov 28, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Clean Frequency DateOffsets Internals Related to non-user accessible pandas implementation Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants