Skip to content

PERF: fixed performance degration becasue doesn't slice values array in to_native_types of DateTimeBlock #25765

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Apr 19, 2019
Merged

PERF: fixed performance degration becasue doesn't slice values array in to_native_types of DateTimeBlock #25765

merged 2 commits into from
Apr 19, 2019

Conversation

hksonngan
Copy link
Contributor

@hksonngan hksonngan commented Mar 18, 2019

        values = self.values
        i8values = self.values.view('i8')

        if slicer is not None:
            i8values = i8values[..., slicer]

        from pandas.io.formats.format import _get_format_datetime64_from_values
        fmt = _get_format_datetime64_from_values(values, date_format)

@hksonngan hksonngan changed the title fixed regression slicing values fixed regression slicing values to_native_types of DateTimeBlock Mar 18, 2019
@codecov
Copy link

codecov bot commented Mar 18, 2019

Codecov Report

Merging #25765 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #25765      +/-   ##
==========================================
+ Coverage   91.25%   91.25%   +<.01%     
==========================================
  Files         172      172              
  Lines       52977    52978       +1     
==========================================
+ Hits        48342    48344       +2     
+ Misses       4635     4634       -1
Flag Coverage Δ
#multiple 89.82% <100%> (ø) ⬆️
#single 41.74% <0%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/internals/blocks.py 94.08% <100%> (ø) ⬆️
pandas/util/testing.py 89.08% <0%> (+0.09%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 707c720...54502c0. Read the comment docs.

@codecov
Copy link

codecov bot commented Mar 18, 2019

Codecov Report

Merging #25765 into master will decrease coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #25765      +/-   ##
==========================================
- Coverage   91.99%   91.98%   -0.01%     
==========================================
  Files         175      175              
  Lines       52382    52383       +1     
==========================================
- Hits        48188    48184       -4     
- Misses       4194     4199       +5
Flag Coverage Δ
#multiple 90.54% <100%> (ø) ⬆️
#single 40.74% <0%> (-0.14%) ⬇️
Impacted Files Coverage Δ
pandas/core/internals/blocks.py 94.08% <100%> (ø) ⬆️
pandas/io/gbq.py 75% <0%> (-12.5%) ⬇️
pandas/core/frame.py 96.9% <0%> (-0.12%) ⬇️
pandas/util/testing.py 90.61% <0%> (-0.11%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c18c8be...d135af2. Read the comment docs.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tests pls

@jreback jreback added the Indexing Related to indexing on series/frames, not to indexes themselves label Mar 18, 2019
@jreback
Copy link
Contributor

jreback commented Mar 18, 2019

or if this is a perf fix, then show the updated asv's (and/or add one)

@hksonngan hksonngan changed the title fixed regression slicing values to_native_types of DateTimeBlock fixed performance degration becasue doesn not slice values array in to_native_types of DateTimeBlock Mar 19, 2019
@hksonngan hksonngan changed the title fixed performance degration becasue doesn not slice values array in to_native_types of DateTimeBlock fixed performance degration becasue doesn't slice values array in to_native_types of DateTimeBlock Mar 19, 2019
@hksonngan hksonngan changed the title fixed performance degration becasue doesn't slice values array in to_native_types of DateTimeBlock PERF: fixed performance degration becasue doesn't slice values array in to_native_types of DateTimeBlock Mar 19, 2019
@pep8speaks
Copy link

pep8speaks commented Mar 21, 2019

Hello @hksonngan! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2019-04-19 00:10:51 UTC

@hksonngan
Copy link
Contributor Author

hksonngan commented Mar 21, 2019

@jreback I add asv's test, test with my macair 2012.

· Running 2 total benchmarks (2 commits * 1 environments * 1 benchmarks)
[  0.00%] · For pandas commit db6993cd <master^2> (round 1/2):
[  0.00%] ·· Building for conda-py3.6-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt....
[  0.00%] ·· Benchmarking conda-py3.6-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
[ 25.00%] ··· Running (io.csv.ToCSVDatetimeBig.time_frame--).
[ 25.00%] · For pandas commit 4472dc9d <25708-fix-reregression-slicing> (round 1/2):
[ 25.00%] ·· Building for conda-py3.6-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt...
[ 25.00%] ·· Benchmarking conda-py3.6-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
[ 50.00%] ··· Running (io.csv.ToCSVDatetimeBig.time_frame--).
[ 50.00%] · For pandas commit 4472dc9d <25708-fix-reregression-slicing> (round 2/2):
[ 50.00%] ·· Benchmarking conda-py3.6-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
[ 75.00%] ··· io.csv.ToCSVDatetimeBig.time_frame                                    912±50ms
[ 75.00%] · For pandas commit db6993cd <master^2> (round 2/2):
[ 75.00%] ·· Building for conda-py3.6-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt...
[ 75.00%] ·· Benchmarking conda-py3.6-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
[100.00%] ··· io.csv.ToCSVDatetimeBig.time_frame                                   1.28±0.2s

@hksonngan
Copy link
Contributor Author

@jreback ping

@jreback
Copy link
Contributor

jreback commented Mar 22, 2019

@hksonngan I think you ran asv dev, you need to run asv continuous here it should the ratio of the change.

@jreback
Copy link
Contributor

jreback commented Mar 22, 2019

also pls add a whatsnew note. in performance improvements.

@hksonngan
Copy link
Contributor Author

hksonngan commented Mar 22, 2019

@jreback I think for small under 10.000obs is not gain to notice
This is with oservation > 1.000.000obs, we can see change so much. This is test with 1mil obs, I don't know why asv doesn't report ratio.

asv continuous -s -f 1.1 upstream/master 25708-fix-reregression-slicing -b ^io.csv.ToCSVDatetimeBig
· Creating environments
· Discovering benchmarks.
·· Uninstalling from conda-py3.6-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt..
·· Installing 0df6b571 <25708-fix-reregression-slicing> into conda-py3.6-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt....
· Running 2 total benchmarks (2 commits * 1 environments * 1 benchmarks)
[  0.00%] · For pandas commit db6993cd <master^2> (round 1/2):
[  0.00%] ·· Building for conda-py3.6-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt......
[  0.00%] ·· Benchmarking conda-py3.6-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
[ 25.00%] ··· Running (io.csv.ToCSVDatetimeBig.time_frame--).
[ 25.00%] · For pandas commit 0df6b571 <25708-fix-reregression-slicing> (round 1/2):
[ 25.00%] ·· Building for conda-py3.6-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt.....
[ 25.00%] ·· Benchmarking conda-py3.6-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
[ 50.00%] ··· Running (io.csv.ToCSVDatetimeBig.time_frame--).
[ 50.00%] · For pandas commit 0df6b571 <25708-fix-reregression-slicing> (round 2/2):
[ 50.00%] ·· Benchmarking conda-py3.6-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
[ 75.00%] ··· io.csv.ToCSVDatetimeBig.time_frame                       16.6±0.2s
[ 75.00%] · For pandas commit db6993cd <master^2> (round 2/2):
[ 75.00%] ·· Building for conda-py3.6-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt......
[ 75.00%] ·· Benchmarking conda-py3.6-Cython-matplotlib-numexpr-numpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
[100.00%] ··· io.csv.ToCSVDatetimeBig.time_frame                         22.4±1s

BENCHMARKS NOT SIGNIFICANTLY CHANGED.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also i am not clear you are actually fixing anything. pls run your example with timeit on master and with the PR.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls show a timeit before/after this PR

@jreback
Copy link
Contributor

jreback commented Apr 5, 2019

can you merge master and update

@hksonngan
Copy link
Contributor Author

@jreback Here is CProfile upstream/master and branch fix.

  • Upstream/Master
    CProfile_master
  • Fix in branch
    CProfile_Patch

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls also show the asv's results

@jreback jreback added this to the 0.25.0 milestone Apr 15, 2019
@jreback
Copy link
Contributor

jreback commented Apr 15, 2019

lgtm. ping on green.

@hksonngan
Copy link
Contributor Author

@jreback ping.

@hksonngan
Copy link
Contributor Author

'''
E:\dev\pandas\asv_bench>asv continuous -f 1.1 upstream/master HEAD -b ^io.csv.ToCSVDatetimeBig
· Creating environments
· Discovering benchmarks
·· Uninstalling from virtualenv-py3.6-Cython-matplotlib-numexpr-numpy-openpyxl-pytest-scipy-sqlalchemy-tables-xlrd-xlsxwriter-xlwt
·· Building 04afb4dd <25708-fix-reregression-slicing> for virtualenv-py3.6-Cython-matplotlib-numexpr-numpy-openpyxl-pytest-scipy-sqlalchemy-tables-xlrd-xlsxwriter-xlwt
·· Installing 04afb4dd <25708-fix-reregression-slicing> into virtualenv-py3.6-Cython-matplotlib-numexpr-numpy-openpyxl-pytest-scipy-sqlalchemy-tables-xlrd-xlsxwriter-xlwt
· Running 2 total benchmarks (2 commits * 1 environments * 1 benchmarks)
[ 0.00%] · For pandas commit 7b3bf2d (round 1/2):
[ 0.00%] ·· Building for virtualenv-py3.6-Cython-matplotlib-numexpr-numpy-openpyxl-pytest-scipy-sqlalchemy-tables-xlrd-xlsxwriter-xlwt
[ 0.00%] ·· Benchmarking virtualenv-py3.6-Cython-matplotlib-numexpr-numpy-openpyxl-pytest-scipy-sqlalchemy-tables-xlrd-xlsxwriter-xlwt
[ 25.00%] ··· Running (io.csv.ToCSVDatetimeBig.time_frame--).
[ 25.00%] · For pandas commit 04afb4dd <25708-fix-reregression-slicing> (round 1/2):
[ 25.00%] ·· Building for virtualenv-py3.6-Cython-matplotlib-numexpr-numpy-openpyxl-pytest-scipy-sqlalchemy-tables-xlrd-xlsxwriter-xlwt
[ 25.00%] ·· Benchmarking virtualenv-py3.6-Cython-matplotlib-numexpr-numpy-openpyxl-pytest-scipy-sqlalchemy-tables-xlrd-xlsxwriter-xlwt
[ 50.00%] ··· Running (io.csv.ToCSVDatetimeBig.time_frame--).
[ 50.00%] · For pandas commit 04afb4dd <25708-fix-reregression-slicing> (round 2/2):
[ 50.00%] ·· Benchmarking virtualenv-py3.6-Cython-matplotlib-numexpr-numpy-openpyxl-pytest-scipy-sqlalchemy-tables-xlrd-xlsxwriter-xlwt
[ 75.00%] ··· io.csv.ToCSVDatetimeBig.time_frame ok
[ 75.00%] ··· ========== ============
obs
---------- ------------
1000 7.81±3ms
1000000 4.73±0.01s
10000000 49.1±0.4s
========== ============

[ 75.00%] · For pandas commit 7b3bf2d (round 2/2):
[ 75.00%] ·· Building for virtualenv-py3.6-Cython-matplotlib-numexpr-numpy-openpyxl-pytest-scipy-sqlalchemy-tables-xlrd-xlsxwriter-xlwt
[ 75.00%] ·· Benchmarking virtualenv-py3.6-Cython-matplotlib-numexpr-numpy-openpyxl-pytest-scipy-sqlalchemy-tables-xlrd-xlsxwriter-xlwt
[100.00%] ··· io.csv.ToCSVDatetimeBig.time_frame ok
[100.00%] ··· ========== =============
obs
---------- -------------
1000 0±8000000ns
1000000 5.68±0.06s
10000000 2.16±0.01m
========== =============

   before           after         ratio
 [7b3bf2dc]       [04afb4dd]
 <master-upstream>       <25708-fix-reregression-slicing>
  •  2.16±0.01m        49.1±0.4s     0.38  io.csv.ToCSVDatetimeBig.time_frame(10000000)
    

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
'''

@jreback jreback merged commit f53bb06 into pandas-dev:master Apr 19, 2019
@jreback
Copy link
Contributor

jreback commented Apr 19, 2019

thanks @hksonngan

@hksonngan hksonngan deleted the 25708-fix-reregression-slicing branch April 19, 2019 14:11
@TomAugspurger
Copy link
Contributor

@hksonngan
Copy link
Contributor Author

@TomAugspurger this is real, because of setting this params = [1000, 10000, 100000].

@TomAugspurger
Copy link
Contributor

I don’t think so. The regression is for a different benchmark, no?

yhaque1213 pushed a commit to yhaque1213/pandas that referenced this pull request Apr 22, 2019
ryanreh99 added a commit to ryanreh99/pandas that referenced this pull request Apr 22, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Performance degradation when dumping large dataframe with np.datetime to csv
5 participants