Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

On path with a known exact float, extract the double with a fast macro. #21072

Merged
merged 1 commit into from
Jun 23, 2020

Conversation

rhettinger
Copy link
Contributor

We're already testing for an exact float, so take advantage of that information and extract the double with the fast macro.

Baseline timings

$ ./python.exe -m timeit -r 11 -s 'from math import floor' -s 'x=3.14' 'floor(x)'
10000000 loops, best of 11: 38.5 nsec per loop
$ ./python.exe -m timeit -r 11 -s 'from math import floor' -s 'x=0.0' 'floor(x)'
10000000 loops, best of 11: 38.3 nsec per loop
$ ./python.exe -m timeit -r 11 -s 'from math import floor' -s 'x=-3.14E32' 'floor(x)'
5000000 loops, best of 11: 69.3 nsec per loop
$ ./python.exe -m timeit -r 11 -s 'from math import floor' -s 'x=-323452345.14' 'floor(x)'
5000000 loops, best of 11: 53.4 nsec per loop

Timings with the patch:

$ ./python.exe -m timeit -r 11 -s 'from math import floor' -s 'x=3.14' 'floor(x)'
10000000 loops, best of 11: 36.5 nsec per loop
$ ./python.exe -m timeit -r 11 -s 'from math import floor' -s 'x=0.0' 'floor(x)'
10000000 loops, best of 11: 36.5 nsec per loop
$ ./python.exe -m timeit -r 11 -s 'from math import floor' -s 'x=-3.14E32' 'floor(x)'
5000000 loops, best of 11: 64.4 nsec per loop
$ ./python.exe -m timeit -r 11 -s 'from math import floor' -s 'x=-323452345.14' 'floor(x)'
5000000 loops, best of 11: 47 nsec per loop

While the timings all show improvements, I don't understand why the timings for floor() also depend on the magnitude of the inputs.

Copy link
Member

@tim-one tim-one left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know why, but Python 3 changed math.floor() to return an int instead of a float - so the larger the absolute value, the more time it takes to create an ever-larger int object. So it's not surprising that the time depends on the magnitude of the argument. I suppose PyLong_FromDouble() could be micro-optimized to exploit that, eventually, the trailing bits of the potentially giant int must all be 0.

>>> math.floor(3.14e32)
314000000000000005680822245916672

@rhettinger rhettinger merged commit 930f451 into python:master Jun 23, 2020
@miss-islington
Copy link
Contributor

Thanks @rhettinger for the PR 🌮🎉.. I'm working now to backport this PR to: 3.9.
🐍🍒⛏🤖

@miss-islington
Copy link
Contributor

Sorry @rhettinger, I had trouble checking out the 3.9 backport branch.
Please backport using cherry_picker on command line.
cherry_picker 930f4518aea7f3f0f914ce93c3fb92831a7e1d2a 3.9

@rhettinger rhettinger added needs backport to 3.9 only security fixes and removed needs backport to 3.9 only security fixes labels Jun 23, 2020
@miss-islington
Copy link
Contributor

Thanks @rhettinger for the PR 🌮🎉.. I'm working now to backport this PR to: 3.9.
🐍🍒⛏🤖

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Jun 23, 2020
…o. (pythonGH-21072)

(cherry picked from commit 930f451)

Co-authored-by: Raymond Hettinger <rhettinger@users.noreply.github.com>
@bedevere-bot
Copy link

GH-21102 is a backport of this pull request to the 3.9 branch.

@bedevere-bot bedevere-bot removed the needs backport to 3.9 only security fixes label Jun 23, 2020
fasih pushed a commit to fasih/cpython that referenced this pull request Jun 29, 2020
arun-mani-j pushed a commit to arun-mani-j/cpython that referenced this pull request Jul 21, 2020
@Mariatta Mariatta added the needs backport to 3.9 only security fixes label Sep 4, 2020
@miss-islington
Copy link
Contributor

Thanks @rhettinger for the PR 🌮🎉.. I'm working now to backport this PR to: 3.9.
🐍🍒⛏🤖

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Sep 4, 2020
…o. (pythonGH-21072)

(cherry picked from commit 930f451)

Co-authored-by: Raymond Hettinger <rhettinger@users.noreply.github.com>
@bedevere-bot
Copy link

GH-22108 is a backport of this pull request to the 3.9 branch.

@bedevere-bot bedevere-bot removed the needs backport to 3.9 only security fixes label Sep 4, 2020
miss-islington added a commit that referenced this pull request Sep 4, 2020
…o. (GH-21072)

(cherry picked from commit 930f451)

Co-authored-by: Raymond Hettinger <rhettinger@users.noreply.github.com>
hauntsaninja added a commit to hauntsaninja/cpython that referenced this pull request Sep 2, 2023
This matches a similar optimisation done for math.floor in
python#21072

Before:
```
λ ./python.exe -m timeit -r 11 -s 'from math import ceil' -s 'x=3.14' 'ceil(x)'
20000000 loops, best of 11: 13.3 nsec per loop
λ ./python.exe -m timeit -r 11 -s 'from math import ceil' -s 'x=0.0' 'ceil(x)'
20000000 loops, best of 11: 13.3 nsec per loop
λ ./python.exe -m timeit -r 11 -s 'from math import ceil' -s 'x=-3.14E32' 'ceil(x)'
10000000 loops, best of 11: 35.3 nsec per loop
λ ./python.exe -m timeit -r 11 -s 'from math import ceil' -s 'x=-323452345.14' 'ceil(x)'
10000000 loops, best of 11: 21.8 nsec per loop
```

After:
```
λ ./python.exe -m timeit -r 11 -s 'from math import ceil' -s 'x=3.14' 'ceil(x)'
20000000 loops, best of 11: 11.8 nsec per loop
λ ./python.exe -m timeit -r 11 -s 'from math import ceil' -s 'x=0.0' 'ceil(x)'
20000000 loops, best of 11: 11.7 nsec per loop
λ ./python.exe -m timeit -r 11 -s 'from math import ceil' -s 'x=-3.14E32' 'ceil(x)'
10000000 loops, best of 11: 32.7 nsec per loop
λ ./python.exe -m timeit -r 11 -s 'from math import ceil' -s 'x=-323452345.14' 'ceil(x)'
10000000 loops, best of 11: 20.1 nsec per loop
```
hauntsaninja added a commit to hauntsaninja/cpython that referenced this pull request Sep 3, 2023
This matches a similar optimisation done for math.floor in
python#21072

Before:
```
λ ./python.exe -m timeit -r 11 -s 'from math import ceil' -s 'x=3.14' 'ceil(x)'
20000000 loops, best of 11: 13.3 nsec per loop
λ ./python.exe -m timeit -r 11 -s 'from math import ceil' -s 'x=0.0' 'ceil(x)'
20000000 loops, best of 11: 13.3 nsec per loop
λ ./python.exe -m timeit -r 11 -s 'from math import ceil' -s 'x=-3.14E32' 'ceil(x)'
10000000 loops, best of 11: 35.3 nsec per loop
λ ./python.exe -m timeit -r 11 -s 'from math import ceil' -s 'x=-323452345.14' 'ceil(x)'
10000000 loops, best of 11: 21.8 nsec per loop
```

After:
```
λ ./python.exe -m timeit -r 11 -s 'from math import ceil' -s 'x=3.14' 'ceil(x)'
20000000 loops, best of 11: 11.8 nsec per loop
λ ./python.exe -m timeit -r 11 -s 'from math import ceil' -s 'x=0.0' 'ceil(x)'
20000000 loops, best of 11: 11.7 nsec per loop
λ ./python.exe -m timeit -r 11 -s 'from math import ceil' -s 'x=-3.14E32' 'ceil(x)'
10000000 loops, best of 11: 32.7 nsec per loop
λ ./python.exe -m timeit -r 11 -s 'from math import ceil' -s 'x=-323452345.14' 'ceil(x)'
10000000 loops, best of 11: 20.1 nsec per loop
```
hauntsaninja added a commit that referenced this pull request Oct 6, 2023
This matches a similar optimisation done for math.floor in
#21072
Glyphack pushed a commit to Glyphack/cpython that referenced this pull request Sep 2, 2024
)

This matches a similar optimisation done for math.floor in
python#21072
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Performance or resource usage skip issue skip news
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants