bpo-29882: Add an efficient popcount method for integers #771

niklasf · 2017-03-22T17:32:01Z

https://bugs.python.org/issue29882

mention-bot · 2017-03-22T17:32:06Z

@niklasf, thanks for your PR! By analyzing the history of the files in this pull request, we identified @tim-one, @birkenfeld and @ncoghlan to be potential reviewers.

ilevkivskyi · 2017-03-22T23:18:47Z

Some compilers (gcc) provide hardware optimized __builtin_popcount function(s). Could these speed-up the implementation?

niklasf · 2017-03-22T23:26:23Z

@ilevkivskyi: Yes, using hardware popcnt makes this ever so slightly faster.

$ ./python -m timeit "bin(123456789).count('1')"  # equivalent
1000000 loops, best of 5: 286 nsec per loop
$ ./python -m timeit "(123456789).bit_count()"  # gnuc builtin
5000000 loops, best of 5: 44.8 nsec per loop
$ ./python -m timeit "(123456789).bit_count()"  # swar32
5000000 loops, best of 5: 46.3 nsec per loop

Previously I have submitted #594, but so far there is no consensus on wether or how to use these intrinsics. So I think it's best to consider this seperately.

Edit: I should also mention that there is a faster software implementation, using complete lookup tables for 16bit integers. This one is commonly used in chess engines if hardware popcnt is not available, but seems like a rather large overhead for a general purpose interpreter.

methane

Code looks OK.
But it seems no consensus about adding this method or not.

Objects/longobject.c

Doc/library/stdtypes.rst

Objects/longobject.c

niklasf · 2019-06-02T09:38:36Z

Thanks for reviewing. All done, in case we do decide to add this.

mdickinson · 2020-05-25T14:46:35Z

@niklasf I'm interested in getting this updated and merged, for Python 3.10. May I merge master into this branch and make the appropriate fixes (changing the "whats new" entry from 3.8 to 3.10)? Or would you prefer to do that yourself?

Doc/library/stdtypes.rst

niklasf · 2020-05-25T18:08:37Z

@mdickinson Of course, feel free to modify as needed.

Doc/library/stdtypes.rst

Objects/longobject.c

mdickinson

LGTM. (Admittedly that's not entirely an independent judgement at this point, but I haven't changed any of the core code or tests.)

Doc/library/stdtypes.rst

Objects/longobject.c

vstinner

LGTM.

LGTM: the implementation is correct. I'm not strongly excited by having this method since I never needed it, but it seems like some other people need it. So well, it looks reasonable to add it. Also, in general, I completely trust @mdickinson :-)

vstinner · 2020-05-27T13:03:25Z

Objects/longobject.c

+
+    /* Each digit has up to PyLong_SHIFT ones, so the accumulated bit count
+       from the first PY_SSIZE_T_MAX/PyLong_SHIFT digits can't overflow a
+       Py_ssize_t. */


Nice comment, thanks! Before the comment, I was surprised by the two loops.

vstinner · 2020-05-27T13:04:18Z

Objects/longobject.c

+    /* Each digit has up to PyLong_SHIFT ones, so the accumulated bit count
+       from the first PY_SSIZE_T_MAX/PyLong_SHIFT digits can't overflow a
+       Py_ssize_t. */
+    Py_ssize_t ndigits_fast = Py_MIN(ndigits, PY_SSIZE_T_MAX/PyLong_SHIFT);


I proposed to work use size_t for bit_count, but then I realzied that i is a Py_ssize_t, so it's better to leave the code as it is: work on Py_ssize_t for i and bit_count.

mdickinson

The current implementation of popcount_digit isn't quite as portable as it should be.

The fix should be trivial. I'll look into it.

mdickinson · 2020-05-29T11:34:13Z

Objects/longobject.c

+    u -= (u >> 1) & 0x55555555;
+    u = (u & 0x33333333) + ((u >> 2) & 0x33333333);
+    u = (u + (u >> 4)) & 0x0f0f0f0f;
+    return (u * 0x01010101) >> 24;


I just realised that this line isn't quite portable. The code is relying on the multiplication being a multiplication of 32-bit integers, so that the result is implicitly reduced modulo 2^32.

However, if we're on a platform with a 64-bit int then 0x01010101 will have type int (C99 6.4.4.1p5), u will be promoted to int (following the "usual arithmetic conversions" in C99 6.3.1.8p1), the multiplication will be a multiplication of 64-bit integers and we'll end up with the wrong result.

Fix implemented, in the shape of an extra (uint32_t) cast on the multiplication result along with some U suffixes on the integer constants.

Note that the U suffix on 0x01010101U is actually necessary to avoid undefined behaviour: consider a (hypothetical, but permissible under the C standard) machine with a 48-bit int. If we do u * 0x01010101 on that machine, then first the integer promotions are applied to both arguments, and they both end up being of type int. Then we do an int-by-int multiplication. With the right (wrong?) value of d, this could overflow, giving undefined behaviour.

But with the U suffix, we end up after the integer promotions multiplying an int by an unsigned int, and then the usual arithmetic conversions kick in and the multiplication is performed as type unsigned int, which has well-defined wraparound behaviour for the result.

The other U suffixes aren't strictly necessary; they're just there to satisfy my OCD and make the code easier to think about.

Python/hamt.c has basically a copy of this function, it's just written a little bit differently, but its return type is uint32_t. It exists since Python 3.8 and we didn't get any bug report :-)

I added pycore_byteswap.h. Maybe we could have a pycore_bitutils.h for such function and declare it as static inline, to reuse it here and in hamt.c. Or it can be done later ;-)

@pitrou also asked if "pycore_byteswap.h" filename was not too specific, maybe we could pick a more generic name to put more functions there. Like "bits and bytes utilities" :-)

Note: _testinternalcapi has tests for the byte swap functions ;-) We could have tests on this popcount function as well to detect if a compiler "miscompiles" this function.

Oh, I'm volunteer to move this function as static inline function and add the new test ;-) As I wrote, it can be done later.

It exists since Python 3.8 and we didn't get any bug report :-)

That's probably because it was never run on a machine with a 64-bit C int. Which is not surprising, because it's very hard to find such a machine these days. But who knows what could happen in the future. :-)

The version of the code in hamt.c has the same issues: it'll return the wrong result with a 64-bit C int, and it can technically invoke undefined behaviour. The return type of uint32_t on that function doesn't help, because you already have the wrong number / undefined behaviour before you hit the return.

Realistically, the undefined behaviour is unlikely to ever be an issue in practice. But the wrong result with a 64-bit int isn't entirely implausible.

I'll merge this, then we can look at combining the implementations or fixing the hamt.c code as necessary.

bedevere-bot · 2020-05-29T11:34:55Z

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

mdickinson

Re-approving after fixing my own portability objection.

mdickinson · 2020-05-29T16:27:56Z

Merging. @niklasf: Thank you!

bedevere-bot · 2020-05-29T16:28:05Z

@mdickinson: Please replace # with GH- in the commit message next time. Thanks!

vstinner · 2020-05-29T16:28:56Z

OMG, I was merging the PR (I was editing the commit message) while @mdickinson merged it :-D Thanks @mdickinson ;-)

niklasf · 2020-05-29T16:29:35Z

Thanks for getting this over the finish line!

vstinner · 2020-05-29T16:32:15Z

PR created in 2017! It has an identifier lower than 1000 rather than the latest PR got the identifier 20517! Thanks @niklasf!

vstinner · 2020-05-29T17:23:18Z

I wrote PR #20518 which adds a _Py_popcount32() static inline function based on popcount_digit().

* 'master' of github.com:python/cpython: (497 commits) bpo-40061: Fix a possible refleak in _asynciomodule.c (pythonGH-19748) bpo-40798: Generate a different message for already removed elements (pythonGH-20483) closes bpo-29017: Update the bindings for Qt information with PySide2 (pythonGH-20149) bpo-39885: Make IDLE context menu cut and copy work again (pythonGH-18951) bpo-29882: Add an efficient popcount method for integers (python#771) Further de-linting of zoneinfo module (python#20499) bpo-40780: Fix failure of _Py_dg_dtoa to remove trailing zeros (pythonGH-20435) Indicate that abs() method accept argument that implement __abs__(), just like call() method in the docs (pythonGH-20509) bpo-39040: Fix parsing of email mime headers with whitespace between encoded-words. (pythongh-17620) bpo-40784: Fix sqlite3 deterministic test (pythonGH-20448) bpo-30064: Properly skip unstable loop.sock_connect() racing test (pythonGH-20494) Note the output ordering of combinatoric functions (pythonGH-19732) bpo-40474: Updated coverage.yml to better report coverage stats (python#19851) bpo-40806: Clarify that itertools.product immediately consumes its inpt (pythonGH-20492) bpo-1294959: Try to clarify the meaning of platlibdir (pythonGH-20332) bpo-37878: PyThreadState_DeleteCurrent() was not removed (pythonGH-20489) bpo-40777: Initialize PyDateTime_IsoCalendarDateType.tp_base at run-time (pythonGH-20493) bpo-40755: Add missing multiset operations to Counter() (pythonGH-20339) bpo-25920: Remove socket.getaddrinfo() lock on macOS (pythonGH-20177) bpo-40275: Fix test.support.threading_helper (pythonGH-20488) ...

HannesGitH · 2023-01-26T15:55:56Z

code

import random
import timeit

lens = range(1, 10000)
maxs = [2**i for i in lens]

# array with random numbers in [0, maxs[i]-1]
randos = [max + random.randrange(max) for max in maxs]

# time bin().count('1') vs bit_count() for each random number
bin_times = [timeit.timeit(lambda: bin(rando).count('1'), number=100) for rando in randos]
bit_count_times = [timeit.timeit(lambda: rando.bit_count(), number=100) for rando in randos]

# create plot
from matplotlib import pyplot as plt

plt.loglog(lens, bin_times, label='bin().count(\'1\')')
plt.loglog(lens, bit_count_times, label='bit_count()')
plt.legend()
plt.show()

mdickinson · 2023-01-26T16:59:58Z

@HannesGitH Why the comment? Is there some action that needs to be taken?

HannesGitH · 2023-01-26T18:34:16Z

@HannesGitH Why the comment? Is there some action that needs to be taken?

nope, all fine, sorry

just wanted to add the speedup graph for anyone who stumbles upon this and is curious

the-knights-who-say-ni added the CLA signed label Mar 22, 2017

Mariatta added needs rebase and removed needs rebase labels Oct 9, 2017

brettcannon added the awaiting review label Feb 21, 2018

methane reviewed Jun 4, 2018

View reviewed changes

Objects/longobject.c Show resolved Hide resolved

mdickinson reviewed Jun 1, 2019

View reviewed changes

Doc/library/stdtypes.rst Outdated Show resolved Hide resolved

mdickinson reviewed Jun 1, 2019

View reviewed changes

Doc/library/stdtypes.rst Outdated Show resolved Hide resolved

mdickinson reviewed Jun 1, 2019

View reviewed changes

Objects/longobject.c Outdated Show resolved Hide resolved

niklasf force-pushed the bpo-29882-popcount branch from 1c6e9fa to b6e46da Compare June 2, 2019 09:21

bpo-29882: Add an efficient popcount method for integers

74e3ea6

niklasf force-pushed the bpo-29882-popcount branch from b6e46da to 74e3ea6 Compare June 2, 2019 09:33

hexchain mentioned this pull request May 21, 2020

ENH: Expose bit_count ufunc equivalent to the scalar methods numpy/numpy#16325

Closed

mdickinson reviewed May 25, 2020

View reviewed changes

Doc/library/stdtypes.rst Outdated Show resolved Hide resolved

mdickinson reviewed May 25, 2020

View reviewed changes

Doc/library/stdtypes.rst Outdated Show resolved Hide resolved

mdickinson added 3 commits May 26, 2020 10:00

Merge remote-tracking branch 'upstream/master' into bpo-29882-popcount

16cc8e5

Update 'sign bit' and versionadded in docs

19da190

Add entry to whatsnew document

070ac45

mdickinson reviewed May 26, 2020

View reviewed changes

Doc/library/stdtypes.rst Show resolved Hide resolved

mdickinson reviewed May 26, 2020

View reviewed changes

Objects/longobject.c Outdated Show resolved Hide resolved

mdickinson approved these changes May 26, 2020

View reviewed changes

bedevere-bot added awaiting merge and removed awaiting review labels May 26, 2020

vstinner reviewed May 26, 2020

View reviewed changes

Doc: use positive example, mention population count

b87f162

vstinner approved these changes May 27, 2020

View reviewed changes

Merge remote-tracking branch 'upstream/master' into bpo-29882-popcount

46d8bc9

mdickinson suggested changes May 29, 2020

View reviewed changes

bedevere-bot removed the awaiting merge label May 29, 2020

bedevere-bot added the awaiting changes label May 29, 2020

Ensure popcount_digit is portable with 64-bit ints

3e8422b

mdickinson approved these changes May 29, 2020

View reviewed changes

bedevere-bot added awaiting merge and removed awaiting changes labels May 29, 2020

mdickinson merged commit 8bd216d into python:master May 29, 2020

bedevere-bot removed the awaiting merge label May 29, 2020

vstinner mentioned this pull request May 29, 2020

bpo-29882: Add _Py_popcount32() function #20518

Merged

This was referenced Jan 22, 2022

bpo-29882: _Py_popcount32() doesn't need 64x64 multiply #30774

Merged

bpo-29882: Fix portability bug introduced in GH-30774 #30794

Merged

vihan29 mentioned this pull request Oct 1, 2022

Gaussian Elimination brc7/caramel#4

Merged

niklasf mannequin mentioned this pull request Nov 13, 2022

Add an efficient popcount method for integers #74068

Closed

Uh oh!

bpo-29882: Add an efficient popcount method for integers #771

bpo-29882: Add an efficient popcount method for integers #771

Uh oh!

Conversation

niklasf commented Mar 22, 2017 • edited by bedevere-bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mention-bot commented Mar 22, 2017

Uh oh!

ilevkivskyi commented Mar 22, 2017

Uh oh!

niklasf commented Mar 22, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

methane left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

niklasf commented Jun 2, 2019

Uh oh!

mdickinson commented May 25, 2020

Uh oh!

Uh oh!

Uh oh!

niklasf commented May 25, 2020

Uh oh!

Uh oh!

Uh oh!

mdickinson left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vstinner left a comment

Choose a reason for hiding this comment

Uh oh!

vstinner May 27, 2020

Choose a reason for hiding this comment

Uh oh!

vstinner May 27, 2020

Choose a reason for hiding this comment

Uh oh!

mdickinson left a comment

Choose a reason for hiding this comment

Uh oh!

mdickinson May 29, 2020

Choose a reason for hiding this comment

Uh oh!

mdickinson May 29, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vstinner May 29, 2020

Choose a reason for hiding this comment

Uh oh!

vstinner May 29, 2020

Choose a reason for hiding this comment

Uh oh!

mdickinson May 29, 2020

Choose a reason for hiding this comment

Uh oh!

mdickinson May 29, 2020

Choose a reason for hiding this comment

Uh oh!

bedevere-bot commented May 29, 2020

Uh oh!

mdickinson left a comment

Choose a reason for hiding this comment

Uh oh!

mdickinson commented May 29, 2020

Uh oh!

bedevere-bot commented May 29, 2020

niklasf commented Mar 22, 2017 •

edited by bedevere-bot

Loading

niklasf commented Mar 22, 2017 •

edited

Loading

mdickinson May 29, 2020 •

edited

Loading

HannesGitH commented Jan 26, 2023 •

edited

Loading

HannesGitH commented Jan 26, 2023 •

edited

Loading