-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
math/bits: an integer bit twiddling library #18616
Comments
@brtzsnr perhaps you should submit this document to the proposal repo as outlined in the proposal process steps? Since it's already markdown following the template, it should be easy to copy-paste into a CL creating a file design/18616-bit-twiddling.md (or whatever). |
@cespare from https://github.com/golang/proposal "if the author wants to write a design doc, then they can write one". It started as a design doc, if there is strong feeling that I should submit this, I'm totally fine. |
I'd be ok with this, it's common enough functionality used in many algorithmic libraries and math/bits seems like an appropriate place. (For one, math/big also implements nlz (== clz).) There's probably some bike shedding about the names. I for one would prefer the functions to say what they return rather than what they do; which in turn may lead to shorter names. For instance:
and so forth. |
The proposal seems pretty clear and minimal - a design doc seems overkill. I think a CL would be more appropriate at this point. (That is a CL with API and basic implementation - for purpose of discussion in place of a design doc. We still need to decide if this proposal should be accepted or not.) |
@brtzsnr has already written the design document: it's in the issue description and it follows the the template. I assumed that there was some value in having these documents all in one location. |
The last arch listed in the hardware support table is "BSWAP" -- typo? |
Thanks for writing this up. The doc string for ctz and clz should specify the result when passed 0. I also prefer (e.g.) TrailingZeros32 to CountTrailingZeros32. I'd also be happy with Ctz32. It is concise, familiar to most, and easily googleable for the rest. |
Thanks for the proposal. |
How about we provide a package for all bit twiddling primitives defined by
the Hacker's Delight?
When designing the package, we don't need to consider whether the function
can be intrinsicified or not. The optimization can happen later. That is,
don't let low level implementation control the upper level package
interface. We want a good package interface, even if some of them can not
be mapped to single instruction.
|
@minux, fortunately, every bit twiddling function I've needed so far is exactly the ones that are in this proposal. |
Following Hacker's Delight has the advantage that we don't need to waste time arguing about the names. |
I'd like to add the following:
ReverseBits (for uint32 and uint64)
RotateLeft/Right (can be expanded inline with two shifts, but the compiler
can't always do the transformation due to shift range issues)
Maybe also two results form of add and substract? E.g.
func AddUint32(x, y, carryin uint32) (carryout, sum uint32) // carryin must
be 0 or 1.
And similarly for uint64.
And SqrtInt.
Many of my suggestions can't be implemented in a single instruction, but
that is the point: we want a good bit twiddling package, not a mere
intrisics package. Don't let the hardware limit high level package
interface.
|
Relatedly, functions to check whether an add/multiply will overflow. |
A related data point: there are various issues that have been filed for faster cgo. In one such example (proposal #16051), the fact that fast implementation of bsr/ctz/etc. might happen was mentioned as hopefully chipping away at the set of use cases where people writing go are tempted to use cgo.
Many people (myself included) are attracted to go because of the performance, so things like this current proposal for bit twiddling would help. (And yes, cgo is also now faster in 1.8, which is nice as well). |
@minux Can you elaborate what is the problem? |
@brtzsnr : I think what Minux is referring to is that when you write
Then we can make sure (via the &63) that the compiler knows that the range of k is bounded. |
On Fri, Jan 13, 2017 at 10:37 PM, Keith Randall ***@***.***> wrote:
@brtzsnr <https://github.com/brtzsnr> : I think what Minux is referring
to is that when you write (x << k) | (x >> (64-k)), you know you're using 0
<= k < 64, but the compiler can't read your mind, and it is not obviously
derivable from the code. If we had the function
func leftRot(x uint64, k uint) uint64 {
k &= 63
return (x << k) | (x >> (64-k))
}
Then we can make sure (via the &63) that the compiler knows that the range
of k is bounded.
So if the compiler can't prove the input is bounded, then we need an extra
AND. That's better than not generating the rotate assembly at all.
Right. If we define RotateLeft and RotateRight functions, we can formally
define the function rotate left/right k bits (no matter what k is). This is
similar to how our shift operations are defined. And this definition also
maps to actual rotate instruction nicely (unlike shifts, where our more
intuitive definition requires a compare on certain architectures).
|
How about byte and bit shuffling (and unshuffling) functions that are used by the blosc compression library? The slides (the shuffling starts from the slide 17). These functions can be SSE2/AVX2 accelerated. |
On Fri, Jan 13, 2017 at 11:24 PM, opennota ***@***.***> wrote:
How about byte and bit shuffling functions that are used by the blosc
<https://github.com/Blosc/c-blosc> compression library? The slides
<http://www.slideshare.net/PyData/blosc-py-data-2014> (the shuffling
starts from the slide 17). These functions can be SSE2/AVX2 accelerated.
SIMD is a bigger problem and it is out of the scope of this package. It's
#17373.
|
The current proposed functions have a Go native implementation much larger and disproportionally more expensive than optimum. On the other hand rotate is easy to write inline in a way that the compiler can recognize. @minux and also everyone else: Do you know where rotate left/right is used with a non-constant number of rotated bits? crypto/sha256 uses rotate for example, but with constant number of bits. |
It is easy for those who are familiar with the compiler's internals. Putting it in a math/bits package makes it easy for everyone.
Here's an example from #9337: https://play.golang.org/p/rmDG7MR5F9 In each invocation it is a constant number of rotated bits each time, but the function itself is not currently inlined, so it compiles without any rotate instructions. A math/bits library function would definitely help here. |
On Sat, Jan 14, 2017 at 5:05 AM, Alexandru Moșoi ***@***.***> wrote:
The current proposed functions have a Go native implementation much larger
and disproportionally more expensive than optimum. On the other hand rotate
is easy to write inline in a way that the compiler can recognize.
As I stressed many times in this issue, this is not the correct way to
design a Go package. It ties too much to the underlying hardware. What we
want is a good bit twiddling package that is generally useful. Whether
functions can be expanded into a single instruction is irrelevant as long
as the API interface is well-known and generally useful.
@minux <https://github.com/minux> and also everyone else: Do you know
where rotate left/right is used with a non-constant number of rotated bits?
crypto/sha256 uses rotate for example, but with constant number of bits.
Even if in the actual problem the number of bits of rotate is constant, the
compiler might not be able to see that. E.g. when the shift count is stored
in an array, or hide in loop counter or even caller of a not inlined
function.
One simple example of using variable number of rotate is an interesting
implementation of popcount:
// https://play.golang.org/p/ctNRXsBt0z
```go
func RotateRight(x, k uint32) uint32
func Popcount(x uint32) int {
var v uint32
for i := v - v; i < 32; i++ {
v += RotateRight(x, i)
}
return int(-int32(v))
}
```
|
@josharian The example looks like a bad inliner decision if rot is not inlined. Did you try to write the function as @minux: I agree with you. I'm not trying to tie the API to a particular instruction set; the hardware support is a nice bonus. My main focus is to find usages in the real code (not toy code) to understand the context, what is the best signature and how important is to provide everyone's favorite function. Compatibility promise will bite us later if we don't this properly now. For example: What should be the signature of add with carry return? |
Yes. It is part of a bug complaining about inlining. :)
It's not my code--and that's part of the point. And anyway, it is reasonable code. |
It would be nice to guarantee (in the package documentation?) that the rotate and byte-swap functions are constant-time operations so that they can be safely used in crypto algorithms. Possibly something to think about for other functions too. |
On Thu, Jan 19, 2017 at 11:50 AM, Michael Munday ***@***.***> wrote:
It would be nice to guarantee (in the package documentation?) that the
rotate and byte-swap functions are constant-time operations so that they
can be safely used in crypto algorithms. Possibly something to think about
for other functions too.
trivial implementation of byte swap is constant time, but if the underlying
architecture doesn't provide variable shift instructions, it will be hard
to guarantee constant time rotate implementation. Perhaps Go will never run
on those architectures though.
That said, there is also non-negligible chance that the underlying
microarchitecture uses a multi-cycle shifter, and we can't guarantee
constant time rotate on those implementations.
If strict constant time is required, perhaps the only way is write assembly
(and even in that case, it makes strong assumptions that all the used
instructions are itself constant time, which implicitly depends on the
microarchitecture.)
While I understand the need for such guarantees, but it's actually beyond
our control.
|
I'm inclined to agree with @minux. If you want constant-time crypto primitives, they should live in crypto/subtle. crypto/subtle can easily redirect to math/bits on platforms where those implementations have been verified. They can do something else if a slower but constant-time implementation is required. |
@cznic bits are written right-to-left though |
I'm also in favor of just As a counter-argument to @aclements concern about the direction: Providing a Also, the use of |
Bits are just binary digits. Digits in number in any base are written left to right - even in most, if not all, right-to-left writing systems. 123 is one-hundred-and-twenty-three, not three-hundred-and-twenty-one. That the power of the multiplicand for the digit decreases to the right is a different thing. Once again: I don't care about the direction, it's just that the intuitive one is a matter of personal imagination. |
I like Rotate. Least significant bit is intuitively 0 enough in my view.
|
Please keep both RotateLeft and RotateRight instead of doing something half of developers will misremember. It seems fine to handle negative numbers though. |
99% of developers will never program a rotate instruction, so the need for unambiguous direction is weak at best. A single call is enough. The problem that reawakened this discussion is that having both requires arguing about whether negative values are OK, and if not, what to do about them. By having only one, that whole argument falls away. It's cleaner design. |
I somewhat sympathize with the argument about clean design but it seems weird that you have to remove "Right" from "RotateRight", while keeping the same implementation, in order to achieve it. In practical terms the only way it seems to answer questions is by forcing people who see it to read the documentation, by way of the questions the name raises. What I'm saying is that Rotate raises one question for all people, answering it indirectly through documentation. On the other hand Rotate would probably prevent people from writing |
@golang/proposal-review discussed this and ended up at having just one function, but naming it |
For details see the discussion on the issue below. RotateLeft functions can now be inlined because the don't panic anymore for negative rotation counts. name old time/op new time/op delta RotateLeft-8 6.72ns ± 2% 1.86ns ± 0% -72.33% (p=0.016 n=5+4) RotateLeft8-8 4.41ns ± 2% 1.67ns ± 1% -62.15% (p=0.008 n=5+5) RotateLeft16-8 4.46ns ± 6% 1.65ns ± 0% -63.06% (p=0.008 n=5+5) RotateLeft32-8 4.50ns ± 5% 1.67ns ± 1% -62.86% (p=0.008 n=5+5) RotateLeft64-8 4.54ns ± 1% 1.85ns ± 1% -59.32% (p=0.008 n=5+5) https://perf.golang.org/search?q=upload:20170411.4 (Measured on 2.3 GHz Intel Core i7 running macOS 10.12.3.) For #18616. Change-Id: I0828d80d54ec24f8d44954a57b3d6aeedb69c686 Reviewed-on: https://go-review.googlesource.com/40394 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Popcount instructions on amd64 are not guaranteed to be present, so we must guard their call. Rewrite rules can't generate control flow at the moment, so the intrinsifier needs to generate that code. name old time/op new time/op delta OnesCount-8 2.47ns ± 5% 1.04ns ± 2% -57.70% (p=0.000 n=10+10) OnesCount16-8 1.05ns ± 1% 0.78ns ± 0% -25.56% (p=0.000 n=9+8) OnesCount32-8 1.63ns ± 5% 1.04ns ± 2% -35.96% (p=0.000 n=10+10) OnesCount64-8 2.45ns ± 0% 1.04ns ± 1% -57.55% (p=0.000 n=6+10) Update golang#18616 Change-Id: I4aff2cc9aa93787898d7b22055fe272a7cf95673 Reviewed-on: https://go-review.googlesource.com/38320 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>
For details see the discussion on the issue below. RotateLeft functions can now be inlined because the don't panic anymore for negative rotation counts. name old time/op new time/op delta RotateLeft-8 6.72ns ± 2% 1.86ns ± 0% -72.33% (p=0.016 n=5+4) RotateLeft8-8 4.41ns ± 2% 1.67ns ± 1% -62.15% (p=0.008 n=5+5) RotateLeft16-8 4.46ns ± 6% 1.65ns ± 0% -63.06% (p=0.008 n=5+5) RotateLeft32-8 4.50ns ± 5% 1.67ns ± 1% -62.86% (p=0.008 n=5+5) RotateLeft64-8 4.54ns ± 1% 1.85ns ± 1% -59.32% (p=0.008 n=5+5) https://perf.golang.org/search?q=upload:20170411.4 (Measured on 2.3 GHz Intel Core i7 running macOS 10.12.3.) For golang#18616. Change-Id: I0828d80d54ec24f8d44954a57b3d6aeedb69c686 Reviewed-on: https://go-review.googlesource.com/40394 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
CL https://golang.org/cl/40394 mentions this issue. |
CL https://golang.org/cl/41630 mentions this issue. |
The original proposal plus a few extra functions have been designed and implemented at this point. We may add to this library over time, but it seems reasonably "complete" for now. Most notably, we have not decided upon or implemented functionality to:
Personally I'm not convinced those belong into a "bits" package (maybe the tests do). Functions to implement multi-precision add/sub/mul would allow a pure Go implementation of some of the math/big kernels, but I don't believe the granularity is right: What we want there is optimized kernels working on vectors, and maximum performance for those kernels. I don't believe we can achieve that with Go code depending on add/sub/mul "intrinsics" alone. Thus, for now I like to close this issue as "done" unless there are major objections. Please speak up over the next week or so if you are against closing this. |
I'm in favor of adding functions along those lines. I strongly believe that they belong in their own package, if for no other reason than to give it a name that better reflects their collective functionality. 👍 on closing this issue and ❤️ for the work done so far. |
Closing since there were no objections. |
This adds math/bits intrinsics for OnesCount, Len, TrailingZeros on ppc64x. benchmark old ns/op new ns/op delta BenchmarkLeadingZeros-16 4.26 1.71 -59.86% BenchmarkLeadingZeros16-16 3.04 1.83 -39.80% BenchmarkLeadingZeros32-16 3.31 1.82 -45.02% BenchmarkLeadingZeros64-16 3.69 1.71 -53.66% BenchmarkTrailingZeros-16 2.55 1.62 -36.47% BenchmarkTrailingZeros32-16 2.55 1.77 -30.59% BenchmarkTrailingZeros64-16 2.78 1.62 -41.73% BenchmarkOnesCount-16 3.19 0.93 -70.85% BenchmarkOnesCount32-16 2.55 1.18 -53.73% BenchmarkOnesCount64-16 3.22 0.93 -71.12% Update #18616 I also made a change to bits_test.go because when debugging some failures the output was not quite providing the right argument information. Change-Id: Ia58d31d1777cf4582a4505f85b11a1202ca07d3e Reviewed-on: https://go-review.googlesource.com/41630 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com> Reviewed-by: Keith Randall <khr@golang.org>
This is a comment for future decisions regarding API, I understand this particular one is set.
But instead, cleverness ensues. What "a specialist function" means is such a simple thing, it was likely dismissed to quickly. Given a code sample, one likely already understands the rotate to occur and the direction even before encountering the line of code. Such code is usually already preceded by illustrative ascii documentation as it is. What's mentally turbulent is not that Go could have simply chosen RTL as a standard way of interpreting bits from an API perspective, but rather, I first pulled up the changes of 1.9 and find a RotateLeft with no counterpart and the doc giving an example of a negative stride. This is a mind-numbing committee-like decision that is very unfortunate to be landing in 1.9. I only plead to stick to context of usage for the future. All of this should have been self-evident with questions like, "why are we not providing a counterpart to RotateLeft, why are we panic'ing on negative strides or debating int vs uint for a stride"; ultimately, because I think what "a specialist function" means was simply dismissed to easily for not being clever. Let us please avoid cleverness in our justification of APIs. It shows in this 1.9 update. |
Change https://golang.org/cl/90835 mentions this issue: |
This adds math/bits intrinsics for OnesCount on arm64. name old time/op new time/op delta OnesCount 3.81ns ± 0% 1.60ns ± 0% -57.96% (p=0.000 n=7+8) OnesCount8 1.60ns ± 0% 1.60ns ± 0% ~ (all equal) OnesCount16 2.41ns ± 0% 1.60ns ± 0% -33.61% (p=0.000 n=8+8) OnesCount32 4.17ns ± 0% 1.60ns ± 0% -61.58% (p=0.000 n=8+8) OnesCount64 3.80ns ± 0% 1.60ns ± 0% -57.84% (p=0.000 n=8+8) Update #18616 Conflicts: src/cmd/compile/internal/gc/asm_test.go Change-Id: I63ac2f63acafdb1f60656ab8a56be0b326eec5cb Reviewed-on: https://go-review.googlesource.com/90835 Run-TryBot: Cherry Zhang <cherryyz@google.com> Reviewed-by: Cherry Zhang <cherryyz@google.com>
Previous discussions at #17373 and #10757.
Abstract
This proposal introduces a set of API for integer bit twiddling.
Background
This proposal introduces a set of API for integer bit twiddling. For this proposal we are interested in the following functions:
These functions were picked by surveying:
We limited ourselves to these four functions because other twiddling
tricks are very simple to implement using the proposed library,
or already available Go constructs.
We found implementations for a subset of the selected twiddling functions
in many packages including runtime, compiler and tools:
Many other packages implement a subset of these functions:
Similarly hardware providers have recognized the importance
of such functions and included machine level support.
Without hardware support these operations are very expensive.
All bit twiddling functions, except popcnt, are already implemented by runtime/internal/sys and receive special support from the compiler in order to "to help get the very best performance". However, the compiler support is limited to the runtime package and other Golang users have to reimplement the slower variant of these functions.
Proposal
We introduce a new std library
math/bits
with the following external API, to provides compiler / hardware optimized implementations of clz, ctz, popcnt and bswap functions.Rationale
Alternatives to this proposal are:
Compatibility
This proposal does not change or breaks any existing stdlib API and it conforms to compatibly guidelines.
Implementation
SwapBytes, TrailingZeros and LeadingZeros are already implemented. The only missing function is Ones which can be implemented similarly to the other functions. If this proposal is accepted it can be implemented in time for Go1.9.
Open issues (if applicable)
Names are hard, bike shed is in the comments.
Please suggest additional functions to be included in the comments. Ideally, please include where such function is used in stdlib (e.g. math/big), tools or popular packages.
So far the following functions have been proposed and are under consideration:
History
14.Jan: Clarified the output of TrailingZeros and LeadingZeros when the argument is 0.
14.Jan: Renamed methods: CountTrailingZeros -> TrailingZeros, CountLeadingZeros -> LeadingZeros, CountOnes -> Ones.
13.Jan: Fixed architecture name.
11.Jan: Initial proposal opened to public.
The text was updated successfully, but these errors were encountered: