-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
faster rand!(::MersenneTwister, ::Array{Bool}) #33721
Conversation
Seems like you would have to specifically reinterpret a pointer to the julia> bar(b) = @inbounds reinterpret(UInt8, b[1])
bar (generic function with 1 method)
julia> unsafe_load(reinterpret(Ptr{UInt8}, pointer(b)))
0xff
julia> bar(b)
0x01 At any julia> @code_llvm bar(b)
; @ REPL[110]:1 within `bar'
; Function Attrs: uwtable
define i8 @julia_bar_19418(%jl_value_t addrspace(10)* nonnull align 16 dereferenceable(40)) #0 {
top:
; ┌ @ array.jl:758 within `getindex'
%1 = addrspacecast %jl_value_t addrspace(10)* %0 to %jl_value_t addrspace(11)*
%2 = bitcast %jl_value_t addrspace(11)* %1 to i8 addrspace(13)* addrspace(11)*
%3 = load i8 addrspace(13)*, i8 addrspace(13)* addrspace(11)* %2, align 8
%4 = load i8, i8 addrspace(13)* %3, align 1
%5 = and i8 %4, 1
; └
ret i8 %5
} |
I'm not sure I feel comfortable with the fact that this implementation would be observable. It's easy to expect that at most 1 bit will be set per byte. On slack @jakobnissen was giving the example of loading 8 |
Ok, so I updated with a specific algorithm for |
@rfourquet Looks good to me, except I think you forgot to protect the array from reallocation/garbage collection with |
ceb2cd2
to
13a7c0a
Compare
Writing garbage is technically supposed to be OK because the array may have not been zero-init, we need to handle that case gracefully. But usually best to stick with valid bit representations. (especially when it’s faster!) |
Ah right, I didn't about uninitialized arrays. What makes the algo faster now is that its specialized, masking to not write garbage still costs maybe 5% performance, which seems like a fine compromise to me. |
For example, We could of course make that more robust, but usercode might make the same assumptions. |
Bump |
13a7c0a
to
b3513f5
Compare
I added some test code for this change in another PR, as said test code revealed a bug bug in |
This uses the same optimizations as for other bits types, and gives equivalent performance as for `UInt8` (at least 7x to 9x speedup in few tested cases).
b3513f5
to
751d0b7
Compare
@nanosoldier |
Your package evaluation job has completed - possible new issues were detected. A full report can be found here. cc @maleadt |
The three PkgEval failures seem unrelated. |
This uses the same optimizations as for other bits types,
and gives equivalent performance as for
UInt8
(at least7x to 9x speedup in few tested cases).
This seems to work, but I'm not sure whether it's valid to write arbitrary bit patterns at the memory pointed to by
pointer(b::Array{Bool})
.