-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
push!/pop! always does a ccall #24909
Comments
I think the broader solution here would be to reimplement |
That would definitely be very cool. The most interesting thing would be using the 32 bytes remaining in Vector to store very small vectors inline (so that the data and the array_T are in the same cache line). The second most interesting thing would be to make non-resizeable arrays reliably be one cache-line before their data, for the net effect that type-inferred array access can skip one level of indirection (this placement is already relatively reliable, but afaik neither used for prefetching nor for removal of indirection). |
@quinnj Would we really need a something like:
Or would it be too hard to make that performant? |
There's quite a bit more involved here; the julia GC has no awareness of your |
Yes, the simple answer would be "because malloc is so very slow". |
Upon reconsideration, the push! is really bad, also in practice. Consider:
And then
|
Introducing, the mutable struct PushVector{T, A<:AbstractVector{T}} <: AbstractVector{T}
v::A
l::Int
end
PushVector{T}() where {T} = PushVector(Vector{T}(undef, 4), 0)
Base.length(v::PushVector) = v.l
Base.size(v::PushVector) = (v.l,)
@inline function Base.getindex(v::PushVector, i)
@boundscheck checkbounds(v, i)
@inbounds v.v[i]
end
function Base.push!(v::PushVector, i)
v.l += 1
if v.l > length(v.v)
resize!(v.v, v.l * 2)
end
v.v[v.l] = i
return v
end
finish!(v::PushVector) = resize!(v.v, v.l)
using BenchmarkTools
function pushit(v)
for i in 1:10^4
push!(v, i)
end
end @btime begin
p = PushVector{Int64}()
pushit(p)
finish!(p)
end
# 24.601 μs (13 allocations: 192.58 KiB)
@btime begin
p = Vector{Int64}()
pushit(p)
end
# 71.176 μs (14 allocations: 256.70 KiB) Mostly a joke |
This is a user question, but here I immediately reach the people that can answer this: In LinearMaps.jl I use Is this the way to go: if Sys.WORD_SIZE == 64
_flags(A::Array) = unsafe_load(convert(Ptr{UInt16}, pointer_from_objref(A)), 9)
elseif Sys.WORD_SIZE == 32
_flags(A::Array) = unsafe_load(convert(Ptr{UInt16}, pointer_from_objref(A)), 5)
else
error("Unknown word size")
end
_isshared(A::Array) = !(_flags(A) & 0x4000 == 0x0000) I don't have a 32 bit machine, so I don't know if the position |
Mostly yes. But I think the proper way would be But then, there is the compile-time option
I don't know, sorry. But is the exception-route so bad in your case? |
No don't do that. None of the code above are well defined. |
The fun of hacking around :-). Any other suggestion for accessing the |
FWIW, I packaged the workaround of @KristofferC (with his kind permission) as https://github.com/tpapp/PushVectors.jl because I found I was copy-pasting it to various pieces of code. I intend to obsolete/retire it when this issue is fixed, but in the meantime it may be useful. |
Probably closable when #51319 lands |
See <https://hackmd.io/@vtjnash/rkzazi7an> for the Julep describing this change. Also makes memory allocation ccalls safer, for catching Serialization and deepcopy bugs in packages. Fixes #24909 Co-authored-by: Jameson Nash <vtjnash@gmail.com>
When looking over my code_native, I recently saw that every push! costs me a call into the runtime library. This appears...excessive, at least for the standard case where the push can be accommodated without copying.
So, lets compare the push! and the fastpush!
First, lets look at what a vector actually is:
This gives us the fastpush:
And lets run a race:
yielding:
Hence, a 30% speedup is possible if we manage to get the fastpath inlined.
It would mayhaps be nice if it was somehow possible to make llvm aware of julia's runtime lib. For example, compile to llvm_IR (using clang or dragonegg?), and call into a nice library of llvm code including optimization-relevant metadata, and possibly inline current ccall-functions.
Not sure whether this performance problem is worth trying to fix (and how much more expensive it is if we lose important registers on the call). It is easy to avoid, though, by implementing the checks by hand (for tight loops that need to push a lot). In a perfect world, the optimizer would reduce to only a single check for unrolled pushing loops. In my applications, I do enough other things between push!/pop! that this ends up being benign.
PS. This is related to #24901, which removes the boundcheck from push!/pop!.
I also posted this in discourse.julialang.org before. The fastpush is deliberately not a pull request, because it is dirty and architecture dependent.
The text was updated successfully, but these errors were encountered: