Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

write(io, a::AbstractArray) slow for reinterpret arrays #39781

Open
IanButterworth opened this issue Feb 22, 2021 · 10 comments
Open

write(io, a::AbstractArray) slow for reinterpret arrays #39781

IanButterworth opened this issue Feb 22, 2021 · 10 comments
Labels
arrays [a, r, r, a, y, s] io Involving the I/O subsystem: libuv, read, write, etc. performance Must go faster

Comments

@IanButterworth
Copy link
Member

IanButterworth commented Feb 22, 2021

Sending a Base.ReinterpretArray{UInt8, 2, Float64, Matrix{Float64}, false} to write(io, a::AbstractArray) is a lot slower (26x here) than doing a collect and sending the collected form Matrix{UInt8}

julia> tmp, _ = mktemp()
("/var/folders/_6/1yf6sj0950vcg4t91m9ltb5w0000gn/T/jl_7SIaSU", IOStream(<fd 21>))

julia> a = rand(Float64, 1000, 1000);

julia> ar = reinterpret(UInt8, a);

julia> using BenchmarkTools

julia> @btime open(tmp, "w") do io
           write(io, ar)
       end
  390.773 ms (15 allocations: 1.00 KiB)
8000000

julia> @btime open(tmp, "w") do io
           write(io, collect(ar))
       end
  14.677 ms (17 allocations: 7.63 MiB)

julia> typeof(ar)
Base.ReinterpretArray{UInt8, 2, Float64, Matrix{Float64}, false}

julia> typeof(collect(ar))
Matrix{UInt8} (alias for Array{UInt8, 2})

Adding a method to base that collects first would be the easy way, but that might be cheating and break promises for memory usage?

@kimikage
Copy link
Contributor

The output result depends on whether the write for the (original/reinterpreted) element type is overloaded. I think it is a bit tricky to make that decision in Base.
However, on the user or package side, I think we can make some assumptions about whether or not it is overloaded (pirated). If there is no overload method, then due to the nature of ReinterpretArray, we can use ad-hoc parent instead of collect.

@simeonschaub
Copy link
Member

I think this signature could probably just be widened to accept ReinterpretArray and ReshapedArray as well:

function write(s::IO, a::Array)
.

@kimikage
Copy link
Contributor

kimikage commented Feb 22, 2021

Although there are problems with the current situation...

struct UInt32BE
    v::UInt32
end

Base.write(io::IO, x::UInt32BE) = write(io, hton(x.v))
function Base.write(io::IO, a::Array{UInt32BE}) # this is type piracy, though
    nb = 0
    for x in a
        nb += write(io, x)
    end
    nb
end

function hexstring(x)
    buf = IOBuffer()
    write(buf, x)
    bytes2hex(take!(buf))
end
julia> hexstring(0x01234567) # little-endian
"67452301"

julia> hexstring(UInt32BE(0x01234567))
"01234567"

julia> hexstring(UInt32BE.([0x01234567, 0x89abcdef]))
"0123456789abcdef"

julia> hexstring(@view UInt32BE.([0x01234567, 0x89abcdef])[1:2]) # not overloaded
"67452301efcdab89"

julia> ra = reinterpret(UInt32BE, [0x01234567, 0x89abcdef])
2-element reinterpret(UInt32BE, ::Vector{UInt32}):
 UInt32BE(0x01234567)
 UInt32BE(0x89abcdef)

julia> hexstring(ra)
"0123456789abcdef"

julia> function Base.write(s::IO, a::Base.ReinterpretArray)
           write(s, parent(a)) # This loses the information of the reinterpreted eltype
       end

julia> hexstring(ra) # Ah...
"67452301efcdab89"

@JeffBezanson JeffBezanson added arrays [a, r, r, a, y, s] io Involving the I/O subsystem: libuv, read, write, etc. performance Must go faster labels Feb 22, 2021
@JeffBezanson
Copy link
Member

I don't think we can worry about that. There's an informal contract that the raw bits (in native byte order) of a bits type need to be sufficient to describe and recreate it. (And that we don't run on any big-endian systems :) )

@kimikage
Copy link
Contributor

Oh, that might not have been a good example. I don't mean that endianness is the problem, but that even with a bitstype, there might be cases where the raw bits and the contents you want to write to the I/O object are different.

In any case, I am not the one with the specific problem with custom write. My point is that we should be aware of fallback changes that are not well documented. 😄

@JeffBezanson
Copy link
Member

That's why I put endianness in parentheses. I'm not sure it's reasonable to expect such overloads to be observed in all cases. Reading/writing blocks of data at once is a crucial optimization, and many people will write code assuming this property of bits types. For example it would be nice to have #14678, and I'd like to remove the behavior of serializing Ptr as NULL (in fact we already have a "bug" that this is not observed inside Arrays).

@SimonDanisch
Copy link
Contributor

Would be nice to have write(io, reinterpretarray) fast, especially, since reinterpreting is especially important for serialization.
As a user, without Julia support, I now either need to pirate Base.write (bad idea), or roll my own ReinterpretArray (bad idea?).
Since ReinterpretArrays are guaranteed to have a parent with isbits element types, should we just pr:

function write(io::IO, ri::ReinterpretArrays)
   write(io, ri.parent)
end

Maybe with a fallback for array types that don't have write overloaded?

@tlnagy
Copy link
Contributor

tlnagy commented Dec 10, 2022

function write(io::IO, ri::ReinterpretArrays)
   write(io, ri.parent)
end

Unfortunately I don't think this will be sufficient due to the overly narrow definition of Base.write(io, x):

julia/base/io.jl

Lines 688 to 690 in 5a6c808

function write(s::IO, x::Union{Int16,UInt16,Int32,UInt32,Int64,UInt64,Int128,UInt128,Float16,Float32,Float64})
return write(s, Ref(x))
end

julia> isbitstype(RGB{Float64})
true

julia> write(io, RGB{Float64}(1.0))
ERROR: MethodError: no method matching write(::IOStream, ::RGB{Float64})
Closest candidates are:
  write(::IO, ::Any) at io.jl:672
  write(::IO, ::Any, ::Any...) at io.jl:673
  ...
Stacktrace:
 [1] write(io::IOStream, x::RGB{Float64})
   @ Base ./io.jl:672
 [2] top-level scope
   @ REPL[69]:1

So I'm forced to reinterpret to a UInt8 to get it to write to disk.

@SimonDanisch
Copy link
Contributor

I think it should usually dispatch to: https://github.com/JuliaLang/julia/blob/master/base/io.jl#L706
Which should write all the bytes in one go, which is what we want here for every reinterpret array.

@SimonDanisch
Copy link
Contributor

Try: write(IOBuffer(), [RGB{Float64}(1.0)])

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrays [a, r, r, a, y, s] io Involving the I/O subsystem: libuv, read, write, etc. performance Must go faster
Projects
None yet
Development

No branches or pull requests

6 participants