-
Notifications
You must be signed in to change notification settings - Fork 143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Very slow reading of long strings in v0.14 #742
Comments
I can confirm that on Linux with Julia master, I've had to kill the process after a bare |
I think this boils down to a (probably bad) use of huge tuples — julia> struct FixedArray{T,L}
data::NTuple{L,T}
end
julia> Array{FixedArray{Float64,30000}}(undef, 1)
^C^C^C^C^C^CWARNING: Force throwing a SIGINT Edit: If you leave the following running long enough, it'll eventually abort. (An error message with a stack trace scrolls by for a long time, so I'm not sure what the actual error was, but I'd guess stack overflow?) julia> FixedArray{Float64, 30000}(ntuple(zero, 30000)) |
@kleinhenz It looks like the old Lines 1341 to 1364 in f4faf71
It looks like that kind of behavior probably needs to be restored. Would just a special-case within the generic |
Yeah the problem is definitely the giant tuples. We can probably just add a special case to read like we have for opaque datatypes. This would fix it for the current case where you are just reading a fixed string dataset. You basically need the tuple approach to support fixed strings/arrays in compound datatypes though which is where this originally came from. It would be nice to have something like NTuple which didn't destroy the compiler but as far as I know there isn't really another solution. |
Really what we want is a mechanism to be able to sometimes allocate a |
Allocating the working buffer can be successfully done by reinterpreting a julia> struct FS{S}
data::NTuple{S, UInt8}
end
julia> a = reinterpret(reshape, FS{30000}, Array{UInt8}(undef, 30000));
julia> eltype(a)
FS{30000}
julia> length(a)
1 There are still major issues, though — trying to print In general, the large tuple appears to be a problem for inference/type dispatch, so you have to try really hard to hide the type. I made a bit of progress in getting the string to read and normalize by rewriting |
See JuliaLang/julia#35619. I'm not sure if there is any prospect of this just getting fixed on the julia side. |
I have some HDF5 files which contain very long strings (>300k characters). With
HDF5
v0.13.6, I can read these no problem:Updating to v0.14.1, however, even reading this same dataset just once takes so long that I haven't been able to measure it. For shorter strings (10k characters), I see a ~3-fold slowdown (700 μs vs 220 μs), but nothing so egregious.
All of this is on Windows 10, Julia 1.5.1
Any ideas?
The text was updated successfully, but these errors were encountered: