-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
map
on a diskarray is very very slow, compared to a regular array
#199
Comments
It should be iterating out of order? You may be confusing bugs for intention. But an alternative is to use CachedDiskArray first I guess |
Iterating out of order is fine, but a |
There might be some type instability as well, will have to profile with Cthulhu. But AccessCountDiskArray is showing the correct number of accesses via |
Here's another example with julia> da = AccessCountDiskArray(data, chunksize=(10,10))
200×100 AccessCountDiskArray{Int64, 2, Matrix{Int64}, DiskArrays.ChunkRead{DiskArrays.NoStepRange}}
Chunked: (
[10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10]
[10, 10, 10, 10, 10, 10, 10, 10, 10, 10]
)
julia> @be collect($data)
Benchmark: 4638 samples with 1 evaluation
min 2.542 μs (3 allocs: 156.328 KiB)
median 11.875 μs (3 allocs: 156.328 KiB)
mean 17.469 μs (3 allocs: 156.328 KiB, 1.25% gc time)
max 1.782 ms (3 allocs: 156.328 KiB, 99.11% gc time)
julia> @be collect($da)
Benchmark: 2720 samples with 1 evaluation
min 11.291 μs (15 allocs: 312.984 KiB)
median 17.188 μs (15 allocs: 312.984 KiB)
mean 30.395 μs (15.00 allocs: 313.003 KiB, 2.16% gc time)
max 1.020 ms (17 allocs: 345.016 KiB, 96.76% gc time)
julia> da.getindex_log
2871-element Vector{Any}:
(1:200, 1:100)
(1:200, 1:100)
(1:200, 1:100)
(1:200, 1:100)
(1:200, 1:100)
(1:200, 1:100)
(1:200, 1:100)
(1:200, 1:100)
(1:200, 1:100)
(1:200, 1:100)
(1:200, 1:100)
(1:200, 1:100)
(1:200, 1:100)
(1:200, 1:100)
(1:200, 1:100)
(1:200, 1:100)
(1:200, 1:100)
(1:200, 1:100)
(1:200, 1:100)
(1:200, 1:100)
(1:200, 1:100)
(1:200, 1:100)
(1:200, 1:100)
(1:200, 1:100)
(1:200, 1:100)
(1:200, 1:100)
(1:200, 1:100)
⋮
(1:200, 1:100)
(1:200, 1:100)
(1:200, 1:100)
(1:200, 1:100)
(1:200, 1:100)
(1:200, 1:100)
(1:200, 1:100)
(1:200, 1:100)
(1:200, 1:100)
(1:200, 1:100)
(1:200, 1:100)
(1:200, 1:100)
(1:200, 1:100)
(1:200, 1:100)
(1:200, 1:100)
(1:200, 1:100)
(1:200, 1:100)
(1:200, 1:100)
(1:200, 1:100)
(1:200, 1:100)
(1:200, 1:100)
(1:200, 1:100)
(1:200, 1:100)
(1:200, 1:100)
(1:200, 1:100)
(1:200, 1:100)
(1:200, 1:100) |
@asinghvi17 is this fixed now too? |
Unfortunately not, it's still slow |
But for the same reason? Or because we need #200 There are two problems - single value indexing (we can check with access count), and how slow |
It looks like DiskGenerator is not looping over chunks at all, but rather is performing random access. Should we make it so that it loops over chunks? Perhaps by making it stateful, and letting it keep the current chunk "in memory"? Not sure what the best solution is here...but there must be something better than a 2 order of magnitude slowdown...
The text was updated successfully, but these errors were encountered: