Reading data along chunked dimension does not scale linearly with amount of data #116

ali-ramadhan · 2020-02-19T16:03:34Z

Super cool work on integrating DiskArrays.jl with NetCDF.jl! Looking forward to ditching xarray in favor of a pure Julia solution.

@visr helped me get up and running but we noticed that grabbing 2x as much data seems to take ~4x longer whereas I expected it to scale linearly. I am unfortunately interested in grabbing data along the dimension with chunk size 1...

julia> using NetCDF

julia> ds = NetCDF.open("/home/alir/cnhlab004/bsose_i122/bsose_i122_2013to2017_1day_Theta.nc", "THETA")
Disk Array with size 2160 x 588 x 52 x 1826

julia> NetCDF.getchunksize(ds)
(2160, 588, 19, 1)

julia> @time ds[100, 200, :, 300]
  0.012066 seconds (48 allocations: 2.500 KiB)

julia> @time ds[100, 200, :, 320:330]
  0.010111 seconds (55 allocations: 4.750 KiB)

julia> @time ds[100, 200, :, 300:400]
  5.256234 seconds (56 allocations: 23.016 KiB)

julia> @time ds[100, 200, :, 600:800]
 19.074392 seconds (56 allocations: 43.328 KiB)

visr · 2020-02-20T12:30:28Z

It's great to have an example of such a large NetCDF. At this moment I cannot tell if this time is spent in the NetCDF C library or in the Julia wrapper code. Though I think running the slower calls under a profiler should be able to give that information.

meggart · 2020-02-20T15:29:27Z

I agree with @visr it is hard to say where the time is spent. Please note also that the NetCDF C library does some internal caching, so I guess your 3rd call was profiting from the previous reads. I found it very difficult to debug these kinds of problems. Ideally you would restart your Julia session after every data access to make sure NetCDF did not cache anything, but then you include precompilation in your timings...

bjarthur · 2024-02-21T13:59:54Z

i cannot reproduce with my dataset which is of similar size but only three dimensions. @ali-ramadhan is this still a problem for you?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reading data along chunked dimension does not scale linearly with amount of data #116

Reading data along chunked dimension does not scale linearly with amount of data #116

ali-ramadhan commented Feb 19, 2020

visr commented Feb 20, 2020

meggart commented Feb 20, 2020

bjarthur commented Feb 21, 2024

Reading data along chunked dimension does not scale linearly with amount of data #116

Reading data along chunked dimension does not scale linearly with amount of data #116

Comments

ali-ramadhan commented Feb 19, 2020

visr commented Feb 20, 2020

meggart commented Feb 20, 2020

bjarthur commented Feb 21, 2024