-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug writing to raw N5 (no compression) #10
Comments
I tried the same test on my linux workstation and in all cases the store was written in parallel. Here are my results: N5 + raw (note that wall time is much less than total CPU time, indicating parallelism) %%time
darray.to_zarr(zarr.N5Store('test/test1.n5'), compressor = None)
N5 + GZip (more total time because of compression) %%time
from numcodecs import GZip
darray.to_zarr(zarr.N5Store('test/test2.n5'), compressor=GZip(2))
Default Zarr + raw %%time
darray.to_zarr(zarr.DirectoryStore('test/test3.zarr'), compressor = None)
Default zarr + GZip %%time
darray.to_zarr(zarr.DirectoryStore('test/test4.zarr'), compressor = GZip(level=2))
zarr-N5 is pretty consistently slower in my tests than vanilla zarr, (and I have no idea why), but all these implementations are getting parallelized on my machine. I can try these same tests later on a windows machine and see if I observe any discrepancies. |
The |
Sorry, I missed that there were two locations. I did a similar analysis to @d-v-b in adamkglaser/io_benchmarks#1 and also saw multi-threading. I have not yet done the profiling to figure where the zarr/n5 lies. |
Thanks @d-v-b. Would be great to hear if you also get parallel writing on Windows. Unfortunately all of our lab PCs are Windows, and will be running Windows to control our systems with which I am hoping to move towards N5 writing of data. |
I transferred this issue from |
Problem description
Using Zarr + Dask, when saving to N5, multi-threading only works when a compressor is used. When saving as raw with compressor = None, the operation runs single threaded. When using Zarr + Dask and saving to raw zarr format with compressor = None, the operation runs multi-threaded. Is there a potential bug when saving to raw N5 that disables multi-threading?
Thanks!
Adam
Python code
import zarr
import numpy as np
import dask.array as da
data = np.random.randint(0, 2000, size = [512,2048,2048]).astype('uint16')
darray = da.from_array(data, chunks = (16,256,256))
no compression to n5 - no multi-threading bug (?)
compressor = None
store = zarr.N5Store('test1.n5')
darray.to_zarr(store, compressor = compressor)
with compression to n5 - multi-threading works
compressor = GZip(level = 2)
store = zarr.N5Store('test2.n5')
darray.to_zarr(store, compressor = compressor)
no compression to zarr - multi-threading works
compressor = None
store = zarr.DirectoryStore('test3')
darray.to_zarr(store, compressor = compressor)
Version and installation information
Zarr 2.6.1
Dask 2020.12.0
Python 3.9.1
Windows 10
Zarr installed via Conda
The text was updated successfully, but these errors were encountered: