Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug writing to raw N5 (no compression) #10

Open
adamkglaser opened this issue Jan 24, 2021 · 5 comments
Open

Bug writing to raw N5 (no compression) #10

adamkglaser opened this issue Jan 24, 2021 · 5 comments
Labels
help wanted Extra attention is needed

Comments

@adamkglaser
Copy link

adamkglaser commented Jan 24, 2021

Problem description

Using Zarr + Dask, when saving to N5, multi-threading only works when a compressor is used. When saving as raw with compressor = None, the operation runs single threaded. When using Zarr + Dask and saving to raw zarr format with compressor = None, the operation runs multi-threaded. Is there a potential bug when saving to raw N5 that disables multi-threading?

Thanks!
Adam

Python code

import zarr
import numpy as np
import dask.array as da

data = np.random.randint(0, 2000, size = [512,2048,2048]).astype('uint16')
darray = da.from_array(data, chunks = (16,256,256))

no compression to n5 - no multi-threading bug (?)

compressor = None
store = zarr.N5Store('test1.n5')
darray.to_zarr(store, compressor = compressor)

with compression to n5 - multi-threading works

compressor = GZip(level = 2)
store = zarr.N5Store('test2.n5')
darray.to_zarr(store, compressor = compressor)

no compression to zarr - multi-threading works

compressor = None
store = zarr.DirectoryStore('test3')
darray.to_zarr(store, compressor = compressor)

Version and installation information

Zarr 2.6.1
Dask 2020.12.0
Python 3.9.1
Windows 10
Zarr installed via Conda

@d-v-b
Copy link
Collaborator

d-v-b commented Feb 3, 2021

I tried the same test on my linux workstation and in all cases the store was written in parallel. Here are my results:

N5 + raw (note that wall time is much less than total CPU time, indicating parallelism)

%%time
darray.to_zarr(zarr.N5Store('test/test1.n5'), compressor = None)
CPU times: user 31.7 s, sys: 16.2 s, total: 48 s
Wall time: 14.3 s

N5 + GZip (more total time because of compression)

%%time
from numcodecs import GZip
darray.to_zarr(zarr.N5Store('test/test2.n5'), compressor=GZip(2))
CPU times: user 4min 39s, sys: 10.2 s, total: 4min 49s
Wall time: 11.3 s

Default Zarr + raw

%%time
darray.to_zarr(zarr.DirectoryStore('test/test3.zarr'), compressor = None)
CPU times: user 23.1 s, sys: 9.27 s, total: 32.3 s
Wall time: 9.08 s

Default zarr + GZip

%%time
darray.to_zarr(zarr.DirectoryStore('test/test4.zarr'), compressor = GZip(level=2))
CPU times: user 4min 8s, sys: 6.52 s, total: 4min 15s
Wall time: 7.94 s

zarr-N5 is pretty consistently slower in my tests than vanilla zarr, (and I have no idea why), but all these implementations are getting parallelized on my machine. I can try these same tests later on a windows machine and see if I observe any discrepancies.

@jakirkham
Copy link
Member

The N5Store does something mapping between N5 and Zarr under-the-hood. There may be some cost incurred by this. One would probably need to go through and profile the methods more carefully with a mix of cProfiler and line_profiler to determine where the slowdowns are

@joshmoore
Copy link
Member

Sorry, I missed that there were two locations. I did a similar analysis to @d-v-b in adamkglaser/io_benchmarks#1 and also saw multi-threading.

I have not yet done the profiling to figure where the zarr/n5 lies.

@adamkglaser
Copy link
Author

Thanks @d-v-b. Would be great to hear if you also get parallel writing on Windows. Unfortunately all of our lab PCs are Windows, and will be running Windows to control our systems with which I am hoping to move towards N5 writing of data.

@joshmoore joshmoore added the help wanted Extra attention is needed label Dec 2, 2022
@d-v-b
Copy link
Collaborator

d-v-b commented Oct 18, 2024

I transferred this issue from zarr-python to n5py.

@d-v-b d-v-b transferred this issue from zarr-developers/zarr-python Oct 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

4 participants