-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performances in fetching data #28
Conversation
Add an asynchrone open_mfdataset to the httpstore
Here are some benchmarking results for the time of data fetching using the erddap and a large box. The following code was repeated for box depth of 0-50m and 2018 alone and then 0-50m, 0-100m, 0-200m, 0-300m and 2018/2019. for run in range(5):
start_time = time.time()
large_box = [-70+np.random.random_sample(1)[0], -30+np.random.random_sample(1)[0], 20, 40, 0, 200, '2018-01-01', '2020-01-01']
fetcher = ArgoDataFetcher(mode='expert', parallel=True, chunks='auto', box_maxsize=[10, 10, 50]).region(large_box)
ds = fetcher.to_xarray()
par_bench.append({'ETIM':time.time() - start_time, 'NPTS':np.max(ds['N_POINTS'].values),
'MBYT': ds.nbytes/1e6, 'CHUNKS': fetcher.fetcher.chunks, 'NREQ': len(fetcher.fetcher.urls)})
print("Run #%i" % run, par_bench[-1]) ps: The request fails for 0-400m and 0-800m boxes. |
Can switch from thread vs process pools and support for dask client also add a progress bar
## modules: Rename fsspec_wrappers.py to filesystems.py ## Unit tests: This makes it clearer between testing the facade compared to testing each individual fetcher.
protocol options not passed properly when using caching system !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll continue testing, but code is ok to me, and based on docs everything works fine.
ok, thanks @quai20 for the review |
ok to me ! |
Add an asynchrone open_mfdataset to the httpstore
and much more ...
Close #27 #16 #51