-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
move to Dask array + first working version of the multiformer #37
Conversation
put up to date with main
Would it be possible to use xarray to open zarr files? It works much better with timestamps and it's compatible with dask |
Could you reply a clear example where the benefit would be? Dask is significantly faster than the native python zarr library. What is the performance of xarray? |
I didn't mean to use xarray instead of dask. I mean using it: here
In case of xarray it's just:
In such case for datetime type casting becomes uncessary, eg. Additionally it better decodes timestamps. In case of FESOM data when timestamps are in daily resolution, I had to manually do conversion from int. Xarray figured it out on it's own |
But by using xarray we would lose the performance advantage of dask, I assume. I would want to see that it's performance neutral before considering to use xarray instead of dask. |
xarray has a bulit-in support for dask: |
@clessig There's no performance penalty for using Xarray. here is quite extensive comparison between zarr and xarray performance: pydata/xarray#9111 (comment) |
The question is wrt to dask (that might parallelize things etc) and not the native zarr library. Could you run with dask and xarray and num_worker_loaders=0 and see how many sampler/sec we get. Thanks! |
I'm on vacation so I can do it next week |
Where do we stand here? Ready to merge? @kacpnowak : did you check the performance of xarray? |
…_init Fixed/improved handling of GPU detection.
So far it's too difficult to make xarray only version. On developer meeting with @mlangguth89 we've talked about doing during hackaton in November, but I think it would be issue/branch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor things only--would be great if they could be fixed before merging.
adding the following features:
Dask arrays in
multifield_dala_sampler
faster way to retrieve the
normalizers
when lats-lons are in increasing orderfix embed/embeds mismatch when loading checkpoints
running multiformer configuration (multiples changes in
train_multi.py
), also from single field checkpointsa bit of cleaning