Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

move to Dask array + first working version of the multiformer #37

Merged
merged 15 commits into from
Oct 7, 2024

Conversation

iluise
Copy link
Collaborator

@iluise iluise commented Sep 5, 2024

adding the following features:

  1. Dask arrays in multifield_dala_sampler

  2. faster way to retrieve the normalizers when lats-lons are in increasing order

  3. fix embed/embeds mismatch when loading checkpoints

  4. running multiformer configuration (multiples changes in train_multi.py), also from single field checkpoints

  5. a bit of cleaning

@iluise iluise requested a review from clessig September 5, 2024 11:00
@iluise iluise added enhancement New feature or request core model anything related to trainer, attention etc.. labels Sep 5, 2024
@iluise iluise self-assigned this Sep 5, 2024
@kacpnowak
Copy link
Collaborator

Would it be possible to use xarray to open zarr files? It works much better with timestamps and it's compatible with dask

@clessig
Copy link
Owner

clessig commented Sep 5, 2024

Would it be possible to use xarray to open zarr files? It works much better with timestamps and it's compatible with dask

Could you reply a clear example where the benefit would be?

Dask is significantly faster than the native python zarr library. What is the performance of xarray?

@kacpnowak
Copy link
Collaborator

I didn't mean to use xarray instead of dask. I mean using it: here

self.ds = zarr.open( file_path)

In case of xarray it's just:

self.ds = xr.open_zarr( file_path)

In such case for datetime type casting becomes uncessary, eg. self.year_base = self.ds['time'][0].astype(datetime) as one can simply use self.ds.index['time'][0]

Additionally it better decodes timestamps. In case of FESOM data when timestamps are in daily resolution, I had to manually do conversion from int. Xarray figured it out on it's own

@clessig
Copy link
Owner

clessig commented Sep 9, 2024

for datetime type casting becomes uncessary, eg. self.year_base = self.ds['time'][0].astype(datetime) as one can simply use self.ds.index['time'][0]

Additionally it better decodes timestamps. In case of FESOM data when timestamps are in daily resolution, I had to manually do conversion from int. Xarray figured it out on it's own

But by using xarray we would lose the performance advantage of dask, I assume. I would want to see that it's performance neutral before considering to use xarray instead of dask.

@mlangguth89
Copy link
Collaborator

xarray has a bulit-in support for dask:
https://docs.xarray.dev/en/latest/user-guide/dask.html

@kacpnowak
Copy link
Collaborator

@clessig There's no performance penalty for using Xarray. here is quite extensive comparison between zarr and xarray performance: pydata/xarray#9111 (comment)

@clessig
Copy link
Owner

clessig commented Sep 9, 2024

The question is wrt to dask (that might parallelize things etc) and not the native zarr library. Could you run with dask and xarray and num_worker_loaders=0 and see how many sampler/sec we get. Thanks!

@kacpnowak
Copy link
Collaborator

I'm on vacation so I can do it next week

@clessig
Copy link
Owner

clessig commented Oct 7, 2024

Where do we stand here? Ready to merge?

@kacpnowak : did you check the performance of xarray?

…_init

Fixed/improved handling of GPU detection.
@kacpnowak
Copy link
Collaborator

So far it's too difficult to make xarray only version. On developer meeting with @mlangguth89 we've talked about doing during hackaton in November, but I think it would be issue/branch

Copy link
Owner

@clessig clessig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor things only--would be great if they could be fixed before merging.

slurm_atmorep.sh Outdated Show resolved Hide resolved
slurm_atmorep_evaluate.sh Outdated Show resolved Hide resolved
atmorep/core/train.py Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core model anything related to trainer, attention etc.. enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants