Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Variable naming when combining different level types in one dataset #87

Open
mpartio opened this issue Oct 14, 2024 · 10 comments
Open

Variable naming when combining different level types in one dataset #87

mpartio opened this issue Oct 14, 2024 · 10 comments

Comments

@mpartio
Copy link
Contributor

mpartio commented Oct 14, 2024

Describe the bug

When combining surface level and pressure level data in a single dataset, the naming convention is param_level which is silly for certain surface level parameters that already have the level information in their name (for example 2t).

To Reproduce

dates:
  start: 2024-10-12T14:00:00Z
  end: 2024-10-12T14:00:00Z
  frequency: 1h
input:
  join:
  - grib:
      path: example.grib2
      param: [2t]
      levtype: sfc
  - grib:
      path: example.grib2
      param: [z]
      levtype: pl
      levelist: [1000]
$ anemoi-datasets create example.yaml example.zarr
$ anemoi-datasets inspect example.zarr
📦 Path          : example.zarr
🔢 Format version: 0.30.0

📅 Start      : 2024-10-12 14:00
📅 End        : 2024-10-12 14:00
⏰ Frequency  : 1h
🚫 Missing    : 0
🌎 Resolution : 2p5km
🌎 Field shape: [1069, 949]

📐 Shape      : 1 × 2 × 1 × 1,014,481 (7.7 MiB)
💽 Size       : 13.4 MiB (13.4 MiB)
📁 Files      : 75

   Index │ Variable │     Min │     Max │    Mean │   Stdev
   ──────┼──────────┼─────────┼─────────┼─────────┼────────
       0 │ 2t_2     │ 262.871 │ 289.308 │ 280.725 │ 3.92529
       1 │ z_1000   │ -2214.4 │ 1830.41 │ 485.131 │ 977.196
   ──────┴──────────┴─────────┴─────────┴─────────┴────────
🔋 Dataset ready, last update 3 hours ago.
📊 Statistics ready.


URL to sample input data

Attached to this issue ticket.

Expected behavior

surface level parameters are without level value, ie. "2t"
vertical levels (pressure, hybrid, whatever) are with level value, ie "z_1000"

Additional context

$ anemoi-datasets --version
0.5.7

I tried changing "variable_naming" in "build":

build:
    variable_naming: param

But this changed the naming for all sources -- I need to change the naming for just a single source.

@mpartio
Copy link
Contributor Author

mpartio commented Oct 14, 2024

example.grib2.gz

example data -- compressed with gzip because github did not allow uploading of grib.

@mchantry
Copy link
Member

Hello
Thanks for raising this isssue, there is work in progress to better differentiate between different level types.
We will report back soon.

@mpartio
Copy link
Contributor Author

mpartio commented Oct 20, 2024

Thanks!

I also found a small bug in the code that uses xarray for reading data (like netcdf source); do you prefer a pull request or some other means to deliver this information?

@mchantry
Copy link
Member

mchantry commented Nov 1, 2024

PR would be great thanks.

@mchantry
Copy link
Member

mchantry commented Nov 1, 2024

@mpartio You could look at https://anemoi-datasets.readthedocs.io/en/latest/building/filters/rename.html to rename 2t_2 to 2t. Would this meet your needs?

@mpartio
Copy link
Contributor Author

mpartio commented Nov 4, 2024

Thanks for the suggestion! I tried rename and I could not get it to work this way. My assumption is that anemoi is using naming scheme {param}_{level} and the rename is only affecting the {param} part.

For example this configuration:

  pipe:
    - join:
      - grib:
          path: example.grib2
          param: [2t]
          levtype: sfc
      - grib:
          path: example.grib2
          param: [z]
          levtype: pl
          levelist: [1000]
    - rename:
        param:
          2t_2: 2t

And inspecting the resulting zarr:

   Index │ Variable │     Min │     Max │    Mean │   Stdev
   ──────┼──────────┼─────────┼─────────┼─────────┼────────
       0 │ 2t_2     │ 262.871 │ 289.308 │ 280.725 │ 3.92529

If I change the rename to:

    - rename:
        param:
          2t: tempe

I get:

   Index │ Variable │     Min │     Max │    Mean │   Stdev
   ──────┼──────────┼─────────┼─────────┼─────────┼────────
       0 │ tempe_2  │ 262.871 │ 289.308 │ 280.725 │ 3.92529

@frazane
Copy link

frazane commented Nov 5, 2024

I am also having the same issue. In my case it occurs for GRIB fields that have key typeOfLevel=heightAboveGround, and the specified level in meters is then appended to the param name.

@mpartio
Copy link
Contributor Author

mpartio commented Nov 19, 2024

I found out the reason why my test case fails to combine surface and pressure levels, even though the instructions at https://anemoi-datasets.readthedocs.io/en/latest/building/introduction.html show it working.

The reason is that my data comes from a Harmonie model, which defines nearly all parameters using typeOfLevel=heightAboveGround. For example, my two meter temperature is typeOfLevel=heightAboveGround,level=2.

Then again, IFS defines nearly everything with typeOfLevel=surface (two meter temperature is typeOfLevel=surface,level=0). If I change my Harmonie data to level surface (grib_set -s typeOfLevel=surface ...), anemoi-datasets create will create a zarr file where the surface data variables do not have level values appended.

So, there must be some extra code in anemoi especially for the case typeOfLevel=surface. Changing the values with grib_set is ok as a workaround, but it's a bit silly and will mess up shortNames (for example 2d -> dpt) -- and then I have to fix the the names with rename.

@frazane
Copy link

frazane commented Nov 19, 2024

So, there must be some extra code in anemoi especially for the case typeOfLevel=surface. Changing the values with grib_set is ok as a workaround, but it's a bit silly and will mess up shortNames (for example 2d -> dpt) -- and then I have to fix the the names with rename.

This is good to know, thanks @mpartio! We also have the same way of defining parameters levels in the COSMO model and afaik in the new ICON model. So yeah, it's nice to have a workaround but hopefully this will be solved in the codebase. I suppose there's a bit of wait due to large incoming changes from earthkit-data.

@frazane
Copy link

frazane commented Jan 28, 2025

@mpartio in case you're interested, currently our solution is to use this lambda filter

https://github.com/ecmwf/anemoi-transform/blob/1411a9c5e2a8644090fa76383d8d288267488517/src/anemoi/transform/filters/lambda_filters.py#L21

and specify a function that takes the field as input and overrides some parameters accordingly, such as:

def remove_sfc_levelist(field: ArrayField) -> ArrayField:
    return field.clone(typeOfLevel="surface", levelist=None)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants