Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to Read Remote GRIB Data #185

Open
timothyas opened this issue Jan 22, 2025 · 0 comments
Open

Unable to Read Remote GRIB Data #185

timothyas opened this issue Jan 22, 2025 · 0 comments

Comments

@timothyas
Copy link
Contributor

timothyas commented Jan 22, 2025

Describe the bug
It seems like Earthkit data can be used to read remote grib data, and these lines in anemoi-datasets seem to indicate that this capability should be possible with anemoi-datasets. However, this doesn't seem to work for me. And I get an error that anemoi-datasets can't find the "path" positional argument, when it seems like we should actually be using "url" here.

Let me know if this is intended to work with anemoi-datasets. If it is and someone has an idea for a quick fix then that would be great! Otherwise, I would be happy to contribute code to incorporate this feature since it would be very useful for us at NOAA.

Full Traceback and output up to the point of error
(anemoi-datasets) [psl_linux remote-grib-test]$ anemoi-datasets create recipe.yaml test.zarr --overwrite
2025-01-22 12:36:56 INFO 🎬 Task init((),{}) starting
2025-01-22 12:36:56 INFO Setting flatten_grid=True in config
2025-01-22 12:36:56 INFO Setting ensemble_dimension=2 in config
2025-01-22 12:36:56 INFO Setting flatten_grid=True in config
2025-01-22 12:36:56 INFO Setting ensemble_dimension=2 in config
2025-01-22 12:36:56 INFO {'start': '2018-08-01T12', 'end': '2018-08-01T12', 'frequency': '6h', 'group_by': 'monthly'}
2025-01-22 12:36:56 INFO Groups(dates=1,<anemoi.datasets.dates.StartEndDates object at 0x7fbcfc0ea690>)
2025-01-22 12:36:56 INFO FunctionAction: url=https://get.ecmwf.int/repository/test-data/earthkit-data/test-data/t_pl.grib param=['t']
2025-01-22 12:36:56 INFO Groups: Groups(dates=1,<anemoi.datasets.dates.StartEndDates object at 0x7fbcfc0ea690>)
2025-01-22 12:36:56 INFO Minimal input for 'init' step (using only the first date) : GroupOfDates(dates=['2018-08-01T12:00:00'])
2025-01-22 12:36:56 INFO grib(GroupOfDates(dates=['2018-08-01T12:00:00']))
2025-01-22 12:36:56 INFO Config loaded ok:
2025-01-22 12:36:56 INFO Found 1 datetimes.
2025-01-22 12:36:56 INFO Dates: Found 1 datetimes, in 1 groups:
2025-01-22 12:36:56 INFO Missing dates: 0
2025-01-22 12:36:56 ERROR Error in execute
Traceback (most recent call last):
  File "/home/tsmith/work/aneml/anemoi-datasets/src/anemoi/datasets/create/input/function.py", line 97, in datasource
    self.action.function(
TypeError: execute() missing 1 required positional argument: 'path'
Traceback (most recent call last):
  File "/home/tsmith/miniconda3/envs/anemoi-datasets/bin/anemoi-datasets", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/tsmith/work/aneml/anemoi-datasets/src/anemoi/datasets/__main__.py", line 23, in main
    cli_main(__version__, __doc__, COMMANDS)
  File "/home/tsmith/miniconda3/envs/anemoi-datasets/lib/python3.11/site-packages/anemoi/utils/cli.py", line 153, in cli_main
    cmd.run(args)
  File "/home/tsmith/work/aneml/anemoi-datasets/src/anemoi/datasets/commands/create.py", line 72, in run
    self.serial_create(args)
  File "/home/tsmith/work/aneml/anemoi-datasets/src/anemoi/datasets/commands/create.py", line 82, in serial_create
    task("init", options)
  File "/home/tsmith/work/aneml/anemoi-datasets/src/anemoi/datasets/commands/create.py", line 36, in task
    result = c.run()
             ^^^^^^^
  File "/home/tsmith/work/aneml/anemoi-datasets/src/anemoi/datasets/create/__init__.py", line 395, in run
    return self._run()
           ^^^^^^^^^^^
  File "/home/tsmith/work/aneml/anemoi-datasets/src/anemoi/datasets/create/__init__.py", line 417, in _run
    variables = self.minimal_input.variables
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tsmith/work/aneml/anemoi-datasets/src/anemoi/datasets/create/input/result.py", line 538, in variables
    self.build_coords()
  File "/home/tsmith/work/aneml/anemoi-datasets/src/anemoi/datasets/create/input/result.py", line 483, in build_coords
    cube = self.get_cube()
           ^^^^^^^^^^^^^^^
  File "/home/tsmith/work/aneml/anemoi-datasets/src/anemoi/datasets/create/input/result.py", line 277, in get_cube
    ds = self.datasource
         ^^^^^^^^^^^^^^^
  File "/home/tsmith/miniconda3/envs/anemoi-datasets/lib/python3.11/functools.py", line 1001, in __get__
    val = self.func(instance)
          ^^^^^^^^^^^^^^^^^^^
  File "/home/tsmith/work/aneml/anemoi-datasets/src/anemoi/datasets/create/input/misc.py", line 56, in wrapper
    result = method(self, *args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tsmith/work/aneml/anemoi-datasets/src/anemoi/datasets/create/input/template.py", line 21, in wrapper
    result = method(self, *args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tsmith/work/aneml/anemoi-datasets/src/anemoi/datasets/create/input/trace.py", line 57, in wrapper
    result = method(self, *args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tsmith/work/aneml/anemoi-datasets/src/anemoi/datasets/create/input/function.py", line 97, in datasource
    self.action.function(
TypeError: execute() missing 1 required positional argument: 'path'

Version number

  • anemoi-datasets 0.5.15 (9fea6f7)
  • anemoi-transform 0.1.1 (1411a9c5e2a8644090fa76383d8d288267488517)
  • anemoi-utils 0.4.11 (5315571c76c2ab7b937d9218a25371d0d3604c4c)
  • earthkit-data 0.10.8
  • earthkit-geo 0.2.0
  • earthkit-meteo 0.1.1

To Reproduce

Using this recipe.yaml

dates:
  start: 2018-08-01T12
  end: 2018-08-01T12
  frequency: 6h

input:
  grib:
    url: https://get.ecmwf.int/repository/test-data/earthkit-data/test-data/t_pl.grib
    param: [t]

anemoi-datasets create recipe.yaml test.zarr

Note, that I would prefer to be reading NOAA GEFS data in grib format on AWS, which would use something like this yaml

dates:
  start: 2020-02-17T00:00:00
  end: 2020-02-17T12:00:00
  frequency: 6h

input:
  grib:
    url: s3://noaa-gefs-pds/gefs.{date:strftime(%Y)}{date:strftime(%m)}{date:strftime(%d)}/{date:strftime(%H)}/pgrb2a/gec00.t{date:strftime(%H)}z.pgrb2af06
    param: [u10, v10]

But given that the first yaml is similar to what is shown in the earthkit-data documentation, it seems like a good place to start.

URL to sample input data
First example: https://get.ecmwf.int/repository/test-data/earthkit-data/test-data/t_pl.grib
Second example: https://noaa-gefs-pds.s3.amazonaws.com/index.html#gefs.20170101/00/

Expected behavior
Dataset creation

Screenshots

Just showing that I'm able to access the first example dataset with earthkit-data, so it seems to be an issue with how anemoi-datasets is hooked up.
Image

Additional context

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant