You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
It seems like Earthkit data can be used to read remote grib data, and these lines in anemoi-datasets seem to indicate that this capability should be possible with anemoi-datasets. However, this doesn't seem to work for me. And I get an error that anemoi-datasets can't find the "path" positional argument, when it seems like we should actually be using "url" here.
Let me know if this is intended to work with anemoi-datasets. If it is and someone has an idea for a quick fix then that would be great! Otherwise, I would be happy to contribute code to incorporate this feature since it would be very useful for us at NOAA.
Full Traceback and output up to the point of error
(anemoi-datasets) [psl_linux remote-grib-test]$ anemoi-datasets create recipe.yaml test.zarr --overwrite
2025-01-22 12:36:56 INFO 🎬 Task init((),{}) starting
2025-01-22 12:36:56 INFO Setting flatten_grid=True in config
2025-01-22 12:36:56 INFO Setting ensemble_dimension=2 in config
2025-01-22 12:36:56 INFO Setting flatten_grid=True in config
2025-01-22 12:36:56 INFO Setting ensemble_dimension=2 in config
2025-01-22 12:36:56 INFO {'start': '2018-08-01T12', 'end': '2018-08-01T12', 'frequency': '6h', 'group_by': 'monthly'}
2025-01-22 12:36:56 INFO Groups(dates=1,<anemoi.datasets.dates.StartEndDates object at 0x7fbcfc0ea690>)
2025-01-22 12:36:56 INFO FunctionAction: url=https://get.ecmwf.int/repository/test-data/earthkit-data/test-data/t_pl.grib param=['t']
2025-01-22 12:36:56 INFO Groups: Groups(dates=1,<anemoi.datasets.dates.StartEndDates object at 0x7fbcfc0ea690>)
2025-01-22 12:36:56 INFO Minimal input for 'init' step (using only the first date) : GroupOfDates(dates=['2018-08-01T12:00:00'])
2025-01-22 12:36:56 INFO grib(GroupOfDates(dates=['2018-08-01T12:00:00']))
2025-01-22 12:36:56 INFO Config loaded ok:
2025-01-22 12:36:56 INFO Found 1 datetimes.
2025-01-22 12:36:56 INFO Dates: Found 1 datetimes, in 1 groups:
2025-01-22 12:36:56 INFO Missing dates: 0
2025-01-22 12:36:56 ERROR Error in execute
Traceback (most recent call last):
File "/home/tsmith/work/aneml/anemoi-datasets/src/anemoi/datasets/create/input/function.py", line 97, in datasource
self.action.function(
TypeError: execute() missing 1 required positional argument: 'path'
Traceback (most recent call last):
File "/home/tsmith/miniconda3/envs/anemoi-datasets/bin/anemoi-datasets", line 8, in <module>
sys.exit(main())
^^^^^^
File "/home/tsmith/work/aneml/anemoi-datasets/src/anemoi/datasets/__main__.py", line 23, in main
cli_main(__version__, __doc__, COMMANDS)
File "/home/tsmith/miniconda3/envs/anemoi-datasets/lib/python3.11/site-packages/anemoi/utils/cli.py", line 153, in cli_main
cmd.run(args)
File "/home/tsmith/work/aneml/anemoi-datasets/src/anemoi/datasets/commands/create.py", line 72, in run
self.serial_create(args)
File "/home/tsmith/work/aneml/anemoi-datasets/src/anemoi/datasets/commands/create.py", line 82, in serial_create
task("init", options)
File "/home/tsmith/work/aneml/anemoi-datasets/src/anemoi/datasets/commands/create.py", line 36, in task
result = c.run()
^^^^^^^
File "/home/tsmith/work/aneml/anemoi-datasets/src/anemoi/datasets/create/__init__.py", line 395, in run
return self._run()
^^^^^^^^^^^
File "/home/tsmith/work/aneml/anemoi-datasets/src/anemoi/datasets/create/__init__.py", line 417, in _run
variables = self.minimal_input.variables
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tsmith/work/aneml/anemoi-datasets/src/anemoi/datasets/create/input/result.py", line 538, in variables
self.build_coords()
File "/home/tsmith/work/aneml/anemoi-datasets/src/anemoi/datasets/create/input/result.py", line 483, in build_coords
cube = self.get_cube()
^^^^^^^^^^^^^^^
File "/home/tsmith/work/aneml/anemoi-datasets/src/anemoi/datasets/create/input/result.py", line 277, in get_cube
ds = self.datasource
^^^^^^^^^^^^^^^
File "/home/tsmith/miniconda3/envs/anemoi-datasets/lib/python3.11/functools.py", line 1001, in __get__
val = self.func(instance)
^^^^^^^^^^^^^^^^^^^
File "/home/tsmith/work/aneml/anemoi-datasets/src/anemoi/datasets/create/input/misc.py", line 56, in wrapper
result = method(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tsmith/work/aneml/anemoi-datasets/src/anemoi/datasets/create/input/template.py", line 21, in wrapper
result = method(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tsmith/work/aneml/anemoi-datasets/src/anemoi/datasets/create/input/trace.py", line 57, in wrapper
result = method(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tsmith/work/aneml/anemoi-datasets/src/anemoi/datasets/create/input/function.py", line 97, in datasource
self.action.function(
TypeError: execute() missing 1 required positional argument: 'path'
But given that the first yaml is similar to what is shown in the earthkit-data documentation, it seems like a good place to start.
URL to sample input data
First example: https://get.ecmwf.int/repository/test-data/earthkit-data/test-data/t_pl.grib
Second example: https://noaa-gefs-pds.s3.amazonaws.com/index.html#gefs.20170101/00/
Expected behavior
Dataset creation
Screenshots
Just showing that I'm able to access the first example dataset with earthkit-data, so it seems to be an issue with how anemoi-datasets is hooked up.
Additional context
The text was updated successfully, but these errors were encountered:
Describe the bug
It seems like Earthkit data can be used to read remote grib data, and these lines in anemoi-datasets seem to indicate that this capability should be possible with anemoi-datasets. However, this doesn't seem to work for me. And I get an error that anemoi-datasets can't find the "path" positional argument, when it seems like we should actually be using "url" here.
Let me know if this is intended to work with anemoi-datasets. If it is and someone has an idea for a quick fix then that would be great! Otherwise, I would be happy to contribute code to incorporate this feature since it would be very useful for us at NOAA.
Full Traceback and output up to the point of error
Version number
To Reproduce
Using this recipe.yaml
anemoi-datasets create recipe.yaml test.zarr
Note, that I would prefer to be reading NOAA GEFS data in grib format on AWS, which would use something like this yaml
But given that the first yaml is similar to what is shown in the earthkit-data documentation, it seems like a good place to start.
URL to sample input data
First example:
https://get.ecmwf.int/repository/test-data/earthkit-data/test-data/t_pl.grib
Second example:
https://noaa-gefs-pds.s3.amazonaws.com/index.html#gefs.20170101/00/
Expected behavior
Dataset creation
Screenshots
Just showing that I'm able to access the first example dataset with earthkit-data, so it seems to be an issue with how anemoi-datasets is hooked up.
Additional context
The text was updated successfully, but these errors were encountered: