Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Access to GCP Storage via it's HTTPS URL #61

Open
CSyl opened this issue Sep 30, 2024 · 6 comments
Open

Access to GCP Storage via it's HTTPS URL #61

CSyl opened this issue Sep 30, 2024 · 6 comments
Labels
bug Something isn't working

Comments

@CSyl
Copy link

CSyl commented Sep 30, 2024

What happened?

When trying to source a zarr from GCP storage via executing anemoi-datasets create config.yaml test.zarr' the following error occurs:

json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
2024-09-30 13:15:18 ERROR 
💣 Expecting value: line 1 column 1 (char 0)
2024-09-30 13:15:18 ERROR 💣 Exiting

I am able to obtain the data from an S3 storage and ECMWF URL via the S3 or HTTPS, and HTTPS, respectively -- However, the capability to access an objects a GCP Storage (e.g. https://console.cloud.google.com/storage/browser/gcp-public-data-arco-era5/ar/1959-2022-1h-360x181_equiangular_with_poles_conservative.zarr) leads to a JSON error. Would it be possible to request a feature to be added for which could accommodate the sourcing of a zarr object from a GCP storage location?

What are the steps to reproduce the bug?

  1. Created a configuration file (saved as test_gcp_zarr_httpsurl.yaml) as such:
dates:
  start: 2021-12-31T09:00:00
  end: 2021-12-31T22:00:00
  frequency: 1h

input:
  xarray-zarr:
    url: "https://console.cloud.google.com/storage/browser/gcp-public-data-arco-era5/ar/1959-2022-1h-360x181_equiangular_with_poles_conservative.zarr"
    param: [2m_temperature,
    10m_u_component_of_wind,
    geopotential,
    10m_v_component_of_wind,
    surface_pressure]
  1. Executed: anemoi-datasets create test_gcp_zarr_httpsurl.yaml test_gcp.zarr and obtained the following error:
2024-09-30 13:14:59 INFO Task init((),{}) starting
2024-09-30 13:15:00 INFO Setting flatten_grid=True in config
2024-09-30 13:15:00 INFO Setting ensemble_dimension=2 in config
2024-09-30 13:15:00 INFO Setting flatten_grid=True in config
2024-09-30 13:15:00 INFO Setting ensemble_dimension=2 in config
2024-09-30 13:15:00 INFO {'start': datetime.datetime(2021, 12, 31, 9, 0), 'end': datetime.datetime(2021, 12, 31, 22, 0), 'frequency': '1h', 'group_by': 'monthly'}
2024-09-30 13:15:00 INFO Groups(dates=1)
2024-09-30 13:15:00 INFO FunctionAction: url=https://console.cloud.google.com/storage/browser/gcp-public-data-arco-era5/ar/1959-2022-1h-360x181_equiangular_with_poles_conservative.zarr param=['2m_temperature', '10m_u_component_of_wind', 'geopotential', '10m_v_component_of_wind', 'surface_pressure'] 
2024-09-30 13:15:06 INFO Minimal input for 'init' step (using only the first date) :
2024-09-30 13:15:06 INFO xarray-zarr(['2021-12-31T09:00:00'])
2024-09-30 13:15:06 INFO Config loaded ok:
2024-09-30 13:15:06 INFO Found 14 datetimes.
2024-09-30 13:15:06 INFO Dates: Found 14 datetimes, in 1 groups: 
2024-09-30 13:15:06 INFO Missing dates: 0
2024-09-30 13:15:18 ERROR Error in execute
Traceback (most recent call last):
  File ".../miniconda3/envs/anemoi_test/lib/python3.10/site-packages/anemoi/datasets/create/input.py", line 590, in datasource
    return _tidy(self.action.function(FunctionContext(self), self.dates, *args, **kwargs))
  File ".../miniconda3/envs/anemoi_test/lib/python3.10/site-packages/anemoi/datasets/create/functions/sources/xarray_zarr.py", line 15, in execute
    return load_many("🇿", context, dates, url, *args, **kwargs)
  File ".../miniconda3/envs/anemoi_test/lib/python3.10/site-packages/anemoi/datasets/create/functions/sources/xarray/__init__.py", line 77, in load_many
    result.append(load_one(emoji, context, dates, path, **kwargs))
  File ".../miniconda3/envs/anemoi_test/lib/python3.10/site-packages/anemoi/datasets/create/functions/sources/xarray/__init__.py", line 47, in load_one
    data = xr.open_zarr(name_to_zarr_store(dataset), **options)
  File ".../miniconda3/envs/anemoi_test/lib/python3.10/site-packages/xarray/backends/zarr.py", line 1103, in open_zarr
    ds = open_dataset(
  File ".../miniconda3/envs/anemoi_test/lib/python3.10/site-packages/xarray/backends/api.py", line 611, in open_dataset
    backend_ds = backend.open_dataset(
  File ".../miniconda3/envs/anemoi_test/lib/python3.10/site-packages/xarray/backends/zarr.py", line 1173, in open_dataset
    store = ZarrStore.open_group(
  File ".../miniconda3/envs/anemoi_test/lib/python3.10/site-packages/xarray/backends/zarr.py", line 483, in open_group
    zarr_group, consolidate_on_close, close_store_on_close = _get_open_params(
  File ".../miniconda3/envs/anemoi_test/lib/python3.10/site-packages/xarray/backends/zarr.py", line 1335, in _get_open_params
    zarr_group = zarr.open_consolidated(store, **open_kwargs)
  File ".../miniconda3/envs/anemoi_test/lib/python3.10/site-packages/zarr/convenience.py", line 1360, in open_consolidated
    meta_store = ConsolidatedStoreClass(store, metadata_key=metadata_key)
  File ".../miniconda3/envs/anemoi_test/lib/python3.10/site-packages/zarr/storage.py", line 3046, in __init__
    meta = json_loads(self.store[metadata_key])
  File ".../miniconda3/envs/anemoi_test/lib/python3.10/site-packages/zarr/util.py", line 76, in json_loads
    return json.loads(ensure_text(s, "utf-8"))
  File ".../miniconda3/envs/anemoi_test/lib/python3.10/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
  File ".../miniconda3/envs/anemoi_test/lib/python3.10/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File ".../miniconda3/envs/anemoi_test/lib/python3.10/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Traceback (most recent call last):
  File ".../miniconda3/envs/anemoi_test/lib/python3.10/site-packages/anemoi/utils/cli.py", line 135, in cli_main
    cmd.run(args)
  File ".../miniconda3/envs/anemoi_test/lib/python3.10/site-packages/anemoi/datasets/commands/create.py", line 64, in run
    self.serial_create(args)
  File "..../miniconda3/envs/anemoi_test/lib/python3.10/site-packages/anemoi/datasets/commands/create.py", line 74, in serial_create
    task("init", options)
  File ".../miniconda3/envs/anemoi_test/lib/python3.10/site-packages/anemoi/datasets/commands/create.py", line 29, in task
    result = c.run()
  File ".../miniconda3/envs/anemoi_test/lib/python3.10/site-packages/anemoi/datasets/create/__init__.py", line 355, in run
    return self._run()
  File ".../miniconda3/envs/anemoi_test/lib/python3.10/site-packages/anemoi/datasets/create/__init__.py", line 375, in _run
    variables = self.minimal_input.variables
  File ".../miniconda3/envs/anemoi_test/lib/python3.10/site-packages/anemoi/datasets/create/input.py", line 484, in variables
    self.build_coords()
  File ".../miniconda3/envs/anemoi_test/lib/python3.10/site-packages/anemoi/datasets/create/input.py", line 435, in build_coords
    from_data = self.get_cube().user_coords
  File ".../miniconda3/envs/anemoi_test/lib/python3.10/site-packages/anemoi/datasets/create/input.py", line 230, in get_cube
    ds = self.datasource
  File ".../miniconda3/envs/anemoi_test/lib/python3.10/functools.py", line 981, in __get__
    val = self.func(instance)
  File ".../miniconda3/envs/anemoi_test/lib/python3.10/site-packages/anemoi/datasets/create/input.py", line 90, in wrapper
    result = method(self, *args, **kwargs)
  File ".../miniconda3/envs/anemoi_test/lib/python3.10/site-packages/anemoi/datasets/create/template.py", line 26, in wrapper
    result = method(self, *args, **kwargs)
  File ".../miniconda3/envs/anemoi_test/lib/python3.10/site-packages/anemoi/datasets/create/trace.py", line 56, in wrapper
    result = method(self, *args, **kwargs)
  File ".../miniconda3/envs/anemoi_test/lib/python3.10/site-packages/anemoi/datasets/create/input.py", line 590, in datasource
    return _tidy(self.action.function(FunctionContext(self), self.dates, *args, **kwargs))
  File ".../miniconda3/envs/anemoi_test/lib/python3.10/site-packages/anemoi/datasets/create/functions/sources/xarray_zarr.py", line 15, in execute
    return load_many("🇿", context, dates, url, *args, **kwargs)
  File ".../miniconda3/envs/anemoi_test/lib/python3.10/site-packages/anemoi/datasets/create/functions/sources/xarray/__init__.py", line 77, in load_many
    result.append(load_one(emoji, context, dates, path, **kwargs))
  File ".../miniconda3/envs/anemoi_test/lib/python3.10/site-packages/anemoi/datasets/create/functions/sources/xarray/__init__.py", line 47, in load_one
    data = xr.open_zarr(name_to_zarr_store(dataset), **options)
  File ".../miniconda3/envs/anemoi_test/lib/python3.10/site-packages/xarray/backends/zarr.py", line 1103, in open_zarr
    ds = open_dataset(
  File ".../miniconda3/envs/anemoi_test/lib/python3.10/site-packages/xarray/backends/api.py", line 611, in open_dataset
    backend_ds = backend.open_dataset(
  File ".../miniconda3/envs/anemoi_test/lib/python3.10/site-packages/xarray/backends/zarr.py", line 1173, in open_dataset
    store = ZarrStore.open_group(
  File ".../miniconda3/envs/anemoi_test/lib/python3.10/site-packages/xarray/backends/zarr.py", line 483, in open_group
    zarr_group, consolidate_on_close, close_store_on_close = _get_open_params(
  File ".../miniconda3/envs/anemoi_test/lib/python3.10/site-packages/xarray/backends/zarr.py", line 1335, in _get_open_params
    zarr_group = zarr.open_consolidated(store, **open_kwargs)
  File ".../miniconda3/envs/anemoi_test/lib/python3.10/site-packages/zarr/convenience.py", line 1360, in open_consolidated
    meta_store = ConsolidatedStoreClass(store, metadata_key=metadata_key)
  File ".../miniconda3/envs/anemoi_test/lib/python3.10/site-packages/zarr/storage.py", line 3046, in __init__
    meta = json_loads(self.store[metadata_key])
  File ".../miniconda3/envs/anemoi_test/lib/python3.10/site-packages/zarr/util.py", line 76, in json_loads
    return json.loads(ensure_text(s, "utf-8"))
  File ".../miniconda3/envs/anemoi_test/lib/python3.10/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
  File ".../miniconda3/envs/anemoi_test/lib/python3.10/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File ".../miniconda3/envs/anemoi_test/lib/python3.10/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
**json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)**
2024-09-30 13:15:18 ERROR 
💣 Expecting value: line 1 column 1 (char 0)
2024-09-30 13:15:18 ERROR 💣 Exiting

Version

0.5.0

Platform (OS and architecture)

Linux

Relevant log output

No response

Accompanying data

No response

Organisation

No response

@CSyl CSyl added the bug Something isn't working label Sep 30, 2024
@b8raoult
Copy link
Collaborator

b8raoult commented Oct 3, 2024

This is not a problem with anemoi-datasets. This will also fail:

import xarray as xr

xr.open_zarr('https://console.cloud.google.com/storage/browser/gcp-public-data-arco-era5/ar/1959-2022-1h-360x181_equiangular_with_poles_conservative.zarr')

The URL is not correct. The correct URL is gs://gcp-public-data-arco-era5/ar/1959-2022-1h-360x181_equiangular_with_poles_conservative.zarr

@CSyl
Copy link
Author

CSyl commented Oct 4, 2024

Hi @b8raoult , thank you for your response. Much appreciated!. Yes, I agree with you that with the xr.open_zarr() I am able to open the zarr with the URL you mentioned above (gs://gcp-public-data-arco-era5/ar/1959-2022-1h-360x181_equiangular_with_poles_conservative.zarr), however when running the following python script to call the functions in the anemoi-datasets develop source code, I am obtaining the following:

  1. Python script executed:
from anemoi.datasets.data import add_dataset_path, open_dataset
add_dataset_path("gs://gcp-public-data-arco-era5/ar/")
ds = open_dataset("1959-2022-1h-360x181_equiangular_with_poles_conservative")

Error message:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
File ~/miniconda3/envs/ai_pipeline/lib/python3.10/site-packages/zarr/hierarchy.py:520, in Group.__getattr__(self, item)
    519 try:
--> 520     return self.__getitem__(item)
    521 except KeyError:

File ~/miniconda3/envs/ai_pipeline/lib/python3.10/site-packages/zarr/hierarchy.py:500, in Group.__getitem__(self, item)
    499 else:
--> 500     raise KeyError(item)

KeyError: 'data'

During handling of the above exception, another exception occurred:

AttributeError                            Traceback (most recent call last)
Cell In[1], line 6
      3 add_dataset_path("gs://gcp-public-data-arco-era5/ar/")
      5 # Opening entire dataset w/out filter.
----> 6 ds = open_dataset("1959-2022-1h-360x181_equiangular_with_poles_conservative")
      7 ds

File .../src/anemoi/datasets/data/__init__.py:29, in open_dataset(*args, **kwargs)
     28 def open_dataset(*args, **kwargs):
---> 29     ds = _open_dataset(*args, **kwargs)
     30     ds = ds.mutate()
     31     ds.arguments = {"args": args, "kwargs": kwargs}

File .../src/anemoi/datasets/data/misc.py:267, in _open_dataset(*args, **kwargs)
    265 sets = []
    266 for a in args:
--> 267     sets.append(_open(a))
    269 if "xy" in kwargs:
    270     from .xy import xy_factory

File .../src/anemoi/datasets/data/misc.py:180, in _open(a)
    177     return Zarr(a).mutate()
    179 if isinstance(a, str):
--> 180     return Zarr(zarr_lookup(a)).mutate()
    182 if isinstance(a, PurePath):
    183     return _open(str(a)).mutate()

File .../src/anemoi/datasets/data/stores.py:167, in Zarr.__init__(self, path)
    164     self.z = open_zarr(self.path)
    166 # This seems to speed up the reading of the data a lot
--> 167 self.data = self.z.data
    168 self.missing = set()

File ~/miniconda3/envs/ai_pipeline/lib/python3.10/site-packages/zarr/hierarchy.py:522, in Group.__getattr__(self, item)
    520     return self.__getitem__(item)
    521 except KeyError:
--> 522     raise AttributeError

AttributeError: 

Now, the above error does not occur IF I were to open up an object from an S3 bucket or ECMWF object store (e.g. https://object-store.os-api.cci1.ecmwf.int/ml-examples). For example, when sourcing from the ECMWF object store:

from anemoi.datasets.data import add_dataset_path, open_dataset
add_dataset_path("https://object-store.os-api.cci1.ecmwf.int/ml-examples/")
ds = open_dataset("an-oper-2023-2023-2p5-6h-v1")

Result is the zarr located in ecmwf's object store will load without the above error that I got when trying to source from GS storage.

@b8raoult
Copy link
Collaborator

b8raoult commented Oct 4, 2024

Yes, this is the zarr you put in the YAML file.

dates:
  start: 2021-12-31T09:00:00
  end: 2021-12-31T22:00:00
  frequency: 1h

input:
  xarray-zarr:
    url: "gs://gcp-public-data-arco-era5/ar/1959-2022-1h-360x181_equiangular_with_poles_conservative.zarr"
    param: [2m_temperature,
    10m_u_component_of_wind,
    geopotential,
    10m_v_component_of_wind,
    surface_pressure]

@CSyl
Copy link
Author

CSyl commented Oct 4, 2024

Hi @b8raoult,
I have tried setting up the YAML file (gcp-gsurl-sample-zarr.yaml) as such:

dates:
  start: 2021-12-31T09:00:00
  end: 2021-12-31T10:00:00
  frequency: 1h

input:
  xarray-zarr:
    url: "gs://gcp-public-data-arco-era5/ar/1959-2022-1h-360x181_equiangular_with_poles_conservative.zarr"
    param: [2m_temperature,
    10m_u_component_of_wind,
    geopotential,
    10m_v_component_of_wind,
    surface_pressure]

In this case, I then ran the latest release of anemoi-datasets (v0.5.6) via:

anemoi-datasets create gcp-gsurl-sample-zarr.yaml test.zarr

& the following error will also occur:

2024-10-04 10:02:01 INFO 🎬 Task init((),{}) starting
2024-10-04 10:02:02 INFO Setting flatten_grid=True in config
2024-10-04 10:02:02 INFO Setting ensemble_dimension=2 in config
2024-10-04 10:02:02 INFO Setting flatten_grid=True in config
2024-10-04 10:02:02 INFO Setting ensemble_dimension=2 in config
2024-10-04 10:02:02 INFO {'start': datetime.datetime(2021, 12, 31, 9, 0), 'end': datetime.datetime(2021, 12, 31, 10, 0), 'frequency': '1h', 'group_by': 'monthly'}
2024-10-04 10:02:02 INFO Groups(dates=1,<anemoi.datasets.dates.StartEndDates object at 0x7fea1befa6b0>)
2024-10-04 10:02:02 INFO FunctionAction: url=gs://gcp-public-data-arco-era5/ar/1959-2022-1h-360x181_equiangular_with_poles_conservative.zarr param=['2m_temperature', '10m_u_component_of_wind', 'geopotential', '10m_v_component_of_wind', 'surface_pressure'] 
2024-10-04 10:02:04 INFO Groups: Groups(dates=1,<anemoi.datasets.dates.StartEndDates object at 0x7fea1befa6b0>)
2024-10-04 10:02:07 INFO Minimal input for 'init' step (using only the first date) : GroupOfDates(dates=['2021-12-31T09:00:00'])
2024-10-04 10:02:07 INFO xarray-zarr(GroupOfDates(dates=['2021-12-31T09:00:00']))
2024-10-04 10:02:07 INFO Config loaded ok:
2024-10-04 10:02:07 INFO Found 2 datetimes.
2024-10-04 10:02:07 INFO Dates: Found 2 datetimes, in 1 groups: 
2024-10-04 10:02:07 INFO Missing dates: 0
2024-10-04 10:02:17 WARNING Compute Engine Metadata server unavailable on attempt 1 of 3. Reason: timed out
2024-10-04 10:02:21 WARNING Compute Engine Metadata server unavailable on attempt 2 of 3. Reason: timed out
2024-10-04 10:02:26 WARNING Compute Engine Metadata server unavailable on attempt 3 of 3. Reason: timed out
2024-10-04 10:02:26 WARNING Authentication failed using Compute Engine authentication due to unavailable metadata server.
2024-10-04 10:02:26 WARNING Compute Engine Metadata server unavailable on attempt 1 of 5. Reason: HTTPConnectionPool(host='metadata.google.internal', port=80): Max retries exceeded with url: /computeMetadata/v1/instance/service-accounts/default/?recursive=true (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x7fea18e92a40>: Failed to resolve 'metadata.google.internal' ([Errno -2] Name or service not known)"))
2024-10-04 10:02:27 WARNING Compute Engine Metadata server unavailable on attempt 2 of 5. Reason: HTTPConnectionPool(host='metadata.google.internal', port=80): Max retries exceeded with url: /computeMetadata/v1/instance/service-accounts/default/?recursive=true (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x7fea18e93850>: Failed to resolve 'metadata.google.internal' ([Errno -2] Name or service not known)"))
2024-10-04 10:02:29 WARNING Compute Engine Metadata server unavailable on attempt 3 of 5. Reason: HTTPConnectionPool(host='metadata.google.internal', port=80): Max retries exceeded with url: /computeMetadata/v1/instance/service-accounts/default/?recursive=true (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x7fea18e93c40>: Failed to resolve 'metadata.google.internal' ([Errno -2] Name or service not known)"))
2024-10-04 10:02:33 WARNING Compute Engine Metadata server unavailable on attempt 4 of 5. Reason:

@b8raoult
Copy link
Collaborator

b8raoult commented Oct 4, 2024

That's OK. I got that warning message as well. It will eventually finish.

@CSyl
Copy link
Author

CSyl commented Oct 7, 2024

Hi @b8raoult,
Is there an intermediate step required to get around connecting to the metadata server? At the moment, when the latest release of anemoi-datasets (v0.5.6) is ran with the aforementioned configuration file via:

anemoi-datasets create gcp-gsurl-sample-zarr.yaml test.zarr

The framework gets hung up & stays at the series of messages of "WARNING Compute Engine Metadata server unavailable on attempt" & does not progress forward after 1hr of wait time. What gets generated is a zarr, test.zarr, with an empty _build folder

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants