Skip to content

Commit

Permalink
feat: add ERA5 download and aggregation (#61)
Browse files Browse the repository at this point in the history
* feat: add era5 download modules

* feat: add era5 sync function

* fix: update package dependencies

* fix: add json file as package data

* fix: do not move tmp file before closing

* fix: infinite loop in sync function

* fix: better log message

* fix: better log message

* fix: infinite loop in sync function

* fix: infinite loop in sync function

* feat(era5): download all CDS products for a given period

* fix(era5): do not download daily data if monthly file is present

* fix(era5): compatibility with cads_api_client>=1.4.5

* fix(era5): make datetime unaware of timezones for comparability

* feat(era5): add era5 aggregate module

* docs(era5): add era5 README

* docs(era5): update era5 README
  • Loading branch information
yannforget authored Oct 15, 2024
1 parent 9466f33 commit 48cb5ce
Show file tree
Hide file tree
Showing 7 changed files with 1,194 additions and 1 deletion.
180 changes: 180 additions & 0 deletions openhexa/toolbox/era5/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,180 @@
# OpenHEXA Toolbox ERA5

The package contains ETL classes and functions to acquire and process ERA5-Land data. ERA5-Land
provides hourly information of surface variables from 1950 to 5 days before the current date, with
a ~9 km spatial resolution. See [ERA5-Land: data
documentation](https://confluence.ecmwf.int/display/CKB/ERA5-Land%3A+data+documentation) for more
information.

Available variables include:
* 2 metre temperature
* Wind components
* Leaf area index
* Volumetric soil water layer
* Total precipitation

See [ERA5-Land data
documentation](https://confluence.ecmwf.int/display/CKB/ERA5-Land%3A+data+documentation#ERA5Land:datadocumentation-parameterlistingParameterlistings)
for a full list of available parameters.

In addition to download clients for the Copernicus [Climate Data Store](https://cds.climate.copernicus.eu/datasets/reanalysis-era5-land?tab=overview) and [Google Public Datasets](https://cloud.google.com/storage/docs/public-datasets/era5), the package includes an `aggregate` module to aggregate ERA5 measurements in space (geographic boundaries) and time (hourly to daily).

## Usage

The package contains 3 modules:
* `openhexa.toolbox.era5.cds`: download ERA5-land products from the Copernicus [Climate Data Store](https://cds.climate.copernicus.eu/datasets/reanalysis-era5-land?tab=overview)
* `openhexa.toolbox.era5.google`: download ERA5 products from Google Cloud [Public Datasets](https://cloud.google.com/storage/docs/public-datasets/era5)
* `openhexa.toolbox.era5.aggregate`: aggregate ERA5 data in space and time

### Download from CDS

To download products from the Climate Data Store, you will need to create an account and generate an API key in ECMWF (see [CDS](https://cds.climate.copernicus.eu/)).

```python
from openhexa.toolbox.era5.cds import Client

cds = Client(key="<cds_api_key>")

request = cds.build_request(
variable="2m_temperature",
year=2024,
month=4
)

cds.download(
request=request,
dst_file="data/product.grib"
)
```

The module also contains helper functions to use bounds from a geoparquet file as an area of
interest. Source bounds are buffered and rounded by default to make sure the required data is
downloaded.

```python
bounds = bounds_from_file(fp=Path("data/districts.parquet"), buffer=0.5)

request = cds.build_request(
variable="total_precipitation",
year=2023,
month=10,
days=[1, 2, 3, 4, 5],
area=bounds
)

cds.download(
request=request,
dst_file="data/product.grib"
)
```

To download multiple products for a given period, use `Client.download_between()`:

```python
cds.download_between(
variable="2m_temperature",
start=datetime(2020, 1, 1),
end=datetime(2021, 6, 1),
dst_dir="data/raw/2m_temperature",
area=bounds
)
```

Checking latest available date in the ERA5-Land dataset:

```python
cds = Client("<api_key>")

cds.latest
```
```
>>> datetime(2024, 10, 8)
```

NB: End dates in product requests will be automatically replaced by latest available date if they are greater.

### Download from Google Cloud

```python
from openhexa.toolbox.era5.google import Client

google = Client()

google.download(
variable="2m_temperature",
date=datetime(2024, 6, 15),
dst_file="data/product.nc"
)
```

Or to download all products for a given period:

```python
# if products are already presents in dst_dir, they will be skipped
google.sync(
variable="2m_temperature",
start_date=datetime(2022, 1, 1),
end_date=datetime(2022, 6, 1),
dst_dir="data"
)
```

### Aggregation

```python
from pathlib import Path

import geopandas as gpd
from openhexa.toolbox.era5.aggregate import build_masks, merge, aggregate, get_transform

boundaries = gpd.read_parquet("districts.parquet")
data_dir = Path("data/era5/total_precipitation")

ds = merge(data_dir)

ncols = len(ds.longitude)
nrows = len(ds.latitude)
transform = get_transform(ds)
masks = build_masks(boundaries, nrows, ncols, transform)

df = aggregate(
ds=ds,
var="tp",
masks=masks,
boundaries_id=[uid for uid in boundaries["district_id"]]
)

print(df)
```
```
shape: (18_410, 5)
┌─────────────┬────────────┬───────────┬──────────┬───────────┐
│ boundary_id ┆ date ┆ mean ┆ min ┆ max │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ date ┆ f64 ┆ f64 ┆ f64 │
╞═════════════╪════════════╪═══════════╪══════════╪═══════════╡
│ mPenE8ZIBFC ┆ 2024-01-01 ┆ 0.000462 ┆ 0.0 ┆ 0.00086 │
│ TPgpGxUBU9y ┆ 2024-01-01 ┆ 0.000462 ┆ 0.0 ┆ 0.00086 │
│ AhST5ZpuCDJ ┆ 2024-01-01 ┆ 0.000462 ┆ 0.0 ┆ 0.00086 │
│ Lp2BjBVT63s ┆ 2024-01-01 ┆ 0.000462 ┆ 0.0 ┆ 0.00086 │
│ EdfRX9b9vEb ┆ 2024-01-01 ┆ 0.000462 ┆ 0.0 ┆ 0.00086 │
│ yhs1ecKsLOc ┆ 2024-01-01 ┆ 0.000462 ┆ 0.0 ┆ 0.00086 │
│ iHSJypSwlo5 ┆ 2024-01-01 ┆ 0.000462 ┆ 0.0 ┆ 0.00086 │
│ CTtB0TPRvWc ┆ 2024-01-01 ┆ 0.000462 ┆ 0.0 ┆ 0.00086 │
│ eVFAuZOzogt ┆ 2024-01-01 ┆ 0.000462 ┆ 0.0 ┆ 0.00086 │
│ WVEJjdJ2S15 ┆ 2024-01-01 ┆ 0.000462 ┆ 0.0 ┆ 0.00086 │
│ rbYGKFgupK9 ┆ 2024-01-01 ┆ 0.000462 ┆ 0.0 ┆ 0.00086 │
│ Nml6rVDElLh ┆ 2024-01-01 ┆ 0.000462 ┆ 0.0 ┆ 0.00086 │
│ E0hd8TD1M0q ┆ 2024-01-01 ┆ 0.000462 ┆ 0.0 ┆ 0.00086 │
│ PCg4pLGmKSM ┆ 2024-01-01 ┆ 0.000462 ┆ 0.0 ┆ 0.00086 │
│ C6EBhE8OnfW ┆ 2024-01-01 ┆ 0.000462 ┆ 0.0 ┆ 0.00086 │
│ … ┆ … ┆ … ┆ … ┆ … │
│ CkpfOFkMyrd ┆ 2024-10-07 ┆ 1.883121 ┆ 0.001785 ┆ 2.700447 │
│ tMXsltjzzmR ┆ 2024-10-07 ┆ 3.579136 ┆ 0.105436 ┆ 4.702504 │
│ F0ytkh0RExg ┆ 2024-10-07 ┆ 8.415455 ┆ 0.838535 ┆ 17.08884 │
...
│ TTSmaRnHa82 ┆ 2024-10-07 ┆ 1.724243 ┆ 0.007809 ┆ 5.692989 │
│ jbmw2gdrrTV ┆ 2024-10-07 ┆ 1.176629 ┆ 0.110173 ┆ 1.582995 │
│ eKYyXbBdvmB ┆ 2024-10-07 ┆ 0.599976 ┆ 0.037771 ┆ 1.189411 │
└─────────────┴────────────┴───────────┴──────────┴───────────┘
```
15 changes: 15 additions & 0 deletions openhexa/toolbox/era5/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
import logging

import cdsapi

logging.basicConfig(level=logging.DEBUG, format="%(name)s %(asctime)s %(levelname)s %(message)s")
log = logging.getLogger(__name__)

BASE_URL = "https://cds-beta.climate.copernicus.eu/api"


class ERA5:
def __init__(self, key: str):
self.client = cdsapi.Client(
key=key,
)
Loading

0 comments on commit 48cb5ce

Please sign in to comment.