Intake-esm

Intake-esm

Badges

CI
Docs
Package
License
Citation

Motivation

Computer simulations of the Earth’s climate and weather generate huge amounts of data. These data are often persisted on HPC systems or in the cloud across multiple data assets of a variety of formats (netCDF, zarr, etc...). Finding, investigating, loading these data assets into compute-ready data containers costs time and effort. The data user needs to know what data sets are available, the attributes describing each data set, before loading a specific data set and analyzing it.

Finding, investigating, loading these assets into data array containers such as xarray can be a daunting task due to the large number of files a user may be interested in. Intake-esm aims to address these issues by providing necessary functionality for searching, discovering, data access/loading.

Overview

intake-esm is a data cataloging utility built on top of intake, pandas, and xarray, and it's pretty awesome!

Opening an ESM collection definition file: An ESM (Earth System Model) collection file is a JSON file that conforms to the ESM Collection Specification. When provided a link/path to an esm collection file, intake-esm establishes a link to a database (CSV file) that contains data assets locations and associated metadata (i.e., which experiment, model, the come from). The collection JSON file can be stored on a local filesystem or can be hosted on a remote server.
```
In [1]: import intake

In [2]: col_url = "https://gist.githubusercontent.com/andersy005/7f416e57acd8319b20fc2b88d129d2b8/raw/987b4b336d1a8a4f9abec95c23eed3bd7c63c80e/pangeo-gcp-subset.json"

In [3]: col = intake.open_esm_datastore(col_url)

In [4]: col
Out[4]: <pangeo-cmip6 catalog with 4287 dataset(s) from 282905 asset(s)>
```

Search and Discovery: intake-esm provides functionality to execute queries against the catalog:

In [5]: col_subset = col.search(
   ...:     experiment_id=["historical", "ssp585"],
   ...:     table_id="Oyr",
   ...:     variable_id="o2",
   ...:     grid_label="gn",
   ...: )

In [6]: col_subset
Out[6]: <pangeo-cmip6 catalog with 18 dataset(s) from 138 asset(s)>

Access: when the user is satisfied with the results of their query, they can ask intake-esm to load data assets (netCDF/HDF files and/or Zarr stores) into xarray datasets:

  In [7]: dset_dict = col_subset.to_dataset_dict(zarr_kwargs={"consolidated": True})

  --> The keys in the returned dictionary of datasets are constructed as follows:
          'activity_id.institution_id.source_id.experiment_id.table_id.grid_label'
  |███████████████████████████████████████████████████████████████| 100.00% [18/18 00:10<00:00]

See documentation for more information.

Installation

Intake-esm can be installed from PyPI with pip:

python -m pip install intake-esm

It is also available from conda-forge for conda installations:

conda install -c conda-forge intake-esm

Name		Name	Last commit message	Last commit date
Latest commit History 1,007 Commits
.github		.github
ci		ci
docs		docs
intake_esm		intake_esm
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.prettierrc.toml		.prettierrc.toml
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
codecov.yml		codecov.yml
pyproject.toml		pyproject.toml
readthedocs.yml		readthedocs.yml
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Intake-esm

Badges

Motivation

Overview

Installation

About

Releases

Packages

Languages

License

RondeauG/intake-esm

Folders and files

Latest commit

History

Repository files navigation

Intake-esm

Badges

Motivation

Overview

Installation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages