diff --git a/docs/source/storeconv.md b/docs/source/storeconv.md index 44ca810c7..5ac1f8e31 100644 --- a/docs/source/storeconv.md +++ b/docs/source/storeconv.md @@ -33,34 +33,27 @@ There are no further restrictions for data source and data store identifiers. A data accessor identifier MUST correspond to the following scheme: -`::[:]` - -`` specifies a data type. -Its intention and format is described in the sub-section below. -In case the type specifier has flags, the flags MUST be given within square brackets, in alphabetic order, separated by -single commas and without spaces (e.g., `"dataset[cube,multilevel]"`). -The `` describes the data format that may be accessed, e.g., `zarr` or `netcdf`. -The `` describes the kind of storage or data provision the accessor can access. -Example values are `memory`, `s3` or `sentinelhub`. -The `` finally is an optional notifier about a data accessor's version. -The version SHOULD follow the [Semantic Versioning](https://semver.org). +`::[:]` + +`` identifies the in-memory data type to represent the data, +e.g., `dataset` (or `xarray.Dataset`), `geodataframe` +(or `geopandas.GeoDataFrame`). +`` identifies the data format that may be accessed, +e.g., `zarr`, `netcdf`, `geojson`. +`` identifies the kind of storage or data provision the +accessor can access. Example values are `file` (the local file system), +`s3` (AWS S3-compatible object storage), or `sentinelhub` +(the Sentinel Hub API), or `cciodp` (the ESA CCI Open Data Portal API). +The `` finally is an optional notifier +about a data accessor's version. The version MUST follow the +[Semantic Versioning](https://semver.org). Examples for valid data accessors identifiers are: -`dataset[cube]:netcdf:posix`. -`geodataframe:shapefile:cciodp:0.4.1` - -### Type Specifiers - -Type Specifiers are used to specify a data type. -They consist of a name and an arbitrary number of optional flags, given in square brackets. -These flags are used to define characteristics of a type, e.g., the type specifier "`dataset[cube]`" denotes a dataset -which also meets the requirements of a cube. -A dataset specified by `dataset[cube, multilevel]` is a cube and has multiple levels. -The order of flags is irrelevant, i.e., `dataset[cube, multilevel]` is the same as `dataset[multilevel, cube]`. -A type specifier with a flag is compatible to a type specifier that does not have the same flag set but is otherwise -similar, e.g., `dataset[cube]` is compatible with `dataset`. -The value `*` indicates that any type is supported. +* `dataset:netcdf:file` +* `dataset:zarr:sentinelhub` +* `geodataframe:geojson:file` +* `geodataframe:shapefile:cciodp:0.4.1` ## Open Parameters diff --git a/examples/notebooks/datastores/1_getting_started.ipynb b/examples/notebooks/datastores/1_getting_started.ipynb index d9195fdf9..a8eaf2ee0 100644 --- a/examples/notebooks/datastores/1_getting_started.ipynb +++ b/examples/notebooks/datastores/1_getting_started.ipynb @@ -29,7 +29,7 @@ "New data store implementations can be added to xcube, usually through xcube plugins. \n", "Since xcube 0.5, three data stores based on different data APIs are added through xcube plugins available in the [xcube GitHub organisation](https://github.com/dcs4cop):\n", "* `sentinelhub` - Datasets from Sentinel Hub by [plugin xcube_sh](https://github.com/dcs4cop/xcube-sh) with a dedicated [Notebook](./2_sentinel_hub.ipynb)\n", - "* `cciodp` - Datasets from ESA Climate Change Initiative (CCI) by [plugin xcube_cci](https://github.com/dcs4cop/xcube-cci) with a dedicated [Notebook](./3_esa_climate_change_initiative.ipynb)\n", + "* `cciodp` and `ccizarr` - Datasets from ESA Climate Change Initiative (CCI) by [plugin xcube_cci](https://github.com/dcs4cop/xcube-cci) with a dedicated [Notebook](./3_esa_climate_change_initiative.ipynb)\n", "* `cds` - Datasets from the C3S Climate Data Store by [plugin xcube_cds](https://github.com/dcs4cop/xcube-cds) with a dedicated [Notebook](./4_c3s_climate_data_store.ipynb)\n", "\n", "If you are interested in the development of new data stores for xcube, you may want to follow the [xcube data store conventions](https://github.com/dcs4cop/xcube/blob/master/docs/source/storeconv.md).\n", @@ -70,6 +70,31 @@ { "data": { "application/json": { + "cciodp": { + "data_store_notices": [ + { + "content": "The ESA CCI Open Data Portal (ODP) utilises an \"[ontology](http://vocab-test.ceda.ac.uk/ontology/cci/cci-content/index.html) whose terms might slightly differ from the ones used in this software.\nFor example, a *Dataset* in the CCI terminology may refer to all data products generated by a certain CCI project using a specific configuration of algorithms and auxiliary data.\nIn this software, a *Data Source* refers to a subset (a file set) of a given ODP dataset whose data share a common spatio-temporal grid and/or share other common properties, e.g. the instrument used for the original measurements.\nIn addition, the term *Dataset* is used to represent in-memory instances of gridded data sources or subsets of them.", + "icon": "info-sign", + "id": "terminologyClarification", + "intent": "primary", + "title": "Terminology Clarification" + }, + { + "content": "This data store currently provides **only a subset of all datasets** provided by the \"ESA CCI Open Data Portal (ODP), namely gridded datasets originally stored in NetCDF format.\nIn upcoming versions, the store will also allow for browsing and accessing the remaining ODP datasets. This includes gridded data in TIFF format and also vector data using ESRI Shapefile format.\nFor the time being users can download the missing vector data from the ODP FTP server](http://cci.esa.int/data#ftp) `ftp://anon-ftp.ceda.ac.uk/neodc/esacci/`\n* CCI Glaciers in FTP directory `glaciers`\n* CCI Ice Sheets in FTP directories `ice_sheets_antarctica` and `ice_sheets_greenland", + "icon": "warning-sign", + "id": "dataCompleteness", + "intent": "warning", + "title": "Data Completeness" + } + ], + "description": "ESA CCI Open Data Portal" + }, + "ccizarr": { + "description": "xarray.Dataset in Zarr format from ESA CCI Object Storage" + }, + "cds": { + "description": "Climate Data Store API" + }, "file": { "description": "Data store that uses a local filesystem" }, @@ -223,7 +248,7 @@ "type": "object" }, "text/plain": [ - "" + "" ] }, "execution_count": 3, @@ -241,11 +266,22 @@ "source": [ "The field `properties` lists the available data store parameters. For the filesystem-based data stores `file`, `s3`, and `memory` the most important parameter is `root` which specifies the data store's root path into the filesystem. \n", "\n", - "For the filesystem-based data stores there is also a special parameter `fs_params` that can contain further filesystem-specific parameters. Here are some examples how to parameterize an `s3` data store:\n", + "For the filesystem-based data stores there is also a special parameter `fs_params` that can contain further filesystem-specific parameters. \n", + "\n", + "Here are some examples how to parameterize an `s3` data store that can access AWS S3 compatible object storage:\n", "\n", - "* Public AWS S3 bucket: `new_data_store('s3', root=\"\", fs_params=dict(anon=True))`\n", - "* Private AWS S3 bucket: `new_data_store('s3', root=\"\", fs_params=dict(anon=False, key=\"\", secret=\"\"))`\n", - "* Public AWS S3 compatible object storage: `new_data_store('s3', root=\"\", fs_params=dict(anon=True, client_kwargs=dict(endpoint_url=\"\")))`\n", + "Public bucket on AWS S3: \n", + "```python\n", + "store = new_data_store('s3', root=\"\", fs_params=dict(anon=True))\n", + "```\n", + "Private bucket on AWS S3: \n", + "```python\n", + "store = new_data_store('s3', root=\"\", fs_params=dict(anon=False, key=\"\", secret=\"\"))\n", + "```\n", + "Public object storage other than AWS S3: \n", + "```python\n", + "store = new_data_store('s3', root=\"\", fs_params=dict(anon=True, client_kwargs=dict(endpoint_url=\"\")))\n", + "```\n", "\n", "Which parameters are accepted by `new_data_store()` when using the `file` data store?" ] @@ -304,7 +340,7 @@ "type": "object" }, "text/plain": [ - "" + "" ] }, "execution_count": 4, @@ -331,7 +367,7 @@ { "data": { "text/plain": [ - ".FsDataStoreClass at 0x2b099e9ab80>" + ".FsDataStoreClass at 0x2364884fc70>" ] }, "execution_count": 5, @@ -495,6 +531,7 @@ } }, "data_id": "cube-1-250-250.zarr", + "data_type": "dataset", "data_vars": { "c2rcc_flags": { "attrs": { @@ -709,11 +746,10 @@ "time_range": [ "2017-01-16", "2017-01-30" - ], - "type_specifier": "dataset[cube]" + ] }, "text/plain": [ - "" + "" ] }, "execution_count": 7, @@ -856,6 +892,7 @@ } }, "data_id": "cube-1-250-250.zarr", + "data_type": "dataset", "data_vars": { "c2rcc_flags": { "attrs": { @@ -1070,11 +1107,10 @@ "time_range": [ "2017-01-16", "2017-01-30" - ], - "type_specifier": "dataset[cube]" + ] }, "text/plain": [ - "" + "" ] }, "metadata": {}, @@ -1192,6 +1228,7 @@ } }, "data_id": "cube-5-100-200.zarr", + "data_type": "dataset", "data_vars": { "c2rcc_flags": { "attrs": { @@ -1406,11 +1443,10 @@ "time_range": [ "2017-01-16", "2017-01-30" - ], - "type_specifier": "dataset[cube]" + ] }, "text/plain": [ - "" + "" ] }, "metadata": {}, @@ -1524,6 +1560,7 @@ } }, "data_id": "cube.nc", + "data_type": "dataset", "data_vars": { "c2rcc_flags": { "attrs": { @@ -1713,11 +1750,10 @@ "time_range": [ "2017-01-16", "2017-01-30" - ], - "type_specifier": "dataset[cube]" + ] }, "text/plain": [ - "" + "" ] }, "metadata": {}, @@ -2122,7 +2158,7 @@ " kd489 (time, lat, lon) float64 dask.array<chunksize=(1, 250, 250), meta=np.ndarray>\n", " quality_flags (time, lat, lon) float64 dask.array<chunksize=(1, 250, 250), meta=np.ndarray>\n", "Attributes:\n", - " Conventions: CF-1.7