Skip to content

Commit

Permalink
Merge branch 'develop' into feature/grib-output-from-climetlab
Browse files Browse the repository at this point in the history
  • Loading branch information
sandorkertesz authored Oct 3, 2023
2 parents caf4091 + f578185 commit ce0479d
Show file tree
Hide file tree
Showing 28 changed files with 648 additions and 152 deletions.
2 changes: 1 addition & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ repos:
hooks:
- id: isort
- repo: https://github.com/psf/black
rev: 23.3.0
rev: 23.9.1
hooks:
- id: black
- repo: https://github.com/keewis/blackdoc
Expand Down
8 changes: 3 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,11 +5,9 @@
[![PyPI version fury.io](https://badge.fury.io/py/earthkit-data.svg)](https://pypi.python.org/pypi/earthkit-data/)
[![PyPI pyversions](https://img.shields.io/pypi/pyversions/earthkit-data.svg)](https://pypi.python.org/pypi/earthkit-data/)

> :warning: **DISCLAIMER**
>
> This project is **BETA** and will be **Experimental** for the foreseeable future.
> Interfaces and functionality are likely to change, and the project itself may be scrapped.
> **DO NOT** use this software in any project/software that is operational.
**DISCLAIMER**

> This project is in the **BETA** stage of development. Please be aware that interfaces and functionality may change as the project develops. If this software is to be used in operational systems you are **strongly advised to use a released tag in your system configuration**, and you should be willing to accept incoming changes and bug fixes that require adaptations on your part. ECMWF **does use** this software in operations and abides by the same caveats.
A format-agnostic interface for geospatial data with a focus on meteorology and
climate science.
Expand Down
2 changes: 2 additions & 0 deletions docs/examples.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,8 @@ Here is a list of example notebooks to illustrate how to use earthkit-data.
examples/cds.ipynb
examples/ecmwf_open_data.ipynb
examples/fdb.ipynb
examples/mars.ipynb
examples/polytope.ipynb
examples/wekeo.ipynb


Expand Down
188 changes: 188 additions & 0 deletions docs/examples/polytope.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,188 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "efdd065e-e9fc-494d-9f73-9cb3d525dc4a",
"metadata": {},
"source": [
"## Retrieving data with polytope"
]
},
{
"cell_type": "markdown",
"id": "75079b34-78ba-4536-8498-4dc3d0c3e646",
"metadata": {},
"source": [
"The “polytope” data source provides access to the [Polytope web services](https://polytope-client.readthedocs.io/en/latest/)."
]
},
{
"cell_type": "markdown",
"id": "621c0aa7-db93-441a-ab55-743fa6fbcd51",
"metadata": {},
"source": [
"The following example retrieves data from the ECMWF MARS archive using polytope. The dataset was prepared for the OGC GeoDataCubes working group, see details [here](https://github.com/ecmwf/ogc-gdc-usecase/tree/main)."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "7910ac60-a503-4392-a719-0b780625346f",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"2023-09-29 18:03:52 - INFO - Sending request...\n",
"{'request': 'class: rd\\n'\n",
" \"date: '20200915'\\n\"\n",
" 'domain: g\\n'\n",
" 'expver: hsvs\\n'\n",
" \"levellist: '1'\\n\"\n",
" 'levtype: pl\\n'\n",
" \"param: '129.128'\\n\"\n",
" 'step: 0/12\\n'\n",
" 'stream: oper\\n'\n",
" 'time: 00:00:00\\n'\n",
" 'type: fc\\n',\n",
" 'verb': 'retrieve'}\n",
"2023-09-29 18:03:53 - INFO - Request accepted. Please poll http://polytope.ecmwf.int/api/v1/requests/5af79420-5e06-477d-8167-a54e0de84fe1 for status\n",
"2023-09-29 18:03:53 - INFO - Checking request status (5af79420-5e06-477d-8167-a54e0de84fe1)...\n",
"2023-09-29 18:03:54 - INFO - The current status of the request is 'processing'\n",
"2023-09-29 18:03:58 - INFO - The current status of the request is 'processed'\n",
" \r"
]
},
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>centre</th>\n",
" <th>shortName</th>\n",
" <th>typeOfLevel</th>\n",
" <th>level</th>\n",
" <th>dataDate</th>\n",
" <th>dataTime</th>\n",
" <th>stepRange</th>\n",
" <th>dataType</th>\n",
" <th>number</th>\n",
" <th>gridType</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>ecmf</td>\n",
" <td>z</td>\n",
" <td>isobaricInhPa</td>\n",
" <td>1</td>\n",
" <td>20200915</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>fc</td>\n",
" <td>0</td>\n",
" <td>sh</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>ecmf</td>\n",
" <td>z</td>\n",
" <td>isobaricInhPa</td>\n",
" <td>1</td>\n",
" <td>20200915</td>\n",
" <td>0</td>\n",
" <td>12</td>\n",
" <td>fc</td>\n",
" <td>0</td>\n",
" <td>sh</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" centre shortName typeOfLevel level dataDate dataTime stepRange \\\n",
"0 ecmf z isobaricInhPa 1 20200915 0 0 \n",
"1 ecmf z isobaricInhPa 1 20200915 0 12 \n",
"\n",
" dataType number gridType \n",
"0 fc 0 sh \n",
"1 fc 0 sh "
]
},
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import earthkit.data\n",
"\n",
"request = {\n",
" 'stream': 'oper',\n",
" 'levtype': 'pl',\n",
" 'levellist': '1',\n",
" 'param': '129.128',\n",
" 'step': '0/12',\n",
" 'time': '00:00:00',\n",
" 'date': '20200915',\n",
" 'type': 'fc',\n",
" 'class': 'rd',\n",
" 'expver': 'hsvs',\n",
" 'domain': 'g'\n",
"}\n",
"\n",
"\n",
"ds = earthkit.data.from_source(\"polytope\", \"ecmwf-mars\", request)\n",
"ds.ls()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "32cb0e10-7545-4758-b2f0-99984f01d71f",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "dev",
"language": "python",
"name": "dev"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.11"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
111 changes: 78 additions & 33 deletions docs/guide/caching.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,8 @@ Caching
Purpose
-------

eartkit-data caches most of the remote data access on a local cache. Running again
``earthkit.data.from_source`` will use the cached data instead of
earthkit-data caches most of the remote data access on a local cache. Running again
:func:`from_source` will use the cached data instead of
downloading it again. When the cache is full, cached data is deleted according it cache policy
(i.e. oldest data is deleted first).
earthkit-data cache configuration is managed through the :doc:`settings`.
Expand All @@ -26,49 +26,94 @@ earthkit-data cache configuration is managed through the :doc:`settings`.
through using mirrors.

.. _cache_location:
.. _cache_policies:

Cache location
--------------
Cache policies and locations
------------------------------

The cache location is defined by the ``cache‑directory`` setting. Its default
value depends on your system:
The primary key to control the cache in the settings is ``cache‑policy``, which can take the following values:

- ``/tmp/earthkit-data-$USER`` for Linux,
- ``C:\\Users\\$USER\\AppData\\Local\\Temp\\earthkit-data-$USER`` for Windows
- ``/tmp/.../earthkit-data-$USER`` for MacOS
- "user" (default)
- "temporary"
- "off"

The cache location can be read and modified with Python (see the details below).

The cache location can be read and modified either with shell command or within python.
.. tip::

.. note::
See the :ref:`/examples/cache.ipynb` notebook for examples.

It is recommended to restart your Jupyter kernels after changing
the cache location.
.. note::

It is recommended to restart your Jupyter kernels after changing
the cache location.

User cache policy
+++++++++++++++++++

When the ``cache‑policy`` is "user" the cache is created in the directory defined by the ``user-cache-directory`` settings. The user cache directory is not cleaned up on exit. So next time you start earthkit-data it will (probably) be there again. Also, when you run multiple sessions of earthkit-data under the same user they will share the same cache.

The default value of the cache directory depends on your system:

- ``/tmp/earthkit-data-$USER`` for Linux,
- ``C:\\Users\\$USER\\AppData\\Local\\Temp\\earthkit-data-$USER`` for Windows
- ``/tmp/.../earthkit-data-$USER`` for MacOS


The following code shows how to change the ``user-cache-directory`` settings:

.. code:: python
>>> from earthkit.data import settings
>>> settings.get("user-cache-directory") # Find the current cache directory
/tmp/earthkit-data-$USER
>>> # Change the value of the setting
>>> settings.set("cache-directory", "/big-disk/earthkit-data-cache")
# Python kernel restarted
From Python:
>>> from earthkit.data import settings
>>> settings.get("user-cache-directory") # Cache directory has been modified
/big-disk/earthkit-data-cache
.. code:: python
More generally, the earthkit-data settings can be read, modified, reset
to their default values from Python,
see the :doc:`Settings documentation <settings>`.

>>> import earthkit.data
>>> earthkit.data.settings.get(
... "cache-directory"
... ) # Find the current cache directory
/tmp/earthkit-data-$USER
>>> # Change the value of the setting
>>> earthkit.data.settings.set("cache-directory", "/big-disk/earthkit-data-cache")

# Python kernel restarted
Temporary cache policy
++++++++++++++++++++++++

>>> import earthkit.data
>>> earthkit.data.settings.get(
... "cache-directory"
... ) # Cache directory has been modified
/big-disk/earthkit-data-cache
When the ``cache‑policy`` is "temporary" the cache will be located in a temporary directory created by ``tempfile.TemporaryDirectory``. This directory will be unique for each earthkit-data session. When the directory object goes out of scope (at the latest on exit) the cache is cleaned up. Due to the temporary nature of this directory path it cannot be queried via the :doc:`settings`, but we need to use :meth:`cache_directory` on the ``cache`` object.

.. code-block:: python
>>> from earthkit.data import cache, settings
>>> settings.set("cache-policy", "temporary")
>>> cache.cache_directory()
'/var/folders/ng/g0zkhc2s42xbslpsywwp_26m0000gn/T/tmp_5bf5kq8'
We can specify the parent directory for the the temporary cache by using the ``temporary-cache-directory-root`` settings. By default it is set to None (no parent directory specified).

.. code-block:: python
>>> from earthkit.data import cache, setting
>>> s = {
... "cache-policy": "temporary",
... "temporary-cache-directory-root": "~/my_demo_cache",
... }
>>> settings.set(s)
>>> cache.cache_directory()
'~/my_demo_cache/tmp0iiuvsz5'
Off cache policy
++++++++++++++++++++++++

It is also possible to turn caching off completely by setting the ``cache-policy`` to “off”.

.. warning::

More generally, the earthkit-data settings can be read, modified, reset
to their default values from python,
see the :doc:`Settings documentation <settings>`.
At the moment, when the cache is disabled none of the sources downloading data (e.g. :ref:`data-sources-mars`) will work. On top of that the :ref:`data-sources-file` source will not be able to handle archive input (e.g. tar, zip).

Cache limits
------------
Expand Down Expand Up @@ -100,9 +145,9 @@ Maximum-cache-disk-usage
and ``maximum-cache-disk-usage`` to ``None``.


Caching settings default values
Caching settings parameters
-------------------------------

.. module-output:: generate_settings_rst .*-cache-.* cache-.*
.. module-output:: generate_settings_rst .*-cache-.* cache-.* .*-cache

Other earthkit-data settings can be found :ref:`here <settings_table>`.
8 changes: 0 additions & 8 deletions docs/guide/include/settings-2-set.py

This file was deleted.

File renamed without changes.
File renamed without changes.
18 changes: 18 additions & 0 deletions docs/guide/include/settings-set.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
import earthkit.data

# Change the location of the user defined cache:
earthkit.data.settings.set("user-cache-directory", "/big-disk/earthkit-data-cache")

# Change number of download threads
earthkit.data.settings.set("number-of-download-threads", 7)

# Multiple values can be set together. The argument list
# can be a dictionary:
earthkit.data.settings.set(
{"number-of-download-threads": 7, "url-download-timeout": "1m"}
)

# Alternatively, we can use keyword arguments. However, because
# the “-” character is not allowed in variable names in Python we have
# to replace “-” with “_” in all the keyword arguments:
earthkit.data.settings.set(number_of_download_threads=8, url_download_timeout="2m")
Loading

0 comments on commit ce0479d

Please sign in to comment.