diff --git a/_toc.yml b/_toc.yml index e15e32b2..713fd663 100644 --- a/_toc.yml +++ b/_toc.yml @@ -37,6 +37,7 @@ parts: - file: intermediate/xarray_and_dask - file: intermediate/xarray_ecosystem - file: intermediate/hvplot + - file: intermediate/cmip6-cloud - file: data_cleaning/ice_velocity.ipynb - caption: Advanced diff --git a/intermediate/cmip6-cloud.ipynb b/intermediate/cmip6-cloud.ipynb new file mode 100644 index 00000000..0dda2a0f --- /dev/null +++ b/intermediate/cmip6-cloud.ipynb @@ -0,0 +1,332 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "0ebd8a4d-6937-4ad6-9c93-fa944fb389c1", + "metadata": {}, + "source": [ + "# Accessing remote data stored on the cloud\n", + "\n", + "In this tutorial, we'll cover the following:\n", + "- Finding a cloud hosted Zarr archive of CMIP6 dataset(s)\n", + "- Remote data access to a single CMIP6 dataset (sea surface height)\n", + "- Calculate future predicted sea level change in 2100 compared to 2015" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b7533f0e-5dd1-423a-9a04-8ed755d180a2", + "metadata": {}, + "outputs": [], + "source": [ + "import gcsfs\n", + "import pandas as pd\n", + "import xarray as xr" + ] + }, + { + "cell_type": "markdown", + "id": "95002377-b0a6-479f-928d-53b044b390df", + "metadata": {}, + "source": [ + "## Finding cloud native data\n", + "\n", + "Cloud-native data means data that is structured for efficient querying across the network.\n", + "Typically, this means having metadata that describes the entire file in the header of the\n", + "file, or having a a separate pointer file (so that there is no need to download everything first).\n", + "\n", + "Quite commonly, you'll see cloud-native datasets stored on these\n", + "three object storage providers, though there are many other ones too.\n", + "\n", + "- [Amazon Simple Storage Service (S3)](https://aws.amazon.com/s3)\n", + "- [Azure Blob Storage](https://azure.microsoft.com/en-us/services/storage/blobs)\n", + "- [Google Cloud Storage](https://cloud.google.com/storage)" + ] + }, + { + "cell_type": "markdown", + "id": "bc520e32-204f-4f92-bdec-4f678160d6de", + "metadata": {}, + "source": [ + "### Getting cloud hosted CMIP6 data\n", + "\n", + "The [Coupled Model Intercomparison Project Phase 6 (CMIP6)](https://en.wikipedia.org/wiki/CMIP6#CMIP_Phase_6)\n", + "dataset is a rich archive of modelling experiments carried out to predict the climate change impacts.\n", + "The datasets are stored using the [Zarr](https://zarr.dev) format, and we'll go over how to access it.\n", + "\n", + "Sources:\n", + "- https://esgf-node.llnl.gov/search/cmip6/\n", + "- CMIP6 data hosted on Google Cloud - https://console.cloud.google.com/marketplace/details/noaa-public/cmip6\n", + "- Pangeo/ESGF Cloud Data Access tutorial - https://pangeo-data.github.io/pangeo-cmip6-cloud/accessing_data.html" + ] + }, + { + "cell_type": "markdown", + "id": "8d12400d-ab5e-420e-b9f5-b61e083dc9ce", + "metadata": {}, + "source": [ + "First, let's open a CSV containing the list of CMIP6 datasets available" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "d1d9f94c-dbe3-4151-8ee7-fa182724810b", + "metadata": {}, + "outputs": [], + "source": [ + "df = pd.read_csv(\"https://cmip6.storage.googleapis.com/pangeo-cmip6.csv\")\n", + "print(f\"Number of rows: {len(df)}\")\n", + "df.head()" + ] + }, + { + "cell_type": "markdown", + "id": "eb263332-dc60-4bd1-9ef3-cf9612cf09a1", + "metadata": {}, + "source": [ + "Over 5 million rows! Let's filter it down to the variable and experiment\n", + "we're interested in, e.g. sea surface height.\n", + "\n", + "For the `variable_id`, you can look it up given some keyword at\n", + "https://docs.google.com/spreadsheets/d/1UUtoz6Ofyjlpx5LdqhKcwHFz2SGoTQV2_yekHyMfL9Y\n", + "\n", + "For the `experiment_id`, download the spreadsheet from\n", + "https://github.com/ES-DOC/esdoc-docs/blob/master/cmip6/experiments/spreadsheet/experiments.xlsx,\n", + "go to the 'experiment' tab, and find the one you're interested in.\n", + "\n", + "Another good place to find the right model runs is https://esgf-node.llnl.gov/search/cmip6\n", + "(once you get your head around the acronyms and short names)." + ] + }, + { + "cell_type": "markdown", + "id": "9b435c14-fd56-481c-b5f4-781794a1cc1a", + "metadata": {}, + "source": [ + "Below, we'll filter to CMIP6 experiments matching:\n", + "- Sea Surface Height Above Geoid [m] (variable_id: `zos`)\n", + "- Shared Socioeconomic Pathway 5 (experiment_id: `ssp585`)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "2fe50e53-b02f-4a84-bc4a-e1934fe32661", + "metadata": {}, + "outputs": [], + "source": [ + "df_zos = df.query(\"variable_id == 'zos' & experiment_id == 'ssp585'\")\n", + "df_zos" + ] + }, + { + "cell_type": "markdown", + "id": "9ddfad3e-d4de-4c0a-be6f-53f1f7928f51", + "metadata": {}, + "source": [ + "There's 272 modelled scenarios for SSP5.\n", + "Let's just get the URL to the first one in the list for now." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5515186d-8571-439a-b5a8-b8b56aab77f6", + "metadata": {}, + "outputs": [], + "source": [ + "print(df_zos.zstore.iloc[0])" + ] + }, + { + "cell_type": "markdown", + "id": "b68bcfbb-24c9-420d-b297-44c678b7f8ce", + "metadata": {}, + "source": [ + "## Reading from the remote Zarr storage" + ] + }, + { + "cell_type": "markdown", + "id": "b3f5660d-bd46-44f6-8f6d-a62947b6f2c4", + "metadata": {}, + "source": [ + "In many cases, you'll need to first connect to the cloud provider.\n", + "The CMIP6 dataset allows anonymous access, but for some cases,\n", + "you may need to authenticate." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a4e6d5e3-35a0-4c31-a1b8-96258cf50974", + "metadata": {}, + "outputs": [], + "source": [ + "fs = gcsfs.GCSFileSystem(token=\"anon\")" + ] + }, + { + "cell_type": "markdown", + "id": "b959f829-e434-4a84-82d2-2f2b24dc84d2", + "metadata": {}, + "source": [ + "Next, we'll need a mapping to the Google Storage object.\n", + "This can be done using `fs.get_mapper`.\n", + "\n", + "A more generic way (for other cloud providers) is to use\n", + "[`fsspec.get_mapper`](https://filesystem-spec.readthedocs.io/en/latest/api.html#fsspec.get_mapper) instead." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e1527d1f-503e-4b0b-8433-794067ed46cc", + "metadata": {}, + "outputs": [], + "source": [ + "store = fs.get_mapper(\n", + " \"gs://cmip6/CMIP6/ScenarioMIP/NOAA-GFDL/GFDL-ESM4/ssp585/r1i1p1f1/Omon/zos/gn/v20180701/\"\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "b694baac-9259-4de8-8eae-ac3cb653d894", + "metadata": {}, + "source": [ + "With that, we can open the Zarr store like so." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "74b6d289-a852-4216-a3b6-4483d5ff2854", + "metadata": {}, + "outputs": [], + "source": [ + "ds = xr.open_zarr(store=store, consolidated=True)\n", + "ds" + ] + }, + { + "cell_type": "markdown", + "id": "d81a5958-517b-4215-8c02-b1083b4b4fe2", + "metadata": {}, + "source": [ + "### Selecting time slices\n", + "\n", + "Let's say we want to calculate sea level change between\n", + "2015 and 2100. We can access just the specific time points\n", + "needed using [`xr.Dataset.sel`](https://docs.xarray.dev/en/stable/generated/xarray.Dataset.sel.html)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "1101b455-ba65-4cab-a3b6-bf2601958400", + "metadata": {}, + "outputs": [], + "source": [ + "zos_2015jan = ds.zos.sel(time=\"2015-01-16\").squeeze()\n", + "zos_2100dec = ds.zos.sel(time=\"2100-12-16\").squeeze()" + ] + }, + { + "cell_type": "markdown", + "id": "fb8d90a2-9883-41da-b26c-7b5547a15270", + "metadata": {}, + "source": [ + "Sea level change would just be 2100 minus 2015." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "1f5fa1ee-260c-4ec4-898a-230826f9f2c8", + "metadata": {}, + "outputs": [], + "source": [ + "sealevelchange = zos_2100dec - zos_2015jan" + ] + }, + { + "cell_type": "markdown", + "id": "0e087f3b-0315-40db-ae03-a3393b49c30e", + "metadata": {}, + "source": [ + "Note that up to this point, we have not actually downloaded any\n", + "(big) data yet from the cloud. This is all working based on\n", + "metadata only.\n", + "\n", + "To bring the data from the cloud to your local computer, call `.compute`.\n", + "This will take a while depending on your connection speed." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "38c2152e-67e7-449e-8f1a-2d64f63dedda", + "metadata": {}, + "outputs": [], + "source": [ + "sealevelchange = sealevelchange.compute()" + ] + }, + { + "cell_type": "markdown", + "id": "5226729f-07db-4fe6-a980-9a1f630c8277", + "metadata": {}, + "source": [ + "We can do a quick plot to show how Sea Level is predicted to change\n", + "between 2015-2100 (from one modelled experiment)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "8c42ed9f-fc61-4762-9765-3dd553d5c2ad", + "metadata": {}, + "outputs": [], + "source": [ + "sealevelchange.plot.imshow()" + ] + }, + { + "cell_type": "markdown", + "id": "b4361786-c889-4ae7-a704-dcbda50513da", + "metadata": {}, + "source": [ + "Notice the blue parts between -40 and -60 South where sea level has dropped?\n", + "That's to do with the Antarctic ice sheet losing mass and resulting in a lower\n", + "gravitational pull, resulting in a relative decrease in sea level. Over most\n", + "of the Northern Hemisphere though, sea level rise has increased between 2015 and 2100." + ] + }, + { + "cell_type": "markdown", + "id": "a87aa0a3-c82e-4da0-a5d0-31e42039feae", + "metadata": {}, + "source": [ + "That's all! Hopefully this will get you started on accessing more cloud-native datasets!" + ] + } + ], + "metadata": { + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +}