diff --git a/doc/source/contributing/quest-available-datasets.rst b/doc/source/contributing/quest-available-datasets.rst index 91a6283a0..86901f7ed 100644 --- a/doc/source/contributing/quest-available-datasets.rst +++ b/doc/source/contributing/quest-available-datasets.rst @@ -9,10 +9,13 @@ On this page, we outline the datasets that are supported by the QUEST module. Cl List of Datasets ---------------- -* `Argo `_ - * The Argo mission involves a series of floats that are designed to capture vertical ocean profiles of temperature, salinity, and pressure down to ~2000 m. Some floats are in support of BGC-Argo, which also includes data relevant for biogeochemical applications: oxygen, nitrate, chlorophyll, backscatter, and solar irradiance. - * (Link Kelsey's paper here) - * (Link to example workbook here) +`Argo `_ +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +The Argo mission involves a series of floats that are designed to capture vertical ocean profiles of temperature, salinity, and pressure down to ~2000 m. Some floats are in support of BGC-Argo, which also includes data relevant for biogeochemical applications: oxygen, nitrate, chlorophyll, backscatter, and solar irradiance. + +A paper outlining the Argo extension to QUEST is currently in preparation, with a citable preprint available in the near future. + +:ref:`Argo Workflow Example` Adding a Dataset to QUEST @@ -20,6 +23,7 @@ Adding a Dataset to QUEST Want to add a new dataset to QUEST? No problem! QUEST includes a template script (``dataset.py``) that may be used to create your own querying module for a dataset of interest. -Guidelines on how to construct your dataset module may be found here: (link to be added) +Once you have developed a script with the template, you may request for the module to be added to QUEST via GitHub. +Please see the How to Contribute page :ref:`dev_guide_label` for instructions on how to contribute to icepyx. -Once you have developed a script with the template, you may request for the module to be added to QUEST via Github. Please see the How to Contribute page :ref:`dev_guide_label` for instructions on how to contribute to icepyx. \ No newline at end of file +Detailed guidelines on how to construct your dataset module are currently a work in progress. diff --git a/doc/source/example_notebooks/QUEST_argo_data_access.ipynb b/doc/source/example_notebooks/QUEST_argo_data_access.ipynb new file mode 100644 index 000000000..1bdb5fd0c --- /dev/null +++ b/doc/source/example_notebooks/QUEST_argo_data_access.ipynb @@ -0,0 +1,626 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "16806722-f5bb-4063-bd4b-60c8b0d24d2a", + "metadata": { + "user_expressions": [] + }, + "source": [ + "# QUEST Example: Finding Argo and ICESat-2 data\n", + "\n", + "In this notebook, we are going to find Argo and ICESat-2 data over a region of the Pacific Ocean. Normally, we would require multiple data portals or Python packages to accomplish this. However, thanks to the [QUEST (Query, Unify, Explore SpatioTemporal) module](https://icepyx.readthedocs.io/en/latest/contributing/quest-available-datasets.html), we can use icepyx to find both!" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ed25d839-4114-41db-9166-8c027368686c", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# Basic packages\n", + "import geopandas as gpd\n", + "import matplotlib.pyplot as plt\n", + "import numpy as np\n", + "from pprint import pprint\n", + "\n", + "# icepyx and QUEST\n", + "import icepyx as ipx" + ] + }, + { + "cell_type": "markdown", + "id": "5c35f5df-b4fb-4a36-8d6f-d20f1552767a", + "metadata": { + "user_expressions": [] + }, + "source": [ + "## Define the Quest Object\n", + "\n", + "QUEST builds off of the general querying process originally designed for ICESat-2, but makes it applicable to other datasets.\n", + "\n", + "Just like the ICESat-2 Query object, we begin by defining our Quest object. We provide the following bounding parameters:\n", + "* `spatial_extent`: Data is constrained to the given box over the Pacific Ocean.\n", + "* `date_range`: Only grab data from April 18-19, 2022 (to keep download sizes small for this example)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c5d0546d-f0b8-475d-9fd4-62ace696e316", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# Spatial bounds, given as SW/NE corners\n", + "spatial_extent = [-154, 30, -143, 37]\n", + "\n", + "# Start and end dates, in YYYY-MM-DD format\n", + "date_range = ['2022-04-18', '2022-04-19']\n", + "\n", + "# Initialize the QUEST object\n", + "reg_a = ipx.Quest(spatial_extent=spatial_extent, date_range=date_range)\n", + "\n", + "print(reg_a)" + ] + }, + { + "cell_type": "markdown", + "id": "8732bf56-1d44-4182-83f7-4303a87d231a", + "metadata": { + "user_expressions": [] + }, + "source": [ + "Notice that we have defined our spatial and temporal domains, but we do not have any datasets in our QUEST object. The next section leads us through that process." + ] + }, + { + "cell_type": "markdown", + "id": "1598bbca-3dcb-4b63-aeb1-81c27d92a1a2", + "metadata": { + "user_expressions": [] + }, + "source": [ + "## Getting the data\n", + "\n", + "Let's first query the ICESat-2 data. If we want to extract information about the water column, the ATL03 product is likely the desired choice.\n", + "* `short_name`: ATL03" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "309a7b26-cfc3-46fc-a683-43e154412074", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# ICESat-2 product\n", + "short_name = 'ATL03'\n", + "\n", + "# Add ICESat-2 to QUEST datasets\n", + "reg_a.add_icesat2(product=short_name)\n", + "print(reg_a)" + ] + }, + { + "cell_type": "markdown", + "id": "ad4bbcfe-3199-4a28-8739-c930d1572538", + "metadata": { + "user_expressions": [] + }, + "source": [ + "Let's see the available files over this region." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a2b4e56f-ceff-45e7-b52c-e7725dc6c812", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "pprint(reg_a.datasets['icesat2'].avail_granules(ids=True))" + ] + }, + { + "cell_type": "markdown", + "id": "7a081854-dae4-4e99-a550-02c02a71b6de", + "metadata": { + "user_expressions": [] + }, + "source": [ + "Note the ICESat-2 functions shown here are the same as those used for direct icepyx queries. The user is referred to other [example workbooks](https://icepyx.readthedocs.io/en/latest/example_notebooks/IS2_data_access.html) for detailed explanations about icepyx functionality.\n", + "\n", + "Accessing ICESat-2 data requires Earthdata login credentials. When running the `download_all()` function below, an authentication check will be passed when attempting to download the ICESat-2 files." + ] + }, + { + "cell_type": "markdown", + "id": "8264515a-00f1-4f57-b927-668a71294079", + "metadata": { + "user_expressions": [] + }, + "source": [ + "Now let's grab Argo data using the same constraints. This is as simple as using the below function." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c857fdcc-e271-4960-86a9-02f693cc13fe", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# Add argo to the desired QUEST datasets\n", + "reg_a.add_argo()" + ] + }, + { + "cell_type": "markdown", + "id": "7bade19e-5939-410a-ad54-363636289082", + "metadata": { + "user_expressions": [] + }, + "source": [ + "When accessing Argo data, the variables of interest will be organized as vertical profiles as a function of pressure. By default, only temperature is queried, so the user should supply a list of desired parameters using the code below. The user may also limit the pressure range of the returned data by passing `presRange=\"0,200\"`.\n", + "\n", + "*Note: Our example shows only physical Argo float parameters, but the process is identical for including BGC float parameters.*" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6739c3aa-1a88-4d8e-9fd8-479528c20e97", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# Customized variable query to retrieve salinity instead of temperature\n", + "reg_a.add_argo(params=['salinity'])" + ] + }, + { + "cell_type": "markdown", + "id": "2d06436c-2271-4229-8196-9f5180975ab1", + "metadata": { + "user_expressions": [] + }, + "source": [ + "Additionally, a user may view or update the list of requested Argo and Argo-BGC parameters at any time through `reg_a.datasets['argo'].params`. If a user submits an invalid parameter (\"temp\" instead of \"temperature\", for example), an `AssertionError` will be raised. `reg_a.datasets['argo'].presRange` behaves anologously for limiting the pressure range of Argo data." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e34756b8", + "metadata": {}, + "outputs": [], + "source": [ + "# update the list of argo parameters\n", + "reg_a.datasets['argo'].params = ['temperature','salinity']\n", + "\n", + "# show the current list\n", + "reg_a.datasets['argo'].params" + ] + }, + { + "cell_type": "markdown", + "id": "453900c1-cd62-40c9-820c-0615f63f17f5", + "metadata": { + "user_expressions": [] + }, + "source": [ + "As for ICESat-2 data, the user can interact directly with the Argo data object (`reg_a.datasets['argo']`) to search or download data outside of the `Quest.search_all()` and `Quest.download_all()` functionality shown below.\n", + "\n", + "The approach to directly search or download Argo data is to use `reg_a.datasets['argo'].search_data()`, and `reg_a.datasets['argo'].download()`. In both cases, the existing parameters and pressure ranges are used unless the user passes new `params` and/or `presRange` kwargs, respectively, which will directly update those values (stored attributes)." + ] + }, + { + "cell_type": "markdown", + "id": "3f55be4e-d261-49c1-ac14-e19d8e0ff828", + "metadata": { + "user_expressions": [] + }, + "source": [ + "With our current setup, let's see what Argo parameters we will get." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "435a1243", + "metadata": {}, + "outputs": [], + "source": [ + "# see what argo parameters will be searched for or downloaded\n", + "reg_a.datasets['argo'].params" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c15675df", + "metadata": {}, + "outputs": [], + "source": [ + "reg_a.datasets['argo'].search_data()" + ] + }, + { + "cell_type": "markdown", + "id": "70d36566-0d3c-4781-a199-09bb11dad975", + "metadata": { + "user_expressions": [] + }, + "source": [ + "Now we can access the data for both Argo and ICESat-2! The below function will do this for us.\n", + "\n", + "**Important**: The Argo data will be compiled into a Pandas DataFrame, which must be manually saved by the user as demonstrated below. The ICESat-2 data is saved as processed HDF-5 files to the directory provided." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a818c5d7-d69a-4aad-90a2-bc670a54c3a7", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "path = './quest/downloaded-data/'\n", + "\n", + "# Access Argo and ICESat-2 data simultaneously\n", + "reg_a.download_all(path=path)" + ] + }, + { + "cell_type": "markdown", + "id": "ad29285e-d161-46ea-8a57-95891fa2b237", + "metadata": { + "tags": [], + "user_expressions": [] + }, + "source": [ + "We now have one available Argo profile, containing `temperature` and `pressure`, in a Pandas DataFrame. BGC Argo is also available through QUEST, so we could add more variables to this list.\n", + "\n", + "If the user wishes to add more profiles, parameters, and/or pressure ranges to a pre-existing DataFrame, then they should use `reg_a.datasets['argo'].download(keep_existing=True)` to retain previously downloaded data and have the new data added." + ] + }, + { + "cell_type": "markdown", + "id": "6970f0ad-9364-4732-a5e6-f93cf3fc31a3", + "metadata": { + "user_expressions": [] + }, + "source": [ + "The `reg_a.download_all()` function also provided a file containing ICESat-2 ATL03 data. Recall that because these data files are very large, we focus on only one file for this example.\n", + "\n", + "The below workflow uses the icepyx Read module to quickly load ICESat-2 data into an Xarray DataSet. To read in multiple files, see the [icepyx Read tutorial](https://icepyx.readthedocs.io/en/latest/example_notebooks/IS2_data_read-in.html) for how to change your input source." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "88f4b1b0-8c58-414c-b6a8-ce1662979943", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "filename = 'processed_ATL03_20220419002753_04111506_006_02.h5'\n", + "\n", + "reader = ipx.Read(data_source=path+filename)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "665d79a7-7360-4846-99c2-222b34df2a92", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# decide which portions of the file to read in\n", + "reader.vars.append(beam_list=['gt2l'], \n", + " var_list=['h_ph', \"lat_ph\", \"lon_ph\", 'signal_conf_ph'])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e7158814-50f0-4940-980c-9bb800360982", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "ds = reader.load()\n", + "ds" + ] + }, + { + "cell_type": "markdown", + "id": "1040438c-d806-4964-b4f0-1247da9f3f1f", + "metadata": { + "user_expressions": [] + }, + "source": [ + "To make the data more easily plottable, let's convert the data into a Pandas DataFrame. Note that this method is memory-intensive for ATL03 data, so users are suggested to look at small spatial domains to prevent the notebook from crashing. Here, since we only have data from one granule and ground track, we have sped up the conversion to a dataframe by first removing extra data dimensions we don't need for our plots. Several of the other steps completed below using Pandas have analogous operations in Xarray that would further reduce memory requirements and computation times." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "50d23a8e", + "metadata": {}, + "outputs": [], + "source": [ + "is2_pd =(ds.squeeze()\n", + " .reset_coords()\n", + " .drop_vars([\"source_file\",\"data_start_utc\",\"data_end_utc\",\"gran_idx\"])\n", + " .to_dataframe()\n", + " )" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "01bb5a12", + "metadata": {}, + "outputs": [], + "source": [ + "is2_pd" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "fc67e039-338c-4348-acaf-96f605cf0030", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# Create a new dataframe with only \"ocean\" photons, as indicated by the \"ds_surf_type\" flag\n", + "is2_pd = is2_pd.reset_index(level=[0,1])\n", + "is2_pd_ocean = is2_pd[is2_pd.ds_surf_type==1].drop(columns=\"photon_idx\")\n", + "is2_pd_ocean" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "976ed530-1dc9-412f-9d2d-e51abd28c564", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# Set Argo data as its own DataFrame\n", + "argo_df = reg_a.datasets['argo'].argodata" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f9a3b8cf-f3b9-4522-841b-bf760672e37f", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# Convert both DataFrames into GeoDataFrames\n", + "is2_gdf = gpd.GeoDataFrame(is2_pd_ocean, \n", + " geometry=gpd.points_from_xy(is2_pd_ocean['lon_ph'], is2_pd_ocean['lat_ph']),\n", + " crs='EPSG:4326'\n", + ")\n", + "argo_gdf = gpd.GeoDataFrame(argo_df, \n", + " geometry=gpd.points_from_xy(argo_df.lon, argo_df.lat),\n", + " crs='EPSG:4326'\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "86cb8463-dc14-4c1d-853e-faf7bf4300a5", + "metadata": { + "user_expressions": [] + }, + "source": [ + "To view the relative locations of ICESat-2 and Argo, the below cell uses the `explore()` function from GeoPandas. The time variables cause errors in the function, so we will drop those variables first. \n", + "\n", + "Note that for large datasets like ICESat-2, loading the map might take a while." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "7178fecc-6ca1-42a1-98d4-08f57c050daa", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# Drop time variables that would cause errors in explore() function\n", + "is2_gdf = is2_gdf.drop(['delta_time','atlas_sdp_gps_epoch'], axis=1)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5ff40f7b-3a0f-4e32-8187-322a5b7cb44d", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# Plot ICESat-2 track (medium/high confidence photons only) on a map\n", + "m = is2_gdf[is2_gdf['signal_conf_ph']>=3].explore(column='rgt', tiles='Esri.WorldImagery',\n", + " name='ICESat-2')\n", + "\n", + "# Add Argo float locations to map\n", + "argo_gdf.explore(m=m, name='Argo', marker_kwds={\"radius\": 6}, color='red')" + ] + }, + { + "cell_type": "markdown", + "id": "8b7063ec-a2f8-4509-a7ce-5b0482b48682", + "metadata": { + "user_expressions": [] + }, + "source": [ + "While we're at it, let's plot temperature and pressure profiles for each of the Argo floats in the area." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "da2748b7-b174-4abb-a44a-bd73d1d36eba", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# Plot vertical profile of temperature vs. pressure for all of the floats\n", + "fig, ax = plt.subplots(figsize=(12, 6))\n", + "for pid in np.unique(argo_df['profile_id']):\n", + " argo_df[argo_df['profile_id']==pid].plot(ax=ax, x='temperature', y='pressure', label=pid)\n", + "plt.gca().invert_yaxis()\n", + "plt.xlabel('Temperature [$\\degree$C]')\n", + "plt.ylabel('Pressure [hPa]')\n", + "plt.ylim([750, -10])\n", + "plt.tight_layout()" + ] + }, + { + "cell_type": "markdown", + "id": "08481fbb-2298-432b-bd50-df2e1ca45cf5", + "metadata": { + "user_expressions": [] + }, + "source": [ + "Lastly, let's look at some near-coincident ICESat-2 and Argo data in a multi-panel plot." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "1269de3c-c15d-4120-8284-3b072069d5ee", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# Only consider ICESat-2 signal photons\n", + "is2_pd_signal = is2_pd_ocean[is2_pd_ocean['signal_conf_ph']>=0]\n", + "\n", + "## Multi-panel plot showing ICESat-2 and Argo data\n", + "\n", + "# Calculate Extent\n", + "lons = [-154, -143, -143, -154, -154]\n", + "lats = [30, 30, 37, 37, 30]\n", + "lon_margin = (max(lons) - min(lons)) * 0.1\n", + "lat_margin = (max(lats) - min(lats)) * 0.1\n", + "\n", + "# Create Plot\n", + "fig,([ax1,ax2],[ax3,ax4]) = plt.subplots(2, 2, figsize=(12, 6))\n", + "\n", + "# Plot Relative Global View\n", + "world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))\n", + "world.plot(ax=ax1, color='0.8', edgecolor='black')\n", + "argo_df.plot.scatter(ax=ax1, x='lon', y='lat', s=25.0, c='green', zorder=3, alpha=0.3)\n", + "is2_pd_signal.plot.scatter(ax=ax1, x='lon_ph', y='lat_ph', s=10.0, zorder=2, alpha=0.3)\n", + "ax1.plot(lons, lats, linewidth=1.5, color='orange', zorder=2)\n", + "ax1.set_xlim(-160,-100)\n", + "ax1.set_ylim(20,50)\n", + "ax1.set_aspect('equal', adjustable='box')\n", + "ax1.set_xlabel('Longitude', fontsize=18)\n", + "ax1.set_ylabel('Latitude', fontsize=18)\n", + "\n", + "# Plot Zoomed View of Ground Tracks\n", + "argo_df.plot.scatter(ax=ax2, x='lon', y='lat', s=50.0, c='green', zorder=3, alpha=0.3)\n", + "is2_pd_signal.plot.scatter(ax=ax2, x='lon_ph', y='lat_ph', s=10.0, zorder=2, alpha=0.3)\n", + "ax2.plot(lons, lats, linewidth=1.5, color='orange', zorder=1)\n", + "ax2.set_xlim(min(lons) - lon_margin, max(lons) + lon_margin)\n", + "ax2.set_ylim(min(lats) - lat_margin, max(lats) + lat_margin)\n", + "ax2.set_aspect('equal', adjustable='box')\n", + "ax2.set_xlabel('Longitude', fontsize=18)\n", + "ax2.set_ylabel('Latitude', fontsize=18)\n", + "\n", + "# Plot ICESat-2 along-track vertical profile. A dotted line notes the location of a nearby Argo float\n", + "is2 = ax3.scatter(is2_pd_signal['lat_ph'], is2_pd_signal['h_ph']+13.1, s=0.1)\n", + "ax3.axvline(34.43885, linestyle='--', linewidth=3, color='black')\n", + "ax3.set_xlim([34.3, 34.5])\n", + "ax3.set_ylim([-20, 5])\n", + "ax3.set_xlabel('Latitude', fontsize=18)\n", + "ax3.set_ylabel('Approx. IS-2 Depth [m]', fontsize=16)\n", + "ax3.set_yticklabels(['15', '10', '5', '0', '-5'])\n", + "\n", + "# Plot vertical ocean profile of the nearby Argo float\n", + "argo_df.plot(ax=ax4, x='temperature', y='pressure', linewidth=3)\n", + "# ax4.set_yscale('log')\n", + "ax4.invert_yaxis()\n", + "ax4.get_legend().remove()\n", + "ax4.set_xlabel('Temperature [$\\degree$C]', fontsize=18)\n", + "ax4.set_ylabel('Argo Pressure', fontsize=16)\n", + "\n", + "plt.tight_layout()\n", + "\n", + "# Save figure\n", + "#plt.savefig('/icepyx/quest/figures/is2_argo_figure.png', dpi=500)" + ] + }, + { + "cell_type": "markdown", + "id": "37720c79", + "metadata": {}, + "source": [ + "Recall that the Argo data must be saved manually.\n", + "The dataframe associated with the Quest object can be saved using `reg_a.save_all(path)`" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "9b6548e2-0662-4c8b-a251-55ca63aff99b", + "metadata": {}, + "outputs": [], + "source": [ + "reg_a.save_all(path)" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.10" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/doc/source/index.rst b/doc/source/index.rst index 586c8810f..612af6adc 100644 --- a/doc/source/index.rst +++ b/doc/source/index.rst @@ -128,6 +128,7 @@ ICESat-2 datasets to enable scientific discovery. example_notebooks/IS2_data_visualization example_notebooks/IS2_data_read-in example_notebooks/IS2_cloud_data_access + example_notebooks/QUEST_argo_data_access .. toctree:: :maxdepth: 2 @@ -145,9 +146,9 @@ ICESat-2 datasets to enable scientific discovery. contributing/contributors_link contributing/contribution_guidelines contributing/how_to_contribute + contributing/attribution_link contributing/icepyx_internals contributing/quest-available-datasets - contributing/attribution_link contributing/development_plan contributing/release_guide contributing/code_of_conduct_link diff --git a/icepyx/quest/__init__.py b/icepyx/quest/__init__.py new file mode 100644 index 000000000..e69de29bb diff --git a/icepyx/quest/dataset_scripts/__init__.py b/icepyx/quest/dataset_scripts/__init__.py index c7b28ee49..7834127ff 100644 --- a/icepyx/quest/dataset_scripts/__init__.py +++ b/icepyx/quest/dataset_scripts/__init__.py @@ -1 +1 @@ -from .dataset import * \ No newline at end of file +from .dataset import * diff --git a/icepyx/quest/dataset_scripts/argo.py b/icepyx/quest/dataset_scripts/argo.py new file mode 100644 index 000000000..8c614d301 --- /dev/null +++ b/icepyx/quest/dataset_scripts/argo.py @@ -0,0 +1,515 @@ +import os.path + +import numpy as np +import pandas as pd +import requests + +from icepyx.core.spatial import geodataframe +from icepyx.quest.dataset_scripts.dataset import DataSet + + +class Argo(DataSet): + """ + Initialises an Argo Dataset object via a Quest object. + Used to query physical and BGC Argo profiles. + + Parameters + --------- + aoi : + area of interest supplied via the spatial parameter of the QUEST object + toi : + time period of interest supplied via the temporal parameter of the QUEST object + params : list of str, default ["temperature"] + A list of strings, where each string is a requested parameter. + Only metadata for profiles with the requested parameters are returned. + To search for all parameters, use `params=["all"]`; + be careful using all for floats with BGC data, as this may be result in a large download. + presRange : str, default None + The pressure range (which correllates with depth) to search for data within. + Input as a "shallow-limit,deep-limit" string. + + See Also + -------- + DataSet + """ + + # Note: it looks like ArgoVis now accepts polygons, not just bounding boxes + def __init__(self, aoi, toi, params=["temperature"], presRange=None): + self._params = self._validate_parameters(params) + self._presRange = presRange + self._spatial = aoi + self._temporal = toi + # todo: verify that this will only work with a bounding box (I think our code can accept arbitrary polygons) + assert self._spatial._ext_type == "bounding_box" + self.argodata = None + self._apikey = "92259861231b55d32a9c0e4e3a93f4834fc0b6fa" + + def __str__(self): + if self.presRange is None: + prange = "All" + else: + prange = str(self.presRange) + + if self.argodata is None: + df = "No data yet" + else: + df = "\n" + str(self.argodata.head()) + s = ( + "---Argo---\n" + "Parameters: {0}\n" + "Pressure range: {1}\n" + "Dataframe head: {2}".format(self.params, prange, df) + ) + + return s + + # ---------------------------------------------------------------------- + # Properties + + @property + def params(self) -> list: + """ + User's list of Argo parameters to search (query) and download. + + The user may modify this list directly. + """ + + return self._params + + @params.setter + def params(self, value): + """ + Validate the input list of parameters. + """ + + self._params = list(set(self._validate_parameters(value))) + + @property + def presRange(self) -> str: + """ + User's pressure range to search (query) and download. + + The user may modify this string directly. + """ + + return self._presRange + + @presRange.setter + def presRange(self, value): + """ + Update the presRange based on the user input + """ + + self._presRange = value + + # ---------------------------------------------------------------------- + # Formatting API Inputs + + def _fmt_coordinates(self) -> str: + """ + Convert spatial extent into string format needed by argovis API + i.e. list of polygon coords [[[lat1,lon1],[lat2,lon2],...]] + """ + + gdf = geodataframe(self._spatial._ext_type, self._spatial._spatial_ext) + coordinates_array = np.asarray(gdf.geometry[0].exterior.coords) + x = "" + for i in coordinates_array: + coord = "[{0},{1}]".format(i[0], i[1]) + if x == "": + x = coord + else: + x += "," + coord + + x = "[" + x + "]" + return x + + # ---------------------------------------------------------------------- + # Validation + + def _valid_params(self) -> list: + """ + A list of valid Argo measurement parameters (including BGC). + + To get a list of valid parameters, comment out the validation line in `search_data` herein, + submit a search with an invalid parameter, and get the list from the response. + """ + + valid_params = [ + # all argo + "pressure", + "pressure_argoqc", + "salinity", + "salinity_argoqc", + "salinity_sfile", + "salinity_sfile_argoqc", + "temperature", + "temperature_argoqc", + "temperature_sfile", + "temperature_sfile_argoqc", + # BGC params + "bbp470", + "bbp470_argoqc", + "bbp532", + "bbp532_argoqc", + "bbp700", + "bbp700_argoqc", + "bbp700_2", + "bbp700_2_argoqc", + "bisulfide", + "bisulfide_argoqc", + "cdom", + "cdom_argoqc", + "chla", + "chla_argoqc", + "cndc", + "cndc_argoqc", + "cndx", + "cndx_argoqc", + "cp660", + "cp660_argoqc", + "down_irradiance380", + "down_irradiance380_argoqc", + "down_irradiance412", + "down_irradiance412_argoqc", + "down_irradiance442", + "down_irradiance442_argoqc", + "down_irradiance443", + "down_irradiance443_argoqc", + "down_irradiance490", + "down_irradiance490_argoqc", + "down_irradiance555", + "down_irradiance555_argoqc", + "down_irradiance670", + "down_irradiance670_argoqc", + "downwelling_par", + "downwelling_par_argoqc", + "doxy", + "doxy_argoqc", + "doxy2", + "doxy2_argoqc", + "doxy3", + "doxy3_argoqc", + "molar_doxy", + "molar_doxy_argoqc", + "nitrate", + "nitrate_argoqc", + "ph_in_situ_total", + "ph_in_situ_total_argoqc", + "turbidity", + "turbidity_argoqc", + "up_radiance412", + "up_radiance412_argoqc", + "up_radiance443", + "up_radiance443_argoqc", + "up_radiance490", + "up_radiance490_argoqc", + "up_radiance555", + "up_radiance555_argoqc", + # all params + "all", + ] + return valid_params + + def _validate_parameters(self, params) -> list: + """ + Checks that the list of user requested parameters are valid. + + Returns + ------- + The list of valid parameters + """ + + if "all" in params: + params = ["all"] + else: + valid_params = self._valid_params() + # checks that params are valid + for i in params: + assert ( + i in valid_params + ), "Parameter '{0}' is not valid. Valid parameters are {1}".format( + i, valid_params + ) + + return list(set(params)) + + # ---------------------------------------------------------------------- + # Querying and Getting Data + + def search_data(self, params=None, presRange=None, printURL=False) -> str: + """ + Query for available argo profiles given the spatio temporal criteria + and other params specific to the dataset. + Searches will automatically use the parameter and pressure range inputs + supplied when the `quest.argo` object was created unless replacement arguments + are added here. + + Parameters + --------- + params : list of str, default None + A list of strings, where each string is a requested parameter. + This kwarg is used to replace the existing list in `self.params`. + Do not submit this kwarg if you would like to use the existing `self.params` list. + Only metadata for profiles with the requested parameters are returned. + To search for all parameters, use `params=["all"]`; + be careful using all for floats with BGC data, as this may be result in a large download. + presRange : str, default None + The pressure range (which correllates with depth) to search for data within. + This kwarg is used to replace the existing pressure range in `self.presRange`. + Do not submit this kwarg if you would like to use the existing `self.presRange` values. + Input as a "shallow-limit,deep-limit" string. + printURL : boolean, default False + Print the URL of the data request. Useful for debugging and when no data is returned. + + Returns + ------ + str : message on the success status of the search + """ + + # if search is called with replaced parameters or presRange + if not params is None: + self.params = params + + if not presRange is None: + self.presRange = presRange + + # builds URL to be submitted + baseURL = "https://argovis-api.colorado.edu/argo" + payload = { + "startDate": self._temporal._start.strftime("%Y-%m-%dT%H:%M:%S.%fZ"), + "endDate": self._temporal._end.strftime("%Y-%m-%dT%H:%M:%S.%fZ"), + "polygon": [self._fmt_coordinates()], + "data": self.params, + } + + if self.presRange is not None: + payload["presRange"] = self.presRange + + # submit request + resp = requests.get( + baseURL, headers={"x-argokey": self._apikey}, params=payload + ) + + if printURL: + print(resp.url) + + selectionProfiles = resp.json() + + # Consider any status other than 2xx an error + if not resp.status_code // 100 == 2: + # check for the existence of profiles from query + if selectionProfiles == []: + msg = ( + "Warning: Query returned no profiles\n" + "Please try different search parameters" + ) + print(msg) + return msg + + else: + msg = "Error: Unexpected response {}".format(resp) + print(msg) + return msg + + # record the profile ids for the profiles that contain the requested parameters + prof_ids = [] + for i in selectionProfiles: + prof_ids.append(i["_id"]) + # should we be doing a set/duplicates check here?? + self.prof_ids = prof_ids + + msg = "{0} valid profiles have been identified".format(len(prof_ids)) + print(msg) + return msg + + def _download_profile( + self, + profile_number, + printURL=False, + ) -> dict: + """ + Download available argo data for a particular profile_ID. + + Parameters + --------- + profile_number: str + String containing the argo profile ID of the data being downloaded. + printURL: boolean, default False + Print the URL of the data request. Useful for debugging and when no data is returned. + + Returns + ------ + dict : json formatted dictionary of the profile data + """ + + # builds URL to be submitted + baseURL = "https://argovis-api.colorado.edu/argo" + payload = { + "id": profile_number, + "data": self.params, + } + + if self.presRange: + payload["presRange"] = self.presRange + + # submit request + resp = requests.get( + baseURL, headers={"x-argokey": self._apikey}, params=payload + ) + + if printURL: + print(resp.url) + + # Consider any status other than 2xx an error + if not resp.status_code // 100 == 2: + return "Error: Unexpected response {}".format(resp) + profile = resp.json() + return profile + + def _parse_into_df(self, profile_data) -> pd.DataFrame: + """ + Parses downloaded data from a single profile into dataframe. + Appends data to any existing profile data stored in the `argodata` property. + + Parameters + ---------- + profile_data: dict + The downloaded profile data. + The data is contained in the requests response and converted into a json formatted dictionary + by `_download_profile` before being passed into this function. + + Returns + ------- + pd.DataFrame : DataFrame of profile data + """ + + profileDf = pd.DataFrame( + np.transpose(profile_data["data"]), columns=profile_data["data_info"][0] + ) + + # this block tries to catch changes to the ArgoVis API that will break the dataframe creation + try: + profileDf["profile_id"] = profile_data["_id"] + # there's also a geolocation field that provides the geospatial info as shapely points + profileDf["lat"] = profile_data["geolocation"]["coordinates"][1] + profileDf["lon"] = profile_data["geolocation"]["coordinates"][0] + profileDf["date"] = profile_data["timestamp"] + except KeyError as err: + msg = "We cannot automatically parse your profile into a dataframe due to {0}".format( + err + ) + print(msg) + return msg + + profileDf.replace("None", np.nan, inplace=True, regex=True) + + return profileDf + + def download(self, params=None, presRange=None, keep_existing=True) -> pd.DataFrame: + """ + Downloads the requested data for a list of profile IDs (stored under .prof_ids) and returns it in a DataFrame. + + Data is also stored in self.argodata. + Note that if new inputs (`params` or `presRange`) are supplied and `keep_existing=True`, + the existing data will not be limited to the new input parameters. + + Parameters + ---------- + params : list of str, default None + A list of strings, where each string is a requested parameter. + This kwarg is used to replace the existing list in `self.params`. + Do not submit this kwarg if you would like to use the existing `self.params` list. + Only metadata for profiles with the requested parameters are returned. + To search for all parameters, use `params=["all"]`. + For a list of available parameters, see: `reg._valid_params` + presRange : str, default None + The pressure range (which correllates with depth) to search for data within. + This kwarg is used to replace the existing pressure range in `self.presRange`. + Do not submit this kwarg if you would like to use the existing `self.presRange` values. + Input as a "shallow-limit,deep-limit" string. + keep_existing : boolean, default True + Provides the option to clear any existing downloaded data before downloading more. + + Returns + ------- + pd.DataFrame : DataFrame of requested data + """ + + # TODO: do some basic testing of this block and how the dataframe merging actually behaves + if keep_existing == False: + print( + "Your previously stored data in reg.argodata", + "will be deleted before new data is downloaded.", + ) + self.argodata = None + elif keep_existing == True and hasattr(self, "argodata"): + print( + "The data requested by running this line of code\n", + "will be added to previously downloaded data.", + ) + + # if download is called with replaced parameters or presRange + if not params is None: + self.params = params + + if not presRange is None: + self.presRange = presRange + + # Add qc data for each of the parameters requested + if self.params == ["all"]: + pass + else: + for p in self.params: + if p.endswith("_argoqc") or (p + "_argoqc" in self.params): + pass + else: + self.params.append(p + "_argoqc") + + # intentionally resubmit search to reset prof_ids, in case the user requested different parameters + self.search_data() + + # create a dataframe for each profile and merge it with the rest of the profiles from this set of parameters being downloaded + merged_df = pd.DataFrame(columns=["profile_id"]) + for i in self.prof_ids: + print("processing profile", i) + try: + profile_data = self._download_profile(i) + profile_df = self._parse_into_df(profile_data[0]) + merged_df = pd.concat([merged_df, profile_df], sort=False) + except: + print("\tError processing profile {0}. Skipping.".format(i)) + + # now that we have a df from this round of downloads, we can add it to any existing dataframe + # note that if a given column has previously been added, update needs to be used to replace nans (merge will not replace the nan values) + if not self.argodata is None: + self.argodata = self.argodata.merge(merged_df, how="outer") + else: + self.argodata = merged_df + + self.argodata.reset_index(inplace=True, drop=True) + + return self.argodata + + def save(self, filepath): + """ + Saves the argo dataframe to a csv at the specified location + + Parameters + ---------- + filepath : str + String containing complete filepath and name of file + Any extension will be removed and replaced with csv. + Also appends '_argo.csv' to filename + e.g. /path/to/file/my_data(_argo.csv) + """ + + # create the directory if it doesn't exist + path, file = os.path.split(filepath) + if not os.path.exists(path): + os.mkdir(path) + + # remove any file extension + base, ext = os.path.splitext(filepath) + + self.argodata.to_csv(base + "_argo.csv") diff --git a/icepyx/quest/dataset_scripts/dataset.py b/icepyx/quest/dataset_scripts/dataset.py index e76081e08..193fab22e 100644 --- a/icepyx/quest/dataset_scripts/dataset.py +++ b/icepyx/quest/dataset_scripts/dataset.py @@ -11,9 +11,7 @@ class DataSet: All sub-classes must support the following methods for use via the QUEST class. """ - def __init__( - self, spatial_extent=None, date_range=None, start_time=None, end_time=None - ): + def __init__(self, spatial_extent, date_range, start_time=None, end_time=None): """ Complete any dataset specific initializations (i.e. beyond space and time) required here. For instance, ICESat-2 requires a product, and Argo requires parameters. @@ -70,6 +68,12 @@ def download(self): """ raise NotImplementedError + def save(self, filepath): + """ + Save the downloaded data to a directory on your local machine. + """ + raise NotImplementedError + # ---------------------------------------------------------------------- # Working with Data diff --git a/icepyx/quest/quest.py b/icepyx/quest/quest.py index fe3039a39..966b19dca 100644 --- a/icepyx/quest/quest.py +++ b/icepyx/quest/quest.py @@ -2,10 +2,9 @@ from icepyx.core.query import GenQuery, Query -# from icepyx.quest.dataset_scripts.argo import Argo +from icepyx.quest.dataset_scripts.argo import Argo -# todo: implement the subclass inheritance class Quest(GenQuery): """ QUEST - Query Unify Explore SpatioTemporal - object to query, obtain, and perform basic @@ -15,7 +14,6 @@ class Quest(GenQuery): See the doc page for GenQuery for details on temporal and spatial input parameters. - Parameters ---------- proj : proj4 string @@ -55,8 +53,8 @@ class Quest(GenQuery): def __init__( self, - spatial_extent=None, - date_range=None, + spatial_extent, + date_range, start_time=None, end_time=None, proj="default", @@ -64,6 +62,7 @@ def __init__( """ Tells QUEST to initialize data given the user input spatiotemporal data. """ + super().__init__(spatial_extent, date_range, start_time, end_time) self.datasets = {} @@ -86,7 +85,7 @@ def __str__(self): def add_icesat2( self, - product=None, + product, start_time=None, end_time=None, version=None, @@ -100,7 +99,6 @@ def add_icesat2( Parameters ---------- - For details on inputs, see the Query documentation. Returns @@ -128,10 +126,32 @@ def add_icesat2( self.datasets["icesat2"] = query - # def add_argo(self, params=["temperature"], presRange=None): + def add_argo(self, params=["temperature"], presRange=None) -> None: + """ + Adds Argo (including Argo-BGC) to QUEST structure. + + Parameters + ---------- + For details on inputs, see the Argo dataset script documentation. + + Returns + ------- + None + + See Also + -------- + quest.dataset_scripts.argo + icepyx.query.GenQuery + + Examples + -------- + # example with profiles available + >>> reg_a = Quest([-154, 30,-143, 37], ['2022-04-12', '2022-04-26']) + >>> reg_a.add_argo(params=["temperature", "salinity"]) + """ - # argo = Argo(self._spatial, self._temporal, params, presRange) - # self.datasets["argo"] = argo + argo = Argo(self._spatial, self._temporal, params, presRange) + self.datasets["argo"] = argo # ---------------------------------------------------------------------- # Methods (on all datasets) @@ -144,11 +164,11 @@ def search_all(self, **kwargs): Parameters ---------- **kwargs : default None - Optional passing of keyword arguments to supply additional search constraints per datasets. - Each key must match the dataset name (e.g. "icesat2", "argo") as in quest.datasets.keys(), - and the value is a dictionary of acceptable keyword arguments - and values allowable for the `search_data()` function for that dataset. - For instance: `icesat2 = {"IDs":True}, argo = {"presRange":"10,500"}`. + Optional passing of keyword arguments to supply additional search constraints per datasets. + Each key must match the dataset name (e.g. "icesat2", "argo") as in quest.datasets.keys(), + and the value is a dictionary of acceptable keyword arguments + and values allowable for the `search_data()` function for that dataset. + For instance: `icesat2 = {"IDs":True}, argo = {"presRange":"10,500"}`. """ print("\nSearching all datasets...") @@ -168,6 +188,7 @@ def search_all(self, **kwargs): v.search_data(kwargs[k]) except KeyError: v.search_data() + except: dataset_name = type(v).__name__ print("Error querying data from {0}".format(dataset_name)) @@ -180,18 +201,19 @@ def download_all(self, path="", **kwargs): Parameters ---------- **kwargs : default None - Optional passing of keyword arguments to supply additional search constraints per datasets. - Each key must match the dataset name (e.g. "icesat2", "argo") as in quest.datasets.keys(), - and the value is a dictionary of acceptable keyword arguments - and values allowable for the `search_data()` function for that dataset. - For instance: `icesat2 = {"verbose":True}, argo = {"keep_existing":True}`. + Optional passing of keyword arguments to supply additional search constraints per datasets. + Each key must match the dataset name (e.g. "icesat2", "argo") as in quest.datasets.keys(), + and the value is a dictionary of acceptable keyword arguments + and values allowable for the `search_data()` function for that dataset. + For instance: `icesat2 = {"verbose":True}, argo = {"keep_existing":True}`. """ + print("\nDownloading all datasets...") for k, v in self.datasets.items(): print() - try: + try: if isinstance(v, Query): print("---ICESat-2---") try: @@ -208,4 +230,22 @@ def download_all(self, path="", **kwargs): print(msg) except: dataset_name = type(v).__name__ - print("Error downloading data from {0}".format(dataset_name)) + print("Error downloading data from {0}".format(dataset_name)) + + def save_all(self, path): + """ + Saves all datasets according to their respective `.save()` functionality. + + Parameters + ---------- + path : str + Path at which to save the dataset files. + + """ + + for k, v in self.datasets.items(): + if isinstance(v, Query): + print("ICESat-2 granules are saved during download") + else: + print("Saving " + k) + v.save(path) diff --git a/icepyx/tests/test_quest.py b/icepyx/tests/test_quest.py index f50b1bea2..0ba7325a6 100644 --- a/icepyx/tests/test_quest.py +++ b/icepyx/tests/test_quest.py @@ -15,6 +15,7 @@ def quest_instance(scope="module", autouse=True): ########## PER-DATASET ADDITION TESTS ########## + # Paramaterize these add_dataset tests once more datasets are added def test_add_is2(quest_instance): # Add ATL06 as a test to QUEST @@ -32,44 +33,39 @@ def test_add_is2(quest_instance): assert quest_instance.datasets[exp_key].product == prod -# def test_add_argo(quest_instance): -# params = ["down_irradiance412", "temperature"] -# quest_instance.add_argo(params=params) -# exp_key = "argo" -# exp_type = ipx.quest.dataset_scripts.argo.Argo - -# obs = quest_instance.datasets - -# assert type(obs) == dict -# assert exp_key in obs.keys() -# assert type(obs[exp_key]) == exp_type -# assert quest_instance.datasets[exp_key].params == params - -# def test_add_multiple_datasets(): -# bounding_box = [-150, 30, -120, 60] -# date_range = ["2022-06-07", "2022-06-14"] -# my_quest = Quest(spatial_extent=bounding_box, date_range=date_range) -# -# # print(my_quest.spatial) -# # print(my_quest.temporal) -# -# # my_quest.add_argo(params=["down_irradiance412", "temperature"]) -# # print(my_quest.datasets["argo"].params) -# -# my_quest.add_icesat2(product="ATL06") -# # print(my_quest.datasets["icesat2"].product) -# -# print(my_quest) -# -# # my_quest.search_all() -# # -# # # this one still needs work for IS2 because of auth... -# # my_quest.download_all() +def test_add_argo(quest_instance): + params = ["down_irradiance412", "temperature"] + quest_instance.add_argo(params=params) + exp_key = "argo" + exp_type = ipx.quest.dataset_scripts.argo.Argo + + obs = quest_instance.datasets + + assert type(obs) == dict + assert exp_key in obs.keys() + assert type(obs[exp_key]) == exp_type + assert set(quest_instance.datasets[exp_key].params) == set(params) + + +def test_add_multiple_datasets(quest_instance): + quest_instance.add_argo(params=["down_irradiance412", "temperature"]) + # print(quest_instance.datasets["argo"].params) + + quest_instance.add_icesat2(product="ATL06") + # print(quest_instance.datasets["icesat2"].product) + + exp_keys = ["argo", "icesat2"] + assert set(exp_keys) == set(quest_instance.datasets.keys()) + ########## ALL DATASET METHODS TESTS ########## + # each of the query functions should be tested in their respective modules def test_search_all(quest_instance): + quest_instance.add_argo(params=["down_irradiance412", "temperature"]) + quest_instance.add_icesat2(product="ATL06") + # Search and test all datasets quest_instance.search_all() @@ -78,8 +74,8 @@ def test_search_all(quest_instance): "kwargs", [ {"icesat2": {"IDs": True}}, - # {"argo":{"presRange":"10,500"}}, - # {"icesat2":{"IDs":True}, "argo":{"presRange":"10,500"}} + {"argo": {"presRange": "10,500"}}, + {"icesat2": {"IDs": True}, "argo": {"presRange": "10,500"}}, ], ) def test_search_all_kwargs(quest_instance, kwargs): @@ -88,15 +84,19 @@ def test_search_all_kwargs(quest_instance, kwargs): # TESTS NOT IMPLEMENTED # def test_download_all(): -# # this will require auth in some cases... -# pass +# quest_instance.add_argo(params=["down_irradiance412", "temperature"]) +# quest_instance.add_icesat2(product="ATL06") + +# # this will require auth in some cases... +# quest_instance.download_all() + # @pytest.mark.parametrize( # "kwargs", # [ # {"icesat2": {"verbose":True}}, -# # {"argo":{"keep_existing":True}, -# # {"icesat2":{"verbose":True}, "argo":{"keep_existing":True} +# {"argo":{"keep_existing":True}, +# {"icesat2":{"verbose":True}, "argo":{"keep_existing":True} # ], # ) # def test_download_all_kwargs(quest_instance, kwargs): diff --git a/icepyx/tests/test_quest_argo.py b/icepyx/tests/test_quest_argo.py new file mode 100644 index 000000000..a6940fe7b --- /dev/null +++ b/icepyx/tests/test_quest_argo.py @@ -0,0 +1,247 @@ +import os + +import pytest +import re + +from icepyx.quest.quest import Quest + + +# create an Argo instance via quest (Argo is a submodule) +@pytest.fixture(scope="function") +def argo_quest_instance(): + def _argo_quest_instance(bounding_box, date_range): # aka "factories as fixtures" + my_quest = Quest(spatial_extent=bounding_box, date_range=date_range) + my_quest.add_argo() + my_argo = my_quest.datasets["argo"] + + return my_argo + + return _argo_quest_instance + + +# --------------------------------------------------- +# Test Formatting and Validation + + +def test_fmt_coordinates(argo_quest_instance): + reg_a = argo_quest_instance([-154, 30, -143, 37], ["2022-04-12", "2022-04-26"]) + obs = reg_a._fmt_coordinates() + + exp = "[[-143.0,30.0],[-143.0,37.0],[-154.0,37.0],[-154.0,30.0],[-143.0,30.0]]" + + assert obs == exp + + +def test_validate_parameters(argo_quest_instance): + reg_a = argo_quest_instance([-154, 30, -143, 37], ["2022-04-12", "2022-04-26"]) + + invalid_params = ["temp", "temperature_files"] + + ermsg = re.escape( + "Parameter '{0}' is not valid. Valid parameters are {1}".format( + "temp", reg_a._valid_params() + ) + ) + + with pytest.raises(AssertionError, match=ermsg): + reg_a._validate_parameters(invalid_params) + + +# --------------------------------------------------- +# Test Setters + + +def test_param_setter(argo_quest_instance): + reg_a = argo_quest_instance([-154, 30, -143, 37], ["2022-04-12", "2022-04-26"]) + + exp = ["temperature"] + assert reg_a.params == exp + + reg_a.params = ["temperature", "salinity"] + + exp = list(set(["temperature", "salinity"])) + assert reg_a.params == exp + + +def test_param_setter_invalid_inputs(argo_quest_instance): + reg_a = argo_quest_instance([-154, 30, -143, 37], ["2022-04-12", "2022-04-26"]) + + exp = ["temperature"] + assert reg_a.params == exp + + ermsg = re.escape( + "Parameter '{0}' is not valid. Valid parameters are {1}".format( + "temp", reg_a._valid_params() + ) + ) + + with pytest.raises(AssertionError, match=ermsg): + reg_a.params = ["temp", "salinity"] + + +def test_presRange_setter(argo_quest_instance): + reg_a = argo_quest_instance([-154, 30, -143, 37], ["2022-04-12", "2022-04-26"]) + + exp = None + assert reg_a.presRange == exp + + reg_a.presRange = "0.5,150" + + exp = "0.5,150" + assert reg_a.presRange == exp + + +def test_presRange_setter_invalid_inputs(argo_quest_instance): + reg_a = argo_quest_instance([-154, 30, -143, 37], ["2022-04-12", "2022-04-26"]) + + exp = None + assert reg_a.presRange == exp + + reg_a.presRange = ( + "0.5, sam" # it looks like the API will take a string with a space + ) + + # this setter doesn't currently have a validation check, so would need to search + obs_msg = reg_a.search_data() + + exp_msg = "Error: Unexpected response " + + assert obs_msg == exp_msg + + +# --------------------------------------------------- +# Test search_data + + +def test_search_data_available_profiles(argo_quest_instance): + reg_a = argo_quest_instance([-154, 30, -143, 37], ["2022-04-12", "2022-04-26"]) + obs_msg = reg_a.search_data() + + exp_msg = "19 valid profiles have been identified" + + assert obs_msg == exp_msg + + +def test_search_data_no_available_profiles(argo_quest_instance): + reg_a = argo_quest_instance([-55, 68, -48, 71], ["2019-02-20", "2019-02-28"]) + obs = reg_a.search_data() + + exp = ( + "Warning: Query returned no profiles\n" "Please try different search parameters" + ) + + assert obs == exp + + +# --------------------------------------------------- +# Test download and df + + +def test_download_parse_into_df(argo_quest_instance): + reg_a = argo_quest_instance([-154, 30, -143, 37], ["2022-04-12", "2022-04-13"]) + reg_a.download() # note: pressure is returned by default + + obs_cols = reg_a.argodata.columns + + exp_cols = [ + "temperature", + "temperature_argoqc", + "pressure", + "profile_id", + "lat", + "lon", + "date", + ] + + assert set(exp_cols) == set(obs_cols) + + assert len(reg_a.argodata) == 2948 + + +# approach for additional testing of df functions: create json files with profiles and store them in test suite +# then use those for the comparison (e.g. number of rows in df and json match) + + +def test_save_df_to_csv(argo_quest_instance): + reg_a = argo_quest_instance([-154, 30, -143, 37], ["2022-04-12", "2022-04-13"]) + reg_a.download() # note: pressure is returned by default + + path = os.getcwd() + "test_file" + reg_a.save(path) + + assert os.path.exists(path + "_argo.csv") + os.remove(path + "_argo.csv") + + +def test_merge_df(argo_quest_instance): + reg_a = argo_quest_instance([-150, 30, -120, 60], ["2022-06-07", "2022-06-14"]) + param_list = ["salinity", "temperature", "down_irradiance412"] + + df = reg_a.download(params=param_list) + + assert "down_irradiance412" in df.columns + assert "down_irradiance412_argoqc" in df.columns + + df = reg_a.download(["doxy"], keep_existing=True) + assert "doxy" in df.columns + assert "doxy_argoqc" in df.columns + assert "down_irradiance412" in df.columns + assert "down_irradiance412_argoqc" in df.columns + + +# --------------------------------------------------- +# Test kwargs to replace params and presRange in search and download + + +def test_replace_param_search(argo_quest_instance): + reg_a = argo_quest_instance([-154, 30, -143, 37], ["2022-04-12", "2022-04-26"]) + + obs = reg_a.search_data(params=["doxy"]) + + exp = ( + "Warning: Query returned no profiles\n" "Please try different search parameters" + ) + + assert obs == exp + + +def test_replace_param_download(argo_quest_instance): + reg_a = argo_quest_instance([-154, 30, -143, 37], ["2022-04-12", "2022-04-13"]) + reg_a.download(params=["salinity"]) # note: pressure is returned by default + + obs_cols = reg_a.argodata.columns + + exp_cols = [ + "salinity", + "salinity_argoqc", + "pressure", + "profile_id", + "lat", + "lon", + "date", + ] + + assert set(exp_cols) == set(obs_cols) + + assert len(reg_a.argodata) == 1942 + + +def test_replace_presRange_search(argo_quest_instance): + reg_a = argo_quest_instance([-154, 30, -143, 37], ["2022-04-12", "2022-04-26"]) + obs_msg = reg_a.search_data(presRange="100,600") + + exp_msg = "19 valid profiles have been identified" + + assert obs_msg == exp_msg + + +def test_replace_presRange_download(argo_quest_instance): + reg_a = argo_quest_instance([-154, 30, -143, 37], ["2022-04-12", "2022-04-13"]) + df = reg_a.download(params=["salinity"], presRange="0.2,180") + + assert df["pressure"].min() >= 0.2 + assert df["pressure"].max() <= 180 + assert "salinity" in df.columns + + +# second pres range test where does have a higher max pressure because only the new data was presRange limited?