From 429bec20af803e1272bf271f097ff14c27461e78 Mon Sep 17 00:00:00 2001 From: pkdash Date: Fri, 2 Jun 2023 17:52:04 -0400 Subject: [PATCH] [#44] adding example notebook for aggregation data object operations --- .../Aggregation_Data_Object_Operations.ipynb | 929 ++++++++++++++++++ docs/examples/Aggregation_Operations.ipynb | 2 +- mkdocs.yml | 1 + 3 files changed, 931 insertions(+), 1 deletion(-) create mode 100644 docs/examples/Aggregation_Data_Object_Operations.ipynb diff --git a/docs/examples/Aggregation_Data_Object_Operations.ipynb b/docs/examples/Aggregation_Data_Object_Operations.ipynb new file mode 100644 index 0000000..a1f2111 --- /dev/null +++ b/docs/examples/Aggregation_Data_Object_Operations.ipynb @@ -0,0 +1,929 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "source": [ + "# hsclient HydroShare Python Client Resource Aggregation Data Object Operation Examples\n", + "\n", + "\n", + "---\n", + "\n", + "\n", + "The following code snippets show examples for how to use the hsclient HydroShare Python Client to load certain aggregation data types to relevant data processing objects to view data properties as well as be able to modify the data. The aggregation data object feature is available for the following HydroShare's content type aggregations:\n", + " * Time series\n", + " * Geographic feature\n", + " * Geographic raster\n", + " * Multidimensional NetCDF" + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "markdown", + "source": [ + "## Install the hsclient Python Client\n", + "\n", + "The hsclient Python Client for HydroShare may not be installed by default in your Python environment, so it has to be installed first before you can work with it. Use the following command to install hsclient via the Python Package Index (PyPi)." + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "!pip install hsclient[all]" + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "markdown", + "source": [ + "## Authenticating with HydroShare\n", + "\n", + "Before you start interacting with resources in HydroShare you will need to authenticate." + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "import os\n", + "from hsclient import HydroShare\n", + "\n", + "hs = HydroShare()\n", + "hs.sign_in()" + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "markdown", + "source": [ + "## Loading Resource Aggregation Data to Relevant Python Data Analysis Modules\n", + "\n", + "The python data analysis module used for each of the supported aggregation types is shown below:\n", + "\n", + "* Time series : pandas.DataFrame\n", + "* Geographic feature : fiona.Collection\n", + "* Geographic raster : rasterio.DatasetReader\n", + "* Multidimensional NetCDF : xarray.Dataset\n", + "\n", + "In the following code examples, we are assuming that we have a resource in HydroShare that contains the above four aggregation types. All these aggregations are at the root of the resource. The resource id used in the following code examples is \"a0e0c2e2e5e84e1e9b6b2b2b2b2b2b2b\". You will need to change this resource id to the id of your resource in HydroShare.\n" + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "# first we need to get the resource object from HydroShare using id of the resource\n", + "resource_id = 'a0e0c2e2e5e84e1e9b6b2b2b2b2b2b2b'\n", + "resource = hs.resource(resource_id)\n", + "# show resource identifier\n", + "print(f\"Resource ID:{resource.resource_id}\")" + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "markdown", + "source": [ + "### Loading Time Series Data to pandas.DataFrame\n", + "Here we are assuming the time series aggregation contains a sqlite file with name \"sample.sqlite\"" + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "# retrieve the time series aggregation\n", + "file_path = \"sample.sqlite\"\n", + "ts_aggr = resource.aggregation(file__path=file_path)" + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "# show the aggregation type\n", + "print(f\"Aggregation Type:{ts_aggr.metadata.type}\")" + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "# display the time series results metadata to see the all available series\n", + "# later we will use one of the series ids to retrieve the time series data\n", + "print(ts_aggr.metadata.time_series_results)" + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "# download the time series aggregation - these directory paths must exist for hsclient to download and unzip the aggregation zip file\n", + "# Note: These directory paths need to be changed based on where you want to download the aggregation\n", + "\n", + "download_to = r\"D:\\Temp\\TimeSeries_Testing\"\n", + "unzip_to = rf\"{download_to}\\aggr_unzipped\"\n", + "aggr_path = resource.aggregation_download(aggregation=ts_aggr, save_path=download_to, unzip_to=unzip_to)\n", + "print(f\"Downloaded aggregation to:{aggr_path}\")" + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "# load a given time series of the aggregation as pandas.DataFrame from the downloaded location (aggr_path)\n", + "# Note: Here we are assuming the series id used below is one of the ids we found when we printed the\n", + "# time series results in the earlier coding step\n", + "series_id = '51e31687-1ebc-11e6-aa6c-f45c8999816f'\n", + "pd_dataframe = ts_aggr.as_data_object(series_id=series_id, agg_path=aggr_path)\n", + "print(f\"Type of data processing object:{type(pd_dataframe)}\")" + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "# now we can use the pandas.DataFrame to do some data analysis\n", + "\n", + "# show time series column headings\n", + "print(pd_dataframe.columns)" + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "# show time series data summary\n", + "print(pd_dataframe.info)" + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "# show number of data points in time series\n", + "print(pd_dataframe.size)" + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "# show first 5 records in time series\n", + "print(pd_dataframe.head(5))" + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "# editing time series aggregation data using the pandas.DataFrame\n", + "print(f\"Data frame size before edit:{pd_dataframe.size}\")\n", + "rows, columns = pd_dataframe.shape\n", + "print(f\"Number of rows:{rows}\")\n", + "print(f\"Number of columns:{columns}\")" + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "# delete 10 rows from the dataframe. This will result in deleting 10 records from the 'TimeSeriesResultValues' table when we save the dataframe.\n", + "pd_dataframe.drop(pd_dataframe.index[0:10], axis=0, inplace=True)\n", + "rows, columns = pd_dataframe.shape\n", + "print(f\"Number of rows in dataframe after delete:{rows}\")\n", + "print(f\"Number of columns in dataframe after delete:{columns}\")\n", + "print(f\"Data frame size after delete:{pd_dataframe.size}\")\n", + "expected_row_count = rows" + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "# save the updated dataframe object to the time series aggregation in HydroShare\n", + "# Note this will update the data for the existing time series aggregation in HydroShare\n", + "ts_aggr = ts_aggr.save_data_object(resource=resource, agg_path=aggr_path, as_new_aggr=False)\n", + "print(f\"Updated time series aggregation ...\")" + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "# we can also create a new time series aggregation in HydroShare using the updated dataframe object\n", + "# we will first create a new folder in which the new aggregation will be created\n", + "aggr_folder = \"ts_folder\"\n", + "resource.folder_create(folder=aggr_folder)\n", + "ts_aggr = ts_aggr.save_data_object(resource=resource, agg_path=aggr_path, as_new_aggr=True,\n", + " destination_path=aggr_folder)\n", + "print(f\"Created a new time series aggregation ...\")" + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "# retrieve the updated time series aggregation to verify the data was updated\n", + "# reload the new timeseries as pandas.DataFrame\n", + "# need to first download this new aggregation\n", + "\n", + "aggr_path = resource.aggregation_download(aggregation=ts_aggr, save_path=download_to, unzip_to=unzip_to)\n", + "print(f\"Downloaded aggregation to:{aggr_path}\")\n", + "pd_dataframe = ts_aggr.as_data_object(series_id=series_id, agg_path=aggr_path)\n", + "rows, columns = pd_dataframe.shape\n", + "print(f\"Number of rows in the updated timeseries:{rows}\")\n", + "print(f\"Number of columns in the updated timeseries:{columns}\")\n", + "assert rows == expected_row_count" + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "markdown", + "source": [ + "### Loading Geographic Feature Data to fiona.Collection\n", + "Here we are assuming the geographic feature aggregation contains a shapefile with name \"sample.shp\"" + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "# retrieve the geographic feature aggregation\n", + "file_path = \"sample.shp\"\n", + "gf_aggr = resource.aggregation(file__path=file_path)" + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "# show the aggregation type\n", + "print(f\"Aggregation Type:{gf_aggr.metadata.type}\")" + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "# download the geographic feature aggregation - these directory paths must exist for hsclient to download and unzip the aggregation zip file\n", + "# Note: These directory paths need to be changed based on where you want to download the aggregation\n", + "download_to = r\"D:\\Temp\\GeoFeature_Testing\"\n", + "unzip_to = rf\"{download_to}\\aggr_unzipped\"\n", + "aggr_path = resource.aggregation_download(aggregation=gf_aggr, save_path=download_to, unzip_to=unzip_to)\n", + "print(f\"Downloaded aggregation to:{aggr_path}\")" + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "# load the downloaded geo-feature aggregation as a fiona Collection object\n", + "fiona_coll = gf_aggr.as_data_object(agg_path=aggr_path)\n", + "print(f\"Type of data processing object:{type(fiona_coll)}\")" + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "# now we can use the fiona.Collection object to do some data analysis\n", + "\n", + "# show driver used to open the vector file\n", + "print(fiona_coll.driver)" + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "# show feature collection coordinate reference system\n", + "print(fiona_coll.crs)" + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "# show feature collection spatial coverage\n", + "print(fiona_coll.bounds)" + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "# show number of features/bands\n", + "print(len(list(fiona_coll)))" + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "# show feature field information\n", + "print(fiona_coll.schema)" + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "# show data for a single feature in feature collection\n", + "from fiona.model import to_dict\n", + "\n", + "feature = fiona_coll[1]\n", + "to_dict(feature)" + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "# editing geographic feature aggregation data using the fiona.Collection object\n", + "import fiona\n", + "\n", + "# location of the new output shp file\n", + "# Note: The output shapefile directory path must exist.\n", + "output_shp_file_dir_path = os.path.join(download_to, \"updated_aggr\")\n", + "\n", + "# name the output shape file same as the original shape file\n", + "orig_shp_file_name = os.path.basename(gf_aggr.main_file_path)\n", + "output_shp_file_path = os.path.join(output_shp_file_dir_path, orig_shp_file_name)\n", + "\n", + "# here we will remove one of the bands (where the state name is Alaska) and then write the updated data to a new shp file\n", + "# Note: You have to use a different criteria for selecting bands depending on your feature dataset\n", + "with fiona.open(output_shp_file_path, 'w', schema=fiona_coll.schema, driver=fiona_coll.driver,\n", + " crs=fiona_coll.crs) as out_shp_file:\n", + " for feature in fiona_coll:\n", + " ft_dict = to_dict(feature)\n", + " if ft_dict['properties']['STATE_NAME'] != \"Alaska\":\n", + " out_shp_file.write(feature)\n", + " else:\n", + " print(\">> Skipping feature for Alaska\")\n", + "\n", + "print(\"Done updating the shp file ...\")" + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "# we can now update the geographic feature aggregation in HydroShare using the updated shp file\n", + "gf_aggr = gf_aggr.save_data_object(resource=resource, agg_path=output_shp_file_dir_path, as_new_aggr=False)\n", + "print(\"Aggregation updated ...\")" + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "# we can also create a new geographic feature aggregation in HydroShare using the updated shp file\n", + "\n", + "# we will first create a new folder in which the new aggregation will be created in HydroShare\n", + "aggr_folder = \"gf_folder\"\n", + "resource.folder_create(folder=aggr_folder)\n", + "gf_aggr = gf_aggr.save_data_object(resource=resource, agg_path=output_shp_file_dir_path, as_new_aggr=True,\n", + " destination_path=aggr_folder)\n", + "print(\"New aggregation created ...\")" + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "# retrieve the updated geographic feature aggregation to verify the data was updated\n", + "# need to first download this updated/new aggregation\n", + "aggr_path = resource.aggregation_download(aggregation=gf_aggr, save_path=download_to, unzip_to=unzip_to)\n", + "fiona_coll = gf_aggr.as_data_object(agg_path=aggr_path)\n", + "# check the number of bands in the updated aggregation\n", + "print(len(list(fiona_coll)))" + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "markdown", + "source": [ + "### Loading Multidimensional Data to xarray.Dataset\n", + "Here we are assuming the multidimensional aggregation contains a netcdf file with name \"sample.nc\"\n" + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "# retrieve the multidimensional aggregation\n", + "file_path = \"sample.nc\"\n", + "md_aggr = resource.aggregation(file__path=file_path)\n", + "print(f\"Aggregation Type:{md_aggr.metadata.type}\")" + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "# download the multidimensional aggregation - these directory paths must exist for hsclient to download and unzip the aggregation zip file\n", + "# Note: These directory paths need to be changed based on where you want to download the aggregation\n", + "download_to = r\"D:\\Temp\\MultiDim_Testing\"\n", + "unzip_to = rf\"{download_to}\\aggr_unzipped\"\n", + "aggr_path = resource.aggregation_download(aggregation=md_aggr, save_path=download_to, unzip_to=unzip_to)\n", + "print(f\"Downloaded aggregation to:{aggr_path}\")" + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "# load the downloaded multidimensional aggregation as a xarray.Dataset object\n", + "xarray_ds = md_aggr.as_data_object(agg_path=aggr_path)\n", + "print(f\"Type of data processing object:{type(xarray_ds)}\")" + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "# now we can use the xarray.Dataset object to do some data analysis\n", + "\n", + "# show netcdf global attributes\n", + "print(xarray_ds.attrs)" + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "# show netcdf dimensions\n", + "print(xarray_ds.dims)" + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "# show coordinate variables of the netcdf dataset\n", + "print(xarray_ds.coords)" + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "# editing multidimensional aggregation data using the xarray.Dataset object\n", + "\n", + "# here we will only change the title attribute of the dataset\n", + "aggr_title = \"This is a modified title for this aggregation modified using hsclient\"\n", + "xarray_ds.attrs[\"title\"] = aggr_title" + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "# we can update the multidimensional aggregation in HydroShare using the updated xarray.Dataset object\n", + "md_aggr = md_aggr.save_data_object(resource=resource, agg_path=aggr_path, as_new_aggr=False)\n", + "print(\"Aggregation updated ...\")" + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "# we can also create a new multidimensional aggregation in HydroShare using the updated xarray.Dataset object\n", + "\n", + "# we will first create a new folder in which the new aggregation will be created\n", + "aggr_folder = \"md_folder\"\n", + "resource.folder_create(folder=aggr_folder)\n", + "md_aggr = md_aggr.save_data_object(resource=resource, agg_path=aggr_path, as_new_aggr=True,\n", + " destination_path=aggr_folder)\n", + "print(\"New aggregation created ...\")" + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "# retrieve the updated multidimensional aggregation to verify the data was updated\n", + "\n", + "# need to first download this updated/new aggregation\n", + "aggr_path = resource.aggregation_download(aggregation=md_aggr, save_path=download_to, unzip_to=unzip_to)\n", + "xarray_ds = md_aggr.as_data_object(agg_path=aggr_path)\n", + "# check the title attribute of the updated aggregation\n", + "assert xarray_ds.attrs[\"title\"] == aggr_title" + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "markdown", + "source": [ + "### Loading Geo Raster Data to rasterio.DatasetReader\n", + "Here we are assuming the georaster aggregation contains a geotiff file with name \"sample.tif\"" + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "# retrieve the georaster aggregation\n", + "file_path = \"sample.tif\"\n", + "gr_aggr = resource.aggregation(file__path=file_path)\n", + "print(f\"Aggregation Type:{gr_aggr.metadata.type}\")" + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "# download the georaster aggregation - these directory paths must exist for hsclient to download and unzip the aggregation zip file\n", + "# Note: These directory paths need to be changed based on where you want to download the aggregation\n", + "download_to = r\"D:\\Temp\\GeoRaster_Testing\"\n", + "unzip_to = rf\"{download_to}\\aggr_unzipped\"\n", + "aggr_path = resource.aggregation_download(aggregation=gr_aggr, save_path=download_to, unzip_to=unzip_to)\n", + "print(f\"Downloaded aggregation to:{aggr_path}\")" + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "# load the downloaded georaster aggregation as a rasterio.DatasetReader object\n", + "rasterio_ds = gr_aggr.as_data_object(agg_path=aggr_path)\n", + "print(f\"Type of data processing object:{type(rasterio_ds)}\")" + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "# now we can use the rasterio.DatasetReader object to do some data analysis\n", + "\n", + "# show raster band count\n", + "print(rasterio_ds.count)" + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "# show raster band dimensions\n", + "print(rasterio_ds.width, rasterio_ds.height)" + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "# show raster coordinate reference system\n", + "print(rasterio_ds.crs)" + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "# show raster bounds\n", + "print(rasterio_ds.bounds)" + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "# show raster data\n", + "data = rasterio_ds.read()\n", + "print(data)" + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "# editing georaster aggregation data using the rasterio.DatasetReader object\n", + "from rasterio.windows import Window\n", + "import rasterio\n", + "\n", + "# here we will subset the raster data to a smaller extent\n", + "print(\"raster dimensions before editing:\")\n", + "print(f\"raster width :{rasterio_ds.width}\")\n", + "print(f\"raster height:{rasterio_ds.height}\")\n", + "new_width = rasterio_ds.width - 9\n", + "new_height = rasterio_ds.height - 10\n", + "subset_window = Window(0, 0, new_width, new_height)\n", + "subset_band = rasterio_ds.read(1, window=subset_window)\n", + "print(subset_band)" + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "# write the subset data to a new tif file\n", + "\n", + "output_raster_dir_path = r\"D:\\Temp\\GeoRaster_Testing\\updated_aggr\"\n", + "output_raster_filename = \"out_sample.tif\"\n", + "output_raster_file_path = os.path.join(output_raster_dir_path, output_raster_filename)\n", + "profile = rasterio_ds.profile\n", + "rasterio_ds.close()\n", + "profile['driver'] = \"GTiff\"\n", + "profile['width'] = new_width\n", + "profile['height'] = new_height\n", + "\n", + "with rasterio.open(output_raster_file_path, \"w\", **profile) as dst:\n", + " dst.write(subset_band, 1)\n", + "\n", + "print(f\"Saved subset raster to:{output_raster_file_path}\")" + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "# we can update the georaster aggregation in HydroShare using the updated rasterio.DatasetReader object\n", + "gr_aggr = gr_aggr.save_data_object(resource=resource, agg_path=output_raster_dir_path, as_new_aggr=False)\n", + "print(\"Aggregation updated ...\")" + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "# we can also create a new georaster aggregation in HydroShare using the updated rasterio.DatasetReader object\n", + "\n", + "# we will first create a new folder in which the new aggregation will be created\n", + "aggr_folder = \"gr_folder\"\n", + "resource.folder_create(folder=aggr_folder)\n", + "gr_aggr = gr_aggr.save_data_object(resource=resource, agg_path=output_raster_dir_path, as_new_aggr=True,\n", + " destination_path=aggr_folder)\n", + "print(\"New aggregation created ...\")" + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "# retrieve the updated georaster aggregation to verify the data was updated\n", + "\n", + "# need to first download this updated/new aggregation\n", + "aggr_path = resource.aggregation_download(aggregation=gr_aggr, save_path=download_to, unzip_to=unzip_to)\n", + "rasterio_ds = gr_aggr.as_data_object(agg_path=aggr_path)\n", + "# check the raster dimensions of the updated aggregation\n", + "print(\"raster dimensions after editing:\")\n", + "print(f\"raster width :{rasterio_ds.width}\")\n", + "print(f\"raster height:{rasterio_ds.height}\")" + ], + "metadata": { + "collapsed": false + } + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 2 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython2", + "version": "2.7.6" + } + }, + "nbformat": 4, + "nbformat_minor": 0 +} diff --git a/docs/examples/Aggregation_Operations.ipynb b/docs/examples/Aggregation_Operations.ipynb index a08a08e..d4629f9 100644 --- a/docs/examples/Aggregation_Operations.ipynb +++ b/docs/examples/Aggregation_Operations.ipynb @@ -72,7 +72,7 @@ "\n", "A \"resource\" is a container for your content in HydroShare. Think of it as a \"working directory\" into which you are going to organize the code and/or data you are using and want to share. The following code can be used to create a new, empty resource within which you can create content and metadata.\n", "\n", - "This code creates a new resource in HydroShare. It also creates an in-memory object representation of that resource in your local environmment that you can then manipulate with further code." + "This code creates a new resource in HydroShare. It also creates an in-memory object representation of that resource in your local environment that you can then manipulate with further code." ] }, { diff --git a/mkdocs.yml b/mkdocs.yml index be7df82..6d9cacd 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -25,6 +25,7 @@ nav: Aggregation Operations: examples/Aggregation_Operations.ipynb File Operations: examples/File_Operations.ipynb Metadata Operations: examples/Metadata_Operations.ipynb + Aggregation Data Object Operations: examples/Aggregation_Data_Object_Operations.ipynb - Developer Documentation: - Models: Single File: metadata/SingleFile.md