CSHS-CWRA · richardarsenault · Dec 12, 2023 · Oct 24, 2023 · Oct 24, 2023 · Oct 24, 2023
diff --git a/docs/notebooks/Sensitivity_analysis.ipynb b/docs/notebooks/Sensitivity_analysis.ipynb
@@ -0,0 +1,383 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# Performing a sensitivity analysis\n",
+ "\n",
+ "In this notebook, we perform a sensitivity analysis on GR4JCN to determine the importance of each parameter using the Sobol' sensitivity analysis method. The example shown herein is done using very few parameter samples, and as such, results will be poor and should not be interpreted as-is. However, it is possible to use this code locally using RavenPy to run a much larger sampling on a local computer. "
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Prepare data for GR4JCN\n",
+ "\n",
+ "We will use GR4JCN for this analysis. Since the sensitivity analysis acts on a model response to different inputs, we must find a metric that can be used to measure the impacts of parameters on the model response. In this case, we will use the Nash-Sutcliffe and Absolute Error metrics as responses. It could be any scalar value: mean flow, peak flow, lowest flow, flow volume, etc. But for this exercice we suppose that we want to know the impact of a parameter set on an objective function value. We therefore use a dataset that contains observed streamflow to compute the evaluation metrics.\n",
+ "\n",
+ "Let's now import the required packages, get the correct data and setup the model HRU for physiographic information."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [],
+ "source": [
+ "# Import required packages:\n",
+ "import datetime as dt\n",
+ "import tempfile\n",
+ "from pathlib import Path\n",
+ "\n",
+ "import numpy as np\n",
+ "from SALib.analyze import sobol as sobol_analyzer\n",
+ "from SALib.sample import sobol as sobol_sampler\n",
+ "from tqdm.notebook import tqdm\n",
+ "\n",
+ "from ravenpy import OutputReader\n",
+ "from ravenpy.config import commands as rc\n",
+ "from ravenpy.config.emulators import GR4JCN\n",
+ "from ravenpy.ravenpy import run\n",
+ "from ravenpy.utilities.testdata import get_file\n",
+ "\n",
+ "# We get the netCDF from a server. You can replace the `get_file` function by a string containing the path to your own netCDF.\n",
+ "nc_file = get_file(\n",
+ " \"raven-gr4j-cemaneige/Salmon-River-Near-Prince-George_meteo_daily.nc\"\n",
+ ")\n",
+ "\n",
+ "# Here, we need to give the name of your different dataset in order to match with Raven models.\n",
+ "alt_names = {\n",
+ " \"RAINFALL\": \"rain\",\n",
+ " \"SNOWFALL\": \"snow\",\n",
+ " \"TEMP_MIN\": \"tmin\",\n",
+ " \"TEMP_MAX\": \"tmax\",\n",
+ " \"PET\": \"pet\",\n",
+ " \"HYDROGRAPH\": \"qobs\",\n",
+ "}\n",
+ "\n",
+ "# The HRU of your watershed\n",
+ "hru = dict(area=4250.6, elevation=843.0, latitude=54.4848, longitude=-123.3659)\n",
+ "\n",
+ "# The evaluation metrics. Multiple options are possible, as can be found in Tutorial Notebook 06. Here we use Nash-Sutcliffe and Absolute Error.\n",
+ "eval_metrics = (\"NASH_SUTCLIFFE\", \"ABSERR\")\n",
+ "\n",
+ "# Data keywords for meteorological data stations\n",
+ "data_kwds = {\n",
+ " \"ALL\": {\n",
+ " \"elevation\": hru[\n",
+ " \"elevation\"\n",
+ " ], # extract the values directly from the \"hru\" we previously built\n",
+ " \"latitude\": hru[\"latitude\"],\n",
+ " \"longitude\": hru[\"longitude\"],\n",
+ " }\n",
+ "}"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Sensitivity analysis step 1: Define the Sobol problem to analyze\n",
+ "\n",
+ "Sobol sensitivity analysis requires three distinct steps:\n",
+ "\n",
+ "1. Sample parameter sets from the possible parameter space;\n",
+ "2. Run the model and gather the model response for each of these parameter spaces;\n",
+ "3. Analyze the change in model responses as a function of changes in parameter sets.\n",
+ "\n",
+ "Therefore, the first step is to sample the parameter space."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "The number of parameter sets to evaluate is 56\n"
+ ]
+ }
+ ],
+ "source": [
+ "# Define the model inputs:\n",
+ "problem = {\n",
+ " \"num_vars\": 6, # Number of variables\n",
+ " \"names\": [\n",
+ " \"x1\",\n",
+ " \"x2\",\n",
+ " \"x3\",\n",
+ " \"x4\",\n",
+ " \"CN1\",\n",
+ " \"CN2\",\n",
+ " ], # Names of these variables, to make it easier to follow. Can be any string defined by the user\n",
+ " \"bounds\": [\n",
+ " [\n",
+ " 0.01,\n",
+ " 2.5,\n",
+ " ], # We must provide lower and upper bounds for each parameter to sample. Must be adjusted for each model.\n",
+ " [-15.0, 10.0],\n",
+ " [10.0, 700.0],\n",
+ " [0.0, 7.0],\n",
+ " [1.0, 30.0],\n",
+ " [0.0, 1.0],\n",
+ " ],\n",
+ "}\n",
+ "\n",
+ "# Generate samples. The number of parameter sets to generate will be N * (2D + 2), where N is defined below and D is the number of\n",
+ "# model inputs (6 for GR4JCN).\n",
+ "N = 4\n",
+ "param_values = sobol_sampler.sample(problem, N)\n",
+ "\n",
+ "# Display the size of the param_values matrix. We will run the model with each set of parameters, i.e. one per row.\n",
+ "print(\"The number of parameter sets to evaluate is \" + str(param_values.shape[0]))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Sensitivity analysis step 2: Run the model for each parameter set\n",
+ "\n",
+ "In this stage, we have our sampled parameter sets according to the Sobol / Saltelli sampling methods. We now need to run the GR4JCN model for each of these parameter sets and compute the objective function (model response) that we want. Here we ask the model to pre-compute two objective functions (NSE and MAE), so we will be able to perform the sensitivity analysis on both metrics while only running the model once for each parameter set.\n",
+ "\n",
+ "We use a simple loop to run the model here, but advanced users could parallelize this as it is an \"embarassingly parallelizable\" problem. "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 3,
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "/opt/conda/envs/birdy/lib/python3.9/site-packages/xclim/indices/fire/_cffwis.py:207: NumbaDeprecationWarning: \u001b[1mThe 'nopython' keyword argument was not supplied to the 'numba.jit' decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit for details.\u001b[0m\n",
+ " def _day_length(lat: int | float, mth: int): # pragma: no cover\n",
+ "/opt/conda/envs/birdy/lib/python3.9/site-packages/xclim/indices/fire/_cffwis.py:227: NumbaDeprecationWarning: \u001b[1mThe 'nopython' keyword argument was not supplied to the 'numba.jit' decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit for details.\u001b[0m\n",
+ " def _day_length_factor(lat: float, mth: int): # pragma: no cover\n"
+ ]
+ },
+ {
+ "data": {
+ "application/vnd.jupyter.widget-view+json": {
+ "model_id": "7b7834b5265540868eba214b2d50ffa9",
+ "version_major": 2,
+ "version_minor": 0
+ },
+ "text/plain": [
+ " 0%| | 0/56 [00:00<?, ?it/s]"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "# Set the working directory at one place so all files are overwritten at the same place (Avoids creating hundreds or thousands\n",
+ "# of folders with each run's data)\n",
+ "workdir = Path(tempfile.mkdtemp())\n",
+ "\n",
+ "# Pre-define the results matrix based on the number of parameters we will test (and thus how many runs we will need to do). We will test SA with\n",
+ "# two objective functions (NSE and AbsErr). Let's pre-define both vectors now.\n",
+ "Y_NSE = np.zeros([param_values.shape[0]])\n",
+ "Y_ABS = np.zeros([param_values.shape[0]])\n",
+ "\n",
+ "# Define a run name for files\n",
+ "run_name = \"SA_Sobol\"\n",
+ "\n",
+ "config = dict(\n",
+ " ObservationData=[rc.ObservationData.from_nc(nc_file, alt_names=\"qobs\")],\n",
+ " Gauge=[rc.Gauge.from_nc(nc_file, alt_names=alt_names, data_kwds=data_kwds)],\n",
+ " HRUs=[hru],\n",
+ " StartDate=dt.datetime(1990, 1, 1),\n",
+ " EndDate=dt.datetime(1999, 12, 31),\n",
+ " RunName=run_name,\n",
+ " EvaluationMetrics=eval_metrics, # We add this code to tell Raven which objective function we want to pass.\n",
+ " SuppressOutput=True, # This suppresses the writing of files to disk, returning only basic information such as the evaluation metrics values.\n",
+ ")\n",
+ "\n",
+ "# Now we have a loop that runs the model iteratively, once per parameter set:\n",
+ "for i, X in enumerate(tqdm(param_values)):\n",
+ " # We need to create the desired model with its parameters the same way as in the Notebook 04_Emulating_hydrological_models.\n",
+ " m = GR4JCN(\n",
+ " params=X.tolist(), # Here is where we pass the paramter sets to the model, from the loop enumerator X.\n",
+ " **config,\n",
+ " )\n",
+ "\n",
+ " # Write the files to disk, and overwrite existing files in the folder (we already got the values we needed from previous runs)\n",
+ " m.write_rv(workdir=workdir, overwrite=True)\n",
+ "\n",
+ " # Run the model and get the path to the outputs folder that can be used in the output reader.\n",
+ " outputs_path = run(modelname=run_name, configdir=workdir)\n",
+ "\n",
+ " # Get the outputs using the Output Reader object.\n",
+ " outputs = OutputReader(run_name=run_name, path=outputs_path)\n",
+ "\n",
+ " # Gather the results for both of the desired objective functions. We will see how the choice of objective function impacts sensitivity.\n",
+ " Y_NSE[i] = outputs.diagnostics[\"DIAG_NASH_SUTCLIFFE\"][0]\n",
+ " Y_ABS[i] = outputs.diagnostics[\"DIAG_ABSERR\"][0]"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Sensitivity analysis step 3: Analyze results and obtain parameter sensitivity indices\n",
+ "\n",
+ "At this point, we have a model response for each of the parameter sets. We can analyze the results using the code below. We will display only the total and 1st order sensitivities."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 4,
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ " ST ST_conf\n",
+ "x1 3.732186e-03 0.450553\n",
+ "x2 1.419784e-02 4.051277\n",
+ "x3 3.449052e+03 3587.049191\n",
+ "x4 3.917818e-05 0.007440\n",
+ "CN1 5.406374e-07 0.000126\n",
+ "CN2 1.732335e-02 0.229530\n",
+ " S1 S1_conf\n",
+ "x1 -0.076254 42.680675\n",
+ "x2 -0.652888 164.484810\n",
+ "x3 -129.678487 145.365224\n",
+ "x4 0.009685 5.521421\n",
+ "CN1 0.003480 1.339628\n",
+ "CN2 -0.412624 27.920944\n",
+ " S2 S2_conf\n",
+ "(x1, x2) 3.469755 73.222639\n",
+ "(x1, x3) -86.468689 135.059651\n",
+ "(x1, x4) 3.492759 73.545136\n",
+ "(x1, CN1) 3.497973 73.545500\n",
+ "(x1, CN2) 3.302566 73.325432\n",
+ "(x2, x3) 6537.048181 6678.607761\n",
+ "(x2, x4) -250.967244 394.428925\n",
+ "(x2, CN1) -251.357166 394.889992\n",
+ "(x2, CN2) -236.799947 380.062434\n",
+ "(x3, x4) 133.288708 279.626485\n",
+ "(x3, CN1) 133.296202 279.330894\n",
+ "(x3, CN2) 133.057135 279.197331\n",
+ "(x4, CN1) 1.857413 9.201576\n",
+ "(x4, CN2) 1.749050 9.117204\n",
+ "(CN1, CN2) -0.039138 1.573522\n",
+ "[-7.62544112e-02 -6.52887811e-01 -1.29678487e+02 9.68483864e-03\n",
+ " 3.47999025e-03 -4.12623836e-01]\n"
+ ]
+ }
+ ],
+ "source": [
+ "# Perform analysis for the NSE objective function first\n",
+ "Si = sobol_analyzer.analyze(problem, Y_NSE, print_to_console=True)\n",
+ "\n",
+ "# Print the first-order sensitivity indices\n",
+ "print(Si[\"S1\"])"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 5,
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ " ST ST_conf\n",
+ "x1 0.222646 0.417622\n",
+ "x2 3.081639 19.932806\n",
+ "x3 2231.847449 5874.558577\n",
+ "x4 0.001056 0.004228\n",
+ "CN1 0.000249 0.001094\n",
+ "CN2 0.432236 1.050870\n",
+ " S1 S1_conf\n",
+ "x1 0.025494 7.512936\n",
+ "x2 -3.027893 35.841620\n",
+ "x3 -104.841372 277.249162\n",
+ "x4 0.011650 0.534197\n",
+ "CN1 0.055864 0.371602\n",
+ "CN2 -1.856200 4.035749\n",
+ " S2 S2_conf\n",
+ "(x1, x2) 1.030658 10.054179\n",
+ "(x1, x3) -22.366541 62.476140\n",
+ "(x1, x4) 0.554580 10.996033\n",
+ "(x1, CN1) 0.561424 10.993885\n",
+ "(x1, CN2) 0.219127 11.001991\n",
+ "(x2, x3) 4251.579060 11177.265987\n",
+ "(x2, x4) -121.904276 337.455743\n",
+ "(x2, CN1) -123.326827 341.100735\n",
+ "(x2, CN2) -64.599526 189.050386\n",
+ "(x3, x4) 101.427855 283.426013\n",
+ "(x3, CN1) 101.466243 283.409563\n",
+ "(x3, CN2) 101.081141 282.424822\n",
+ "(x4, CN1) 0.104207 1.101813\n",
+ "(x4, CN2) 0.025941 1.002548\n",
+ "(CN1, CN2) -0.120574 0.664040\n",
+ "[ 2.54942682e-02 -3.02789279e+00 -1.04841372e+02 1.16498952e-02\n",
+ " 5.58640803e-02 -1.85619957e+00]\n"
+ ]
+ }
+ ],
+ "source": [
+ "# Now perform the sensitivity analysis for the Absolute Error objective function\n",
+ "Si = sobol_analyzer.analyze(problem, Y_ABS, print_to_console=True)\n",
+ "\n",
+ "# Print the first-order sensitivity indices\n",
+ "print(Si[\"S1\"])"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Result analysis\n",
+ "\n",
+ "We can see that parameters x2 and x3 are more sensitive than the other with total (ST) and 1st order (S1) sensitivities higher than the other parameters. This is true for both objective functions, but could also be different for other metrics, so it is important to keep this in mind when using a sensitivity analysis to determine parameter importance! A common example is a parameter related to snowmelt. This parameter will have no impact if there is no snow during the period used in the model, but would become critical if there were to be snow in following years.\n",
+ "\n",
+ "Note that the tables above present the sobol sensitivity in the left column and the confidence interval (95%) in the right column. Values are strange because we are using way too few parameter sets to adequately sample the parameter space here, but increasing the value of \"N\" to 1024 or 2048 would allow for a much better estimation for a 6-parameter model."
+ ]
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "Python 3 (ipykernel)",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.9.16"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
diff --git a/docs/notebooks/index.rst b/docs/notebooks/index.rst
@@ -41,6 +41,7 @@ Advanced workflows
  Managing_Jupyter_Environments
  Perform_Regionalization
  Running_HMETS_with_CANOPEX_dataset
+ Sensitivity_analysis
  time_series_analysis
 
 

diff --git a/environment-rtd.yml b/environment-rtd.yml
@@ -20,6 +20,7 @@ dependencies:
  - notebook
  - pandoc
  - pydantic >=1.10.8,<2.0
+ - salib
  - seaborn
  - sphinx
  - sphinx-autoapi