-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathREADME.Rmd
311 lines (241 loc) · 15.7 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
---
output: github_document
bibliography: [paper/paper.bib]
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%",
dev = "jpeg",
warning = FALSE,
message = FALSE
)
library(dplyr)
library(ggplot2)
library(patchwork)
```
<!-- badges: start -->
[](https://doi.org/10.21105/joss.06145)
[](https://github.com/mikejohnson51/AHGestimation/actions/workflows/R-CMD-check.yaml)
[](https://choosealicense.com/licenses/mit/)
[](https://www.repostatus.org/#active)
[](https://codecov.io/github/mikejohnson51/AHGestimation)
[](#)
[](https://github.com/mikejohnson51/AHGestimation/actions/workflows/pkgdown.yaml)
<!-- badges: end -->
# AHGestimation <a href="https://github.com/mikejohnson51/AHGestimation"><img src="man/figures/logo.png" align="right" width="20%"/></a>
## Installation
```{r, eval = FALSE}
# install.packages("remotes")
remotes::install_github("mikejohnson51/AHGestimation")
```
# Introduction
The behavior of a river system at a location can be represented as the relationship between discharge (Q), mean depth (Y), mean velocity (V), and mean top width (TW). If you have these measurements over a long period of time, a relationship between how much water (Q) is in the channel and the corresponding Y, V, and TW can be established. The idea of 'at-a-station hydraulic geometry' (AHG) suggests three power laws can adequately describe these relations [@leopold1953hydraulic]:
$$
TW = a \cdot Q^b
$$
$$
Y = c \cdot Q^f
$$
$$
V = k \cdot Q^m
$$
Each **single relationships** describes a component of the channel behavior (see section [Fitting single relationships]). For example, the Q-Y relation is similar to the **traditional rating curve** that relates stage to discharge where stage is given as a height of water (H) above a datum (Ho).
$$
Q = c \cdot (H - H_o)^f
$$
Traditionally, each relationship has been fit independent of one another using ordinary least square regression (OLS) on the log transformed time series of the respective components.
Some research has moved beyond pure OLS fitting with a (typically) singular focus on the rating curve relation. For example the Texas Water Resources Institute suggests a non linear least squares regression (NLS) ([here](https://txwri.github.io/r-manual/stage-discharge.html)), and Bayesian hierarchical modeling have been shown in other cases to be beneficial ([@hrafnkelsson2022generalization], R implementation [here](https://CRAN.R-project.org/package=bdrc))
When trying to fit a **hydraulic system** - or - all three equations - there are additional considerations . Notably continuity dictates that no water is gained or lost such that:
$$
Q = TW·Y·V
$$
and:
$$
(b + f + m) = (a·c·k) = 1
$$
**Critically, neither OLS or NLS solvers can ensure this characteristic of the solution.** (see [Fitting hydraulic systems] and this [article](https://mikejohnson51.github.io/AHGestimation/articles/traditonal-ahg.html)). Instead, this can be achieved by either
(1) Preprocessing data based on thresholds (see @enzminger_thomas_l_2023_7868764 and @afshari2017statistical) (see AHGestimation functions in [here](https://mikejohnson51.github.io/AHGestimation/articles/data-filtering.html)), or
(2) as we suggest in this package, using a evolutionary solver - along with OLS and NLS fits - to optimally solve relationships in accordance with each other (see detailed description [here](https://mikejohnson51.github.io/AHGestimation/articles/improved-ahg.html)).
The aim of this package is to provide increased AHG fitting flexibility, that ensures mass conservation in hydraulic systems, and optimal curve fitting for single relations.
# AHGestimation
Using USGS field measurements at the gage (National Water Information System or NWIS site) located on [Nashua River at East Pepperell, MA](https://waterdata.usgs.gov/nwis/measurements/?site_no=01096500&agency_cd=USGS) we can illustrate 4 capabilities this package offers:
1. Fitting single relations with OLS and NLS ([Fitting single relationships])
2. Estimating hydraulic system with a mass conserving ensemble fitting method ([Fitting hydraulic systems])
3. Data preprocessing (removing outliers based on defined criteria) ([Data Filtering])
4. Deriving cross section shapes and traits from the AHG fits ([Hydraulic Shape Estimation]).
The example data is included in the package, and its development can be seen [here](https://github.com/mikejohnson51/AHGestimation/blob/master/data-raw/nwis.R).
```{r}
library(AHGestimation)
glimpse(nwis)
```
```{r, echo = F}
ggplot(data = nwis ) +
geom_point(aes(x = Q_cms, y = Y_m)) +
theme_light() +
labs(title = 'Q-Y Relationship', x = "Q (cms)", y = "Y (m)") +
ggplot(data = nwis ) +
geom_point(aes(x = Q_cms, y = TW_m)) +
theme_light() +
labs(title = 'Q-TW Relationship', x = "Q (cms)", y = "TW (m)") +
ggplot(data = nwis ) +
geom_point(aes(x = Q_cms, y = V_ms)) +
theme_light() +
labs(title = 'Q-V Relationship', x = "Q (cms)", y = "V (m/s)")
```
## Fitting single relationships
The `AHGestimation::ahg_estimate(...)` function can be used to fit single relationships using both NLS and OLS solutions. Below we fit the Q-Y relation by passing a `data.frame` with the column names `Q` and `Y`. More columns could be included in the input, but only those with `Q`, `Y`, `V`, and/or `TW` are used.
```{r}
(sf = nwis %>%
#input column names must be Q,Y,V and/or TW
select(date, Q = Q_cms, Y = Y_m) %>%
ahg_estimate())
```
The returned object rank sorts the solutions based on the nrmse of the simulated Y values compared to the input data. For convenience, the pBias is also reported. Below, we can see the variation in the NLS (blue) and OLS (red) solutions.
```{r, echo = FALSE}
ggplot() +
geom_point(data = nwis, aes(x = Q_cms, y = Y_m), col = "black") +
geom_line(data = nwis, aes(x = Q_cms, y = sf$coef[1] * Q_cms ^ (sf$exp[1]), col = "NLS")) +
geom_line(data = nwis, aes(x = Q_cms, y = sf$coef[2] * Q_cms ^ (sf$exp[2]), col = "OLS")) +
theme_light() +
theme(legend.position = "bottom") +
labs(title = 'Q-Y Relationship', color = "Fit", x = "Q (cms)", y = "Y (m)") +
scale_color_manual(values = c( "NLS" = "blue", "OLS" = "red"))
```
When 2 relationships (e.g. Q-Y and Q-V) are passed to `ahg_estimate(...)` the default behavior is to return a single solution based on the minimum nrmse. Since only 2 of the 3 hydraulic traits are passed, mass conservation cannot be checked here.
```{r}
(nwis %>%
#input column names must be Q,Y,V and/or TW
select(Q = Q_cms, Y = Y_m, V = V_ms) %>%
ahg_estimate())
```
If you would like all fits (OLS and NLS), the `full_fitting` parameter can be set to `TRUE`.
```{r}
(nwis %>%
#input column names must be Q,Y,V and/or TW
select(Q = Q_cms, Y = Y_m, V = V_ms) %>%
ahg_estimate(full_fitting = TRUE))
```
## Fitting hydraulic systems
When we have data for all three hydraulic relationships we can ensure the solutions found meet continuity/conserve mass.
In this mode the OLS and NLS models are fit first, and if continuity is not met in best solution (e.g. lowest nmse), then an Evolutionary Approach (nsga2; @mco) is implemented (see this [article](https://mikejohnson51.github.io/AHGestimation/articles/optimize-nsga2.html) for more details).
Doing so produces three unique fits for each relationship (27 total combinations). These are crossed to identify the best performing solution that meets continuity at a prescribed allowance. The allowance specifies the amount that each continuity expression can deviate from 1. More on this can be found at the vignette [here](https://mikejohnson51.github.io/AHGestimation/articles/improved-ahg.html)
As before, the results are rank ordered by minimum nrmse and viability (viable = does the solution meet continuity within the prescribed allowance?)
```{r}
(x = nwis %>%
select(Q = Q_cms, Y = Y_m, V = V_ms, TW = TW_m) %>%
ahg_estimate(allowance = .05))
```
Overall an combination of the NLS and nsga2 provides an error minimizing, viable solution:
```{r, echo = F}
ggplot() +
geom_point(data = nwis, aes(x = Q_cms, y = Y_m), col = "black") +
geom_line(data = nwis, aes(x = Q_cms, y = x$Y_coef[1] * Q_cms ^ (x$Y_exp[1]), col = "bestValid")) +
theme_light() +
labs(title = 'Q-Y Relationship', color = "Fit", x = "Q (cms)", y = "Y (m)") +
ggplot(data = nwis ) +
geom_point(data = nwis, aes(x = Q_cms, y = TW_m), col = "black") +
geom_line(data = nwis, aes(x = Q_cms, y = x$TW_coef[1] * Q_cms ^ (x$TW_exp[1]), col = "bestValid")) +
theme_light() +
labs(title = 'Q-TW Relationship', color = "Fit", x = "Q (cms)", y = "TW (m)") +
ggplot(data = nwis ) +
geom_point(data = nwis, aes(x = Q_cms, y = V_ms), col = "black") +
geom_line(data = nwis, aes(x = Q_cms, y = x$V_coef[1] * Q_cms ^ (x$V_exp[1]), col = "bestValid")) +
theme_light() +
labs(title = 'Q-V Relationship', color = "Fit", x = "Q (cms)", y = "V (m/s)") +
plot_layout(guides = "collect") &
theme(legend.position = 'bottom')
```
## Data Filtering
Due to the volatility of river systems, hydraulic data is often very noisy. While the `ahg_estimate` tool is intended to reduce this noise and produce a mass-conserving hydraulic fit, it is also possible to filter the data prior to fitting. The range of data filtering options provided are documented in the [data-filtering vignette](https://mikejohnson51.github.io/AHGestimation/articles/data-filtering.html) and an example is provided below:
```{r}
filtered_data = nwis %>%
select(date, Q = Q_cms, Y = Y_m, V = V_ms, TW = TW_m) %>%
# Keep the most recent 10 year
date_filter(year = 10, keep_max = TRUE) %>%
# Keep data within 3 Median absolute deviations (log residuals)
mad_filter() %>%
# Keep data that respects the Q = vA criteria w/in allowance
qva_filter()
(ahg_fit = ahg_estimate(filtered_data))
```
Ultimately we recommend selecting fits that conserve mass (`viable = TRUE`) and has the lowest error (any of tot_nrmse, V_nrmse, TW_nrmse, or Y_nrmse) depending on the use case.
When the data is effectively filtered we see NLS can provide an error minimizing, valid solution for the system that is quite different then the full data fit. Further, the nsga2 algorithm did not need to be invoked:
```{r, echo = F}
ggplot() +
geom_point(data = nwis, aes(x = Q_cms, y = Y_m, col = "Full")) +
geom_line(data = nwis, aes(x = Q_cms, y = x$Y_coef[1] * Q_cms ^ (x$Y_exp[1]), col = "Full")) +
geom_point(data = filtered_data, aes(x = Q, y = Y, col = "Filter")) +
geom_line(data = filtered_data, aes(x = Q, y = ahg_fit$Y_coef[1] * Q ^ (ahg_fit$Y_exp[1]), col = "Filter")) +
theme_light() +
labs(title = 'Q-Y Relationship') +
scale_color_manual(values = c("Filter" = "red", "Full" = "black")) +
ggplot(data = nwis ) +
geom_point(data = nwis, aes(x = Q_cms, y = TW_m, col = "Full")) +
geom_line(data = nwis, aes(x = Q_cms, y = x$TW_coef[1] * Q_cms ^ (x$TW_exp[1]), col = "Full")) +
geom_point(data = filtered_data, aes(x = Q, y = TW, col = "Filter")) +
geom_line(data = filtered_data, aes(x = Q, y = ahg_fit$TW_coef[1] * Q ^ (ahg_fit$TW_exp[1]), col = "Filter")) +
theme_light() +
labs(title = 'Q-TW Relationship') +
scale_color_manual(values = c("Filter" = "red", "Full" = "black")) +
ggplot(data = nwis) +
geom_point(data = nwis, aes(x = Q_cms, y = V_ms, col = "Full")) +
geom_line(data = nwis, aes(x = Q_cms, y = x$V_coef[1] * Q_cms ^ (x$V_exp[1]), col = "Full")) +
geom_point(data = filtered_data, aes(x = Q, y = V, col = "Filter")) +
geom_line(data = filtered_data, aes(x = Q, y = ahg_fit$V_coef[1] * Q ^ (ahg_fit$V_exp[1]), col = "Filter")) +
theme_light() +
labs(title = 'Q-V Relationship') +
scale_color_manual(values = c("Filter" = "red", "Full" = "black")) +
plot_layout(guides = "collect") &
theme(legend.position = 'bottom')
```
## Hydraulic Shape Estimation
Lastly, a range of functions have been added to extend the AHG parameters into cross section hydraulics and geometry. These come primarily from [@dingman2018field] and are described in detail [here](https://mikejohnson51.github.io/AHGestimation/articles/hydraulics.html)
```{r}
# Compute hydraulic parameters
(hydraulic_params = compute_hydraulic_params(ahg_fit))
# Estimate roughness
# Slope is taken from the NHD reach associated with our gage
compute_n(filtered_data, S = 0.01463675)
```
Of particular note, the `r` value describes the theoretical shape of the channel ranging from a triangle (r = 1) to a rectangle (r = ∞). When paired with a TW and Depth (here assuming max of the record) a generalized cross section can be derived. The returned data.frame provide a point index (from left bank looking upstream) the associated relative x and absolute Y position, and the cross sectional area for that Y.
```{r}
cs = cross_section(r = hydraulic_params$r,
TW = max(filtered_data$TW),
Ymax = max(filtered_data$Y))
glimpse(cs)
```
```{r, echo = FALSE}
ggplot(data = cs) +
geom_point(aes(x = x, y = Y)) +
geom_line(aes(x = x, y = Y)) +
theme_light() +
geom_hline(yintercept = max(cs$Y), color = "darkgreen") +
geom_hline(yintercept = 2, color = "blue", linewidth = 2) +
geom_vline(xintercept = mean(cs$x), color = "darkgreen") +
labs(x = "Relative Width", y = "Depth", title = "Shape") +
ggplot(data = cs) +
geom_point(aes(x = A, y = Y)) +
geom_line(aes(x = A, y = Y)) +
geom_hline(yintercept = 2, color = "blue", linewidth = 2) +
theme_light() +
labs(Y = "Depth", x = "Cross Sectional Area", title = "Area to Depth")
```
# History
The development of this package began as a graduate school project between friends at UC Santa Barbara and UMass Amherst following the 2017 NOAA OWP Summer Institute and clear evidence channel shape may be a limiting factor in National Water Model Performance. It has since evolved to provide an open source utility for robust large scale data synthesis and evaluation. Funding from the National Science Foundation (Grants 1937099, 2033607) provided time to draft @preprint and apply an early version of this software to the [Continental Flood Inundation Mapping (CFIM) synthetic rating curve dataset [@cfim]. Funding from the National Oceanic and Atmospheric Administration's Office of Water Prediction supported the addition of data filtering and hydraulic estimation, improved documentation, and code improvement We are grateful to all involved.
# Contributing
First, thanks for considering a contribution! We hope to make this package a community created resource!
- Please attempt to describe what you want to do prior to contributing by submitting an issue.
- Please follow the typical github fork - pull-request workflow.
- Contributions should be tested with `testthat` by running `devtools::test()`.
- Code style should attempt to follow the [tidyverse style guide](https://style.tidyverse.org/).
- Make sure you use `roxygen` and run `devtools::check()` before contributing.
Other notes:
- Consider running `goodpractice::gp()` on the package before contributing.
- Consider running `devtools::spell_check()` and `devtools::document()` if you wrote documentation.
- Consider running `devtools::build_readme()` if you made any changes.
- This package uses pkgdown. Running `pkgdown::build_site()` will refresh it.
# References