Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update internal data #201

Merged
merged 4 commits into from
Dec 29, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion NEWS.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# tidyhydat 0.6.0.9000
# tidyhydat 0.6.1
- Add `...` to print methods so you can pass arguments all the way down.
- Add workaround for vroom#519 bug that prevents `realtime_*` fucntions from working

Expand Down
21 changes: 21 additions & 0 deletions cran-comments.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,24 @@
tidyhydat 0.6.1
=========================

There were zero WARNINGS and zero ERRORS.

## NEWS
- Add `...` to print methods so you can pass arguments all the way down.
- Add workaround for vroom#519 bug that prevents `realtime_*` fucntions from working

## Test environments
* win-builder (via `devtools::check_win_devel()` and `devtools::check_win_release()`)
* local macOS, R 4.3.1 (via R CMD check --as-cran)
* ubuntu-20.04, r: 'release' (github actions)
* ubuntu-20.04, r: 'devel' (github actions)
* macOS, r: 'release' (github actions)
* windows, r: 'release' (github actions)
* Fedora Linux, R-devel, clang, gfortran - r-hub
* Debian Linux, R-release, GCC (debian-gcc-release) - r-hub
* Windows Server 2008 R2 SP1, R-devel, 32/64 bit - r-hub


tidyhydat 0.6.0
=========================

Expand Down
1,648 changes: 820 additions & 828 deletions data-raw/HYDAT_internal_data/allstations.csv

Large diffs are not rendered by default.

Binary file modified data/allstations.rda
Binary file not shown.
Binary file modified data/hy_data_symbols.rda
Binary file not shown.
Binary file modified data/hy_data_types.rda
Binary file not shown.
86 changes: 43 additions & 43 deletions vignettes/tidyhydat_an_introduction.Rmd
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
title: "tidyhydat: An Introduction"
author: "Sam Albers"
date: "2023-01-29"
date: "2023-12-28"
output:
html_vignette:
keep_md: true
Expand Down Expand Up @@ -42,7 +42,7 @@ hy_daily_flows(station_number = "08LA001")
```

```
## Queried from version of HYDAT released on 2022-10-24
## Queried from version of HYDAT released on 2023-11-20
## Observations: 31,351
## Measurement flags: 6,166
## Parameter(s): Flow
Expand All @@ -63,7 +63,7 @@ hy_daily_flows(station_number = "08LA001")
## 8 08LA001 1914-01-08 Flow 140 <NA>
## 9 08LA001 1914-01-09 Flow 140 <NA>
## 10 08LA001 1914-01-10 Flow 140 <NA>
## # … with 31,341 more rows
## # 31,341 more rows
```

Another method is to use `hy_stations()` to generate your vector which is then given the `station_number` argument. For example, we could take a subset for only those active stations within Prince Edward Island (Province code:PE) and then create vector for `hy_daily_flows()`:
Expand All @@ -79,24 +79,24 @@ PEI_stns
```

```
## [1] "01CA003" "01CB002" "01CB004" "01CB018" "01CC002" "01CC005" "01CC010" "01CC011"
## [9] "01CD005"
## [1] "01CA003" "01CB002" "01CB004" "01CB018" "01CC002"
## [6] "01CC005" "01CC010" "01CC011" "01CD005"
```

```r
hy_daily_flows(station_number = PEI_stns)
```

```
## Queried from version of HYDAT released on 2022-10-24
## Observations: 115,337
## Measurement flags: 20,614
## Queried from version of HYDAT released on 2023-11-20
## Observations: 117,530
## Measurement flags: 20,867
## Parameter(s): Flow
## Date range: 1961-08-01 to 2020-12-31
## Date range: 1961-08-01 to 2021-12-31
## Station(s) returned: 9
## Stations requested but not returned:
## All stations returned.
## # A tibble: 115,337 × 5
## # A tibble: 117,530 × 5
## STATION_NUMBER Date Parameter Value Symbol
## <chr> <date> <chr> <dbl> <chr>
## 1 01CA003 1961-08-01 Flow NA <NA>
Expand All @@ -109,7 +109,7 @@ hy_daily_flows(station_number = PEI_stns)
## 8 01CB002 1961-08-04 Flow NA <NA>
## 9 01CA003 1961-08-05 Flow NA <NA>
## 10 01CB002 1961-08-05 Flow NA <NA>
## # … with 115,327 more rows
## # ℹ 117,520 more rows
```

We can also merge our station choice and data extraction into one unified pipe which accomplishes a single goal. For example if for some reason we wanted all the stations in Canada that had the name "Canada" in them we unify that selection and data extraction process into a single pipe:
Expand All @@ -121,15 +121,15 @@ search_stn_name("canada") %>%
```

```
## Queried from version of HYDAT released on 2022-10-24
## Observations: 86,147
## Measurement flags: 26,222
## Queried from version of HYDAT released on 2023-11-20
## Observations: 87,669
## Measurement flags: 26,754
## Parameter(s): Flow
## Date range: 1918-08-01 to 2022-06-30
## Date range: 1918-08-01 to 2023-05-31
## Station(s) returned: 7
## Stations requested but not returned:
## All stations returned.
## # A tibble: 86,147 × 5
## # A tibble: 87,669 × 5
## STATION_NUMBER Date Parameter Value Symbol
## <chr> <date> <chr> <dbl> <chr>
## 1 01AK001 1918-08-01 Flow NA <NA>
Expand All @@ -142,7 +142,7 @@ search_stn_name("canada") %>%
## 8 01AK001 1918-08-08 Flow 1.78 <NA>
## 9 01AK001 1918-08-09 Flow 1.5 <NA>
## 10 01AK001 1918-08-10 Flow 1.78 <NA>
## # … with 86,137 more rows
## # ℹ 87,659 more rows
```

We saw above that if we were only interested in a subset of dates we could use the `start_date` and `end_date` arguments. A date must be supplied to both these arguments in the form of YYYY-MM-DD. If you were interested in all daily flow data from station number "08LA001" for 1981, you would specify all days in 1981 :
Expand Down Expand Up @@ -196,18 +196,18 @@ search_stn_name("liard")

```
## # A tibble: 9 × 5
## STATION_NUMBER STATION_NAME PROV_TERR_STATE_LOC LATITUDE LONGIT…¹
## <chr> <chr> <chr> <dbl> <dbl>
## 1 10AA001 LIARD RIVER AT UPPER CROSSING YT 60.1 -129.
## 2 10AA006 LIARD RIVER BELOW SCURVY CREEK YT 60.8 -131.
## 3 10BE001 LIARD RIVER AT LOWER CROSSING BC 59.4 -126.
## 4 10ED001 LIARD RIVER AT FORT LIARD NT 60.2 -123.
## 5 10ED002 LIARD RIVER NEAR THE MOUTH NT 61.7 -121.
## 6 10BE005 LIARD RIVER ABOVE BEAVER RIVER BC 59.7 -124.
## 7 10BE006 LIARD RIVER ABOVE KECHIKA RIVER BC 59.7 -127.
## 8 10ED008 LIARD RIVER AT LINDBERG LANDING NT 61.1 -123.
## 9 10GC004 MACKENZIE RIVER ABOVE LIARD RIVER NT 61.9 -121.
## # … with abbreviated variable name ¹​LONGITUDE
## STATION_NUMBER STATION_NAME PROV_TERR_STATE_LOC LATITUDE
## <chr> <chr> <chr> <dbl>
## 1 10AA001 LIARD RIVER A… YT 60.1
## 2 10AA006 LIARD RIVER B… YT 60.8
## 3 10BE001 LIARD RIVER A… BC 59.4
## 4 10ED001 LIARD RIVER A… NT 60.2
## 5 10ED002 LIARD RIVER N… NT 61.7
## 6 10BE005 LIARD RIVER A… BC 59.7
## 7 10BE006 LIARD RIVER A… BC 59.7
## 8 10ED008 LIARD RIVER A… NT 61.1
## 9 10GC004 MACKENZIE RIV… NT 61.9
## # ℹ 1 more variable: LONGITUDE <dbl>
```
Similarly, `search_stn_number()` can be useful if you are interested in all stations from the *08MF* sub-sub-drainage:

Expand All @@ -217,20 +217,20 @@ search_stn_number("08MF")

```
## # A tibble: 54 × 5
## STATION_NUMBER STATION_NAME PROV_TERR_STA…¹ LATIT…² LONGI…³
## <chr> <chr> <chr> <dbl> <dbl>
## 1 08MF005 FRASER RIVER AT HOPE BC 49.4 -121.
## 2 08MF035 FRASER RIVER NEAR AGASSIZ BC 49.2 -122.
## 3 08MF038 FRASER RIVER AT CANNOR BC 49.1 -122.
## 4 08MF040 FRASER RIVER ABOVE TEXAS CREEK BC 50.6 -122.
## 5 08MF062 COQUIHALLA RIVER BELOW NEEDLE CREEK BC 49.5 -121.
## 6 08MF065 NAHATLATCH RIVER BELOW TACHEWANA CREEK BC 50.0 -122.
## 7 08MF068 COQUIHALLA RIVER ABOVE ALEXANDER CREEK BC 49.4 -121.
## 8 08MF072 FRASER RIVER AT LAIDLAW BC 49.3 -122.
## 9 08MF073 FRASER RIVER AT HARRISON MILLS BC 49.2 -122.
## 10 08MF074 FRASER RIVER ABOVE HERRLING ISLAND BC 49.3 -122.
## # … with 44 more rows, and abbreviated variable names ¹​PROV_TERR_STATE_LOC, ²​LATITUDE,
## # ³​LONGITUDE
## STATION_NUMBER STATION_NAME PROV_TERR_STATE_LOC LATITUDE
## <chr> <chr> <chr> <dbl>
## 1 08MF005 FRASER RIVER… BC 49.4
## 2 08MF040 FRASER RIVER… BC 50.6
## 3 08MF062 COQUIHALLA R… BC 49.5
## 4 08MF065 NAHATLATCH R… BC 50.0
## 5 08MF068 COQUIHALLA R… BC 49.4
## 6 08MF001 ANDERSON RIV… BC 49.8
## 7 08MF002 BOULDER CREE… BC 49.3
## 8 08MF003 COQUIHALLA R… BC 49.4
## 9 08MF004 FRASER RIVER… BC 50.2
## 10 08MF006 WAHLEACH CRE… BC 49.3
## # 44 more rows
## # ℹ 1 more variable: LONGITUDE <dbl>
```

## Using joins
Expand Down
46 changes: 24 additions & 22 deletions vignettes/tidyhydat_example_analysis.Rmd
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
title: "Two examples of using tidyhydat"
author: "Sam Albers"
date: "2023-01-29"
date: "2023-12-28"
output:
html_vignette:
keep_md: true
Expand Down Expand Up @@ -60,25 +60,26 @@ hy_stn_data_range()
```

```
## Queried from version of HYDAT released on 2022-10-24
## Observations: 12,079
## Station(s) returned: 7,937
## Queried from version of HYDAT released on 2023-11-20
## Observations: 12,125
## Station(s) returned: 7,963
## Stations requested but not returned:
## All stations returned.
## # A tibble: 12,079 × 6
## STATION_NUMBER DATA_TYPE SED_DATA_TYPE Year_from Year_to RECORD_LENGTH
## <chr> <chr> <chr> <int> <int> <int>
## 1 01AA002 Q <NA> 1967 1977 11
## 2 01AD001 Q <NA> 1918 1997 80
## 3 01AD002 Q <NA> 1926 2020 95
## 4 01AD003 H <NA> 2011 2020 10
## 5 01AD003 Q <NA> 1951 2020 70
## 6 01AD004 H <NA> 1980 2019 35
## 7 01AD004 Q <NA> 1968 1979 12
## 8 01AD005 H <NA> 1966 1974 9
## 9 01AD008 H <NA> 1972 1974 3
## 10 01AD009 H <NA> 1973 1982 10
## # … with 12,069 more rows
## # A tibble: 12,125 × 6
## STATION_NUMBER DATA_TYPE SED_DATA_TYPE Year_from Year_to
## <chr> <chr> <chr> <int> <int>
## 1 01AA002 Q <NA> 1967 1977
## 2 01AD001 Q <NA> 1918 1997
## 3 01AD002 Q <NA> 1926 2021
## 4 01AD003 H <NA> 2011 2022
## 5 01AD003 Q <NA> 1951 2022
## 6 01AD004 H <NA> 1980 2021
## 7 01AD004 Q <NA> 1968 1979
## 8 01AD005 H <NA> 1966 1974
## 9 01AD008 H <NA> 1972 1974
## 10 01AD009 H <NA> 1973 1982
## # ℹ 12,115 more rows
## # ℹ 1 more variable: RECORD_LENGTH <int>
```
Our objective here is to filter from this data for the station that has the longest record of flow (`DATA_TYPE == "Q"`). You'll also notice this symbol `%>%` which in R is called a [pipe](https://magrittr.tidyverse.org/reference/pipe.html). In code, read it as the word *then*. So for the data_range data we want to grab the data *then* filter it by flow ("Q") in `DATA_TYPE` and then by the maximum value of `RECORD_LENGTH`:

Expand All @@ -88,15 +89,16 @@ hy_stn_data_range() %>%
```

```
## Queried from version of HYDAT released on 2022-10-24
## Queried from version of HYDAT released on 2023-11-20
## Observations: 1
## Station(s) returned: 1
## Stations requested but not returned:
## All stations returned.
## # A tibble: 1 × 6
## STATION_NUMBER DATA_TYPE SED_DATA_TYPE Year_from Year_to RECORD_LENGTH
## <chr> <chr> <chr> <int> <int> <int>
## 1 02HA003 Q <NA> 1860 2021 162
## STATION_NUMBER DATA_TYPE SED_DATA_TYPE Year_from Year_to
## <chr> <chr> <chr> <int> <int>
## 1 02HA003 Q <NA> 1860 2021
## # ℹ 1 more variable: RECORD_LENGTH <int>
```
*then* pull the `STATION_NUMBER` that has the longest record:

Expand Down
36 changes: 21 additions & 15 deletions vignettes/tidyhydat_hydat_db.Rmd
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
title: "Stepping into the HYDAT Database"
author: "Dewey Dunnington"
date: "2023-01-29"
date: "2023-12-28"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Stepping into the HYDAT Database}
Expand Down Expand Up @@ -38,17 +38,23 @@ To list the tables, use `src_tbls()` from the **dplyr** package.

```r
src_tbls(src)
#> [1] "AGENCY_LIST" "ANNUAL_INSTANT_PEAKS" "ANNUAL_STATISTICS"
#> [4] "CONCENTRATION_SYMBOLS" "DATA_SYMBOLS" "DATA_TYPES"
#> [7] "DATUM_LIST" "DLY_FLOWS" "DLY_LEVELS"
#> [10] "MEASUREMENT_CODES" "OPERATION_CODES" "PEAK_CODES"
#> [13] "PRECISION_CODES" "REGIONAL_OFFICE_LIST" "SAMPLE_REMARK_CODES"
#> [16] "SED_DATA_TYPES" "SED_DLY_LOADS" "SED_DLY_SUSCON"
#> [19] "SED_SAMPLES" "SED_SAMPLES_PSD" "SED_VERTICAL_LOCATION"
#> [22] "SED_VERTICAL_SYMBOLS" "STATIONS" "STN_DATA_COLLECTION"
#> [25] "STN_DATA_RANGE" "STN_DATUM_CONVERSION" "STN_DATUM_UNRELATED"
#> [28] "STN_OPERATION_SCHEDULE" "STN_REGULATION" "STN_REMARKS"
#> [31] "STN_REMARK_CODES" "STN_STATUS_CODES" "VERSION"
#> [1] "AGENCY_LIST" "ANNUAL_INSTANT_PEAKS"
#> [3] "ANNUAL_STATISTICS" "CONCENTRATION_SYMBOLS"
#> [5] "DATA_SYMBOLS" "DATA_TYPES"
#> [7] "DATUM_LIST" "DLY_FLOWS"
#> [9] "DLY_LEVELS" "MEASUREMENT_CODES"
#> [11] "OPERATION_CODES" "PEAK_CODES"
#> [13] "PRECISION_CODES" "REGIONAL_OFFICE_LIST"
#> [15] "SAMPLE_REMARK_CODES" "SED_DATA_TYPES"
#> [17] "SED_DLY_LOADS" "SED_DLY_SUSCON"
#> [19] "SED_SAMPLES" "SED_SAMPLES_PSD"
#> [21] "SED_VERTICAL_LOCATION" "SED_VERTICAL_SYMBOLS"
#> [23] "STATIONS" "STN_DATA_COLLECTION"
#> [25] "STN_DATA_RANGE" "STN_DATUM_CONVERSION"
#> [27] "STN_DATUM_UNRELATED" "STN_OPERATION_SCHEDULE"
#> [29] "STN_REGULATION" "STN_REMARKS"
#> [31] "STN_REMARK_CODES" "STN_STATUS_CODES"
#> [33] "VERSION"
```

To inspect any particular table, use the `tbl()` function with the `src` and the table name.
Expand All @@ -57,7 +63,7 @@ To inspect any particular table, use the `tbl()` function with the `src` and the
```r
tbl(src, "STN_OPERATION_SCHEDULE")
#> # Source: table<STN_OPERATION_SCHEDULE> [?? x 5]
#> # Database: sqlite 3.39.4 [/Users/samalbers/_dev/gh_repos/tidyhydat/inst/test_db/tinyhydat.sqlite3]
#> # Database: sqlite 3.41.2 [/Users/samalbers/_dev/gh_repos/tidyhydat/inst/test_db/tinyhydat.sqlite3]
#> STATION_NUMBER DATA_TYPE YEAR MONTH_FROM MONTH_TO
#> <chr> <chr> <int> <chr> <chr>
#> 1 05AA008 H 2012 JAN DEC
Expand All @@ -70,7 +76,7 @@ tbl(src, "STN_OPERATION_SCHEDULE")
#> 8 05AA008 H 2019 JAN DEC
#> 9 05AA008 H 2020 JAN DEC
#> 10 05AA008 Q 1910 <NA> <NA>
#> # … with more rows
#> # more rows
```

Working with SQL tables in dplyr is much like working with regular data frames, except no data is actually read from the database until necessary. Because some of these tables are large (particularly those containing the actual data), you will want to `filter()` the tables before you `collect()` them (the `collect()` operation loads them into memory as a `data.frame`).
Expand All @@ -93,7 +99,7 @@ tbl(src, "STN_OPERATION_SCHEDULE") %>%
#> 8 05AA008 H 2019 JAN DEC
#> 9 05AA008 H 2020 JAN DEC
#> 10 05AA008 Q 1910 <NA> <NA>
#> # … with 93 more rows
#> # 93 more rows
```

When you are finished with the database (i.e., the end of the script), it is good practice to close the connection (you may get a loud red warning if you don't!).
Expand Down
Binary file modified vignettes/vignette-fig-pcrtile_plt-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified vignettes/vignette-fig-unnamed-chunk-8-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading