Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix read duckdb #78

Merged
merged 7 commits into from
Aug 10, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
84 changes: 21 additions & 63 deletions .github/workflows/R-CMD-check.yaml
Original file line number Diff line number Diff line change
@@ -1,12 +1,10 @@
# For help debugging build failures open an issue on the RStudio community with the 'github-actions' tag.
# https://community.rstudio.com/new-topic?category=Package%20development&tags=github-actions
# Workflow derived from https://github.com/r-lib/actions/tree/v2/examples
# Need help debugging build failures? Start at https://github.com/r-lib/actions#where-to-find-help
on:
push:
branches:
- '**'
branches: [main, master]
pull_request:
branches:
- master
branches: [main, master]

name: R-CMD-check

Expand All @@ -20,72 +18,32 @@ jobs:
fail-fast: false
matrix:
config:
- {os: macos-latest, r: 'release'}
- {os: windows-latest, r: 'release'}
- {os: macOS-latest, r: 'release'}
- {os: ubuntu-18.04, r: 'release', rspm: "https://packagemanager.rstudio.com/cran/__linux__/bionic/latest"}
- {os: ubuntu-18.04, r: 'devel', rspm: "https://packagemanager.rstudio.com/cran/__linux__/bionic/latest"}
- {os: ubuntu-latest, r: 'devel', http-user-agent: 'release'}
- {os: ubuntu-latest, r: 'release'}
- {os: ubuntu-latest, r: 'oldrel-1'}

env:
R_REMOTES_NO_ERRORS_FROM_WARNINGS: true
RSPM: ${{ matrix.config.rspm }}
cache-version: v2
GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}
R_KEEP_PKG_SOURCE: yes

steps:
- uses: actions/checkout@v2
- uses: actions/checkout@v3

- uses: r-lib/actions/setup-r@master
- uses: r-lib/actions/setup-pandoc@v2

- uses: r-lib/actions/setup-r@v2
with:
r-version: ${{ matrix.config.r }}
http-user-agent: ${{ matrix.config.http-user-agent }}
use-public-rspm: true

- uses: r-lib/actions/setup-pandoc@master

- name: Query dependencies
run: |
install.packages('remotes')
saveRDS(remotes::dev_package_deps(dependencies = TRUE), ".github/depends.Rds", version = 2)
writeLines(sprintf("R-%i.%i", getRversion()$major, getRversion()$minor), ".github/R-version")
shell: Rscript {0}

- name: Cache R packages
if: runner.os != 'Windows'
uses: actions/cache@v2
- uses: r-lib/actions/setup-r-dependencies@v2
with:
path: ${{ env.R_LIBS_USER }}
key: ${{ env.cache-version }}-${{ runner.os }}-${{ hashFiles('.github/R-version') }}-1-${{ hashFiles('.github/depends.Rds') }}
restore-keys: ${{ env.cache-version }}-${{ runner.os }}-${{ hashFiles('.github/R-version') }}-1-

- name: Install system dependencies
if: runner.os == 'Linux'
run: |
while read -r cmd
do
eval sudo $cmd
done < <(Rscript -e 'writeLines(remotes::system_requirements("ubuntu", "20.04"))')
extra-packages: any::rcmdcheck
needs: check

- name: Install dependencies
run: |
remotes::install_deps(dependencies = TRUE)
# latest dev version of duckdb
# install.packages("https://github.com/cwida/duckdb/releases/download/v0.2.5/duckdb_r_src.tar.gz", repos = NULL)
remotes::install_cran("rcmdcheck")
shell: Rscript {0}

- name: Check
env:
_R_CHECK_CRAN_INCOMING_REMOTE_: false
run: rcmdcheck::rcmdcheck(args = c("--no-manual", "--as-cran"), error_on = "warning", check_dir = "check")
shell: Rscript {0}

- name: Upload check results
if: failure()
uses: actions/upload-artifact@main
- uses: r-lib/actions/check-r-package@v2
with:
name: ${{ runner.os }}-r${{ matrix.config.r }}-results
path: check

- name: Test coverage
if: matrix.config.os == 'macOS-latest' && matrix.config.r == 'release'
run: |
install.packages("covr")
covr::codecov(token = "${{secrets.CODECOV_TOKEN}}")
shell: Rscript {0}
upload-snapshots: true
50 changes: 50 additions & 0 deletions .github/workflows/test-coverage.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
# Workflow derived from https://github.com/r-lib/actions/tree/v2/examples
# Need help debugging build failures? Start at https://github.com/r-lib/actions#where-to-find-help
on:
push:
branches: [main, master]
pull_request:
branches: [main, master]

name: test-coverage

jobs:
test-coverage:
runs-on: ubuntu-latest
env:
GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}

steps:
- uses: actions/checkout@v3

- uses: r-lib/actions/setup-r@v2
with:
use-public-rspm: true

- uses: r-lib/actions/setup-r-dependencies@v2
with:
extra-packages: any::covr
needs: coverage

- name: Test coverage
run: |
covr::codecov(
quiet = FALSE,
clean = FALSE,
install_path = file.path(Sys.getenv("RUNNER_TEMP"), "package")
)
shell: Rscript {0}

- name: Show testthat output
if: always()
run: |
## --------------------------------------------------------------------
find ${{ runner.temp }}/package -name 'testthat.Rout*' -exec cat '{}' \; || true
shell: bash

- name: Upload test results
if: failure()
uses: actions/upload-artifact@v3
with:
name: coverage-test-failures
path: ${{ runner.temp }}/package
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ Imports:
rappdirs (>= 0.3.1),
readr (>= 2.1.0),
rlang,
duckdb (>= 0.3.1),
duckdb (>= 0.8.1),
storr (>= 1.2.5),
stringr (>= 1.4.0),
tibble (>= 3.1.5),
Expand Down
2 changes: 1 addition & 1 deletion NEWS.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# rems (development version)

* `ask` parameter now skips all verification (caching and updates) when set to `FALSE` (#67, #68)
* `download_historic_data()` now has more flixible options to update the historic database (principally to enable better updating behaviour in [shinyrems](https://github.com/bcgov/shinyrems)). (#69, thanks @aylapear)
* `download_historic_data()` now has more flexible options to update the historic database (principally to enable better updating behaviour in [shinyrems](https://github.com/bcgov/shinyrems)). (#69, thanks @aylapear)
* removed capability to register/convert some units due to a bug in the C library underlying the {units} package (https://github.com/r-quantities/units/issues/301). (#70)

# rems 0.8.0
Expand Down
27 changes: 21 additions & 6 deletions R/duckdb_create.R
Original file line number Diff line number Diff line change
Expand Up @@ -41,10 +41,30 @@ create_rems_duckdb <- function(csv_file, db_path, cache_date) {
paste(historic_col_names, historic_col_sql_types, collapse = ', '),
')'))

# For some reason this file fails when read using multithreading,
# so disable parallel reading by default.
parallel <- getOption("rems.duckdb_read_parallel", default = FALSE)

DBI::dbExecute(con,
glue("COPY {tbl_name} from '{csv_file}' ( HEADER, TIMESTAMPFORMAT '{ems_timestamp_format()}' )")
glue("COPY {tbl_name} from '{csv_file}'
( HEADER, TIMESTAMPFORMAT '{ems_timestamp_format()}',
PARALLEL {as.character(parallel)} )")
)

if (getOption("rems.duckdb_build_indexes", default = FALSE)) {
# With duckdb v0.8.1 a min-max index is automatically created for columns
# of all general-purpose data types. ART indexes must be able to fit in
# memory, and since this won't on most machines they are not built by default.
# https://duckdb.org/docs/archive/0.8.1/sql/indexes
add_indexes(con)
}

set_cache_date("historic", cache_date)

invisible(TRUE)
}

add_indexes <- function(con) {
message("Adding database indexes")
cat_if_interactive("|=")
add_sql_index(con, colname = "EMS_ID")
Expand All @@ -70,10 +90,5 @@ create_rems_duckdb <- function(csv_file, db_path, cache_date) {
add_sql_index(con, colname = "PARAMETER")
cat_if_interactive("=")
add_sql_index(con, colname = "PARAMETER_CODE")

cat_if_interactive("| 100%\n")

set_cache_date("historic", cache_date)

invisible(TRUE)
}
4 changes: 4 additions & 0 deletions R/rems-package.R
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
#' @keywords internal
#' @importFrom rlang .data
"_PACKAGE"

ignore_unused_imports <- function() {
dbplyr::tbl_lazy
}
1 change: 1 addition & 0 deletions README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ knitr::opts_chunk$set(
[![CRAN status](https://www.r-pkg.org/badges/version/rems)](https://cran.r-project.org/package=rems)
[![R build status](https://github.com/bcgov/rems/workflows/R-CMD-check/badge.svg)](https://github.com/bcgov/rems/actions)
[![img](https://img.shields.io/badge/Lifecycle-Maturing-007EC6)](https://github.com/bcgov/repomountie/blob/master/doc/lifecycle-badges.md)
[![R-CMD-check](https://github.com/bcgov/rems/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/bcgov/rems/actions/workflows/R-CMD-check.yaml)
<!-- badges: end -->

## Overview
Expand Down
68 changes: 34 additions & 34 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ status](https://www.r-pkg.org/badges/version/rems)](https://cran.r-project.org/p
[![R build
status](https://github.com/bcgov/rems/workflows/R-CMD-check/badge.svg)](https://github.com/bcgov/rems/actions)
[![img](https://img.shields.io/badge/Lifecycle-Maturing-007EC6)](https://github.com/bcgov/repomountie/blob/master/doc/lifecycle-badges.md)
[![R-CMD-check](https://github.com/bcgov/rems/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/bcgov/rems/actions/workflows/R-CMD-check.yaml)
<!-- badges: end -->

## Overview
Expand Down Expand Up @@ -46,8 +47,8 @@ source the packages which require compilation”*, choose **“No”**.

**NOTE:** If you are using Windows, you must be running the 64-bit
version of R, as the 32-bit version cannot handle the size of the EMS
data. In RStudio, click on Tools -> Global Options and ensure the 64 bit
version is chosen in the *R version* box.
data. In RStudio, click on Tools -\> Global Options and ensure the 64
bit version is chosen in the *R version* box.

You can use the `get_ems_data()` function to get last two years of data
(you can also specify `which = "4yr"` to get the last four years of
Expand All @@ -61,18 +62,18 @@ two_year <- get_ems_data(which = "2yr", ask = FALSE)
#> Caching data on disk...
#> Loading data...
nrow(two_year)
#> [1] 1092314
#> [1] 2214758
head(two_year)
#> # A tibble: 6 × 24
#> EMS_ID REQUISITION_ID MONITORING_LOCATION LATITUDE LONGITUDE LOCATION_TYPE
#> <chr> <chr> <chr> <dbl> <dbl> <chr>
#> 1 0121580 VA21A0072 ENGLISHMAN RIVER AT P49.3 -124. RIVER,STREAM…
#> 2 0121580 VA21A0072 ENGLISHMAN RIVER AT P49.3 -124. RIVER,STREAM…
#> 3 0121580 VA21A0072 ENGLISHMAN RIVER AT P49.3 -124. RIVER,STREAM…
#> 4 0121580 VA21A0072 ENGLISHMAN RIVER AT P49.3 -124. RIVER,STREAM…
#> 5 0121580 VA21A0072 ENGLISHMAN RIVER AT P49.3 -124. RIVER,STREAM…
#> 6 0121580 VA21A0072 ENGLISHMAN RIVER AT P49.3 -124. RIVER,STREAM…
#> # … with 18 more variables: COLLECTION_START <dttm>, LOCATION_PURPOSE <chr>,
#> 1 0120802 6983960101 COWICHAN RIVER AT HIG48.8 -124. RIVER,STREAM…
#> 2 0120802 6983960101 COWICHAN RIVER AT HIG48.8 -124. RIVER,STREAM…
#> 3 0120802 6983960101 COWICHAN RIVER AT HIG48.8 -124. RIVER,STREAM…
#> 4 0120802 6983960101 COWICHAN RIVER AT HIG48.8 -124. RIVER,STREAM…
#> 5 0120802 6983960101 COWICHAN RIVER AT HIG48.8 -124. RIVER,STREAM…
#> 6 0120802 6983960101 COWICHAN RIVER AT HIG48.8 -124. RIVER,STREAM…
#> # 18 more variables: COLLECTION_START <dttm>, LOCATION_PURPOSE <chr>,
#> # PERMIT <chr>, SAMPLE_CLASS <chr>, SAMPLE_STATE <chr>,
#> # SAMPLE_DESCRIPTOR <chr>, PARAMETER_CODE <chr>, PARAMETER <chr>,
#> # ANALYTICAL_METHOD_CODE <chr>, ANALYTICAL_METHOD <chr>, RESULT_LETTER <chr>,
Expand Down Expand Up @@ -179,13 +180,13 @@ head(all_data)
#> # A tibble: 6 × 24
#> EMS_ID REQUISITION_ID MONITORING_LOCATION LATITUDE LONGITUDE LOCATION_TYPE
#> <chr> <chr> <chr> <dbl> <dbl> <chr>
#> 1 0126400 <NA> QUINSAM RIVER AT THE … 50.0 -125. RIVER,STREAM…
#> 2 0126400 08189513 QUINSAM RIVER AT THE … 50.0 -125. RIVER,STREAM…
#> 3 0121580 08312097 ENGLISHMAN RIVER AT P… 49.3 -124. RIVER,STREAM…
#> 4 0121580 08214745 ENGLISHMAN RIVER AT P49.3 -124. RIVER,STREAM…
#> 5 0126400 08193504 QUINSAM RIVER AT THE … 50.0 -125. RIVER,STREAM…
#> 6 0126400 08334138 QUINSAM RIVER AT THE 50.0 -125. RIVER,STREAM…
#> # … with 18 more variables: COLLECTION_START <dttm>, LOCATION_PURPOSE <chr>,
#> 1 0126400 08103000 QUINSAM RIVER AT THE … 50.0 -125. RIVER,STREAM…
#> 2 0126400 <NA> QUINSAM RIVER AT THE … 50.0 -125. RIVER,STREAM…
#> 3 0121580 08204191 ENGLISHMAN RIVER AT P… 49.3 -124. RIVER,STREAM…
#> 4 0126400 50126715 QUINSAM RIVER AT THE 50.0 -125. RIVER,STREAM…
#> 5 0126400 08192637 QUINSAM RIVER AT THE … 50.0 -125. RIVER,STREAM…
#> 6 0121580 08193908 ENGLISHMAN RIVER AT P49.3 -124. RIVER,STREAM…
#> # 18 more variables: COLLECTION_START <dttm>, LOCATION_PURPOSE <chr>,
#> # PERMIT <chr>, SAMPLE_CLASS <chr>, SAMPLE_STATE <chr>,
#> # SAMPLE_DESCRIPTOR <chr>, PARAMETER_CODE <chr>, PARAMETER <chr>,
#> # ANALYTICAL_METHOD_CODE <chr>, ANALYTICAL_METHOD <chr>, RESULT_LETTER <chr>,
Expand All @@ -207,14 +208,14 @@ filter(all_data, UNIT != MDL_UNIT) %>%
select(RESULT, UNIT, METHOD_DETECTION_LIMIT, MDL_UNIT) %>%
head()
#> # A tibble: 6 × 4
#> RESULT UNIT METHOD_DETECTION_LIMIT MDL_UNIT
#> <dbl> <chr> <dbl> <chr>
#> 1 0.00207 mg/L 0.05 ug/L
#> 2 0.00053 mg/L 0.05 ug/L
#> 3 0.105 mg/L 0.2 ug/L
#> 4 0.00078 mg/L 0.02 ug/L
#> 5 0.0002 mg/L 0.2 ug/L
#> 6 0.0138 mg/L 0.5 ug/L
#> RESULT UNIT METHOD_DETECTION_LIMIT MDL_UNIT
#> <dbl> <chr> <dbl> <chr>
#> 1 0.000001 mg/L 0.001 ug/L
#> 2 0.0302 mg/L 0.2 ug/L
#> 3 0.0005 mg/L 0.2 ug/L
#> 4 0.00057 mg/L 0.02 ug/L
#> 5 0.000002 mg/L 0.001 ug/L
#> 6 0.000001 mg/L 0.001 ug/L

all_data <- standardize_mdl_units(all_data)
#> Successfully converted units in 2172 rows.
Expand All @@ -224,7 +225,6 @@ filter(all_data, UNIT != MDL_UNIT) %>%
select(RESULT, UNIT, METHOD_DETECTION_LIMIT, MDL_UNIT) %>%
head()
#> # A tibble: 4 × 4
#> # Groups: MDL_UNIT, UNIT [1]
#> RESULT UNIT METHOD_DETECTION_LIMIT MDL_UNIT
#> <dbl> <chr> <dbl> <chr>
#> 1 0.00065 mg/L NA ug/L
Expand Down Expand Up @@ -293,13 +293,13 @@ head(filtered_2yr_lt_lakes_ems)
#> # A tibble: 6 × 24
#> EMS_ID REQUISITION_ID MONITORING_LOCATION LATITUDE LONGITUDE LOCATION_TYPE
#> <chr> <chr> <chr> <dbl> <dbl> <chr>
#> 1 0200052 50253424 WINDERMERE L. OFF TIM… 50.5 -116. LAKE OR POND
#> 2 0200052 50253424 WINDERMERE L. OFF TIM… 50.5 -116. LAKE OR POND
#> 3 0200052 50253424 WINDERMERE L. OFF TIM… 50.5 -116. LAKE OR POND
#> 4 0200052 50253424 WINDERMERE L. OFF TIM… 50.5 -116. LAKE OR POND
#> 5 0200052 50253424 WINDERMERE L. OFF TIM… 50.5 -116. LAKE OR POND
#> 6 0200052 50253424 WINDERMERE L. OFF TIM… 50.5 -116. LAKE OR POND
#> # … with 18 more variables: COLLECTION_START <dttm>, LOCATION_PURPOSE <chr>,
#> 1 0200052 50257596 WINDERMERE L. OFF TIM… 50.5 -116. LAKE OR POND
#> 2 0200052 50257596 WINDERMERE L. OFF TIM… 50.5 -116. LAKE OR POND
#> 3 0200052 50257596 WINDERMERE L. OFF TIM… 50.5 -116. LAKE OR POND
#> 4 0200052 50257596 WINDERMERE L. OFF TIM… 50.5 -116. LAKE OR POND
#> 5 0200052 50257596 WINDERMERE L. OFF TIM… 50.5 -116. LAKE OR POND
#> 6 0200052 50257596 WINDERMERE L. OFF TIM… 50.5 -116. LAKE OR POND
#> # 18 more variables: COLLECTION_START <dttm>, LOCATION_PURPOSE <chr>,
#> # PERMIT <chr>, SAMPLE_CLASS <chr>, SAMPLE_STATE <chr>,
#> # SAMPLE_DESCRIPTOR <chr>, PARAMETER_CODE <chr>, PARAMETER <chr>,
#> # ANALYTICAL_METHOD_CODE <chr>, ANALYTICAL_METHOD <chr>, RESULT_LETTER <chr>,
Expand All @@ -312,7 +312,7 @@ filtered_2yr_lt_lakes_req <- filter_ems_data(two_year, req_id = lt_lake_req(),
"Turbidity"))
head(filtered_2yr_lt_lakes_req)
#> # A tibble: 0 × 24
#> # … with 24 variables: EMS_ID <chr>, REQUISITION_ID <chr>,
#> # 24 variables: EMS_ID <chr>, REQUISITION_ID <chr>,
#> # MONITORING_LOCATION <chr>, LATITUDE <dbl>, LONGITUDE <dbl>,
#> # LOCATION_TYPE <chr>, COLLECTION_START <dttm>, LOCATION_PURPOSE <chr>,
#> # PERMIT <chr>, SAMPLE_CLASS <chr>, SAMPLE_STATE <chr>,
Expand Down
Binary file modified fig/README-unnamed-chunk-11-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 0 additions & 1 deletion rems.Rproj
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,6 @@ Version: 1.0
RestoreWorkspace: Default
SaveWorkspace: Default
AlwaysSaveHistory: Default
QuitChildProcessesOnExit: Default

EnableCodeIndexing: Yes
UseSpacesForTab: Yes
Expand Down
Loading
Loading