Skip to content

Commit

Permalink
Merge pull request #78 from bcgov/fix-read-duckdb
Browse files Browse the repository at this point in the history
Fix read duckdb
  • Loading branch information
KarHarker authored Aug 10, 2023
2 parents af9c259 + f5b2df1 commit 85cbbca
Show file tree
Hide file tree
Showing 11 changed files with 224 additions and 160 deletions.
84 changes: 21 additions & 63 deletions .github/workflows/R-CMD-check.yaml
Original file line number Diff line number Diff line change
@@ -1,12 +1,10 @@
# For help debugging build failures open an issue on the RStudio community with the 'github-actions' tag.
# https://community.rstudio.com/new-topic?category=Package%20development&tags=github-actions
# Workflow derived from https://github.com/r-lib/actions/tree/v2/examples
# Need help debugging build failures? Start at https://github.com/r-lib/actions#where-to-find-help
on:
push:
branches:
- '**'
branches: [main, master]
pull_request:
branches:
- master
branches: [main, master]

name: R-CMD-check

Expand All @@ -20,72 +18,32 @@ jobs:
fail-fast: false
matrix:
config:
- {os: macos-latest, r: 'release'}
- {os: windows-latest, r: 'release'}
- {os: macOS-latest, r: 'release'}
- {os: ubuntu-18.04, r: 'release', rspm: "https://packagemanager.rstudio.com/cran/__linux__/bionic/latest"}
- {os: ubuntu-18.04, r: 'devel', rspm: "https://packagemanager.rstudio.com/cran/__linux__/bionic/latest"}
- {os: ubuntu-latest, r: 'devel', http-user-agent: 'release'}
- {os: ubuntu-latest, r: 'release'}
- {os: ubuntu-latest, r: 'oldrel-1'}

env:
R_REMOTES_NO_ERRORS_FROM_WARNINGS: true
RSPM: ${{ matrix.config.rspm }}
cache-version: v2
GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}
R_KEEP_PKG_SOURCE: yes

steps:
- uses: actions/checkout@v2
- uses: actions/checkout@v3

- uses: r-lib/actions/setup-r@master
- uses: r-lib/actions/setup-pandoc@v2

- uses: r-lib/actions/setup-r@v2
with:
r-version: ${{ matrix.config.r }}
http-user-agent: ${{ matrix.config.http-user-agent }}
use-public-rspm: true

- uses: r-lib/actions/setup-pandoc@master

- name: Query dependencies
run: |
install.packages('remotes')
saveRDS(remotes::dev_package_deps(dependencies = TRUE), ".github/depends.Rds", version = 2)
writeLines(sprintf("R-%i.%i", getRversion()$major, getRversion()$minor), ".github/R-version")
shell: Rscript {0}

- name: Cache R packages
if: runner.os != 'Windows'
uses: actions/cache@v2
- uses: r-lib/actions/setup-r-dependencies@v2
with:
path: ${{ env.R_LIBS_USER }}
key: ${{ env.cache-version }}-${{ runner.os }}-${{ hashFiles('.github/R-version') }}-1-${{ hashFiles('.github/depends.Rds') }}
restore-keys: ${{ env.cache-version }}-${{ runner.os }}-${{ hashFiles('.github/R-version') }}-1-

- name: Install system dependencies
if: runner.os == 'Linux'
run: |
while read -r cmd
do
eval sudo $cmd
done < <(Rscript -e 'writeLines(remotes::system_requirements("ubuntu", "20.04"))')
extra-packages: any::rcmdcheck
needs: check

- name: Install dependencies
run: |
remotes::install_deps(dependencies = TRUE)
# latest dev version of duckdb
# install.packages("https://github.com/cwida/duckdb/releases/download/v0.2.5/duckdb_r_src.tar.gz", repos = NULL)
remotes::install_cran("rcmdcheck")
shell: Rscript {0}

- name: Check
env:
_R_CHECK_CRAN_INCOMING_REMOTE_: false
run: rcmdcheck::rcmdcheck(args = c("--no-manual", "--as-cran"), error_on = "warning", check_dir = "check")
shell: Rscript {0}

- name: Upload check results
if: failure()
uses: actions/upload-artifact@main
- uses: r-lib/actions/check-r-package@v2
with:
name: ${{ runner.os }}-r${{ matrix.config.r }}-results
path: check

- name: Test coverage
if: matrix.config.os == 'macOS-latest' && matrix.config.r == 'release'
run: |
install.packages("covr")
covr::codecov(token = "${{secrets.CODECOV_TOKEN}}")
shell: Rscript {0}
upload-snapshots: true
50 changes: 50 additions & 0 deletions .github/workflows/test-coverage.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
# Workflow derived from https://github.com/r-lib/actions/tree/v2/examples
# Need help debugging build failures? Start at https://github.com/r-lib/actions#where-to-find-help
on:
push:
branches: [main, master]
pull_request:
branches: [main, master]

name: test-coverage

jobs:
test-coverage:
runs-on: ubuntu-latest
env:
GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}

steps:
- uses: actions/checkout@v3

- uses: r-lib/actions/setup-r@v2
with:
use-public-rspm: true

- uses: r-lib/actions/setup-r-dependencies@v2
with:
extra-packages: any::covr
needs: coverage

- name: Test coverage
run: |
covr::codecov(
quiet = FALSE,
clean = FALSE,
install_path = file.path(Sys.getenv("RUNNER_TEMP"), "package")
)
shell: Rscript {0}

- name: Show testthat output
if: always()
run: |
## --------------------------------------------------------------------
find ${{ runner.temp }}/package -name 'testthat.Rout*' -exec cat '{}' \; || true
shell: bash

- name: Upload test results
if: failure()
uses: actions/upload-artifact@v3
with:
name: coverage-test-failures
path: ${{ runner.temp }}/package
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ Imports:
rappdirs (>= 0.3.1),
readr (>= 2.1.0),
rlang,
duckdb (>= 0.3.1),
duckdb (>= 0.8.1),
storr (>= 1.2.5),
stringr (>= 1.4.0),
tibble (>= 3.1.5),
Expand Down
2 changes: 1 addition & 1 deletion NEWS.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# rems (development version)

* `ask` parameter now skips all verification (caching and updates) when set to `FALSE` (#67, #68)
* `download_historic_data()` now has more flixible options to update the historic database (principally to enable better updating behaviour in [shinyrems](https://github.com/bcgov/shinyrems)). (#69, thanks @aylapear)
* `download_historic_data()` now has more flexible options to update the historic database (principally to enable better updating behaviour in [shinyrems](https://github.com/bcgov/shinyrems)). (#69, thanks @aylapear)
* removed capability to register/convert some units due to a bug in the C library underlying the {units} package (https://github.com/r-quantities/units/issues/301). (#70)

# rems 0.8.0
Expand Down
27 changes: 21 additions & 6 deletions R/duckdb_create.R
Original file line number Diff line number Diff line change
Expand Up @@ -41,10 +41,30 @@ create_rems_duckdb <- function(csv_file, db_path, cache_date) {
paste(historic_col_names, historic_col_sql_types, collapse = ', '),
')'))

# For some reason this file fails when read using multithreading,
# so disable parallel reading by default.
parallel <- getOption("rems.duckdb_read_parallel", default = FALSE)

DBI::dbExecute(con,
glue("COPY {tbl_name} from '{csv_file}' ( HEADER, TIMESTAMPFORMAT '{ems_timestamp_format()}' )")
glue("COPY {tbl_name} from '{csv_file}'
( HEADER, TIMESTAMPFORMAT '{ems_timestamp_format()}',
PARALLEL {as.character(parallel)} )")
)

if (getOption("rems.duckdb_build_indexes", default = FALSE)) {
# With duckdb v0.8.1 a min-max index is automatically created for columns
# of all general-purpose data types. ART indexes must be able to fit in
# memory, and since this won't on most machines they are not built by default.
# https://duckdb.org/docs/archive/0.8.1/sql/indexes
add_indexes(con)
}

set_cache_date("historic", cache_date)

invisible(TRUE)
}

add_indexes <- function(con) {
message("Adding database indexes")
cat_if_interactive("|=")
add_sql_index(con, colname = "EMS_ID")
Expand All @@ -70,10 +90,5 @@ create_rems_duckdb <- function(csv_file, db_path, cache_date) {
add_sql_index(con, colname = "PARAMETER")
cat_if_interactive("=")
add_sql_index(con, colname = "PARAMETER_CODE")

cat_if_interactive("| 100%\n")

set_cache_date("historic", cache_date)

invisible(TRUE)
}
4 changes: 4 additions & 0 deletions R/rems-package.R
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
#' @keywords internal
#' @importFrom rlang .data
"_PACKAGE"

ignore_unused_imports <- function() {
dbplyr::tbl_lazy
}
1 change: 1 addition & 0 deletions README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ knitr::opts_chunk$set(
[![CRAN status](https://www.r-pkg.org/badges/version/rems)](https://cran.r-project.org/package=rems)
[![R build status](https://github.com/bcgov/rems/workflows/R-CMD-check/badge.svg)](https://github.com/bcgov/rems/actions)
[![img](https://img.shields.io/badge/Lifecycle-Maturing-007EC6)](https://github.com/bcgov/repomountie/blob/master/doc/lifecycle-badges.md)
[![R-CMD-check](https://github.com/bcgov/rems/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/bcgov/rems/actions/workflows/R-CMD-check.yaml)
<!-- badges: end -->

## Overview
Expand Down
68 changes: 34 additions & 34 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ status](https://www.r-pkg.org/badges/version/rems)](https://cran.r-project.org/p
[![R build
status](https://github.com/bcgov/rems/workflows/R-CMD-check/badge.svg)](https://github.com/bcgov/rems/actions)
[![img](https://img.shields.io/badge/Lifecycle-Maturing-007EC6)](https://github.com/bcgov/repomountie/blob/master/doc/lifecycle-badges.md)
[![R-CMD-check](https://github.com/bcgov/rems/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/bcgov/rems/actions/workflows/R-CMD-check.yaml)
<!-- badges: end -->

## Overview
Expand Down Expand Up @@ -46,8 +47,8 @@ source the packages which require compilation”*, choose **“No”**.

**NOTE:** If you are using Windows, you must be running the 64-bit
version of R, as the 32-bit version cannot handle the size of the EMS
data. In RStudio, click on Tools -> Global Options and ensure the 64 bit
version is chosen in the *R version* box.
data. In RStudio, click on Tools -\> Global Options and ensure the 64
bit version is chosen in the *R version* box.

You can use the `get_ems_data()` function to get last two years of data
(you can also specify `which = "4yr"` to get the last four years of
Expand All @@ -61,18 +62,18 @@ two_year <- get_ems_data(which = "2yr", ask = FALSE)
#> Caching data on disk...
#> Loading data...
nrow(two_year)
#> [1] 1092314
#> [1] 2214758
head(two_year)
#> # A tibble: 6 × 24
#> EMS_ID REQUISITION_ID MONITORING_LOCATION LATITUDE LONGITUDE LOCATION_TYPE
#> <chr> <chr> <chr> <dbl> <dbl> <chr>
#> 1 0121580 VA21A0072 ENGLISHMAN RIVER AT P49.3 -124. RIVER,STREAM…
#> 2 0121580 VA21A0072 ENGLISHMAN RIVER AT P49.3 -124. RIVER,STREAM…
#> 3 0121580 VA21A0072 ENGLISHMAN RIVER AT P49.3 -124. RIVER,STREAM…
#> 4 0121580 VA21A0072 ENGLISHMAN RIVER AT P49.3 -124. RIVER,STREAM…
#> 5 0121580 VA21A0072 ENGLISHMAN RIVER AT P49.3 -124. RIVER,STREAM…
#> 6 0121580 VA21A0072 ENGLISHMAN RIVER AT P49.3 -124. RIVER,STREAM…
#> # … with 18 more variables: COLLECTION_START <dttm>, LOCATION_PURPOSE <chr>,
#> 1 0120802 6983960101 COWICHAN RIVER AT HIG48.8 -124. RIVER,STREAM…
#> 2 0120802 6983960101 COWICHAN RIVER AT HIG48.8 -124. RIVER,STREAM…
#> 3 0120802 6983960101 COWICHAN RIVER AT HIG48.8 -124. RIVER,STREAM…
#> 4 0120802 6983960101 COWICHAN RIVER AT HIG48.8 -124. RIVER,STREAM…
#> 5 0120802 6983960101 COWICHAN RIVER AT HIG48.8 -124. RIVER,STREAM…
#> 6 0120802 6983960101 COWICHAN RIVER AT HIG48.8 -124. RIVER,STREAM…
#> # 18 more variables: COLLECTION_START <dttm>, LOCATION_PURPOSE <chr>,
#> # PERMIT <chr>, SAMPLE_CLASS <chr>, SAMPLE_STATE <chr>,
#> # SAMPLE_DESCRIPTOR <chr>, PARAMETER_CODE <chr>, PARAMETER <chr>,
#> # ANALYTICAL_METHOD_CODE <chr>, ANALYTICAL_METHOD <chr>, RESULT_LETTER <chr>,
Expand Down Expand Up @@ -179,13 +180,13 @@ head(all_data)
#> # A tibble: 6 × 24
#> EMS_ID REQUISITION_ID MONITORING_LOCATION LATITUDE LONGITUDE LOCATION_TYPE
#> <chr> <chr> <chr> <dbl> <dbl> <chr>
#> 1 0126400 <NA> QUINSAM RIVER AT THE … 50.0 -125. RIVER,STREAM…
#> 2 0126400 08189513 QUINSAM RIVER AT THE … 50.0 -125. RIVER,STREAM…
#> 3 0121580 08312097 ENGLISHMAN RIVER AT P… 49.3 -124. RIVER,STREAM…
#> 4 0121580 08214745 ENGLISHMAN RIVER AT P49.3 -124. RIVER,STREAM…
#> 5 0126400 08193504 QUINSAM RIVER AT THE … 50.0 -125. RIVER,STREAM…
#> 6 0126400 08334138 QUINSAM RIVER AT THE 50.0 -125. RIVER,STREAM…
#> # … with 18 more variables: COLLECTION_START <dttm>, LOCATION_PURPOSE <chr>,
#> 1 0126400 08103000 QUINSAM RIVER AT THE … 50.0 -125. RIVER,STREAM…
#> 2 0126400 <NA> QUINSAM RIVER AT THE … 50.0 -125. RIVER,STREAM…
#> 3 0121580 08204191 ENGLISHMAN RIVER AT P… 49.3 -124. RIVER,STREAM…
#> 4 0126400 50126715 QUINSAM RIVER AT THE 50.0 -125. RIVER,STREAM…
#> 5 0126400 08192637 QUINSAM RIVER AT THE … 50.0 -125. RIVER,STREAM…
#> 6 0121580 08193908 ENGLISHMAN RIVER AT P49.3 -124. RIVER,STREAM…
#> # 18 more variables: COLLECTION_START <dttm>, LOCATION_PURPOSE <chr>,
#> # PERMIT <chr>, SAMPLE_CLASS <chr>, SAMPLE_STATE <chr>,
#> # SAMPLE_DESCRIPTOR <chr>, PARAMETER_CODE <chr>, PARAMETER <chr>,
#> # ANALYTICAL_METHOD_CODE <chr>, ANALYTICAL_METHOD <chr>, RESULT_LETTER <chr>,
Expand All @@ -207,14 +208,14 @@ filter(all_data, UNIT != MDL_UNIT) %>%
select(RESULT, UNIT, METHOD_DETECTION_LIMIT, MDL_UNIT) %>%
head()
#> # A tibble: 6 × 4
#> RESULT UNIT METHOD_DETECTION_LIMIT MDL_UNIT
#> <dbl> <chr> <dbl> <chr>
#> 1 0.00207 mg/L 0.05 ug/L
#> 2 0.00053 mg/L 0.05 ug/L
#> 3 0.105 mg/L 0.2 ug/L
#> 4 0.00078 mg/L 0.02 ug/L
#> 5 0.0002 mg/L 0.2 ug/L
#> 6 0.0138 mg/L 0.5 ug/L
#> RESULT UNIT METHOD_DETECTION_LIMIT MDL_UNIT
#> <dbl> <chr> <dbl> <chr>
#> 1 0.000001 mg/L 0.001 ug/L
#> 2 0.0302 mg/L 0.2 ug/L
#> 3 0.0005 mg/L 0.2 ug/L
#> 4 0.00057 mg/L 0.02 ug/L
#> 5 0.000002 mg/L 0.001 ug/L
#> 6 0.000001 mg/L 0.001 ug/L

all_data <- standardize_mdl_units(all_data)
#> Successfully converted units in 2172 rows.
Expand All @@ -224,7 +225,6 @@ filter(all_data, UNIT != MDL_UNIT) %>%
select(RESULT, UNIT, METHOD_DETECTION_LIMIT, MDL_UNIT) %>%
head()
#> # A tibble: 4 × 4
#> # Groups: MDL_UNIT, UNIT [1]
#> RESULT UNIT METHOD_DETECTION_LIMIT MDL_UNIT
#> <dbl> <chr> <dbl> <chr>
#> 1 0.00065 mg/L NA ug/L
Expand Down Expand Up @@ -293,13 +293,13 @@ head(filtered_2yr_lt_lakes_ems)
#> # A tibble: 6 × 24
#> EMS_ID REQUISITION_ID MONITORING_LOCATION LATITUDE LONGITUDE LOCATION_TYPE
#> <chr> <chr> <chr> <dbl> <dbl> <chr>
#> 1 0200052 50253424 WINDERMERE L. OFF TIM… 50.5 -116. LAKE OR POND
#> 2 0200052 50253424 WINDERMERE L. OFF TIM… 50.5 -116. LAKE OR POND
#> 3 0200052 50253424 WINDERMERE L. OFF TIM… 50.5 -116. LAKE OR POND
#> 4 0200052 50253424 WINDERMERE L. OFF TIM… 50.5 -116. LAKE OR POND
#> 5 0200052 50253424 WINDERMERE L. OFF TIM… 50.5 -116. LAKE OR POND
#> 6 0200052 50253424 WINDERMERE L. OFF TIM… 50.5 -116. LAKE OR POND
#> # … with 18 more variables: COLLECTION_START <dttm>, LOCATION_PURPOSE <chr>,
#> 1 0200052 50257596 WINDERMERE L. OFF TIM… 50.5 -116. LAKE OR POND
#> 2 0200052 50257596 WINDERMERE L. OFF TIM… 50.5 -116. LAKE OR POND
#> 3 0200052 50257596 WINDERMERE L. OFF TIM… 50.5 -116. LAKE OR POND
#> 4 0200052 50257596 WINDERMERE L. OFF TIM… 50.5 -116. LAKE OR POND
#> 5 0200052 50257596 WINDERMERE L. OFF TIM… 50.5 -116. LAKE OR POND
#> 6 0200052 50257596 WINDERMERE L. OFF TIM… 50.5 -116. LAKE OR POND
#> # 18 more variables: COLLECTION_START <dttm>, LOCATION_PURPOSE <chr>,
#> # PERMIT <chr>, SAMPLE_CLASS <chr>, SAMPLE_STATE <chr>,
#> # SAMPLE_DESCRIPTOR <chr>, PARAMETER_CODE <chr>, PARAMETER <chr>,
#> # ANALYTICAL_METHOD_CODE <chr>, ANALYTICAL_METHOD <chr>, RESULT_LETTER <chr>,
Expand All @@ -312,7 +312,7 @@ filtered_2yr_lt_lakes_req <- filter_ems_data(two_year, req_id = lt_lake_req(),
"Turbidity"))
head(filtered_2yr_lt_lakes_req)
#> # A tibble: 0 × 24
#> # … with 24 variables: EMS_ID <chr>, REQUISITION_ID <chr>,
#> # 24 variables: EMS_ID <chr>, REQUISITION_ID <chr>,
#> # MONITORING_LOCATION <chr>, LATITUDE <dbl>, LONGITUDE <dbl>,
#> # LOCATION_TYPE <chr>, COLLECTION_START <dttm>, LOCATION_PURPOSE <chr>,
#> # PERMIT <chr>, SAMPLE_CLASS <chr>, SAMPLE_STATE <chr>,
Expand Down
Binary file modified fig/README-unnamed-chunk-11-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 0 additions & 1 deletion rems.Rproj
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,6 @@ Version: 1.0
RestoreWorkspace: Default
SaveWorkspace: Default
AlwaysSaveHistory: Default
QuitChildProcessesOnExit: Default

EnableCodeIndexing: Yes
UseSpacesForTab: Yes
Expand Down
Loading

0 comments on commit 85cbbca

Please sign in to comment.