Skip to content

Commit

Permalink
add explicit sql content (fix #77 and fix #102)
Browse files Browse the repository at this point in the history
  • Loading branch information
avallecam committed Oct 1, 2024
1 parent 6fadf3b commit 44ff780
Showing 1 changed file with 94 additions and 16 deletions.
110 changes: 94 additions & 16 deletions episodes/read-cases.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -148,54 +148,132 @@ rio::import(here::here("data", "Marburg.zip"))
The [DBI](https://dbi.r-dbi.org/) package serves as a versatile interface for interacting with database management
systems (DBMS) across different back-ends or servers. It offers a uniform method for accessing and retrieving data from various database systems.

::::::::::::: discuss

The following code chunk demonstrates how to create a temporary SQLite database in memory, store the `ebola_confirmed` as a table on it, and subsequently read it:
### When to read directly from a database?

We can use database interface packages to optimize memory usage. If we process the database with "queries" (e.g., select, filter, summarise) before extraction, we can reduce the memory load in our RStudio session. Conversely, conducting all data manipulation outside the database management system can lead to occupying more disk space than desired running out of memory.

:::::::::::::

The following code chunk demonstrates in four steps how to create a temporary SQLite database in memory, store the `ebola_confirmed` as a table on it, and subsequently read it:

### 1. Connect with a database

First, we establish a connection to an SQLite database created in memory using `DBI::dbConnect()`.

```{r,warning=FALSE,message=FALSE}
library(DBI)
library(RSQLite)
# Create a temporary SQLite database in memory
db_con <- DBI::dbConnect(
db_connection <- DBI::dbConnect(
drv = RSQLite::SQLite(),
dbname = ":memory:"
)
```

::::::::::::::::: callout

A real-life connection would look like this:

```r
# in real-life
db_connection <- DBI::dbConnect(
RSQLite::SQLite(),
host = "database.epiversetrace.com",
user = "juanito",
password = epiversetrace::askForPassword("Database password")
)
```

:::::::::::::::::

### 2. Write a local data frame as a table in a database

Then, we can write the `ebola_confirmed` into a table named `cases` within the database using the `DBI::dbWriteTable()` function.

```{r,warning=FALSE,message=FALSE}
# Store the 'ebola_confirmed' dataframe as a table named 'cases'
# in the SQLite database
DBI::dbWriteTable(
conn = db_con,
conn = db_connection,
name = "cases",
value = ebola_confirmed
)
```

# Read data from the 'cases' table
result <- DBI::dbReadTable(
conn = db_con,
name = "cases"
)
In a database framework, you can have more than one table. Each table can belong to a specific `entity` (e.g., patients, care units, jobs). All tables will be related by a common ID or `primary key`.

# Close the database connection
DBI::dbDisconnect(conn = db_con)
### 3. Read data from a table in a database

# View the result
result %>%
dplyr::as_tibble() # for a simple data frame output
<!-- Subsequently, we reads the data from the `cases` table using `DBI::dbReadTable()`. -->

<!-- ```{r,warning=FALSE,message=FALSE} -->
<!-- # Read data from the 'cases' table -->
<!-- extracted_data <- DBI::dbReadTable( -->
<!-- conn = db_connection, -->
<!-- name = "cases" -->
<!-- ) -->
<!-- ``` -->

Subsequently, we reads the data from the `cases` table using `dplyr::tbl()`.

```{r}
# Read one table from the database
mytable_db <- dplyr::tbl(src = db_connection, "cases")
```

If we apply `{dplyr}` verbs to this database SQLite table, these verbs will be translated to SQL queries.

```{r}
# Show the SQL queries translated
mytable_db %>%
dplyr::filter(confirm > 50) %>%
dplyr::arrange(desc(confirm)) %>%
dplyr::show_query()
```

This code first establishes a connection to an SQLite database created in memory using `dbConnect()`. Then, it writes the `ebola_confirmed` into a table named 'cases' within the database using the `dbWriteTable()` function. Subsequently, it reads the data from the 'cases' table using `dbReadTable()`. Finally, it closes the database connection with `dbDisconnect()`.
### 4. Extract data from the database

Use `dplyr::collect()` to force computation of a database query and extract the output to your local computer.

```{r}
# Pull all data down to a local tibble
extracted_data <- mytable_db %>%
dplyr::filter(confirm > 50) %>%
dplyr::arrange(desc(confirm)) %>%
dplyr::collect()
```

The `extracted_data` object represents the extracted, ideally after specifying queries that reduces its size.

```{r,warning=FALSE,message=FALSE}
# View the extracted_data
extracted_data %>%
dplyr::as_tibble() # for a simple data frame output
```

:::::::::::::::::::::: callout

### Run SQL queries in R using dbplyr

We can use database interface packages to optimize memory usage. If we process the database with "queries" (e.g., select, filter, summarise) before extraction, we can reduce the memory load in our RStudio session. Conversely, conducting all data manipulation outside the database management system can lead to occupying more disk space than desired running out of memory.
Practice how to make relational database SQL queries using multiple `{dplyr}` verbs like `dplyr::left_join()` among tables before pulling down data to your local session with `dplyr::collect()`!

Read this [tutorial episode on SQL databases and R](https://datacarpentry.org/R-ecology-lesson/05-r-and-databases.html#complex-database-queries) to practice how to make relational database SQL queries using multiple {dplyr} verbs like `left_join()` among tables before pulling down data to your local session with `collect()`!
You can also review the `{dbplyr}`.

::::::::::::::::::::::


### 5. Close the database connection

Finally, we can close the database connection with `dbDisconnect()`.

```{r,warning=FALSE,message=FALSE}
# Close the database connection
DBI::dbDisconnect(conn = db_connection)
```

## Reading from HIS APIs

Health related data are also increasingly stored in specialized HIS APIs like **Fingertips**, **GoData**, **REDCap**, and **DHIS2**. In such case one can resort to [readepi](https://epiverse-trace.github.io/readepi/) package, which enables reading data from HIS-APIs.
Expand Down

0 comments on commit 44ff780

Please sign in to comment.