Skip to content

Commit

Permalink
fix: writing into user directory (#204)
Browse files Browse the repository at this point in the history
* remove writing into user directory, #203

* uploaded missing vignettes

* updated the version

---------

Co-authored-by: Fersoil <Fersoil>
  • Loading branch information
Fersoil authored Nov 11, 2024
1 parent ce07d2d commit 02aaa80
Show file tree
Hide file tree
Showing 5 changed files with 371 additions and 1 deletion.
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ Type: Package
Title: Reading, Quality Control and Preprocessing of MBA (Multiplex Bead Assay) Data
Description: Speeds up the process of loading raw data from MBA (Multiplex Bead Assay) examinations, performs quality control checks, and automatically normalises the data, preparing it for more advanced, downstream tasks. The main objective of the package is to create a simple environment for a user, who does not necessarily have experience with R language. The package is developed within the project of the same name - 'PvSTATEM', which is an international project aiming for malaria elimination.
BugReports: https://github.com/mini-pw/PvSTATEM/issues
Version: 0.1.1
Version: 0.1.2
License: BSD_3_clause + file LICENSE
Encoding: UTF-8
Authors@R: c(
Expand Down
1 change: 1 addition & 0 deletions R/generate_report.R
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,7 @@ generate_plate_report <-
),
output_file = filename,
output_dir = output_dir,
knit_root_dir = output_dir,
quiet = TRUE
)

Expand Down
87 changes: 87 additions & 0 deletions vignettes/our_datasets.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
---
title: "Our datasets"
author: "Tymoteusz Kwieciński"
date: "`r Sys.Date()`"
vignette: >
%\VignetteIndexEntry{Our datasets}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(
collapse = FALSE,
comment = "#>",
warning = FALSE,
message = FALSE
)
```

# Introduction

Our package's main purpose is to read, perform quality control, and normalise raw MBA data. Unfortunately, different devices and labs have different data formats. We gathered a few datasets on which our package could be tested. This document describes the datasets and their sources.

The majority of our datasets, available for the public, are stored in the `extdata` folder of the package. The remaining private and the more significant number of publicly available datasets are stored in the `OneDrive` folder, which is accessible to the package developers.

## How to access the files

The simple way of accessing the files is to download them from our GitHub repository.

Another way is to source the files using the `system.file` function. The function returns the path to the file, which can be used to read the data. The function has the following syntax:

```{r}
dataset_name <- "CovidOISExPONTENT.csv"
dataset_filepath <- system.file("extdata", dataset_name, package = "PvSTATEM", mustWork = TRUE)
```
The variable `dataset_filepath` now contains the path to the specified dataset on your computer. Since we know the filepath to the desired dataset, we can execute the `read_data` function to read the data. The function has the following syntax:

```{r}
library(PvSTATEM)
plate <- read_luminex_data(dataset_filepath)
plate
```

## Description of the datasets

Our datasets are divided into three main categories:

- **artificial** - the ones created by us to test the package functionalities
- **public** - the publicly available datasets produced in the scope of the PvSTATEM project or by the laboratories participating in the project.
- **external** - the ones gathered from the public domain, external sources, independent from the PvSTATEM project


### Artificial datasets

To perform simple unit tests and validate the most basic reading functionalities of the package, we created a few artificial datasets. The datasets are stored in the package's `extdata` folder. The datasets are:

- `random.csv` - a simple dataset with random values used to test the basic functionalities of the package
- `random2.csv` - another simple dataset with random values used to test the basic functionalities of the package. This file has a corresponding, artificial layout - `random_layout.csv`
- `random_broken_colB.csv` - this dataset has a broken column, which should be detected by the package and reported as a warning

### Public datasets

The datasets from this category are the most important for package development since the package's primary purpose is to simplify the preprocessing of the data in the scope of the PvSTATEM project.

Most of them are stored in the package's `OneDrive` folder. The datasets available in the `extdata` folder are two files coming from the Covid oise examination:

- `CovidOISExPONTENT.csv`, which is a `IG4DC2~1.csv` plate from examination `IgG_CovidOise4_30plex`. It contains the corresponding layout file `CovidOISExPONTENT_layout.xlsx`
- `CovidOISExPONTENT_CO.csv`, which is a `IGG_CO~1.csv` plate from examination `IgG_CovidOise2_30plex` and corresponding layout file

Most of the examples and vignettes in the package are based on these datasets.

### External datasets

We gathered a few datasets from the public domain to check the package functionalities of the data from different sources. The datasets are also stored in the package `OneDrive` folder and in the subfolder `external` of the `extdata` directory. The datasets are:

- `Chul_IgG3_1.csv` - GitHub repo RTSS_Kisumu_Schisto [source](https://github.com/IDEELResearch/RTSS_Kisumu_Schisto/tree/main/data/raw/luminex)

- `Chul_TotalIgG_2.csv` - GitHub repo RTSS_Kisumu_Schisto [source](https://github.com/IDEELResearch/RTSS_Kisumu_Schisto/tree/main/data/raw/luminex)

- `pone.0187901.s001.csv` - data shipped with drLumi package [source](https://doi.org/10.1371/journal.pone.0187901)

- `New_Batch_6_20160309_174224.csv` - dataset included in the paper *A single-nucleotide-polymorphism-based genotyping assay for simultaneous detection of different carbendazim-resistant genotypes in the Fusarium graminearum species complex*, H. Zhang et. al.

- `New_Batch_14_20140513_082522.csv` - dataset included in the paper *A single-nucleotide-polymorphism-based genotyping assay for simultaneous detection of different carbendazim-resistant genotypes in the Fusarium graminearum species complex*, H. Zhang et. al.
146 changes: 146 additions & 0 deletions vignettes/our_plots.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,146 @@
---
title: "Quick introduction to plots created by our package"
author: "Mateusz Nizwantowski"
date: "`r Sys.Date()`"
vignette: >
%\VignetteIndexEntry{Quick introduction to plots created by our package}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
%\VignetteDepends{ggplot2}
%\VignetteDepends{nplr}
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(
collapse = FALSE,
comment = "#>",
warning = FALSE,
message = FALSE,
dpi = 50,
out.width = "70%"
)
```

# Introduction

The `PvSTATEM` package provides a variety of plots that can be used to visualize the Luminex data. In this vignette, we will show how to use them.
To present the package's functionalities, we use a sample dataset from the Covid OISE study, which is pre-loaded into the package.
Firstly, let us load the dataset as the `plate` object.

```{r}
library(PvSTATEM)
plate_filepath <- system.file("extdata", "CovidOISExPONTENT.csv", package = "PvSTATEM", mustWork = TRUE) # get the filepath of the csv dataset
layout_filepath <- system.file("extdata", "CovidOISExPONTENT_layout.xlsx", package = "PvSTATEM", mustWork = TRUE)
plate <- read_luminex_data(plate_filepath, layout_filepath) # read the data
plate
```


# Plate layout

We will omit some validation functionality in this vignette and focus on the plots.
After successfully loading the plate, we should validate it by looking at some basic information using the summary function.
However, we can obtain similar information more visually using the `plot_layout` function.
It helps to quickly asses whether the layout of the plate is correctly read from Luminex or the layout file.
The function takes the `plate` object as the argument.

```{r}
plot_layout(plate)
```

The plot above shows the layout of the plate. The wells are coloured according to the type of sample.
If the user is familiar with the colour scheme of this package, there is an option to turn off the legend.
This can be done by setting the `show_legend` parameter to `FALSE`.

If the plot window is resized, it is recommended that the function be rerun to adjust the scaling of the plot.
Sometimes, the whole layout may be shifted when a legend is plotted.
To solve this issue, one has to stretch the window toward the layout shift, and everything will be adjusted automatically.

# Counts for a given analyte
The `plot_counts` function allows us to visualize the counts of the analyte in the plate.
This plot is useful for quickly spotting wells with a count that is too low to interpret results with high confidence.
The function takes the `plate` object and the analyte name as the arguments.
The function will return an error message if there is a typo in the analyte name.

```{r}
plot_counts(plate, "Spike_B16172")
```

The plot above shows the the analyte "OC43_NP_NA" counts in the plate. The wells are coloured according to the count of the analyte.
Too-low values are marked with red, values on the edge of the threshold are marked with yellow, and the rest are marked with green.
There is an option to show legend by setting the `show_legend` parameter to `TRUE`.
There is also an option to show the colours without the counts by setting the `show_counts` parameter to `FALSE`. This provides a cleaner plot without the counts.

```{r}
plot_counts(plate, "FluA", plot_counts = FALSE)
```

# Distribution of MFI values

The `plot_mfi_distribution` function allows us to visualize the distribution of the MFI values for test samples for the given analyte. And how they compare to standard curve samples on a given plate.
This plot is helpful to asses if the standard curve samples cover the whole range of MFI of test samples.
The function takes the `plate` object and the analyte name as the arguments.
The function will return an error message if there is a typo in the analyte name.

```{r}
plot_mfi_for_analyte(plate, "Spike_B16172")
```

This plot shows the distribution of the MFI values for test samples for the analyte "OC43_NP_NA". The test samples are coloured in blue, and the standard curve samples are coloured in red.
The default plot type is violin, but there is an option to change it to the boxplot by setting the `plot_type` parameter to `boxplot`.

```{r}
plot_mfi_for_analyte(plate, "FluA", plot_type = "boxplot")
```

Additionally, we can modify the scale of y-axis by setting the `scale_y` to the desired transformation from ggplot2 package. In case of `boxplot` type of plot, we may include the outliers by the `plot_outliers` parameter.

```{r}
plot_mfi_for_analyte(plate, "FluA", plot_type = "boxplot", scale_y = "identity", plot_outliers = TRUE)
```

# Standard curve plots



Finally, we arrive at the most crucial visualization in our package - the standard curve-related plots.
Those plots help assess the quality of the fit, which will be crucial to us in the next step of package development.
It comes in two flavors: `plot_standard_curve_analyte` and `plot_standard_curve_analyte_with_model`.
The first does not incorporate the model, while the second does.

## Standard curve plot without model


This plot should be used to assess the quality of the assay.
If anything goes wrong during the plate preparation, it should be visible easily in this plot.

```{r}
plot_standard_curve_analyte(plate, "Spike_B16172")
```

Above, we see the default plot for the analyte "Spike_B16172". We can modify this plot by setting the parameters of the function.
For example, we can change the direction of the x-axis by setting `decreasing_rau_order` parameter to `FALSE`.
Other parameters worth mentioning are `log_scale`, the default value is `c("all")`, which means that both the x and y axes are in the log scale.
Other parameters worth mentioning are `log_scale`, the default value of which is `c("all")`, which means that both the x and y axes are on the log scale. There is also an option to turn off some parts of the plot by setting parameters `plot_line`, `plot_blank_mean` and `plot_rau_bounds` to `FALSE`.
The first disables drawing the line between standard curve points, the second turns off plotting the mean of blank samples, and the last disables plotting the RAU value bounds.

## Standard curve plot with model

This visualization is similar to the previous one but also incorporates the model.
Thus, it carries more information at the cost of being more complex and crowded.

```{r}
model <- create_standard_curve_model_analyte(plate, analyte_name = "Spike_B16172")
plot_standard_curve_analyte_with_model(plate, model)
```

Here, we do not have to specify the analyte name, as the model already carries this information.
The model is created by the `create_standard_curve_model_analyte` function, which takes the `plate` object and the analyte name as the arguments, but this is not the focus of this vignette.
The arguments of this function are very similar to the previous one, except here there is a missing `plot_line` argument, and there are two new arguments: `plot_asymptote` and `plot_test_predictions`.
The first turns off the asymptotes, and the second disables plotting the test samples' predictions.
By default, both are set to `TRUE`.
136 changes: 136 additions & 0 deletions vignettes/reports.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,136 @@
---
title: "Quick overview of reports generated by PvSTATEM package"
author: "Mateusz Nizwantowski"
date: "`r Sys.Date()`"
vignette: >
%\VignetteIndexEntry{Quick overview of reports generated by PvSTATEM package}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
%\VignetteDepends{ggplot2}
%\VignetteDepends{nplr}
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(
collapse = FALSE,
comment = "#>",
warning = FALSE,
message = FALSE,
dpi = 50,
out.width = "70%"
)
```

# Introduction

The `PvSTATEM` package provides, for now, one report that can be generated using the `generate_plate_report` function.
It is an HTML report that contains a summary of the plate. It was optimized for size so it could be sent via email.
In the future, we plan to add to the package report about Levey-Jennings plots. For now, we will focus on the plate summary report.
To present this functionality, we use a sample dataset from the Covid OISE study, which is pre-loaded into the package.
Firstly, let us load the dataset as the `plate` object.

```{r}
library(PvSTATEM)
plate_filepath <- system.file("extdata", "CovidOISExPONTENT.csv", package = "PvSTATEM", mustWork = TRUE)
layout_filepath <- system.file("extdata", "CovidOISExPONTENT_layout.xlsx", package = "PvSTATEM", mustWork = TRUE)
plate <- read_luminex_data(plate_filepath, layout_filepath)
plate
```

# Generating the report

To generate the report, we need to use the `generate_plate_report` function. There is one parameter that needs to be set, which is the `plate` object.
Generation of the report takes a few seconds, so please be patient. It can take up to a minute for a large plate.

```{r, eval=FALSE}
generate_plate_report(plate)
```

Default report has 4 main sections:

- The most important information about the plate

<img src="img/report_summary.png" alt="Image description" width="600px" style="border: 1px solid black;" />

- The layout of the plate

<img src="img/report_layout.jpg" alt="Image description" width="600px" style="border: 1px solid black;" />

- The preview of standard curves

<img src="img/report_thumbnails.png" alt="Image description" width="600px" style="border: 1px solid black;" />

- Detailed information about analytes: this section has tabs for each analyte, where the user can select the analyte of interest.

<img src="img/report_analyte_details.png" alt="Image description" width="600px" style="border: 1px solid black;" />

# Additional parameters

The user can customize the report by setting additional parameters.
The `generate_plate_report` function has the following optional parameters:

- `additonal_notes` - string with additional notes that will be added to the report

```{r, eval=FALSE}
notes <- "
This is an example of additional notes that can be added to the report.
The notes support markdown syntax, for example:
**Author: Jane Doe** - bold
*Date: 2024-10-28* - italic
~~Complited~~ - strikethrough
H~2~O - subscripts
X^2^ - superscripts
[text that will be displayed](https://www.google.com) - link to resource
Ordered list:
1. First item
2. Second item
3. Third item
Unordered list:
- First item
- Second item
- Third item
> This is a blockquote
#### This is a heading
##### This is a subheading
###### This is a subsubheading
Even though headings #, ##, ### are supported, it is recommended not to use them, as the report has its own structure that is built around ### headings.
"
generate_plate_report(plate, additional_notes = notes)
```

Such notes looks like this in the report:

<img src="img/additional_notes.jpg" alt="Image description" width="600px" style="border: 1px solid black;" />


- `counts_lower_threshold` - the lower threshold for counts plot, works the same way as in the `plot_counts` function
it is used to change the threshold between green and yellow colours
- `counts_higher_threshold` - the higher threshold for counts plot, works the same way as in the `plot_counts` function
it is used to change the threshold between yellow and red colours
- `filename` - The name of the output HTML report file. If not provided or equals to `NULL`,
the output filename will be based on the plate name, precisely: `{plate_name}_report.html`.
By default the `plate_name` is the filename of the input file that contains the plate data.
For more details, please refer to the documentation about the `Plate` object. If the passed filename does not contain the `.html` extension, the default extension `.html` will be added.
Filename can also be a path to a file, e.g. `path/to/file.html`. In this case, the `output_dir` and `filename` will be joined together.
However, if the passed filepath is an absolute path and the `output_dir` parameter is also provided, the `output_dir` parameter will be ignored.
If a file already exists under a specified filepath, the function will overwrite it.
- `output_dir` - The directory where the output CSV file should be saved. Please note that any directory path provided will create all necessary directories (including parent directories) if they do not exist.
If it equals to `NULL` the current working directory will be used. Default is 'reports'.

```{r, eval=FALSE}
generate_plate_report(plate,
additional_notes = notes,
counts_lower_threshold = 10,
counts_higher_threshold = 100,
filename = "example_report.html",
output_dir = "reports_from_new_plates"
)
```

0 comments on commit 02aaa80

Please sign in to comment.