DREAMS is an analysis pipeline that applies a neural network to low-frequency variant calling and circulating tumor DNA detection from next-generation DNA sequencing data.
You can install the development version of dreams from GitHub with:
# install.packages("devtools")
devtools::install_github("JakobSkouPedersenLab/dreams")
If you encounter any issues related to TensorFlow integrations within R, install Keras within the correct python environment to ensure a proper setup:
keras::install_keras(envname = "<ENVIRONMENT_NAME>")
After installation, set the environment at the start of each R session:
reticulate::use_condaenv("<ENVIRONMENT_NAME>", required = TRUE)
This section provides an overview of the basic functions available in the dreams library.
The first steps in using DREAMS involve preparing your data and setting up the model for training, crucial for effective variant calling and cancer detection.
library(dreams)
# For training, DREAMS requires one or more BAM files and a reference genome.
training_data = get_training_data(
bam_paths = "/path/bam_file",
reference_path = "/path/hg38.fa",
...)
# Training the DREAMS Model using a Neural Network
# Basic settings for Keras are required.
model = train_dreams_model(
training_data,
layers = c(128, 64, 32),
model_features = c("read_index", "strand", "trinucleotide_ctx", "first_in_pair",
"umi_count", "seq_length", "fragment_size", "n_other_errors",
"local_GC"),
lr = 0.01,
batch_size = 32000,
epochs = 750,
model_file_path = NULL,
...)
The DREAMS model supports a variety of features categorized into numeric, categorical, and embedded types:
read_index
,fragment_size
,local_GC
,umi_count
,umi_errors
,local_complexity_1
,local_complexity_2
,n_other_errors
,prior_error
,seq_length
ref
,strand
,first_in_pair
,ctx_minus1
,ctx_plus1
,chr
,genomic_pos
trinucleotide_ctx
Ensure that the dataset used aligns with the selected features and
adjust the parameters such as layers
, lr
, batch_size
, and epochs
as needed.
The statistical methods dreams_vc
and dreams_cc
can be used for
variant calling and cancer detection, respectively.
# Call variants using DREAMS-vc
variant_calls = dreams_vc(
mutations_df = mutations_df,
bam_file_path = "/path/test_bam_file",
reference_path = "/path/hg38.fa",
model = model,
...)
# Call cancer using DREAMS-cc
cancer_calls = dreams_cc(
mutations_df = mutations_df,
bam_file_path = "/path/test_bam_file",
reference_path = "/path/hg38.fa",
model = model,
...)
You can save your trained models for later use and load them as needed.
To save a trained model, use the save_model_hdf5
function from the
keras
package. Specify the file path where you want to save the model.
Similarly, when training a model using the train_dreams_model
function, you can directly specify a file path where to save the model
using the model_file_path
argument. This allows for automatic saving
of the model upon training completion.
As default model_file_path = NULL
, and the model won’t be saved
automatically. You can then manually save the model using
save_model_hdf5
.
To load a previously saved model, use the load_model_hdf5
function.
library(keras)
# Save the model
save_model_hdf5(model, filepath = "path/to/your_model.h5")
# Load the model
loaded_model <- load_model_hdf5(filepath = "path/to/your_model.h5")
For technical details describing how DREAMS works please see our article.