Analysis pipeline for extracting, filtering, classifying, and quantifying DNA circuit output on a nanopore sensing platform. Bulk raw data was collected from Oxford Nanopore Technologies' MinION using R9.4.1 flow cells and a custom MinKNOW run script.
Adapted from https://github.com/uwmisl/NanoporeTERs, which uses this pipeline for peptide detection.
This software is compatible with Linux operating systems. The classification algorithms in this software also utilize a GPU (CUDA 10.0).
This repository primarily consists of iPython notebooks that were developed and tested on a Jupyter server with Python 2.7. The following dependencies should be installed:
- dask (1.2.2)
- future (0.17.1)
- h5py (2.9.0)
- joblib (0.14.0)
- matplotlib (2.2.4)
- numpy (1.16.2)
- pandas (0.24.2)
- scikit-learn (0.20.4)
- scipy (1.2.2)
- pytorch (1.2.0) for CUDA 10.0
- yaml (0.1.7)
Installation of these dependencies should only take a few minutes with the exception of pytorch, which can take several hours depending on download speed.
The input for this analysis pipeline is the bulk raw fast5 file generated by MinKNOW after an experimental run. Details of the experimental run, including the times at which each analyte is introduced, should be recorded in a Google spreadsheet. An example of this spreadsheet can be found here.
Open nanopore_experiments/prep_experiment_notebook.ipynb
. Change date
in Cell 2 to match the appropriate experiment. Change f5_base_dir
to the directory of the raw fast5 file. Change output_dir
to the desired directory for output capture data. Run the entire notebook. This will create a new experiment notebook in nanopore_experiments
under the name experiment_DATE_FLOWCELL.ipynb
, as well as a config file in nanopore_experiments/configs
under the name segment_DATE_FLOWCELL.yml
.
Open the newly generated experiment notebook. Details are written in the notebook, as well as in the Methods section of the accompanying manuscript, on the expected behavior and available parameters for each major step in the data processing pipeline. All cells in the notebook should be run in sequential order.
The output from this pipeline should include:
- Split fast5 files for each analyte, saved to the same directory as the bulk raw fast5
- Example nanopore traces for each analyte, saved to
nanopore_experiments/plots
- Map of good channels for each analyte, saved to
nanopore_experiments/plots
- Capture metadata for each analyte, saved to user-defined
output_dir
- Raw capture data for each analyte, saved to user-defined
output_dir
- Filtered and classified capture metadata for each analyte, saved to user-defined
output_dir
- Quantification of each analyte, saved to
concentration
An example raw fast5 file is provided here (file size ~6 GB), corresponding to the experiment logged on the example spreadsheet.
The fully-executed experiment notebook for this demo is provided at nanopore_experiments/experiment_20210118_FAP26604.ipynb
. The expected runtime for this demo (from raw fast5 file to quantification results) is ~10 minutes. Expected results for both time until capture-based and frequency-based quantification are provided at concentration
.