Skip to content

Latest commit

 

History

History
81 lines (70 loc) · 3.65 KB

README.md

File metadata and controls

81 lines (70 loc) · 3.65 KB

PIPEseq

A modular pipeline for VAMPseq

Pre-Configuration:

Python Dependencies (will be installed during Setup)

  • Python 3.11
  • CutAdapt 5

via pip:

    • Snakemake 8+
    • CountESS 0.0.83

Setup

  • Clone PIPEseq to your machine
  • Create a Conda environment with yaml file conda env create -f PIPEseq/conda_env.yaml, its default name is pipeseq_env
  • Activate the Conda environment conda activate pipeseq_env
  • Copy your Samplesheet for bcl2fastq and barcode-variant map into the user_input folder
  • Create an ini file from CountESS GUI if necessary
  • Fill out at least base_sample_sheet, countess_sample_ini, and samplesheet_params in user_variables.yaml file, see User Variables section below
  • Run pipeline with bash run.sh
  • Pipeline will produce a run directory with either a chosen name or a timestamp (ex. TSC2L1 or 202409041630)
  • Will produce a bcl2fastq_output folder containing unpaired fastqs, a pear_output folder containing paired fastqs, and a countess_inis folder countaining the ini file that was generated from the template, with the correct input and output file names, the samplesheet reformatted for bcl2fastq, and three CSV files from CountESS, along with an empty text file called demux.txt for Snakemake

The Pipeline

  • Rules:
    • clean_and_filter_samplesheet - modifies samplesheet to fit expected format for bcl2fastq
    • demux_and_pair - runs bcl2fastq to convert cbcl files to unpaired .fastq.gz files, and pear to convert to paired .fastq files for demuxing and pairing reads
      • The bcl2fastq command in line 148 may need to be modified to include some arguments (ex. --barcode-mismatches 0 --minimum-trimmed-read-length 0 --mask-short-adapter-reads 0)
    • prep_fastqs_for_countess - trims fastqs (if necessary) and runs FastQC on all FASTQ files
    • run_countess_vampseq = Creates counts and scores from assembled FASTQ files

User Variables

  • The user_variables.yaml file can be modified with the following parameters, examples in example_yamls/: run_name - Name of directory where all run output will be stored, else the timestamp will be used

base_sample_sheet - Name of Samplesheet CSV file countess_sample_ini - Name of ini file for CountESS (TECHNICALLY not needed if only running step 1) barcode_variant_map - Pacbio variant map input_data_path - Path to either sequence directory or directory of FASTQ files run_type - run only demux and pair, only counting and scoring, or full sample_filter - Only run a subset of data from samplesheet, if desired count_cutoff - Cutoff usd for scoring samplesheet_params - The names of samplesheet columns if different from Illumina defaults

  • i7_index_id_column_name
  • i7_index_sequence_column_name - index
  • i5_index_id_column_name
  • i5_index_sequence_column_name - index2 cutadapt_trim_boolean - Whether or not trimming is necessary cutadapt_params (OPTIONAL, including all below) - For use if trimming with CutAdapt
  • error
  • f_amp_min_overlap
  • r_amp_min_overlap
  • f_amp_primer_column_name - upstream flanking sequence
  • r_amp_primer_column_name - downstream flanking sequence
  • target_length_column_name bold = needs to be filled in by user

Expected Folder Structure

  • PIPEseq/
    • example_yamls/
    • **<run_name>
    • user_input/
      • *<samplesheet_file>.csv
      • *<countess_file>.ini
    • conda_env.yaml
    • README.md
    • run.sh
    • Snakefile
    • &user_variables.yaml

*=user file (or folder) **=autogenerated &=modify parameters