Skip to content

processing, analysis, and visualization of data from Repair-seq screens

Notifications You must be signed in to change notification settings

fkang-pu/repair-seq

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

processing, analysis, and visualization of data from Repair-seq screens

Table of Contents

Installation

Using pip

pip install git+https://github.com/jeffhussmann/knock-knock.git
pip install git+https://github.com/jeffhussmann/repair-seq.git

Analyzing publicly-available data

This tutorial will walk through processing of data from Hussmann, ..., Adamson, Cell (2021).

Setting up metadata and reference genomes

First, create a project directory BASE_DIR that will hold all input data, references sequences, and analysis output. Then run:

$ repair-seq SRA initial_setup BASE_DIR

This will move annotations of the screen vector and CRISPRi sgRNA libraries into BASE_DIR, download reference genomes, and build corresponding alignment indices.

Downloading sequencing data from SRA

To download raw sequencing data for an individual screen, run

$ repair-seq SRA download BASE_DIR SCREEN_NAME

Experimental details for each screen are listed in Table S5. Valid options for SCREEN_NAME are values from the Screen_Name column in this table.

Note that four sequencing reads per spot were used in these screens: an 8 nt sample index read, a 12 nt UMI, a 45 nt read to identify the CRISPRi sgRNA identity, and a 258 nt read to identify the repair outcome sequence:

sequencing read layout

For compatibility with SRA, the data on SRA consists of only R1 and R2 files. This data has been demultiplexed into individual screens based on the sample index read, and the UMI sequence and quality scores for each spot have been appended onto the query names of the R1 and R2 reads, separated by underscores:

$ zcat K562_SpCas9_target-1_none_AX227_1_R1.fastq.gz | head -n 4
@01:01101:001362:001000_GTCAGTACAAGT_FFFFFFFFFFF:
NATCCCTTGGAGAACCACCTTGTTGGTTTCTCCGGCAGCAGAAAG
+
#FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF

$ zcat K562_SpCas9_target-1_none_AX227_1_R2.fastq.gz | head -n 4
@01:01101:001362:001000_GTCAGTACAAGT_FFFFFFFFFFF:
NGCCGCTGCACGTAGCATGCAACAAAGGAACCTTTAATAGAAATTGGACAGCAAGAAAGCGAGCTTAGTGATACTTGTGGGCCAGGGCAT...
+
#FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF...

Processing sequencing data

To process data for an individual screen after downloading it, run

$ repair-seq SRA process BASE_DIR SCREEN_NAME

This will:

  • demultplex reads based on the CRISPRi sgRNA identities in R1 reads,
  • collapse and error-correct resulting groups of R2 sequences based on their UMI sequences,
  • align and categorize these error-corrected sequences,
  • count the CRISPRi-sgRNA-specific frequencies of each identified repair outcome.

About

processing, analysis, and visualization of data from Repair-seq screens

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 98.2%
  • Other 1.8%