processing, analysis, and visualization of data from Repair-seq screens

Installation

Using pip

pip install git+https://github.com/jeffhussmann/knock-knock.git
pip install git+https://github.com/jeffhussmann/repair-seq.git

Analyzing publicly-available data

This tutorial will walk through processing of data from Hussmann, ..., Adamson, Cell (2021).

Setting up metadata and reference genomes

First, create a project directory BASE_DIR that will hold all input data, references sequences, and analysis output. Then run:

$ repair-seq SRA initial_setup BASE_DIR

This will move annotations of the screen vector and CRISPRi sgRNA libraries into BASE_DIR, download reference genomes, and build corresponding alignment indices.

Downloading sequencing data from SRA

To download raw sequencing data for an individual screen, run

$ repair-seq SRA download BASE_DIR SCREEN_NAME

Experimental details for each screen are listed in Table S5. Valid options for SCREEN_NAME are values from the Screen_Name column in this table.

Note that four sequencing reads per spot were used in these screens: an 8 nt sample index read, a 12 nt UMI, a 45 nt read to identify the CRISPRi sgRNA identity, and a 258 nt read to identify the repair outcome sequence:

For compatibility with SRA, the data on SRA consists of only R1 and R2 files. This data has been demultiplexed into individual screens based on the sample index read, and the UMI sequence and quality scores for each spot have been appended onto the query names of the R1 and R2 reads, separated by underscores:

$ zcat K562_SpCas9_target-1_none_AX227_1_R1.fastq.gz | head -n 4
@01:01101:001362:001000_GTCAGTACAAGT_FFFFFFFFFFF:
NATCCCTTGGAGAACCACCTTGTTGGTTTCTCCGGCAGCAGAAAG
+
#FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF

$ zcat K562_SpCas9_target-1_none_AX227_1_R2.fastq.gz | head -n 4
@01:01101:001362:001000_GTCAGTACAAGT_FFFFFFFFFFF:
NGCCGCTGCACGTAGCATGCAACAAAGGAACCTTTAATAGAAATTGGACAGCAAGAAAGCGAGCTTAGTGATACTTGTGGGCCAGGGCAT...
+
#FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF...

Processing sequencing data

To process data for an individual screen after downloading it, run

$ repair-seq SRA process BASE_DIR SCREEN_NAME

This will:

demultplex reads based on the CRISPRi sgRNA identities in R1 reads,
collapse and error-correct resulting groups of R2 sequences based on their UMI sequences,
align and categorize these error-corrected sequences,
count the CRISPRi-sgRNA-specific frequencies of each identified repair outcome.

Name		Name	Last commit message	Last commit date
Latest commit History 300 Commits
docs		docs
repair_seq		repair_seq
.gitignore		.gitignore
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

processing, analysis, and visualization of data from Repair-seq screens

Table of Contents

Installation

Using pip

Analyzing publicly-available data

Setting up metadata and reference genomes

Downloading sequencing data from SRA

Processing sequencing data

About

Releases

Packages

Languages

fkang-pu/repair-seq

Folders and files

Latest commit

History

Repository files navigation

processing, analysis, and visualization of data from Repair-seq screens

Table of Contents

Installation

Using pip

Analyzing publicly-available data

Setting up metadata and reference genomes

Downloading sequencing data from SRA

Processing sequencing data

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages