Sox2AllelicAnalysis

Analysis files for Sox2 gene

This repository contains scripts used to create and run modified WASP (https://github.com/bmvdgeijn/WASP) pipeline to analyze PLAC-Seq data. Instead of regular bowtie2 mapping, we use mapping and preprocessing as implemented in MAPS (https://github.com/ijuric/MAPS).

Step1:

WASP checks if reads have 'is proper' cigar tag. This tag is set to 0 if two read ends map too far (useful for ChIP-seq data, but not useful for PLAC-seq data where reads can map to very distant areas). After cloning WASP from github, in mapping/rmdup_pe.py comment out this:

if not read.is_proper_pair:
    read_stats.discard_improper_pair += 1
continue

and save it as mapping/rmdup_pe_long_range.py

In mapping/filter_remapped_reads.py comment out this:

if not read.is_proper_pair:
    bad_reads.add(orig_name)
continue

and save it as mapping/filter_remapped_reads_long_range.py

Step 2:

Create .h5 files by running create_h5.py [DATASET] where [DATASET] is the name of your dataset. This is equivalent of doing step 1 in WASP pipeline, but also splits data by chromosome because otherwise analysis doesn't work.

Step 3:

Run MAPS. How to set up MAPS run files is explained here: https://github.com/ijuric/MAPS

Step 4:

Run make_WASP_runfiles.py script. This will create two .sh files. First .sh file (part1) will create some subdirectories in the folder where you run it and it will then perform steps 3 of WASP pipeline using appropriate .bam files from MAPS feather_output subdirectory in your MAPS folder

Step 5:

Run MAPS again using .fq.gz files that need to be remapped (this is equivalent of step 4, but using bwa mem and MAPS feather preprocessing instead of bowtie2). The easiest way to do this is to make a copy of your MAPS runfile (used in step 3) and modify the following lines:

maps=0
dataset_name="[DATASET_NAME].remap"
fastq_format=".fq.gz"
fastq_dir="[DATASET_WASP_ANALYSIS_DIR]/find_intersecting_snps_ALL/"

Where [DATASET_NAME] is the name of your dataset and [DATASET_WASP_ANALYSIS_DIR] is directory where you are performing your WASP mapping (The directory with .h5 files in Step 2). This directory already has find_intersecting_snps_ALL subdirectory in it (it was created by part1 script)

Step 6:

Run part2 script. This one performs steps 5,6 and 7 of WASP pipeline. It will create rmdup_feather_full/[DATASET_NAME].FINAL.bam file. This is .bam file that contains mapped reads.

Step 7:

This part extracts and labels allelic reads from /bam file in previous step. Run make_pair.py like this:

python make_pair.py -b rmdup_feather_full/[DATASET_NAME].FINAL.bam -v F123.vcf -o [DATASET_NAME].FINAL.feather_full.txt.gz

The result is a table with all pair-end reads. Those that are allelic will be labeled 0 or 1, depending if they're from CAST of 129S genomes

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
LICENSE		LICENSE
README.md		README.md
create_h5.py		create_h5.py
make_WASP_runfiles.py		make_WASP_runfiles.py
mm10.autosomal.chrom.sizes		mm10.autosomal.chrom.sizes
test.txt		test.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sox2AllelicAnalysis

About

Releases

Packages

Languages

License

ijuric/Sox2AllelicAnalysis

Folders and files

Latest commit

History

Repository files navigation

Sox2AllelicAnalysis

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages