This is the Hi-C mapping pipeline for Ren lab. The script is still in the development stage, please let us know of any bugs and how to improve!
- BWA (v0.7.0, but preferably newer; contains the 'bwa mem' module)
- samtools
- juicebox.jar (to convert from bam to juicer format)
- straw (to extract contact matrix from juicebox )
- GNU coreutils sort (v8.22 or newer; contains the parallel option)
- snakemake (workflow management)
See INSTALL
-
Remember to install before running anything (see INSTALL).
-
Create your project folder.
-
Inside your project folder, create a folder named
fastq
, and put all your .fastq files in that folder. -
Copy the snakemake.config.yaml file into your project folder, and edit it to your need.
-
Run
(path-to-this-directory)/bin/run_hic_vanilla.sh -c snakemake.config.yaml -e your.email@ucsd.edu -s server
. -
Run
(path-to-this-directory)/bin/run_hic_options.sh -c snakemake.config.yaml -s server -o [options]
if you only need to generate part of the output, for instance, valid read pairs.
- Map reads from fastq from each read end
- Combine reads from read pair; sort, merge
- PCR duplicate removal
- pairs.txt to juicer format
- juicer format to matrix
- calling TADs from DI requires a MATLAB license
Use the method from Nagano et al. 2017; Olivares-Chauvet et al 2016
Implemented the method from Rao et al. 2009.
Columns:
=======
1)Read Name
2) R1 mapping Flag
3) R1 chr
4) R1 position
5) R1 fragment
6) R2 mapping Flag
7) R2 chr
8) R2 position
9) R2 fragment
10) R1 mapping quality
11) R2 mapping quality
- Shawn Yanxiao Zhang
- Initial pieces from Bin Li