-
Notifications
You must be signed in to change notification settings - Fork 183
Home
nchernia edited this page Feb 10, 2017
·
21 revisions
# What is Juicer? #
Juicer is a one-click pipeline for processing terabase scale Hi-C datasets. Using Juicer, you can
- Go from raw fastq files to Hi-C maps binned at many resolutions
- Automatically annotate loops and contact domains with the Juicer tools
- Run the pipeline in the cloud, on LSF, Univa, or SLURM, or on a single CPU
Juicer creates hic files from raw (unaligned) reads derived from a Hi-C experiment.
Juicer Tools Pre creates hic files from aligned Hi-C reads (i.e., lists of Hi-C contacts).
- Choose your cluster system or single CPU. Juicer is currently available in the cloud on AWS, on LSF, Univa, or SLURM, or on a single CPU
- Be sure you know how to load the required software on your system; cluster systems might have slightly different names.
- Log into your cluster
- Install the appropriate Juicer scripts for your system in a directory; we will assume this directory is
/home/user/juicedir
- Under
/home/user/juicedir
, there should be a folderreferences
that contains the reference fasta file for your genome and the BWA index files. You can soft-link if necessary, or otherwise download the fasta files from UCSC and runbwa index
on the fasta file. - Under
/home/user/juicedir
, you should also create a folderrestriction_sites
. This should contain your restriction site file. You can create this file using the generate_site_positions.py Python script, or download already created ones from the Juicer AWS mirror. - [Optional, only for deep maps] Create the bedfile folder
- Create a custom directory (e.g. mkdir -p /custom/filepath/MyHIC)
- Download the [test data]. Create a fastq directory under the top directory (e.g.
cd /custom/filepath/MyHIC; mkdir fastq
) - Soft-link or copy your fastq files (zipped or unzipped) to that directory
- Type
screen
then launch Juicer:where /local/path refers to the folder containing the scripts folder bundling the necessary files included with this distribution. The most important options are/local/path/scripts/juicer.sh [options]
-g <genomeID>
and-s <restriction_site>
. The files will be split if necessary and Juicer will launch. - Check out the results with the appropriate command in your cluster;
bjobs
for LSF and AWS,squeue
for SLURM,qstat
for Univa. The single CPU script will run until it finishes or exits. - If there are no jobs left, type
cat debug/finalcheck*
; you should see a "Pipeline successfully completed" message. - Results are available in the aligned directory. The Hi-C maps are in inter.hic (for MAPQ > 0) and inter_30.hic (for MAPQ >= 30). The Hi-C maps can be loaded in Juicebox and explored. They can also be used for automatic feature annotation and to extract matrices at specific resolutions.
- These results also include automatic feature annotation. The output files include a genome-wide annotation of loops and, whenever possible, the CTCF motifs that anchor them (identified using the HiCCUPS algorithm). The files also include a genome-wide annotation of contact domains (identified using the Arrowhead algorithm). The formats of these files are described in the Juicebox tutorial online; both files can be loaded into Juicebox as a 2D annotation.