An input JSON file includes all input parameters and metadata for running pipelines. Items 1), 2), 3), 4) and 5) are mandatory. Item 6) is optional so that our pipeline will use default values if it is not defined.
- Mandatory
- Input FASTQ file pairs.
- Reference genome bwa index.
- Reference genome chromosome sizes.
- Restriction site locations in the reference genome sequence.
- Name of the restriction enzyme
- Optional
- Pipeline parameters.
- Boolean flag no_call_loops to skip loop calling using HiCCUPS. To set this flag, add the following parameter to the input json:
"hic.no_call_loops": true
We provide three template JSON files for processing of a single library with one or more sequencing runs and for multiple libraries.
- template for a single sequencing run from a single library
- template for two sequencing runs from a single library
- template for two libraries, each having a single sequencing run
Let us take a close look at the following template JSON. Comments are not allowed in a JSON file but we added some comments to help you understand each parameter.
{
////////// 1) Input FASTQ files //////////
"hic.fastq": [[[
"test/test_data/merged_read1.fastq.gz",
"test/test_data/merged_read2.fastq.gz"
]]],
////////// 2) Reference genome chromosome sizes//////////
"hic.chrsz": "test/test_data/ce10_selected.chrom.sizes.tsv",
////////// 3) Restriction sites locations in the reference genome sequence //////////
"hic.restriction_sites": "test/test_data/ce10_selected_MboI.txt",
////////// 4) Reference genome index //////////
"hic.reference_index": "test/test_data/ce10_selected.tar.gz",
////////// 5) Ligation site sequence //////////
"hic.restriction_enzyme": "MboI"
}
In order to run the HiC pipeline you will need to specify the bwa index file prepared using a referemnce genome sequence. We recommend using reference files from the ENCODE portal to enasure comparability of the analysis results.
reference file description | assembly | ENCODE portal link |
---|---|---|
bwa index | hg19 | link |
genome fasta | hg19 | link |
chromosome sizes | hg19 | link |
bwa index | GRCh38 | link |
genome fasta | GRCh38 | link |
chromosome sizes | GRCh38 | link |
You will also need a restriction map file appropriate for the restriction enzyme and assembly. MboI and DpnII share the same restriction map because they have the same recognition site.
restriction enzymes | assembly | ENCODE portal link |
---|---|---|
DpnII, MboI | GRCh38 | link |
HindIII | GRCh38 | link |
DpnII, MboI | hg19 | link |
HindIII | hg19 | link |
Alternatively, you can also create your own restriction map file using the generate_site_positions.py script from the juicer pipeline. You should make sure that your restriction map has a format like:
1 11160 12411 12461 ... 249250621
2 11514 11874 12160 ... 243199373
3 60138 60662 60788 ... 198022430
Other formats can lead to problems with the hiccups step of the pipeline.