Skip to content

Latest commit

 

History

History
78 lines (59 loc) · 3.7 KB

input.md

File metadata and controls

78 lines (59 loc) · 3.7 KB

Input JSON

An input JSON file includes all input parameters and metadata for running pipelines. Items 1), 2), 3), 4) and 5) are mandatory. Item 6) is optional so that our pipeline will use default values if it is not defined.

  • Mandatory
  1. Input FASTQ file pairs.
  2. Reference genome bwa index.
  3. Reference genome chromosome sizes.
  4. Restriction site locations in the reference genome sequence.
  5. Name of the restriction enzyme
  • Optional
  1. Pipeline parameters.
  2. Boolean flag no_call_loops to skip loop calling using HiCCUPS. To set this flag, add the following parameter to the input json: "hic.no_call_loops": true

Templates

We provide three template JSON files for processing of a single library with one or more sequencing runs and for multiple libraries.

  • template for a single sequencing run from a single library
  • template for two sequencing runs from a single library
  • template for two libraries, each having a single sequencing run

Let us take a close look at the following template JSON. Comments are not allowed in a JSON file but we added some comments to help you understand each parameter.

{
    ////////// 1) Input FASTQ files //////////
    "hic.fastq": [[[
        "test/test_data/merged_read1.fastq.gz",
        "test/test_data/merged_read2.fastq.gz"
    ]]],

    ////////// 2) Reference genome chromosome sizes//////////
    "hic.chrsz": "test/test_data/ce10_selected.chrom.sizes.tsv",
    
    ////////// 3) Restriction sites locations in the reference genome sequence //////////
    "hic.restriction_sites": "test/test_data/ce10_selected_MboI.txt",

    ////////// 4) Reference genome index //////////
    "hic.reference_index": "test/test_data/ce10_selected.tar.gz",
    
    ////////// 5) Ligation site sequence //////////
    "hic.restriction_enzyme": "MboI"

}

Reference genome

In order to run the HiC pipeline you will need to specify the bwa index file prepared using a referemnce genome sequence. We recommend using reference files from the ENCODE portal to enasure comparability of the analysis results.

reference file description assembly ENCODE portal link
bwa index hg19 link
genome fasta hg19 link
chromosome sizes hg19 link
bwa index GRCh38 link
genome fasta GRCh38 link
chromosome sizes GRCh38 link

You will also need a restriction map file appropriate for the restriction enzyme and assembly. MboI and DpnII share the same restriction map because they have the same recognition site.

restriction enzymes assembly ENCODE portal link
DpnII, MboI GRCh38 link
HindIII GRCh38 link
DpnII, MboI hg19 link
HindIII hg19 link

Alternatively, you can also create your own restriction map file using the generate_site_positions.py script from the juicer pipeline. You should make sure that your restriction map has a format like:

1 11160 12411 12461 ... 249250621
2 11514 11874 12160 ... 243199373
3 60138 60662 60788 ... 198022430

Other formats can lead to problems with the hiccups step of the pipeline.