Skip to content

Arrowhead

nchernia edited this page Feb 24, 2017 · 7 revisions

#Quick Description# Arrowhead is an algorithm for finding contact domains.

This is the usage that most users will likely use (more detailed usage below):

arrowhead <HiC file> <output_file>

Upon a successful run of Arrowhead, output_file will contain all the contact domains found along the diagonal in this format.

##Examples##

arrowhead local/folder/HIC006.hic local/folder/contact_domains_list

This command will run Arrowhead on HIC006 at resolution 5 kB or 10 kB (depending on the map's resolution) and save all contact domains to the contact_domains_list file.

arrowhead https://hicfiles.s3.amazonaws.com/hiseq/gm12878/in-situ/combined_30.hic contact_domains_list

This command will run Arrowhead at resolution 5kB on the GM12878 HiC map (high resolution) and save all contact domains to the contact_domains_list file. Note: these are the settings used to generate the official GM12878 contact domain list.

Default parameters for arrowhead described below.

#Detailed Usage#

arrowhead [-c chromosome(s)] [-m matrix size] [-r resolution] 
		[-k normalization (NONE/VC/VC_SQRT/KR)] <HiC file> 
		<output_file> [feature_list] [control_list]

The required arguments are:

  • <HiC file>: Address of HiC file which should end with ".hic". This is the file you would load into Juicebox. URLs or local addresses may be used. Running Arrowhead on MAPQ>30 and MAPQ>0 files generally gives comprable results.
  • <output_file>: Final list of all contact domains found by Arrowhead. Can be visualized directly in Juicebox as a 2D annotation.

-- NOTE -- If you want to find scores for a feature and control list, both must be provided:

  • [feature_list]: Feature list of loops/domains for which block scores are to be calculated
  • [control_list]: Control list of loops/domains for which block scores are to be calculated

The optional arguments are:

  • -c <String(s)> Chromosome(s) on which Arrowhead will be run. The number/letter for the chromosome can be used with or without appending the "chr" string. Multiple chromosomes can be specified using commas (e.g. 1,chr2,X,chrY)
  • -m <int> Size of the sliding window along the diagonal in which contact domains will be found. Must be an even number as (m/2) is used as the increment for the sliding window. (Default 2000)
  • -r <int> resolution for which Arrowhead will be run. Generally, 5kB (5000) or 10kB (10000) resolution is used depending on the depth of sequencing in the HiC file(s).
  • -k <NONE/VC/VC_SQRT/KR> Normalizations (case sensitive) that can be selected. Generally, KR (Knight-Ruiz) balancing should be used when available.

##Defaults## Arrowhead uses the following parameters if optional flags are not provided.

Medium resolution maps:

-c (all chromosomes) 
-m 2000 
-r 10000 
-k KR

High resolution maps:

-c (all chromosomes) 
-m 2000 
-r 5000 
-k KR

###Domain List Content### The contact domain list created by Arrowhead will start with a header line, followed by a line for every contact domain. By default, the file should contain 12 fields per line in the following format:

chromosome1    x1    x2    chromosome2    y1    y2    color    
		corner_score    Uvar    Lvar    Usign    Lsign

Explanations of each field are as follows:

  • chromosome = the chromosome that the domain is located on
  • x1,x2/y1,y2 = the interval spanned by the domain (contact domains manifest as squares on the diagonal of a Hi-C matrix and as such: x1=y1, x2=y2)
  • color = the color that the feature will be rendered as if loaded in Juicebox
  • corner_score = the corner score, a score indicating the likelihood that a pixel is at the corner of a contact domain. Higher values indicate a greater likelihood of being at the corner of a domain
  • Uvar = the variance of the upper triangle
  • Lvar = the variance of the lower triangle
  • Usign = -1*(sum of the sign of the entries in the upper triangle)
  • Lsign = sum of the sign of the entries in the lower triangle

See Section IV.a.3 of the Extended Experimental Procedures of Rao, Huntley et al. Cell 2014 for more details.