-
Notifications
You must be signed in to change notification settings - Fork 39
ALLHiC: identify allelic contigs
ALLHiC relies on an allelic contig table (Allele.ctg.table) to remove noisy Hi-C signals. There are a couple of ways to generate this table. We provide a BLAST-based method, which requires a chromosomal level assembly of closely related genome.
- Identification of allelic contigs based on BLAST results
Blast CDS in target genome to CDS file in close related reference
Note: Please modify cds name before running BLAST. The cds name should be same with gene name present in GFF3
$ blastn -query rice.cds -db Bd.cds -out rice_vs_Sb.blast.out -evalue 0.001 -outfmt 6 -num_threads 4 -num_alignments 1
Remove blast hits with identity < 60% and coverage < 80%
blastn_parse.pl -i rice_vs_Sb.blast.out -o Erice_vs_Sb.blast.out -q rice.cds-b 1 -c 0.6 -d 0.8
Classify alleles based on BLAST results
classify.pl -i Eblast.out -p 2 -r Sbicolor_313_v3.1.gene -g rice.gff3
After running the scripts above, two tables will be generated. Allele.gene.table lists the allelic genes in the order of diplod refernece genome and Allele.ctg.table lists corresponding contig names in the same order.
The Allele.ctg.table looks like:
Format of Allele.ctg.table:
The first two columns are the chromosome ID and position of reference genome.
The 3rd to Nth columns are allelic contigs we identified. Prune step will remove the Hi-C linked reads between allelic contigs.
In addition, we also provide a GMAP-based method to generate an Allele.ctg.table, which does not require annotation of your target genome. Please see the following link for detailed commands: https://github.com/tangerzhang/ALLHiC/issues/16
© 2017 - present, ALLHiC authors