Skip to content

An example on removing indels within simple repeat

Yuichi Shiraishi edited this page Sep 10, 2022 · 1 revision

This page is for nanomosv version 0.4.0 or later.

One of the most effective filters is removing insertions and deletions confined in simple repeat regions. For that, the user needs to prepare the bgzip'ed and tabix'ed simple repeat bed file as follows:

wget http://hgdownload.soe.ucsc.edu/goldenPath/hg38/database/simpleRepeat.txt.gz   
zcat simpleRepeat.txt.gz | cut -f 2-4 | sort -k1,1 -k2,2n -k3,3n > simpleRepeat.bed   
bgzip -c simpleRepeat.bed > simpleRepeat.bed.gz
tabix -p bed simpleRepeat.bed.gz 

Then,

python3 misc/add_simple_repeat.py misc/example/v0.4.0/COLO829.nanomonsv.result.txt COLO829.nanomonsv.result.filt.txt simpleRepeat.bed.gz

Now, indels confined within simple repeat are labeled as "Simple_repeat" in COLO829.nanomonsv.result.filt.txt file. You can create a file that includes only SVs that passed every filter checks as follows:

head -n 1 COLO829.nanomonsv.result.filt.txt > COLO829.nanomonsv.result.filt.pass.txt
tail -n +2 COLO829.nanomonsv.result.filt.txt | grep PASS >> COLO829.nanomonsv.result.filt.pass.txt