-
Notifications
You must be signed in to change notification settings - Fork 12
An example on removing indels within simple repeat
Yuichi Shiraishi edited this page Sep 10, 2022
·
1 revision
This page is for nanomosv version 0.4.0 or later.
One of the most effective filters is removing insertions and deletions confined in simple repeat regions. For that, the user needs to prepare the bgzip'ed and tabix'ed simple repeat bed file as follows:
wget http://hgdownload.soe.ucsc.edu/goldenPath/hg38/database/simpleRepeat.txt.gz
zcat simpleRepeat.txt.gz | cut -f 2-4 | sort -k1,1 -k2,2n -k3,3n > simpleRepeat.bed
bgzip -c simpleRepeat.bed > simpleRepeat.bed.gz
tabix -p bed simpleRepeat.bed.gz
Then,
python3 misc/add_simple_repeat.py misc/example/v0.4.0/COLO829.nanomonsv.result.txt COLO829.nanomonsv.result.filt.txt simpleRepeat.bed.gz
Now, indels confined within simple repeat are labeled as "Simple_repeat" in COLO829.nanomonsv.result.filt.txt
file.
You can create a file that includes only SVs that passed every filter checks as follows:
head -n 1 COLO829.nanomonsv.result.filt.txt > COLO829.nanomonsv.result.filt.pass.txt
tail -n +2 COLO829.nanomonsv.result.filt.txt | grep PASS >> COLO829.nanomonsv.result.filt.pass.txt