A workflowr project.
- R
- python3
Download rmskOutCurrent.txt.gz
from (UCSC)[http://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/rmskOutCurrent.txt.gz] to the data/repeat_annotations
directory and verify its md5sum against the version used in the paper.
cd data/repeat_annotations
wget http://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/rmskOutCurrent.txt.gz
rmskMD5=$(md5sum rmskOutCurrent.txt.gz | awk '{print $1}')
[[ "$rmskMD5" == "fe31769b064022e13a24582fad520c61" ]] && mv rmskOutCurrent.txt.gz rmskOutCurrent.Dfam_2_0.v4_0_7.txt.gz || echo "rmskOutCurrent.txt.gz has been updated on the UCSC server. You will need to switch the value of rmsk_full in 01.IAP_Annotations.Rmd"
The list of mm10 CpGs was generated by the following R code:
library(BSgenome.Mmusculus.UCSC.mm10)
mm10.CpG.gr <- vmatchPattern("CG", Mmusculus)
mm10.CpG.gr <- mm10.CpG.gr[strand(mm10.CpG.gr)=="+"]
saveRDS(mm10.CpG.gr, "data/R_objects/mm10.CpG.gr.RDS")