GitHub - guangjunyin/SNPsplit: Allele-specific alignment sorting usage in methylation analysis

Allele-specific alignment sorting

Note for using a UCSC/NCBI genome in conjunction with the VCF file from the Mouse Genomes Project

Several users have run into problems when using genomes from UCSC or NCBI in conjunction with the VCF file from the Mouse Genomes Project (MGP, http://www.sanger.ac.uk/science/data/mouse-genomes-project). The reason for this is that the MGP uses chromosomal coordinates from Ensembl (i.e. 1, 2, 3, X, MT) whereas UCSC uses chromosome names that look like this: chr1, chr2, chr3, chrX, chrM.

We have recently added a check to the SNPsplit genome preparation script that will bail if a chromosome name discrepancy is detected (FelixKrueger#4). It is however possible to convert the VCF file into a UCSC compatible version by (a) changing the chromosome name from e.g. 1 to chr1 and (b) adding changing the chromosome names in the ID field of the VCF file headers. It is normally not necessary to change the name of the mitochondrium from MT to chrM because no SNP positions are recorded for the MT anyway.

Here is a one line awk script that does an Ensembl=>UCSC conversion, but you could of course also run an equivalent script in Perl or Python...

awk '{if($1 ~ "^#") {gsub("contig=<ID=", "contig=<ID=chr"); gsub("contig=<ID=chrMT", "contig=<ID=chrM"); print} else {gsub("^MT", "M"); print "chr"$0}}' mgp.v5.merged.snps_all.dbSNP142.vcf

Installation

SNPsplit is written in Perl and is executed from the command line. To install SNPsplit simply download the latest release of the code from the Releases page and extract the files into a SNPsplit installation folder.

SNPsplit requires the following tools installed and ideally available in the PATH:

Samtools

Documentation

The SNPsplit documentation can be found here: SNPsplit User Guide

Links

SNPsplit publication at F1000 Research:
- https://f1000research.com/articles/5-1479/v2
Here is a link to the SNPsplit project site at the Babraham Institute.

Credits

SNPsplit was written by Felix Krueger, part of the Babraham Bioinformatics group.

Name		Name	Last commit message	Last commit date
Latest commit History 122 Commits
Images		Images
outdated_VCF_versions		outdated_VCF_versions
BS-seq_report.pdf		BS-seq_report.pdf
CHANGELOG.md		CHANGELOG.md
HiC_report.pdf		HiC_report.pdf
LICENSE		LICENSE
README.md		README.md
SNPsplit		SNPsplit
SNPsplit_User_Guide.md		SNPsplit_User_Guide.md
SNPsplit_User_Guide.pdf		SNPsplit_User_Guide.pdf
SNPsplit_genome_preparation		SNPsplit_genome_preparation
copy_SNPsplit_files_for_release.pl		copy_SNPsplit_files_for_release.pl
paired_end_report.pdf		paired_end_report.pdf
tag2sort		tag2sort

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Allele-specific alignment sorting

Note for using a UCSC/NCBI genome in conjunction with the VCF file from the Mouse Genomes Project

Installation

Documentation

Links

Credits

About

Releases

Packages

Languages

License

guangjunyin/SNPsplit

Folders and files

Latest commit

History

Repository files navigation

Allele-specific alignment sorting

Note for using a UCSC/NCBI genome in conjunction with the VCF file from the Mouse Genomes Project

Installation

Documentation

Links

Credits

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages