Skip to content

A package for BEAST2 implementing phylogenetic analysis for VNTR or Microsatellites

License

LGPL-3.0, GPL-3.0 licenses found

Licenses found

LGPL-3.0
COPYING.LESSER
GPL-3.0
COPYING
Notifications You must be signed in to change notification settings

arjun-1/BEASTvntr

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

73 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Build Status GitHub version License: LGPL v3

BEASTvntr

A package for BEAST2 2.4.3 or higher, which can infer phylogeny for VNTR (=Variable Number of Tandem Repeat) data.

Installation

To install BEASTvntr from the BEAST2 Package Manager in Beauti, go to File > Manage Packages, select BEASTvntr and click Install/Upgrade.

To install BEASTvntr manually, download the latest release and extract the contents into its own folder in ~/.beast/2.4/.

Building from Source

These instructions will get you a copy of the BEASTvntr project up and running for development and testing purposes.

Prerequisities

Make sure Apache Ant is installed.

Installing

Download and build the BEAST2 project

git clone https://github.com/CompEvol/beast2.git
cd beast2
ant

Download and build the package

cd ../
git clone https://github.com/arjun-1/BEASTvntr.git
cd BEASTvntr
ant addon

Install the package

cp -r release/add-on ~/.beast/2.4/BEASTvntr

Example

These instructions will show how to infer phylogeny for VNTR data of a set of taxa provided in a paper by Comas.

Setting up the XML file

First download comas2009_VNTR.csv which contains the repeats in CSV format. Start Beauti, either via its shortcut or by running java -cp build/dist/launcher.jar beast.app.beauti.BeautiLauncher in beast2/. In the Beauti window, click File > Import Alignment and select comas2009_VNTR.csv. In the window that appears, we can either select repeats (single partition), repeats (multiple partitions) or nucleotides to import. Select Repeats (single partition) and click OK.

After selecting repeats, we must specify the minimum and maximum repeat which will bound our state space. For Minimum repeat specify 1 and for Maximum repeat specify 15, and click OK.

We can now specify the parameters of the substitution model. Click on the Site Model tab and set Gamma Category Count to 6 to allow rate variations among the different loci. Then, click on estimate for the Shape parameter to estimate the gamma shape parameter of the distribution of rates among sites.

Now, click on the Priors tab. For the coalescent model, select Coalescent Constant Population

To save the configuration in an XML file, click File > save and save as comas2009_VNTR.xml.

Running BEAST2

Start BEAST2, either via its shortcut or by running java -cp build/dist/launcher.jar beast.app.beastapp.BeastLauncher in beast2/, and select comas2009_VNTR.xml. If everything went right, you should see the MCMC run starting:

Start likelihood: -6033.8021475437245 
Writing file comas2009_VNTR.log
Writing file comas2009_VNTR.trees
         Sample      posterior ESS(posterior)     likelihood          prior
              0     -6024.6925              N     -5830.8518      -193.8406 --
           1000     -3573.9178         2.0        -3260.2564      -313.6613 --
           2000     -3320.7594         3.0        -3007.2831      -313.4763 --
           3000     -3155.1852         4.0        -2849.9808      -305.2044 --

Background

To infer phylogeny, BEASTvntr uses a model described in a paper by Sainudiin. Of this model, several implementations are available.

The standard implementation is called Sainudiin Vanilla and can model mutational bias, mutation rate proportionality, and any multi-step mutations. Sainudiin Computed Frequencies Vanilla is a variant on the above model, where the frequencies of the states in the root node are given by the stationary distribution, which is calculated from the other model parameters.

All these implementations use a modified for expression for the mutational bias beta, which is described in a paper by Wu. In this expression, the bias beta for expansion given a mutation event, depends on the parameters b0, b1. The implementations Sainudiin and Sainudiin Computed Frequencies are an adaptation of the standard Vanilla models, which use a transformation of these parameters, given by:

b0 =  biasMagnitude / sqrt(1 + 1 / (focalPoint - minimum repeat)^2)
b1 = -biasMagnitude / sqrt(1 + (focalPoint - minimum repeat)^2)

This was done so that the equation

beta(b0, b1, focalPoint) = 1 - beta(b0, b1, focalPoint)

is always satisfied, i.e. for that focalPoint the bias for expansion is equal to that of contraction. This means that focalPoint can intuitively be interpreted as the focal point of the mutational bias.

In addition, the proportionality of the mutation rate to the number of repeats, a1, has been transformed into:

oneOnA1 = 1 / a1

For the cases where the mutation rate of the minimum repeat is significantly lower than that of the other repeats, a1 blows up, whilst oneOnA1 remains constrained.

##Known Issues During a MCMC run, it is possible that the likelihood makes a sudden unrealistic increase, and that the trace of estimated parameters becomes a flat line:

alt text

The cause of this issue might be that too many parameters are being estimated in the model. If you encounter such an issue, doing any of the following might resolve it:

  • When using the beagle library, pass -beagle_single as an option to beast, or try other scaling options.
  • Use Sainudiin Frequencies Computed instead of Sainudiin as substitution model. The Sainudiin Frequencies Computed does not estimate the frequencies of the repeats, thus this greatly reduces any over-parametrization.
  • Use more restrictive priors on any of the parameters biasMagnitude, focalPoint, g, oneOnA1 of the model. Bounding oneOnA1 from above seems to be most helpful.
  • Remove any duplicate VNTR sequence in the imported alignment.

See this forum post for more information on a similar problem.

Author

  • Arjun Dhawan - Initial work - arjun-1

License

This project is licensed under version 3 of the GNU Lesser General Public License - see the COPYING.LESSER file for details.

About

A package for BEAST2 implementing phylogenetic analysis for VNTR or Microsatellites

Resources

License

LGPL-3.0, GPL-3.0 licenses found

Licenses found

LGPL-3.0
COPYING.LESSER
GPL-3.0
COPYING

Stars

Watchers

Forks

Packages

No packages published

Languages