Skip to content

Commit

Permalink
CSVImporter can now skip first line.
Browse files Browse the repository at this point in the history
Renamed focalState to focalPoint
  • Loading branch information
arjun-1 committed Jul 20, 2016
1 parent 000dcd2 commit 019c7e4
Show file tree
Hide file tree
Showing 6 changed files with 110 additions and 93 deletions.
18 changes: 9 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ cp release/add-on ~/.beast/2.4/BEASTvntr
## Example
These instructions will show how to infer phylogeny for VNTR data of a set of taxa provided in a [paper](http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0007815) by Comas.
### Setting up the XML file
First download [comas2009_VNTR.csv](examples/csv/comas2009_VNTR.csv) which contains the repeats in CSV format. Start Beauti, either via its shortcut or by running `java -cp build/dist/launcher.jar beast.app.beauti.BeautiLauncher` in `beast2/`. In the Beauti window, click **File > Import Alignment** and select *comas2009_VNTR.csv*. In the window that appears, we can either select repeats or nucleotides to import. Select *Repeats* and click **OK**.
First download [comas2009_VNTR.csv](examples/csv/comas2009_VNTR.csv) which contains the repeats in CSV format. Start Beauti, either via its shortcut or by running `java -cp build/dist/launcher.jar beast.app.beauti.BeautiLauncher` in `beast2/`. In the Beauti window, click **File > Import Alignment** and select *comas2009_VNTR.csv*. In the window that appears, we can either select repeats (homogen), repeats (inhomogen) or nucleotides to import. Select *Repeats (homogen)* and click **OK**.

After selecting repeats, we must specify the minimum and maximum repeat which will bound our state space. For *Minimum repeat* specify **1** and for *Maximum repeat* specify **15**, and click **OK**.

Expand All @@ -50,8 +50,8 @@ To save the configuration in an XML file, click **File > save** and save as **co
Start BEAST2, either via its shortcut or by running `java -cp build/dist/launcher.jar beast.app.beastapp.BeastLauncher` in `beast2/`, and select comas2009_VNTR.xml. If everything went right, you should see the MCMC run starting:
```text
Start likelihood: -6033.8021475437245
Writing file comas_VNTR.log
Writing file comas_VNTR.trees
Writing file comas2009_VNTR.log
Writing file comas2009_VNTR.trees
Sample posterior ESS(posterior) likelihood prior
0 -6024.6925 N -5830.8518 -193.8406 --
1000 -3573.9178 2.0 -3260.2564 -313.6613 --
Expand All @@ -62,22 +62,22 @@ Writing file comas_VNTR.trees
## Background
To infer phylogeny, BEASTvntr uses two implementations of a model explained in a [paper](http://www.genetics.org/content/168/1/383.long) by Sainudiin. The first implementation is called *Sainudiin* and can model a mutational bias, mutation rate proportionality, and any multi-step mutations. The second implementation called *Sainudiin Frequencies Computed*, is the same as *Sainudiin*, except that the frequencies of the states in the root node are given by the stationary distribution, which is calculated from the other model parameters.

These implementations use a modified for expression for the mutational bias beta however, which is described in a [paper](http://www.genetics.org/content/188/1/151.long) by Wu. In this expression, the bias `beta` for expansion given a mutation event, depends on the parameters `b_0, b_1`. However, BEASTvntr uses a transformation of these parameters:
These implementations use a modified for expression for the mutational bias beta however, which is described in a [paper](http://www.genetics.org/content/188/1/151.long) by Wu. In this expression, the bias `beta` for expansion given a mutation event, depends on the parameters `b0, b1`. However, BEASTvntr uses a transformation of these parameters:
```
b_0 = biasMagnitude / sqrt( 1 + 1 / focalPoint^2 )
b_1 = -biasMagnitude / sqrt( 1 + focalPoint^2 )
b0 = biasMagnitude / sqrt(1 + 1 / focalPoint^2)
b1 = -biasMagnitude / sqrt(1 + focalPoint^2)
```
This was done so that the equation
```
beta(b_0, b_1, focalPoint) = 1 - beta(b_0, b_1, focalPoint)
beta(b0, b1, focalPoint) = 1 - beta(b0, b1, focalPoint)
```
is always satisfied, i.e. for that `focalPoint` the bias for expansion is equal to that of contraction. This means that `focalPoint` can intuitively be interpreted as the focal point of the mutational bias.

In addition, the proportionality of the mutation rate to the number of repeats, `a1`, has been transformed into:
```
oneOnA1 = 1 / a1
```
For the cases where the mutation rate of the minimum repeat is far lower than that of the other repeats, `a1` blows up, whilst `oneOnA1` remains constrained.
For the cases where the mutation rate of the minimum repeat is significantly lower than that of the other repeats, `a1` blows up, whilst `oneOnA1` remains constrained.

##Known Issues
During a MCMC run, it is possible that the likelihood makes a sudden unrealistic increase, and that the trace of estimated parameters becomes a flat line:
Expand All @@ -87,7 +87,7 @@ During a MCMC run, it is possible that the likelihood makes a sudden unrealistic
The cause of this issue might be that too many parameters are being estimated in the model. If you encounter such an issue, doing any of the following might resolve it:
1. Pass `-beagle_scaling none` as an option to beast.
2. Use *SainudiinStepWise* instead of *Sainudiin* as substitution model. The *SainudiinStepWise* does not estimate the frequencies of the repeats, thus this greatly reduces any over-parametrization.
3. Use more restrictive priors on any of the parameters `r_b, i_eq, g, a_1` of the model.
3. Use more restrictive priors on any of the parameters `biasMagnitude, focalPoint, g, oneOnA1` of the model. Bounding `oneOnA1` from above seems to be most helpful.
4. Remove any duplicate VNTR sequence in the imported alignment.

See [this](https://groups.google.com/forum/#!topic/beast-users/ScG6PEZTADE) forum post for more information on a similar problem.
Expand Down
119 changes: 68 additions & 51 deletions src/beast/app/fileimporters/CSVImporter.java
Original file line number Diff line number Diff line change
Expand Up @@ -60,12 +60,17 @@ enum ParseOption {
private Object sequenceData;
ParseOption choice;

private void parseFile(File file) throws IOException {
private void parseFile(File file, boolean skipFirstLine) throws IOException {
BufferedReader fin = new BufferedReader(new FileReader(file));
List<String> sequenceList = new ArrayList<String>();

boolean firstLineIsSkipped = false;
while (fin.ready()) {
String line = fin.readLine();
if(skipFirstLine && !firstLineIsSkipped) {
firstLineIsSkipped = true;
continue;
}
if ( line.trim().length() == 0 ) {
continue; // Skip blank lines
}
Expand Down Expand Up @@ -126,62 +131,74 @@ public List<BEASTInterface> loadFile(File file) {
int minRepeat = -1, maxRepeat = -1;
choice = ParseOption.REPEATS_HOMOGEN;

String[] parseOptions = {"Repeats (homogen)", "Repeats (inhomogen)", "Nucleotides"};
String selectedOption = (String) JOptionPane.showInputDialog(
null,
"Choose Parsing Option (default: Repeats (homogen))",
"Choose Parsing Option",
JOptionPane.PLAIN_MESSAGE,
null,
parseOptions,
parseOptions[0]
);
switch (selectedOption) {
case "Repeats (homogen)":
choice = ParseOption.REPEATS_HOMOGEN;
break;
case "Repeats (inhomogen)":
choice = ParseOption.REPEATS_INHOMOGEN;
break;
case "Nucleotides":
choice = ParseOption.NUCLEOTIDES;
break;
default:
throw new IllegalArgumentException("No parsing option specified");
}
switch (choice) {
case REPEATS_HOMOGEN:
case REPEATS_INHOMOGEN:
JTextField minRepeatField = new JTextField(5);
JTextField maxRepeatField = new JTextField(5);

JPanel myPanel = new JPanel();
myPanel.setLayout(new BoxLayout(myPanel, BoxLayout.PAGE_AXIS));

myPanel.add(new JLabel("<html>Choose the Minimum and Maximum repeat<br>(Maximum repeat - Minimum repeat >= 0 must hold)</html>"));
myPanel.add(new JLabel("Minimum repeat ( >= 0 ):"));
myPanel.add(minRepeatField);
myPanel.add(new JLabel("Maximum repeat:"));
myPanel.add(maxRepeatField);

int result = JOptionPane.showConfirmDialog(null, myPanel,
"Please Enter Minimum and Maximum Repeat", JOptionPane.OK_CANCEL_OPTION);
if (result == JOptionPane.OK_OPTION) {
minRepeat = Integer.parseInt(minRepeatField.getText());
maxRepeat = Integer.parseInt(maxRepeatField.getText());
} else {
throw new IllegalArgumentException("Minimum and maximum repeat were not specified");
}
break;
case NUCLEOTIDES:
break;
final String[] parseOptions = {"Repeats (homogen)", "Repeats (inhomogen)", "Nucleotides"};

JPanel myPanel = new JPanel();
//myPanel.setLayout(new BoxLayout(myPanel, BoxLayout.PAGE_AXIS));
myPanel.add(new JLabel("Parsing method"));
DefaultComboBoxModel model = new DefaultComboBoxModel(parseOptions);
JComboBox comboBox = new JComboBox(model);
myPanel.add(comboBox);

JCheckBox checkbox = new JCheckBox("Skip first line");
myPanel.add(checkbox);


int result = JOptionPane.showConfirmDialog(null, myPanel,
"Choose parsing method", JOptionPane.OK_CANCEL_OPTION);
switch (result) {
case JOptionPane.OK_OPTION:
break;
default:
throw new IllegalArgumentException("No parsing method specified");
}

switch ((String) comboBox.getSelectedItem()) {
case "Repeats (homogen)":
choice = ParseOption.REPEATS_HOMOGEN;
break;
case "Repeats (inhomogen)":
choice = ParseOption.REPEATS_INHOMOGEN;
break;
case "Nucleotides":
choice = ParseOption.NUCLEOTIDES;
break;
default:
throw new IllegalArgumentException("No parsing method specified");
}
switch (choice) {
case REPEATS_HOMOGEN:
case REPEATS_INHOMOGEN:
JTextField minRepeatField = new JTextField(5);
JTextField maxRepeatField = new JTextField(5);

myPanel = new JPanel();
myPanel.setLayout(new BoxLayout(myPanel, BoxLayout.PAGE_AXIS));

myPanel.add(new JLabel("<html>Choose the Minimum and Maximum repeat<br>(Maximum repeat - Minimum repeat >= 0 must hold)</html>"));
myPanel.add(new JLabel("Minimum repeat ( >= 0 ):"));
myPanel.add(minRepeatField);
myPanel.add(new JLabel("Maximum repeat:"));
myPanel.add(maxRepeatField);

result = JOptionPane.showConfirmDialog(null, myPanel,
"Please Enter Minimum and Maximum Repeat", JOptionPane.OK_CANCEL_OPTION);
if (result == JOptionPane.OK_OPTION) {
minRepeat = Integer.parseInt(minRepeatField.getText());
maxRepeat = Integer.parseInt(maxRepeatField.getText());
} else {
throw new IllegalArgumentException("Minimum and maximum repeat were not specified");
}
break;
case NUCLEOTIDES:
break;
}

String ID = file.getName();
ID = ID.substring(0, ID.lastIndexOf('.')).replaceAll("\\..*", "");

try {
parseFile(file);
parseFile(file, checkbox.isSelected());
// Initialize the datatype for repeats
switch(choice) {
case REPEATS_HOMOGEN:
Expand Down
12 changes: 6 additions & 6 deletions src/beast/evolution/substitutionmodel/Sainudiin.java
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@

public class Sainudiin extends SubstitutionModel.Base {
final public Input<RealParameter> biasMagnitudeInput = new Input<>("biasMagnitude", "magnitude of the mutational bias", Validate.REQUIRED);
final public Input<RealParameter> focalStateInput = new Input<>("focalState", "focal state of the mutational bias", Validate.REQUIRED);
final public Input<RealParameter> focalPointInput = new Input<>("focalPoint", "focal state of the mutational bias", Validate.REQUIRED);
final public Input<RealParameter> gInput = new Input<>("g", "parameter of the geometric distribution of step sizes (1 - g = probability of a mutation being single step)", Validate.REQUIRED);
final public Input<RealParameter> oneOnA1Input = new Input<>("oneOnA1", "one over the proportionality of mutation rate to repeat length (i - minimum repeat)", Validate.REQUIRED);

Expand Down Expand Up @@ -140,7 +140,7 @@ public void initAndValidate() {
}

biasMagnitudeInput.get().setBounds(Math.max(0.0, biasMagnitudeInput.get().getLower()), biasMagnitudeInput.get().getUpper());
focalStateInput.get().setBounds(focalStateInput.get().getLower(), focalStateInput.get().getUpper());
focalPointInput.get().setBounds(focalPointInput.get().getLower(), focalPointInput.get().getUpper());
gInput.get().setBounds(Math.max(0.0, gInput.get().getLower()), Math.min(1.0, gInput.get().getUpper()));
oneOnA1Input.get().setBounds(Math.max(0.0, oneOnA1Input.get().getLower()), oneOnA1Input.get().getUpper());

Expand Down Expand Up @@ -244,14 +244,14 @@ public static double[] findStationaryDistribution(double[] Eval, double[] Ievc)
protected void setupRateMatrix() {
// Since the data is already corrected for minRepeat in FiniteIntegerData,
// we always assume minRepeat is 0 in the substitution model. Except for
// parameters focalState, which are not from FiniteIntegerData.
// parameters focalPoint, which are not from FiniteIntegerData.
final double biasMagnitude = biasMagnitudeInput.get().getValue();
final double focalState = focalStateInput.get().getValue() - minRepeat;
final double focalPoint = focalPointInput.get().getValue() - minRepeat;
final double g = gInput.get().getValue();
final double oneOnA1 = oneOnA1Input.get().getValue();

double b0 = biasMagnitude * Math.abs(focalState) / Math.sqrt(focalState * focalState + 1.0);
double b1 = -biasMagnitude / (Math.sqrt(focalState * focalState + 1.0));
double b0 = biasMagnitude * Math.abs(focalPoint) / Math.sqrt(focalPoint * focalPoint + 1.0);
double b1 = -biasMagnitude / (Math.sqrt(focalPoint * focalPoint + 1.0));

rowSum = new double[nrOfStates];
rowSum2 = new double[nrOfStates];
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ public void initAndValidate() {
setStateBoundsFromAlignment();

biasMagnitudeInput.get().setBounds(Math.max(0.0, biasMagnitudeInput.get().getLower()), biasMagnitudeInput.get().getUpper());
focalStateInput.get().setBounds(focalStateInput.get().getLower(), focalStateInput.get().getUpper());
focalPointInput.get().setBounds(focalPointInput.get().getLower(), focalPointInput.get().getUpper());
gInput.get().setBounds(Math.max(0.0, gInput.get().getLower()), Math.min(1.0, gInput.get().getUpper()));
oneOnA1Input.get().setBounds(Math.max(0.0, oneOnA1Input.get().getLower()), oneOnA1Input.get().getUpper());

Expand Down
8 changes: 4 additions & 4 deletions src/test/beast/evolution/substmodel/SainudiinTest.java
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ public interface Instance {
Double[] getPi();

Double getBiasMagnitude();
Double getFocalState();
Double getFocalPoint();
Double getG();
Double getOneOnA1();

Expand All @@ -40,7 +40,7 @@ public Double getBiasMagnitude() {
return 0.5;
}
@Override
public Double getFocalState() {
public Double getFocalPoint() {
return 5.5;
}
@Override
Expand Down Expand Up @@ -95,7 +95,7 @@ public void testSainudiin() throws Exception {
sainudiin.setNrOfStates(15);
sainudiin.setMinRepeat(0);
sainudiin.initByName("biasMagnitude", test.getBiasMagnitude().toString(),
"focalState", test.getFocalState().toString(),
"focalPoint", test.getFocalPoint().toString(),
"g", test.getG().toString(),
"oneOnA1", test.getOneOnA1().toString(),
"frequencies", freqs);
Expand All @@ -116,7 +116,7 @@ public void testSainudiin() throws Exception {
sainudiinstepwise.setNrOfStates(15);
sainudiinstepwise.setMinRepeat(0);
sainudiinstepwise.initByName("biasMagnitude", test.getBiasMagnitude().toString(),
"focalState", test.getFocalState().toString(),
"focalPoint", test.getFocalPoint().toString(),
"g", "0.0",
"oneOnA1", test.getOneOnA1().toString(),
"frequencies", freqs);
Expand Down
Loading

0 comments on commit 019c7e4

Please sign in to comment.