CSVImporter can now skip first line.

Renamed focalState to focalPoint
arjun-1 · Jul 20, 2016 · 019c7e4 · 019c7e4
1 parent 000dcd2
commit 019c7e4
Show file tree

Hide file tree

Showing 6 changed files with 110 additions and 93 deletions.
diff --git a/README.md b/README.md
@@ -37,7 +37,7 @@ cp release/add-on ~/.beast/2.4/BEASTvntr
 ## Example
 These instructions will show how to infer phylogeny for VNTR data of a set of taxa provided in a [paper](http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0007815) by Comas.
 ### Setting up the XML file
-First download [comas2009_VNTR.csv](examples/csv/comas2009_VNTR.csv) which contains the repeats in CSV format. Start Beauti, either via its shortcut or by running `java -cp build/dist/launcher.jar beast.app.beauti.BeautiLauncher` in `beast2/`. In the Beauti window, click **File > Import Alignment** and select *comas2009_VNTR.csv*. In the window that appears, we can either select repeats or nucleotides to import. Select *Repeats* and click **OK**.
+First download [comas2009_VNTR.csv](examples/csv/comas2009_VNTR.csv) which contains the repeats in CSV format. Start Beauti, either via its shortcut or by running `java -cp build/dist/launcher.jar beast.app.beauti.BeautiLauncher` in `beast2/`. In the Beauti window, click **File > Import Alignment** and select *comas2009_VNTR.csv*. In the window that appears, we can either select repeats (homogen), repeats (inhomogen) or nucleotides to import. Select *Repeats (homogen)* and click **OK**.
 
 After selecting repeats, we must specify the minimum and maximum repeat which will bound our state space. For *Minimum repeat* specify **1** and for *Maximum repeat* specify **15**, and click **OK**.
 
@@ -50,8 +50,8 @@ To save the configuration in an XML file, click **File > save** and save as **co
 Start BEAST2, either via its shortcut or by running `java -cp build/dist/launcher.jar beast.app.beastapp.BeastLauncher` in `beast2/`, and select comas2009_VNTR.xml. If everything went right, you should see the MCMC run starting:
 ```text
 Start likelihood: -6033.8021475437245 
-Writing file comas_VNTR.log
-Writing file comas_VNTR.trees
+Writing file comas2009_VNTR.log
+Writing file comas2009_VNTR.trees
          Sample      posterior ESS(posterior)     likelihood          prior
               0     -6024.6925              N     -5830.8518      -193.8406 --
            1000     -3573.9178         2.0        -3260.2564      -313.6613 --
@@ -62,22 +62,22 @@ Writing file comas_VNTR.trees
 ## Background
 To infer phylogeny, BEASTvntr uses two implementations of a model explained in a [paper](http://www.genetics.org/content/168/1/383.long) by Sainudiin. The first implementation is called *Sainudiin* and can model a mutational bias, mutation rate proportionality, and any multi-step mutations. The second implementation called *Sainudiin Frequencies Computed*, is the same as *Sainudiin*, except that the frequencies of the states in the root node are given by the stationary distribution, which is calculated from the other model parameters. 
 
-These implementations use a modified for expression for the mutational bias beta however, which is described in a [paper](http://www.genetics.org/content/188/1/151.long) by Wu. In this expression, the bias `beta` for expansion given a mutation event, depends on the parameters `b_0, b_1`. However, BEASTvntr uses a transformation of these parameters:
+These implementations use a modified for expression for the mutational bias beta however, which is described in a [paper](http://www.genetics.org/content/188/1/151.long) by Wu. In this expression, the bias `beta` for expansion given a mutation event, depends on the parameters `b0, b1`. However, BEASTvntr uses a transformation of these parameters:
 ```
-b_0 =  biasMagnitude / sqrt( 1 + 1 / focalPoint^2 )
-b_1 = -biasMagnitude / sqrt( 1 + focalPoint^2 )
+b0 =  biasMagnitude / sqrt(1 + 1 / focalPoint^2)
+b1 = -biasMagnitude / sqrt(1 + focalPoint^2)
 ```
 This was done so that the equation
 ```
-beta(b_0, b_1, focalPoint) = 1 - beta(b_0, b_1, focalPoint)
+beta(b0, b1, focalPoint) = 1 - beta(b0, b1, focalPoint)
 ```
 is always satisfied, i.e. for that `focalPoint` the bias for expansion is equal to that of contraction. This means that `focalPoint` can intuitively be interpreted as the focal point of the mutational bias.
 
 In addition, the proportionality of the mutation rate to the number of repeats, `a1`, has been transformed into:
 ```
 oneOnA1 = 1 / a1
 ```
-For the cases where the mutation rate of the minimum repeat is far lower than that of the other repeats, `a1` blows up, whilst `oneOnA1` remains constrained.
+For the cases where the mutation rate of the minimum repeat is significantly lower than that of the other repeats, `a1` blows up, whilst `oneOnA1` remains constrained.
 
 ##Known Issues
 During a MCMC run, it is possible that the likelihood makes a sudden unrealistic increase, and that the trace of estimated parameters becomes a flat line:
@@ -87,7 +87,7 @@ During a MCMC run, it is possible that the likelihood makes a sudden unrealistic
 The cause of this issue might be that too many parameters are being estimated in the model. If you encounter such an issue, doing any of the following might resolve it:  
 1. Pass `-beagle_scaling none` as an option to beast.  
 2. Use *SainudiinStepWise* instead of *Sainudiin* as substitution model. The *SainudiinStepWise*   does not estimate the frequencies of the repeats, thus this greatly reduces any over-parametrization.  
-3. Use more restrictive priors on any of the parameters `r_b, i_eq, g, a_1` of the model.  
+3. Use more restrictive priors on any of the parameters `biasMagnitude, focalPoint, g, oneOnA1` of the model.  Bounding `oneOnA1` from above seems to be most helpful.
 4. Remove any duplicate VNTR sequence in the imported alignment.
 
 See [this](https://groups.google.com/forum/#!topic/beast-users/ScG6PEZTADE) forum post for more information on a similar problem.

diff --git a/src/beast/app/fileimporters/CSVImporter.java b/src/beast/app/fileimporters/CSVImporter.java
@@ -60,12 +60,17 @@ enum ParseOption {
 	private Object sequenceData;
 	ParseOption choice;
 
-	private void parseFile(File file) throws IOException {
+	private void parseFile(File file, boolean skipFirstLine) throws IOException {
 		BufferedReader fin = new BufferedReader(new FileReader(file));
     List<String> sequenceList = new ArrayList<String>();
 
+    boolean firstLineIsSkipped = false;
     while (fin.ready()) {
     	String line = fin.readLine();
+    	if(skipFirstLine && !firstLineIsSkipped) {
+    		firstLineIsSkipped = true;
+    		continue;
+    	}
     	if ( line.trim().length() == 0 ) {
 				continue;  // Skip blank lines
 			}
@@ -126,62 +131,74 @@ public List<BEASTInterface> loadFile(File file) {
 		int minRepeat = -1, maxRepeat = -1;
 		choice = ParseOption.REPEATS_HOMOGEN;
 
-		String[] parseOptions = {"Repeats (homogen)", "Repeats (inhomogen)", "Nucleotides"};
-		String	selectedOption = (String) JOptionPane.showInputDialog(
-				null,
-				"Choose Parsing Option (default: Repeats (homogen))",
-				"Choose Parsing Option",
-				JOptionPane.PLAIN_MESSAGE,
-				null,
-				parseOptions,
-				parseOptions[0]
-			);
-			switch (selectedOption) {
-				case "Repeats (homogen)":
-					choice = ParseOption.REPEATS_HOMOGEN;
-					break;
-				case "Repeats (inhomogen)":
-					choice = ParseOption.REPEATS_INHOMOGEN;
-					break;
-				case "Nucleotides":
-					choice = ParseOption.NUCLEOTIDES;
-					break;
-				default:
-					throw new IllegalArgumentException("No parsing option specified");
-			}
-			switch (choice) {
-				case REPEATS_HOMOGEN:
-				case REPEATS_INHOMOGEN:
-		      JTextField minRepeatField = new JTextField(5);
-		      JTextField maxRepeatField = new JTextField(5);
-
-		      JPanel myPanel = new JPanel();
-		      myPanel.setLayout(new BoxLayout(myPanel, BoxLayout.PAGE_AXIS));
-
-		      myPanel.add(new JLabel("<html>Choose the Minimum and Maximum repeat<br>(Maximum repeat - Minimum repeat >= 0 must hold)</html>"));
-		      myPanel.add(new JLabel("Minimum repeat ( >= 0 ):"));
-		      myPanel.add(minRepeatField);
-		      myPanel.add(new JLabel("Maximum repeat:"));
-		      myPanel.add(maxRepeatField);
-
-		      int result = JOptionPane.showConfirmDialog(null, myPanel, 
-		               "Please Enter Minimum and Maximum Repeat", JOptionPane.OK_CANCEL_OPTION);
-		      if (result == JOptionPane.OK_OPTION) {
-		      	minRepeat = Integer.parseInt(minRepeatField.getText());
-		        maxRepeat = Integer.parseInt(maxRepeatField.getText());
-					} else {
-						throw new IllegalArgumentException("Minimum and maximum repeat were not specified");
-					}
-					break;
-				case NUCLEOTIDES:
-					break;
+		final String[] parseOptions = {"Repeats (homogen)", "Repeats (inhomogen)", "Nucleotides"};
+
+		JPanel myPanel = new JPanel();
+		//myPanel.setLayout(new BoxLayout(myPanel, BoxLayout.PAGE_AXIS));
+		myPanel.add(new JLabel("Parsing method"));
+		DefaultComboBoxModel model = new DefaultComboBoxModel(parseOptions);
+		JComboBox comboBox = new JComboBox(model);
+		myPanel.add(comboBox);
+
+		JCheckBox checkbox = new JCheckBox("Skip first line");
+		myPanel.add(checkbox);
+
+
+    int result = JOptionPane.showConfirmDialog(null, myPanel, 
+             "Choose parsing method", JOptionPane.OK_CANCEL_OPTION);
+    switch (result) {
+    	case JOptionPane.OK_OPTION:
+    		break;
+    	default:
+				throw new IllegalArgumentException("No parsing method specified");
+		}
+
+		switch ((String) comboBox.getSelectedItem()) {
+			case "Repeats (homogen)":
+				choice = ParseOption.REPEATS_HOMOGEN;
+				break;
+			case "Repeats (inhomogen)":
+				choice = ParseOption.REPEATS_INHOMOGEN;
+				break;
+			case "Nucleotides":
+				choice = ParseOption.NUCLEOTIDES;
+				break;
+			default:
+				throw new IllegalArgumentException("No parsing method specified");
+		}
+		switch (choice) {
+			case REPEATS_HOMOGEN:
+			case REPEATS_INHOMOGEN:
+	      JTextField minRepeatField = new JTextField(5);
+	      JTextField maxRepeatField = new JTextField(5);
+
+	      myPanel = new JPanel();
+	      myPanel.setLayout(new BoxLayout(myPanel, BoxLayout.PAGE_AXIS));
+
+	      myPanel.add(new JLabel("<html>Choose the Minimum and Maximum repeat<br>(Maximum repeat - Minimum repeat >= 0 must hold)</html>"));
+	      myPanel.add(new JLabel("Minimum repeat ( >= 0 ):"));
+	      myPanel.add(minRepeatField);
+	      myPanel.add(new JLabel("Maximum repeat:"));
+	      myPanel.add(maxRepeatField);
+
+	      result = JOptionPane.showConfirmDialog(null, myPanel, 
+	               "Please Enter Minimum and Maximum Repeat", JOptionPane.OK_CANCEL_OPTION);
+	      if (result == JOptionPane.OK_OPTION) {
+	      	minRepeat = Integer.parseInt(minRepeatField.getText());
+	        maxRepeat = Integer.parseInt(maxRepeatField.getText());
+				} else {
+					throw new IllegalArgumentException("Minimum and maximum repeat were not specified");
+				}
+				break;
+			case NUCLEOTIDES:
+				break;
 		}
 
 		String ID = file.getName();
 		ID = ID.substring(0, ID.lastIndexOf('.')).replaceAll("\\..*", "");
 
 		try {
-			parseFile(file);
+			parseFile(file, checkbox.isSelected());
 			// Initialize the datatype for repeats
 			switch(choice) {
 				case REPEATS_HOMOGEN:

diff --git a/src/beast/evolution/substitutionmodel/Sainudiin.java b/src/beast/evolution/substitutionmodel/Sainudiin.java
@@ -53,7 +53,7 @@
 
 public class Sainudiin extends SubstitutionModel.Base {
 	final public Input<RealParameter> biasMagnitudeInput = new Input<>("biasMagnitude", "magnitude of the mutational bias", Validate.REQUIRED);
-	final public Input<RealParameter> focalStateInput = new Input<>("focalState", "focal state of the mutational bias", Validate.REQUIRED);
+	final public Input<RealParameter> focalPointInput = new Input<>("focalPoint", "focal state of the mutational bias", Validate.REQUIRED);
 	final public Input<RealParameter> gInput = new Input<>("g", "parameter of the geometric distribution of step sizes (1 - g = probability of a mutation being single step)", Validate.REQUIRED);
 	final public Input<RealParameter> oneOnA1Input = new Input<>("oneOnA1", "one over the proportionality of mutation rate to repeat length (i - minimum repeat)", Validate.REQUIRED);
 
@@ -140,7 +140,7 @@ public void initAndValidate() {
 		}
 
 		biasMagnitudeInput.get().setBounds(Math.max(0.0, biasMagnitudeInput.get().getLower()), biasMagnitudeInput.get().getUpper());
-		focalStateInput.get().setBounds(focalStateInput.get().getLower(), focalStateInput.get().getUpper());
+		focalPointInput.get().setBounds(focalPointInput.get().getLower(), focalPointInput.get().getUpper());
 		gInput.get().setBounds(Math.max(0.0, gInput.get().getLower()), Math.min(1.0, gInput.get().getUpper()));
 		oneOnA1Input.get().setBounds(Math.max(0.0, oneOnA1Input.get().getLower()), oneOnA1Input.get().getUpper());
 
@@ -244,14 +244,14 @@ public static double[] findStationaryDistribution(double[] Eval, double[] Ievc)
 	protected void setupRateMatrix() {
 		// Since the data is already corrected for minRepeat in FiniteIntegerData,
 		// we always assume minRepeat is 0 in the substitution model. Except for
-		// parameters focalState, which are not from FiniteIntegerData.
+		// parameters focalPoint, which are not from FiniteIntegerData.
 		final double biasMagnitude = biasMagnitudeInput.get().getValue();
-		final double focalState = focalStateInput.get().getValue() - minRepeat;
+		final double focalPoint = focalPointInput.get().getValue() - minRepeat;
 		final double g = gInput.get().getValue();
 		final double oneOnA1 = oneOnA1Input.get().getValue();
 
-		double b0 = biasMagnitude * Math.abs(focalState) / Math.sqrt(focalState * focalState + 1.0);
-		double b1 = -biasMagnitude / (Math.sqrt(focalState * focalState + 1.0));
+		double b0 = biasMagnitude * Math.abs(focalPoint) / Math.sqrt(focalPoint * focalPoint + 1.0);
+		double b1 = -biasMagnitude / (Math.sqrt(focalPoint * focalPoint + 1.0));
 
 		rowSum = new double[nrOfStates];
 		rowSum2 = new double[nrOfStates];

diff --git a/src/beast/evolution/substitutionmodel/SainudiinFrequenciesComputed.java b/src/beast/evolution/substitutionmodel/SainudiinFrequenciesComputed.java
@@ -58,7 +58,7 @@ public void initAndValidate() {
 		setStateBoundsFromAlignment();
 
 		biasMagnitudeInput.get().setBounds(Math.max(0.0, biasMagnitudeInput.get().getLower()), biasMagnitudeInput.get().getUpper());
-		focalStateInput.get().setBounds(focalStateInput.get().getLower(), focalStateInput.get().getUpper());
+		focalPointInput.get().setBounds(focalPointInput.get().getLower(), focalPointInput.get().getUpper());
 		gInput.get().setBounds(Math.max(0.0, gInput.get().getLower()), Math.min(1.0, gInput.get().getUpper()));
 		oneOnA1Input.get().setBounds(Math.max(0.0, oneOnA1Input.get().getLower()), oneOnA1Input.get().getUpper());
 

diff --git a/src/test/beast/evolution/substmodel/SainudiinTest.java b/src/test/beast/evolution/substmodel/SainudiinTest.java
@@ -15,7 +15,7 @@ public interface Instance {
 		Double[] getPi();
 
 		Double getBiasMagnitude();
-		Double getFocalState();
+		Double getFocalPoint();
 		Double getG();
 		Double getOneOnA1();
 
@@ -40,7 +40,7 @@ public Double getBiasMagnitude() {
 			return 0.5;
 		}
 		@Override
-		public Double getFocalState() {
+		public Double getFocalPoint() {
 			return 5.5;
 		}
 		@Override
@@ -95,7 +95,7 @@ public void testSainudiin() throws Exception {
 			sainudiin.setNrOfStates(15);
 			sainudiin.setMinRepeat(0);
 			sainudiin.initByName("biasMagnitude", test.getBiasMagnitude().toString(),
-				"focalState", test.getFocalState().toString(), 
+				"focalPoint", test.getFocalPoint().toString(), 
 				"g", test.getG().toString(), 
 				"oneOnA1", test.getOneOnA1().toString(),
 				"frequencies", freqs);
@@ -116,7 +116,7 @@ public void testSainudiin() throws Exception {
 			sainudiinstepwise.setNrOfStates(15);
 			sainudiinstepwise.setMinRepeat(0);
 			sainudiinstepwise.initByName("biasMagnitude", test.getBiasMagnitude().toString(),
-				"focalState", test.getFocalState().toString(), 
+				"focalPoint", test.getFocalPoint().toString(), 
 				"g", "0.0", 
 				"oneOnA1", test.getOneOnA1().toString(),
 				"frequencies", freqs);