Very fast and accurate species tree inference despite ILS. Somewhat a successor to ASTRID-2 and a competitor to ASTER.
A single binary called wastrid
can be downloaded from our recommended release, currently only containing binaries for x86_64 Linux and arm64 macOS. You can put it in PATH
or simply in the directory that you wish to run it from.
If you are familiar with ASTRAL/ASTER, you can skip this section because we almost use the same input format.
wastrid
accepts the common newline delimited Newick format for the input gene trees, where each line contains a gene tree in Newick format. We packed some example data1 containing 200 gene trees on 101 taxa that you can download to try out.
In addition, the gene trees can be annotated with branch support (the example data given above is supplied with bootstrap support from 0 to 100) for better accuracy2 using wASTRID-s.
Say, you have prepared this genes.tre
file containing all your gene trees separated by newlines. Now we have two differing scenarios:
- The gene trees have been annotated by support values that you trust (e.g., bootstrap support, aBayes support). Moreover, you know the upper-bound and lower-bound of these support. Continuing using our example data mentioned above, the lower bound is 0 and the upper bound is 100.
- In the other scenario, the gene trees don't have support you trust.
In the first scenario, the command goes something like this:
# `-b 0-100` means that the support values are in the range of [0, 100]
wastrid -i genes.tre -b 0-100 -o output_stree.tre
and output_stree.tre
is the output species tree path; -b
specifies our bounds for the support (lower bound 0, upper bound 100). As another example, for aBayes support that can have lower bound 0.333 and upper bound 1, the bounds can be specified as -b 0.333-1
instead of what we have here.
In the second scenario (you don't have support/you don't trust the accuracy of such support), you really just want to run ASTRID-3 (not weighted ASTRID), ASTID-2 but faster, in which case just do
wastrid -i genes.tre --preset vanilla -o output_stree.tre
where preset
can preconfigure flags for you. Other presets include:
--preset abayes
, equivalent to-m support -b 0.333-1
--preset hundred-bootstrap
, equivalent to-m support -b 0-100
After running the appropriate command, the output species tree topology is at output_stree.tre
. Note that the branch lengths of the species tree are not biologically meaningful.
Emulation of ASTRID: output is printed to STDOUT
wastrid -i gtrees.tre --preset vanilla
Weighted ASTRID by support on IQTree aBayes support: output is written to a file
specified by the path species.tre
([]
denotes optional arguments)
wastrid -i gtrees.tre -b 0.333-1 [-m support] -o species.tre
See also prebuilt binaries (located in Releases).
internode
is developed with Rust, so compiling it from scratch requires a proper installation of the Rust toolchain. However, the FastME bindings used internally make the compiling process a bit harder than simply cargo build
.
The FastME bindings used internally were generated via bindgen
, which
requires libclang. After a proper installation of libclang
for bindgen
, running the usual pipeline works for building the wastrid
binary:
cargo build --release
For maximum speed that reduces cross-machine compatibility, the usual tricks apply:
RUSTFLAGS="-C target-cpu=native" cargo build --release
- This implementation of ASTRID is faster than the original implementation (of ASTRID-2). That is,
wastrid --preset vanilla
is speed-wise a better ASTRID. - Missing data imputation is implemented (and automatically turned on), but alpha quality, using the original procedure of ASTRID.
- ASTRID-multi (see also DISCO) is still not implemented
The code contains translated parts from ASTRID-2 and TreeSwift. Due to ASTRID-2's GPLv2 and TreeSwift's GPLv3 license, this project (except in third_party
, see below) is licensed as GPLv3.
FastME (residing in third_party
) is copied from its
original source code. We use it in our repository under their
CeCILL-C license. FastME does not fall under the GPLv3 license of this project.
internode
is a derivation of ASTRID-2, which is a variant of NJst, developed by Liang Liu
and Scott Edwards. See these papers for a start.
Portions of this code were developed with the assistance of AI tools, such as GitHub Copilot. This is as of 2023 a common practice to ensure efficient and effective software development.
Footnotes
-
From ASTRAL-III S100, generated by Zhang and Mirarab. ↩
-
Note that usually the more accurate the support is, the more accurate the final result. FastTree default support might even degrade the accuracy a bit. ↩