DISCO

Decomposition Into Single-COpy gene trees (DISCO) is a method for decomposing multi-copy gene-family trees while attempting to preserve orthologs and discard paralogs. These single-copy gene trees can be subsequently used by methods that can estimate species trees from single-copy gene trees such as ASTRAL or ASTRID in order to obtain an accurate estimation of the species tree. Additionally, DISCO can be paired with concatenation analysis using the script ca_disco.py.

NOTE: For species tree estimation default settings are recommended; however, for orthology detection using -m 2 is recommended so small groups are retrieved. It is also highly recommended that you use the most recent version of DISCO, as it deals with some limitations of TreeSwift.

Versions

v1.0 (May 21st 2021): Initial Release
v1.1 (July 19th, 2021): Added ca_disco.py and adjustable filter (instead of just filtering all trees with less than 4 taxa)
v1.2 (June 8th, 2022): Tree rerooting fix (addresses issue with treeswift rooting with no edge lengths)
v1.3 (July 8th, 2022): Allows for more flexibility with leaf labels
v1.3.1 (October 31st, 2022): Fixes issue where extra newline character at the end of the file causes a crash
v1.4.0 (April 21st, 2024): Added ability to make a partition file to CA-DISCO

Note: At present, it is recommended to use the latest version if you want to run the DISCO algorithm as described in Willson et al. 2021, as using earlier versions can causes results inconsistent with the DISCO algorithm in rare cases.

Citation

If you use DISCO, please cite:

@article{willson2022disco,
  title={DISCO: Species tree inference using multicopy gene family tree decomposition},
  author={Willson, James and Roddur, Mrinmoy Saha and Liu, Baqiao and Zaharias, Paul and Warnow, Tandy},
  journal={Systematic Biology},
  volume={71},
  number={3},
  pages={610--629},
  year={2022},
  publisher={Oxford University Press}
}

Algorithm

Given a list of multi-copy gene trees, DISCO does the following for each tree:

Root the tree and tag each internal vertex as either a duplication event or a speciation event in such a way that minimizes the total number of duplications and losses. We do this with the ASTRAL-Pro rooting and tagging algorithm (Zhang et. al. 2020).
Decompose gene tree by splitting off the smallest subtree under every vertex tagged as a duplication from the leaves to the root until all duplication events are resolved; it returns the set of single-copy trees produced.

Dependencies

Python 3
TreeSwift

Treeswift can be installed with: pip install treeswift

Usage

disco.py

Input: File containing list of multi-copy trees in newick format

Output: File containing resulting list of single-copy trees after decomposition in newick format

python3 disco.py -i <input_file> -o <ouput_file> -d <delimiter>

Arguments

Required

-i, --input           Input newick tree file

Optional

-o, --output          Output newick tree file
-d, --delimiter       Delimiter separating species name from rest of leaf label
-s, --single-tree     Output only single large tree per gene tree 
-n, --nth-delimiter   Split on Nth delimiter (only works with -d)
-m, --minimum         Minimum number of taxa required for tree to be outputted
-v, --verbose         Enable verbose output
--keep-labels         Keep original leaf labels instead of using species name
--no-decomp           Outputs rooted trees without decomposition
--outgroups           Write outgroups (including ties) to txt file
--remove-in-paralogs  Remove in-paralogs before rooting/scoring

Example

python3 disco.py -i example/gtrees-mult.trees

ca_disco.py

Input: File containing list of multi-copy trees in newick format and set of alignment files corresponding to the gene families.

Output: Concatenated alignment file

python3 ca_disco.py -i <input_trees> -a <aln_list> -o <output> -d <delimiter> -m <number>

disco.py must be present in the same directory as ca_disco.py in order for it to run. Also, unlike disco.py, it is necessary for the input newick trees given to ca_disco.py to have unique leaf labels where the taxon name comes first and is separated from the rest of the name by some delimiter.

The -a argument should be given the path to an "alignment list" file containing the path to each alignment file you want to concatinate separated by a new line. The order of the alignment files is important---for each tree in the input newick tree file, there should be a corresponding alignment on the same line in the respective alignment list file.

Arguments

Required

-i, --input           Input newick tree file
-a, --alignment       Text file containing paths to alignment files
-f, --format          Format of alignment file (either "fasta" or "phylip")
-o, --output          Output concatenated alignment file

Optional

-m, --filter          Minimum number of taxa required sequence group to be included
-d, --delimiter       Delimiter separating species name from rest of leaf label
-p, --partition       Create partition file

Example

python3 ca_disco.py -i example/g_100.trees -o example.phy -a example/seq_list.csv -f phylip

Partition File

ca_disco.py now has the option to create a partition file; this can be done with the optional -p argument. If this argument is specified, it is necessary to provide relavent information in the alignment list file. For example, see example/seq_list.csv. This would generate a partition file like so:

GTR+G, 0001=1-200
GTR+G, 0002=201-700
GTR+G, 0004=701-1300
...

Note: Gene 0003 is missing. This is not a mistake; this is because it does not have enough informative information to generate large enough DISCO subtrees and thus is not included in the alignment.

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
example		example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
ca_disco.py		ca_disco.py
disco.py		disco.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DISCO

Versions

Citation

Algorithm

Dependencies

Usage

disco.py

Arguments

Required

Optional

Example

ca_disco.py

Arguments

Required

Optional

Example

Partition File

About

Releases 6

Packages

Languages

License

JSdoubleL/DISCO

Folders and files

Latest commit

History

Repository files navigation

DISCO

Versions

Citation

Algorithm

Dependencies

Usage

disco.py

Arguments

Required

Optional

Example

ca_disco.py

Arguments

Required

Optional

Example

Partition File

About

Resources

License

Stars

Watchers

Forks

Releases 6

Packages 0

Languages

Packages