This directory contains a series of tools that can be used to help users with various miscellaneous tasks.
-
clean_labels.py: For each read of the given sequence file or each leaf of a given phylogenetic tree, remove everything from the label except for the contact network individual's name
- Usage:
clean_sequence_headers.py [-h] [-i INPUT] [-o OUTPUT]
INPUT
: Input FASTQ/FASTA/Newick File (default: standard input)OUTPUT
: Input FASTQ/FASTA/Newick File (default: standard output)- The output format (FASTA/FASTQ/Newick) matches the input format and is detected automatically
- If you wish to convert a FASTQ file to FASTA, refer to fastq2fasta.py in my tools repository
- Usage:
-
cluster_previous_time.py: Given a clustering from the simulation end time, a FAVITES-format transmission network, and a time, remove individuals who were not infected at the given time and output the resulting clusters.
- Usage:
cluster_previous_time.py [-h] -c CLUSTERING -tn TRANSMISSIONS -t TIME [-o OUTPUT]
CLUSTERING
: Input clustering file (Cluster Picker format)TRANSMISSIONS
: Input transmission network (FAVITES format)TIME
: Time to remove all individuals infected afterOUTPUT
: Output file (default: standard output)
- Usage:
-
cn_adjacency_matrix_to_favites.py: Convert a given contact network from a binary adjacency matrix to the FAVITES format
- Usage:
cn_adjacency_matrix_to_favites.py [-h] -i INPUT [-o OUTPUT] [-d DELIM]
INPUT
: Input contact network file (binary adjacency matrix)OUT
: Output contact network file (FAVITES format) (default: standard output)DELIM
: Column delimiter of input binary adjacency matrix (default: empty string)
- Usage:
-
degree_stats.py: Given a contact or transmission network, compute various statistics of the node degree distribution
- Usage:
degree_stats.py [-h] -i INPUT
INPUT
: Input contact or transmission network file (FAVITES format)
- Usage:
-
FAVITES2GEXF.py: Convert a FAVITES contact network and transmission network to the GEXF format
- Usage:
FAVITES2GEXF.py [-h] -c CONTACT_NETWORK -t TRANSMISSION_NETWORK [-o OUTPUT]
CONTACT_NETWORK
: FAVITES-format contact networkTRANSMISSION_NETWORK
: FAVITES-format transmission networkOUTPUT
Output file (default: standard output)
- Usage:
-
label_internal_nodes.py: Label the internal nodes of a simulated tree with the corresponding infected individual
- Usage:
label_internal_nodes.py [-h] -tn TRANSMISSION_NETWORK -tt TREE_TIME [-tm TREE_MUTATION] -ot OUTPUT_TIME [-om OUTPUT_MUTATION]
TRANSMISSION_NETWORK
: FAVITES-format transmission networkTREE_TIME
: Simulated tree in unit of timeTREE_MUTATION
: Simulated tree in unit of expected number of per-site mutationsOUTPUT_TIME
: Output labeled tree in unit of timeOUTPUT_MUTATION
: Output labeled tree in unit of expected number of per-site mutations
- Usage:
-
PANGEA_transmissions_to_FAVITES.py: Convert a PANGEA transmission network into the FAVITES edge-list format
- Usage:
PANGEA_transmissions_to_FAVITES.py [-h] -i INPUT [-o OUTPUT]
INPUT
: Input PANGEA transmission network (.csv)OUTPUT
: Output file (default: stdout)
- Usage:
-
patristic_distances.py: Given a phylogenetic tree, compute the pairwise distances between leaves and output the resulting distance matrix as a CSV file
- Usage:
patristic_distances.py [-h] -t TREE [-s SCHEMA] [-o OUTPUT]
TREE
: Input treeSCHEMA
: Input tree schema (default: Newick)OUTPUT
: Output file (default: standard output)
- Usage:
-
scale_tree.py: Given a phylogenetic tree (in the Newick format), scale all branches
- Usage:
scale_tree.py [-h] -t TREE [-o OUTPUT] -m MODE [parameters]
TREE
: Input treeOUTPUT
: Output file (default: stdout)MODE
: Scaling mode- Constant:
scale_tree.py -t TREE [-o OUTPUT] -m c CONSTANT
- Exponential:
scale_tree.py -t TREE [-o OUTPUT] -m e SCALE
- Gamma:
scale_tree.py -t TREE [-o OUTPUT] -m g SHAPE SCALE
- Log-Normal:
Log-Normal Mode Usage: scale_tree.py -t TREE [-o OUTPUT] -m ln MU SIGMA
- Constant:
- Usage:
-
score_clusters.py: Score a given query clustering against a given true reference clustering
- Usage:
score_clusters.py [-h] -q QUERY -r REFERENCE -m METRIC
QUERY
: Query clustering file (Cluster Picker format)REFERENCE
: Reference clustering file (Cluster Picker format)METRIC
: Scoring metricAMI
: Adjusted Mutual InformationARI
: Adjusted Rand IndexCOM
: Completeness ScoreFMI
: Fowlkes-Mallows IndexHCV
: Compute Homogeneity, Completeness, and V-Measure simultaneouslyHOM
: Homogeneity ScoreMI
: Mutual InformationNMI
: Normalized Mutual InformationVM
: V-Measure
- Usage:
-
tn93_to_clusters.py: Convert tn93 output to the Cluster Picker clustering format
- Usage:
tn93_to_clusters.py [-h] -i INPUT [-t THRESHOLD] [-o OUTPUT]
INPUT
: Input TN93 distances fileTHRESHOLD
: Distance threshold t (default: infinity)- Two individuals u and v are placed in a cluster together if their distance is less than or equal to t
OUTPUT
: Output file (default: standard output)
- Usage: