A set of Python scripts to proceed to taxonomical resolution and retrieval of upper taxonomies.
- Github repository: https://github.com/digital-botanical-gardens-initiative/taxonomical-utils/
- Documentation https://digital-botanical-gardens-initiative.github.io/taxonomical-utils/
This repository contains a set of Python scripts to proceed to taxonomical resolution and retrieval of upper taxonomies. For now it uses the Open Tree of Life as a source of taxonomical data. The taxonomical-utils are merely wrappers around the python opentree package. It includes functions for resolving taxonomic names, appending upper taxonomic lineage information, and merging data files.
To install the Taxonomical Utils, follow these steps:
git clone https://github.com/digital-botanical-gardens-initiative/taxonomical-utils.git
cd taxonomical-utils
Install the required dependencies using Poetry:
poetry install
Taxonomical Utils provides several command-line interface (CLI) commands to process taxonomic data. Each command can be run individually or as part of a pipeline.
This command resolves taxonomic names from an input file and generates a resolved taxa file.
Command:
poetry run taxonomical-utils resolve --input-file <input_file> --output-file <resolved_taxa_file> --org-column-header <org_column_header>
- <input_file>: Path to the input CSV/TSV file containing taxonomic names.
- <resolved_taxa_file>: Path to the output file where resolved taxa will be saved.
- <org_column_header>: Column header in the input file that contains the taxonomic names.
Example:
poetry run taxonomical-utils resolve --input-file ./data/in/example.csv --output-file ./data/out/resolved_taxa.csv --org-column-header idTaxon
This command appends upper taxonomic lineage information to the resolved taxa file.
Command:
poetry run taxonomical-utils append-taxonomy --input-file <resolved_taxa_file> --output-file <upper_taxa_lineage_file>
- <resolved_taxa_file>: Path to the resolved taxa file generated by the resolve command.
- <upper_taxa_lineage_file>: Path to the output file where the upper taxa lineage information will be saved.
Example:
poetry run taxonomical-utils append-taxonomy --input-file data/out/resolved_taxa.csv --output-file data/out/upper_taxa_lineage.csv
This command merges the original input file with the resolved taxa file and upper taxa lineage file to produce a fully resolved dataset.
Command:
poetry run taxonomical-utils merge --input-file <input_file> --resolved-taxa-file <resolved_taxa_file> --upper-taxa-lineage-file <upper_taxa_lineage_file> --output-file <final_output_file> --org-column-header <org_column_header>
- <input_file>: Path to the original input CSV/TSV file.
- <resolved_taxa_file>: Path to the resolved taxa file generated by the resolve command.
- <upper_taxa_lineage_file>: Path to the upper taxa lineage file generated by the append-taxonomy command.
- <final_output_file>: Path to the final output file where the merged data will be saved.
- <org_column_header>: Column header in the input file that contains the taxonomic names.
Example:
poetry run taxonomical-utils merge --input-file data/example.csv --resolved-taxa-file data/out/resolved_taxa.csv --upper-taxa-lineage-file data/out/upper_taxa_lineage.csv --output-file data/out/final_output.csv --org-column-header idTaxon
To run the entire pipeline, you can execute the commands sequentially:
poetry run taxonomical-utils resolve --input-file data/example.csv --output-file data/out/resolved_taxa.csv --org-column-header idTaxon
poetry run taxonomical-utils append-taxonomy --input-file data/out/resolved_taxa.csv --output-file data/out/upper_taxa_lineage.csv
poetry run taxonomical-utils merge --input-file data/example.csv --resolved-taxa-file data/out/resolved_taxa.csv --upper-taxa-lineage-file data/out/upper_taxa_lineage.csv --output-file data/out/final_output.csv --org-column-header idTaxon
You can also run the commands in a pipeline using && to ensure each command runs only if the previous command succeeds:
poetry run taxonomical-utils resolve --input-file data/example.csv --output-file data/out/resolved_taxa.csv --org-column-header idTaxon && \
poetry run taxonomical-utils append-taxonomy --input-file data/out/resolved_taxa.csv --output-file data/out/upper_taxa_lineage.csv && \
poetry run taxonomical-utils merge --input-file data/example.csv --resolved-taxa-file data/out/resolved_taxa.csv --upper-taxa-lineage-file data/out/upper_taxa_lineage.csv --output-file data/out/final_output.csv --org-column-header idTaxon
To run the tests, use the following command:
make test
This will execute the test suite and ensure that all functions are working correctly.
Contributions are welcome! Please submit a pull request or open an issue to discuss any changes.
Repository initiated with fpgmaas/cookiecutter-poetry.