GitHub - cobilab/altair: AltaiR: a C toolkit for alignment-free and temporal analysis of multi-FASTA data

AltaiR: a C toolkit for alignment-free and temporal analysis of multi-FASTA data.

This method provides alignment-free and temporal analysis of multi-FASTA data through the implementation of a C toolkit highly flexible and with characteristics covering large-scale data, namely extensive collections of genomes/proteomes. This toolkit is ideal for scenarios entangling the presence of multiple sequences from epidemic and pandemic events. AlcoR is implemented in C language using multi-threading to increase the computational speed, is flexible for multiple applications, and does not contain external dependencies. The tool accepts any sequence(s) in (multi-) FASTA format.

The AltaiR toolkit contains one main menu (command: AltaiR) with the six sub menus for computing the features that it provides, namely

average: moving average filter of a column float CSV file (the column to use is a parameter);
filter: filters FASTA reads by characteristics: alphabet, completeness, length, CG quantity, multiple string patterns and pattern absence;
frequency: computes the alphabet frequencies for each FASTA read (it enables alphabet filtering);
nc: computes the Normalized Compression (NC) for all FASTA reads according to a compression level;
ncd: computes the Normalized Compression Distance (NCD) for all FASTA reads according to a reference;
raw: computes Relative Absent Words (RAWs) with CG quantity estimation for all RAWs.

INSTALLATION

Conda

First, install Miniconda if you haven't already. Then, to create a new Conda environment named altair and install altair-mf using Conda Forge and Bioconda channels, run the following command:

mamba create -n altair -c conda-forge -c bioconda altair-mf

To simply install altair-mf in an existing environment:

conda install -y -c bioconda altair-mf

Otherwise, CMake is needed for manual installation. You can download CMake directly from http://www.cmake.org/cmake/resources/software.html or use an appropriate package manager. Below are the instructions to install, compile, and run AltaiR:

sudo apt-get install cmake git
git clone https://github.com/cobilab/altair.git
cd altair/src/
cmake .
make

Additional Tools

For certain scripts, the Gto toolkit is required, installable via Conda:

conda install -c cobilab gto --yes

Or manually:

git clone https://github.com/cobilab/gto.git
cd gto/src/
make
export PATH="$HOME/gto/bin:$PATH"

PARAMETERS

To see the possible options type

AltaiR

or

AltaiR -h

If you are not interested in viewing each sub-program option, type

AltaiR average -h
AltaiR filter -h
AltaiR frequency -h
AltaiR nc -h
AltaiR ncd -h
AltaiR raw -h

Reproducing Experiments

Assuming AltaiR is compiled under the src/ folder, and you are in the pipeline/ folder.

cp ../src/AltaiR .

Filtering Sequences

To filter sequences use the following command:

python3 Histogram.py
bash Filter.sh 29885 29921

Similarity Profiles (NCD)

To simulate and measure similarity profiles:

bash Simulation.sh
bash Similarity.sh ORIGINAL.fa
bash SimProfile.sh sim-data.csv 2 0 1.2
mv NCDProfilesim-data.csv.pdf NCD_P1.pdf

Phylogenetic Tree Construction

Use the tree.py script to construct a phylogenetic tree from NCD values:

python3 tree.py sim-data.csv -N 50

Complexity Profiles (NC)

Run the following script to generate complexity profiles:

bash ComplexitySars.sh
python3 CompProfileSars.py comp-data.csv sorted_output.fa 0.961 0.9617
mv NCProfilecomp-data.csv.pdf NC.pdf

Frequency Profiles

Generate frequency profiles using the following commands:

bash FrequencySars.sh
python3 combine_freq_and_date.py
mv base_frequencies_plot.pdf Freq.pdf

Relative Singularity (RAWs) Profiles

To calculate RAWs profiles:

bash RawSars.sh
python3 RawSarsProfile.py sorted_output.fa
mv relativeSingularityProfile.pdf RAWProfiles.pdf

Citation

If you use AltaiR in your research, please cite: Silva, Jorge M., Armando J. Pinho, and Diogo Pratas. "AltaiR: a C toolkit for alignment-free and temporal analysis of multi-FASTA data." GigaScience 13 (2024): giae086.

Issues

For any issues, please report at AltaiR Issues.

License

AltaiR is licensed under GPL v3. For more information, visit GPL v3 License.

Name		Name	Last commit message	Last commit date
Latest commit History 69 Commits
imgs		imgs
pipelines		pipelines
src		src
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

INSTALLATION

Conda

Additional Tools

PARAMETERS

Reproducing Experiments

Filtering Sequences

Similarity Profiles (NCD)

Phylogenetic Tree Construction

Complexity Profiles (NC)

Frequency Profiles

Relative Singularity (RAWs) Profiles

Citation

Issues

License

About

Releases 2

Packages

Contributors 2

Languages

License

cobilab/altair

Folders and files

Latest commit

History

Repository files navigation

INSTALLATION

Conda

Additional Tools

PARAMETERS

Reproducing Experiments

Filtering Sequences

Similarity Profiles (NCD)

Phylogenetic Tree Construction

Complexity Profiles (NC)

Frequency Profiles

Relative Singularity (RAWs) Profiles

Citation

Issues

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 2

Packages 0

Contributors 2

Languages

Packages