MachineLalign is a final thesis project developed in Python that includes a web page designed with HTML, CSS and Flask. This study has the goal to prove that DL architectures are an effective approach to MSA, thus, we have created other Machine Learning models, such as Gradient Boosting or Random Forest, to demonstrate that fact.
Publication available in Deep Learning and its applications in Multiple Sequence Alignment
To be able to compute the sequence alignments, the following tools must be installed and moved to /MSA_tools
folder:
It can be run on Linux or macOS.
It includes the training and testing of the following models:
- Decision Tree
- Random Forest
- Gradient Boosting
- Adaboost
- K Nearest Neighbours
- CNN-BiLSTM
- CNN
These models have passed through hyperparameter tuning and cross validation.
To download MachineLalign you must colne this Git repository with the following url:
$ git clone https://github.com/Fiorellaps/MachineLAlign.git
Before running the tool, you must ensure that you have all the packages installed or just execute the following command line:
$ pip install -r requirements.txt
Firstly, make sure you have downloaded and saved the input data into /resources
. Then, to apply hyperparametrization tuning with the models got to /src
and write:
$ python3 train_models.py
The final models can be train and tested by the following command line:
$ python3 test_models.py
Final models will be saved at /src/models
To compute alignment it is necessary to have tools installed and models saved at /src/models
(models can be also downloaded here). Go to /src
and type:
$ python3 align_sequences.py 'SEQUENCES_FILE_NAME' 'MODEL_NAME'
Models' names are: 'decisiontree', 'randomforest', 'adaboost', 'gradientboosting', 'knn', 'cnnbilstm', 'cnn'.
Example:
$ python3 align_sequences.py './input_fasta/BAliBASE/BB11001' 'knn'
The output will be saved at /src/output_aligned'
.
The we application can be run going to /webpage
and writing:
$ python3 application.py
Once runned you must go to 'Align' section and it will appear an interface like this:
Finally, the output will be showed one you introduce the sequence and select one or more models:
For scoring the alignment we have used pyMSA.
- Student
Fiorella Piriz Sapio: fiorellapiriz@uma.es
- Tutors