Skip to content

Trabajo de Fin de Grado Grado Ingeniería de la Salud UMA

License

Notifications You must be signed in to change notification settings

Fiorellaps/MachineLAlign

Repository files navigation

MachineLalign

Aligning Multiple Sequences with Machine Learning models

MachineLalign is a final thesis project developed in Python that includes a web page designed with HTML, CSS and Flask. This study has the goal to prove that DL architectures are an effective approach to MSA, thus, we have created other Machine Learning models, such as Gradient Boosting or Random Forest, to demonstrate that fact.

Publication available in Deep Learning and its applications in Multiple Sequence Alignment

Requirements

To be able to compute the sequence alignments, the following tools must be installed and moved to /MSA_tools folder:

It can be run on Linux or macOS.

Features

It includes the training and testing of the following models:

  • Decision Tree
  • Random Forest
  • Gradient Boosting
  • Adaboost
  • K Nearest Neighbours
  • CNN-BiLSTM
  • CNN

These models have passed through hyperparameter tuning and cross validation.

Downloading

To download MachineLalign you must colne this Git repository with the following url:

$ git clone https://github.com/Fiorellaps/MachineLAlign.git

Before running the tool, you must ensure that you have all the packages installed or just execute the following command line:

$ pip install -r requirements.txt

Tune models

Firstly, make sure you have downloaded and saved the input data into /resources. Then, to apply hyperparametrization tuning with the models got to /src and write:

$ python3 train_models.py

Test Final models

The final models can be train and tested by the following command line:

$ python3 test_models.py

Final models will be saved at /src/models

Align Sequences

To compute alignment it is necessary to have tools installed and models saved at /src/models(models can be also downloaded here). Go to /src and type:

$ python3 align_sequences.py 'SEQUENCES_FILE_NAME' 'MODEL_NAME'

Models' names are: 'decisiontree', 'randomforest', 'adaboost', 'gradientboosting', 'knn', 'cnnbilstm', 'cnn'.

Example: $ python3 align_sequences.py './input_fasta/BAliBASE/BB11001' 'knn'

The output will be saved at /src/output_aligned'.

Run the API Rest

The we application can be run going to /webpage and writing:

$ python3 application.py

Once runned you must go to 'Align' section and it will appear an interface like this:

Alignment interface

Finally, the output will be showed one you introduce the sequence and select one or more models:

Result interface

For scoring the alignment we have used pyMSA.

Authors

  • Student

Fiorella Piriz Sapio: fiorellapiriz@uma.es

  • Tutors

Antonio J. Nebro: antonio@lcc.uma.es

José Manuel García Nieto: jnieto@lcc.uma.es

About

Trabajo de Fin de Grado Grado Ingeniería de la Salud UMA

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published