Impact of genetic features of Pseudomonas putida in its bioremediation potential

This repository collects the scripts and pipelines used for the determination of the features that determine Pseudomonas putida's bioremediation potential.

Requirements

Prokka
Roary
chewBBACA
local BLASTn

Phylogenetic, functional and pan genomic analysis

The genomic analysis pipeline has been described in the Pangenomic__phylogenetic_and_functional_analysis_pipeline.pdf document. It contains python and bash code.

Bioremediation potential determination

The determination of bioremediation potential of each strain, understood as the sum of the presence of different bioremediation-related genes, is resumed in the Bioremediation_potential_determination_pipeline.pdf document. This pipeline describes the steps taken in the assignment of a bioremediation potential value to each strain.

Apart from bioremediation-related genes the pipeline can also be modified to retrieve other types of genes in different genomes, if the genomic sequences of said genes are available.

ML approach of bioremediation potential prediction

The prediction of the bioremediation potential of different Pseudomona putida strains is resumed in the Bioremediation_potential_prediction.ipnyb file. The document contains a script with the steps we followed in order to explore the bioremediation potential data and genome metadata. The steps we have followed have been the following:

Data cleaning
Correlation analysis
Unsupervised learning
- Dimensionality reduction (PCA)
- PERMANOVA analysis
Supervised learning
- XGBoost model
- SHAP analysis

The metrics of the model are the following:

MAE: 2.89
MSE: 12.58
MAPE: 0.12
R2: 0.8

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
.gitattributes		.gitattributes
Bioremediation_potential_determination_pipeline.pdf		Bioremediation_potential_determination_pipeline.pdf
Bioremediation_potential_predicting.ipynb		Bioremediation_potential_predicting.ipynb
Pangenomic__phylogenetic_and_functional_analysis_pipeline.pdf		Pangenomic__phylogenetic_and_functional_analysis_pipeline.pdf
README.md		README.md
bioremediation_counts.py		bioremediation_counts.py
convert_gff_to_gff3.sh		convert_gff_to_gff3.sh
gene_multifasta_preparation.py		gene_multifasta_preparation.py
local_blast_run.sh		local_blast_run.sh
pangenome_analysis.sh		pangenome_analysis.sh
phylogenetic_analysis.sh		phylogenetic_analysis.sh
prokka_annotation.sh		prokka_annotation.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Impact of genetic features of Pseudomonas putida in its bioremediation potential

Requirements

Phylogenetic, functional and pan genomic analysis

Bioremediation potential determination

ML approach of bioremediation potential prediction

About

Releases

Packages

Languages

liberentaizp/Bioremediation-potential

Folders and files

Latest commit

History

Repository files navigation

Impact of genetic features of Pseudomonas putida in its bioremediation potential

Requirements

Phylogenetic, functional and pan genomic analysis

Bioremediation potential determination

ML approach of bioremediation potential prediction

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages