Protein Secondary Structure Support Vector Mechine Predictor

This predictor takes protein sequence fasta files, and predicts the amino acid structure in 3 state format.

It is trained using cas2.3line dataset. And it also predicts sequence structure in 3 state format.

data/testset.dat is an example of sequences it could predict.

To use this predictor, please set work directory as /StrucPred and run the predictor.py file.

Features:

Use evolutionary information (psiblast and PSSM)
Use the neighbor amino acids information in prediction (Builing amino acid windows)
A variaty of SVM methods to choose, including linear SVM and rbfsvm.
Cross Validation methods are used to split the dataset.
Other machine learning method can be used to compare the prediction result, including random forest and simple decision tree.
Prediction and evaluation of predaction are stored in the result folder. (You can find some evaluation of predictions I have tried.

This predictor is written in Python3

Download：

Packages required：

To use this predictor, you need to install pickle, pandas 0.22.0, scikiy-learn 0.19.1 package。

Models to choose :

There are several models to choose from in the model folder.You can change the model in lin293, predictor.py

Evaluation:

The evalations are stored in /result folder, which include Q3 and coeffeience co.

You can also get the cross validation store and prediction accuracy by removing the triple-quotes. This may take a long time.

Coding files:

model_*.py files are files I used to create different models.
additional_dataset_parser.py is used to parser additional 50 protein sequences.

PSSM:

In pssm folder:

Folder 'Sequences': sequences to be psiblasted

Folder 'psiblast_pssm': raw pssm result given by psiblast

Folder 'pssmMatrix': pssm in csv format

formatdb.sh : formating psiblast database-

psiblast.sh: carry out psiblast

extractPSSM.py : raw pssm result to pssm.csv

parser_PSSMtoSVM_MultipleFiles.py : parse pssm and use it later in svm

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
__pycache__		__pycache__
data		data
models		models
pssm		pssm
result		result
additional_dataset_parser.py		additional_dataset_parser.py
extra.py		extra.py
fasta_parser.py		fasta_parser.py
model_linearsvm.py		model_linearsvm.py
model_pssm.py		model_pssm.py
model_randomforest.py		model_randomforest.py
model_rbfsvm.py		model_rbfsvm.py
model_simpledecisiontree.py		model_simpledecisiontree.py
predictor_3line.py		predictor_3line.py
predictor_fasta.py		predictor_fasta.py
predictor_pssm.py		predictor_pssm.py
projectDiary.md		projectDiary.md
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Protein Secondary Structure Support Vector Mechine Predictor

Models to choose :

Evaluation:

Coding files:

PSSM:

About

Releases

Packages

Languages

XuperX/StrucPred

Folders and files

Latest commit

History

Repository files navigation

Protein Secondary Structure Support Vector Mechine Predictor

Models to choose :

Evaluation:

Coding files:

PSSM:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages