This predictor takes protein sequence fasta files, and predicts the amino acid structure in 3 state format.
It is trained using cas2.3line dataset. And it also predicts sequence structure in 3 state format.
data/testset.dat is an example of sequences it could predict.
To use this predictor, please set work directory as /StrucPred and run the file.
Use evolutionary information (psiblast and PSSM)
Use the neighbor amino acids information in prediction (Builing amino acid windows)
A variaty of SVM methods to choose, including linear SVM and rbfsvm.
Cross Validation methods are used to split the dataset.
Other machine learning method can be used to compare the prediction result, including random forest and simple decision tree.
Prediction and evaluation of predaction are stored in the result folder. (You can find some evaluation of predictions I have tried.
This predictor is written in Python3
Packages required:
To use this predictor, you need to install pickle, pandas 0.22.0, scikiy-learn 0.19.1 package。
There are several models to choose from in the model folder.You can change the model in lin293,
The evalations are stored in /result folder, which include Q3 and coeffeience co.
You can also get the cross validation store and prediction accuracy by removing the triple-quotes. This may take a long time.
model_*.py files are files I used to create different models.
- is used to parser additional 50 protein sequences.
In pssm folder:
Folder 'Sequences': sequences to be psiblasted
Folder 'psiblast_pssm': raw pssm result given by psiblast
Folder 'pssmMatrix': pssm in csv format : formating psiblast database- carry out psiblast : raw pssm result to pssm.csv : parse pssm and use it later in svm