The concept of protein structure prediction is crucial to the development of proteins, and could improve the simulation of protein-protein interactions. Secondary structure is just one in a hierarchy of levels of protein structure, including primary (order of amino acids), secondary (alpha and beta pleated sheets), tertiary (3D conformation), and quaternary (interaction of multiple polypeptides)
This project uses 1-D Conv Net to predict the secondary structure of a protein, given its primary structure (order of amino acids). The data used is a public dataset containing primary sequences and their corresponding secondary sequence, used for training.
The file containing data is called "ModifiedSSData1". The machine learning program is called "Protein Structure Prediction - CNN". The .ipynb file has more information on the specifics of how the program works. Please feel free to try it out and build upon it!