From 578e55e4454e5a84615f20e84027aac857622122 Mon Sep 17 00:00:00 2001 From: j3xugit Date: Sun, 8 Jan 2017 10:49:45 -0600 Subject: [PATCH 01/14] Update 04_study.md --- sections/04_study.md | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/sections/04_study.md b/sections/04_study.md index cf9f938c..555179c3 100644 --- a/sections/04_study.md +++ b/sections/04_study.md @@ -52,7 +52,14 @@ particularly notable in this area?* ### Protein secondary and tertiary structure -*Jinbo Xu is writing this* +Proteins play fundamental roles in all biological processes including the maintenance of cellular integrity, metabolism, transcription/translation, and cell-cell communication. Complete description of protein structures and functions is a fundamental step towards understanding biological life and also highly relevant in the development of therapeutics and drugs. Tons of protein sequences have been generated, but fewer than 100,000 of them have experimentally-solved structures. As a result, computational structure prediction is essential for a majority number of protein sequences. However, predicting protein 3D structures from sequence alone is very challenging, especially when similar templates are not available. In the past decades, various computational methods have been developed to predict protein structure from different aspects, including prediction of secondary structure, torsion angles, solvent accessibility, inter-residue contact map, disorder regions and side-chain packing. + +Machine learning is extensively applied to predict protein structures and some success has been achieved. For example, secondary structure can be predicted with about 80% of Q3 accuracy by a shallow machine learning model. Starting from 2012, deep learning has been gradually introduced to protein structure prediction. The adopted deep learning models include deep belief network, LSTM(long short-term memory), deep convolutional neural networks (DCNN) and deep convolutional neural fields (i.e., combination of conditional random fields and DCNN). Here we focus on deep learning methods for two important subproblems: secondary structure prediction and contact map prediction, which represent two different aspects of protein structure prediction. Secondary structure describes the relationship of sequentially-adjacent residues while a contact describes the relationship of two sequentially-distant residues. + +Protein secondary structure prediction. There are two types of secondary structure prediction. One is to predict 3-state secondary structure and the other is to predict 8-state secondary structure. There are many more methods developed for the former than the latter, but the latter provides finer-grained information than the former. A predictor is typically evaluated by Q3 and Q8 accuracy, respectively. In XXX, Qi et al. combined deep learning with multi-task learning to simultaneously predict several local structure properties including secondary structures. In XXX, Cheng group predicts secondary structure using deep belief networks. In XXX, Zhou employed an iterative deep learning framework to predict secondary structure and backbone torsion angles. However, none of these deep learning methods achieved significant improvement over PSIPRED (a shallow neural network) in terms of Q3 accuracy. In 2014, Zhou and Troyanskaya employed supervised generative stochastic network (GSN) to predict 8-state secondary structure and demonstrated that they can improve Q8 accuracy over conditional neural fields (CNF), a shallow learning architecture. However, it is unclear if their method can improve Q3 accuracy or not. In 2016 Wang and Xu developed a deep convolutional neural fields (DeepCNF) model and showed that this model can significantly improve secondary structure prediction in terms of both Q3 and Q8 accuracy. DeepCNF is the first that reports Q3 accuracy of 84-85%, much higher than the 80% accuracy maintained by PSIPRED for more than 10 years. DeepCNF is also shown that it can improve prediction of solvent accessibility and disorder regions. + +Protein contact map prediction. Contact-assisted protein folding represents a promising new direction for ab initio folding of proteins without good templates in PDB, but it requires accurate contact prediction. Evolutionary coupling analysis (ECA) is an effective contact prediction method for some proteins with a very large number (>1000) of sequence homologs, but ECA fares poorly for proteins without many sequence homologs. Since (soluble) proteins with many sequence homologs are very likely to have a good template in PDB, to make contact-assisted folding really useful for ab initio folding, it is essential to predict accurate contacts for proteins without many sequence homologs. By combining ECA with a few other protein features, shallow neural network-based supervised learning methods such as MetaPSICOV and CoinDCA-NN have shown some advantage over ECA for proteins with a small number of sequence homologs, but their accuracy is still not very good. In recent years, deep learning methods have been explored for contact prediction. For example, Di Lena et al. XXX introduced a deep spatio-temporal neural network (up to 100 layers) that utilizes both spatial and temporal features to predict protein contacts. Eickholt and Cheng applied deep belief networks to contact prediction [XXX]. They trained deep networks first by layer-wise unsupervised learning and then by fine-tuning the entire network using supervised learning. Elofsson group employed some iterative deep learning techniques to contact prediction. However, when blindly tested in CASPs, these methods do not show any advantage over MetaPSICOV. Only until 2016, Xu group proposed a novel deep learning method that can significantly improve contact prediction accuracy over MetaPSICOV especially for proteins without many sequence homologs. Xu’s deep model is formed by one 1D residual neural network and one 2D residual neural network. Blindly tested in the latest CASP competition (i.e., CASP12), Xu’s deep learning method is ranked first in terms of the total F1 score (a widely-used performance metric) on 38 free modeling targets. In this test, the group ranked second also employed a deep learning method. Even MetaPSICOV, which ranked third, employed deeper and wider layers than its old version. Xu group has also demonstrated in another blind test CAMEO (which can be interpreted as a fully-automated CASP) that the predicted contacts can fold quite a few proteins with a novel fold and only 65-300 sequence homologs. Xu group also shows that their method works well for membrane protein contact prediction even if trained mostly by non-membrane proteins. + ### Signaling From e2dfda4ab7d2e93868abedc2eb2b6110a6520be6 Mon Sep 17 00:00:00 2001 From: j3xugit Date: Sun, 8 Jan 2017 12:01:57 -0600 Subject: [PATCH 02/14] Update 04_study.md --- sections/04_study.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/sections/04_study.md b/sections/04_study.md index 555179c3..29ac0065 100644 --- a/sections/04_study.md +++ b/sections/04_study.md @@ -54,11 +54,11 @@ particularly notable in this area?* Proteins play fundamental roles in all biological processes including the maintenance of cellular integrity, metabolism, transcription/translation, and cell-cell communication. Complete description of protein structures and functions is a fundamental step towards understanding biological life and also highly relevant in the development of therapeutics and drugs. Tons of protein sequences have been generated, but fewer than 100,000 of them have experimentally-solved structures. As a result, computational structure prediction is essential for a majority number of protein sequences. However, predicting protein 3D structures from sequence alone is very challenging, especially when similar templates are not available. In the past decades, various computational methods have been developed to predict protein structure from different aspects, including prediction of secondary structure, torsion angles, solvent accessibility, inter-residue contact map, disorder regions and side-chain packing. -Machine learning is extensively applied to predict protein structures and some success has been achieved. For example, secondary structure can be predicted with about 80% of Q3 accuracy by a shallow machine learning model. Starting from 2012, deep learning has been gradually introduced to protein structure prediction. The adopted deep learning models include deep belief network, LSTM(long short-term memory), deep convolutional neural networks (DCNN) and deep convolutional neural fields (i.e., combination of conditional random fields and DCNN). Here we focus on deep learning methods for two important subproblems: secondary structure prediction and contact map prediction, which represent two different aspects of protein structure prediction. Secondary structure describes the relationship of sequentially-adjacent residues while a contact describes the relationship of two sequentially-distant residues. +Machine learning is extensively applied to predict protein structures and some success has been achieved. For example, secondary structure can be predicted with about 80% of Q3 accuracy by shallow machine learning methods such as PSIPRED [@doi:10.1093/bioinformatics/16.4.404]. Starting from 2012, deep learning has been gradually introduced to protein structure prediction. The adopted deep learning models include deep belief network, LSTM(long short-term memory), deep convolutional neural networks (DCNN) and deep convolutional neural fields[@doi:10.1007/978-3-319-46227-1_1 @doi:10.1038/srep18962]. Here we focus on deep learning methods for two representative subproblems: secondary structure prediction and contact map prediction. Secondary structure refers to local conformation of a sequence segment while a contact map contains information of global conformation. -Protein secondary structure prediction. There are two types of secondary structure prediction. One is to predict 3-state secondary structure and the other is to predict 8-state secondary structure. There are many more methods developed for the former than the latter, but the latter provides finer-grained information than the former. A predictor is typically evaluated by Q3 and Q8 accuracy, respectively. In XXX, Qi et al. combined deep learning with multi-task learning to simultaneously predict several local structure properties including secondary structures. In XXX, Cheng group predicts secondary structure using deep belief networks. In XXX, Zhou employed an iterative deep learning framework to predict secondary structure and backbone torsion angles. However, none of these deep learning methods achieved significant improvement over PSIPRED (a shallow neural network) in terms of Q3 accuracy. In 2014, Zhou and Troyanskaya employed supervised generative stochastic network (GSN) to predict 8-state secondary structure and demonstrated that they can improve Q8 accuracy over conditional neural fields (CNF), a shallow learning architecture. However, it is unclear if their method can improve Q3 accuracy or not. In 2016 Wang and Xu developed a deep convolutional neural fields (DeepCNF) model and showed that this model can significantly improve secondary structure prediction in terms of both Q3 and Q8 accuracy. DeepCNF is the first that reports Q3 accuracy of 84-85%, much higher than the 80% accuracy maintained by PSIPRED for more than 10 years. DeepCNF is also shown that it can improve prediction of solvent accessibility and disorder regions. +Protein secondary structure prediction. Protein secondary structure can be described by 3 states or 8 states with the latter providing finer-grained information than the former. However, many more methods are developed to predict 3-state secondary structure than 8-state secondary structure. A predictor is typically evaluated by Q3 and Q8 accuracy, respectively. Qi et al. developed a multi-task deep learning method to simultaneously predict several local structure properties including secondary structures [@doi:10.1371/journal.pone.0032235]. Cheng group predicted secondary structure using deep belief networks [@doi: 10.1109/TCBB.2014.2343960]. Zhou developed an iterative deep learning framework to simultaneously predict secondary structure, backbone torsion angles and solvent accessibility [@doi:10.1038/srep11476]. However, none of these deep learning methods achieved significant improvement over PSIPRED, a 2-layer neural network method in terms of Q3 accuracy. In 2014, Zhou and Troyanskaya demonstrated that they can improve Q8 accuracy over conditional neural fields [@doi: 10.1002/pmic.201100196], a shallow architecture, by using a deep supervised and convolutional generative stochastic network[@arXiv:1403.1347], but did not report any results in terms of Q3 accuracy. In 2016 Wang and Xu developed a deep convolutional neural fields (DeepCNF) model and showed that this model can significantly improve secondary structure prediction in terms of both Q3 and Q8 accuracy[@doi:10.1038/srep18962]. DeepCNF is the first that reports Q3 accuracy of 84-85%, much higher than the 80% accuracy maintained by PSIPRED for more than 10 years. DeepCNF is also shown that it can improve prediction of solvent accessibility and disorder regions [@doi:10.1007/978-3-319-46227-1_1]. -Protein contact map prediction. Contact-assisted protein folding represents a promising new direction for ab initio folding of proteins without good templates in PDB, but it requires accurate contact prediction. Evolutionary coupling analysis (ECA) is an effective contact prediction method for some proteins with a very large number (>1000) of sequence homologs, but ECA fares poorly for proteins without many sequence homologs. Since (soluble) proteins with many sequence homologs are very likely to have a good template in PDB, to make contact-assisted folding really useful for ab initio folding, it is essential to predict accurate contacts for proteins without many sequence homologs. By combining ECA with a few other protein features, shallow neural network-based supervised learning methods such as MetaPSICOV and CoinDCA-NN have shown some advantage over ECA for proteins with a small number of sequence homologs, but their accuracy is still not very good. In recent years, deep learning methods have been explored for contact prediction. For example, Di Lena et al. XXX introduced a deep spatio-temporal neural network (up to 100 layers) that utilizes both spatial and temporal features to predict protein contacts. Eickholt and Cheng applied deep belief networks to contact prediction [XXX]. They trained deep networks first by layer-wise unsupervised learning and then by fine-tuning the entire network using supervised learning. Elofsson group employed some iterative deep learning techniques to contact prediction. However, when blindly tested in CASPs, these methods do not show any advantage over MetaPSICOV. Only until 2016, Xu group proposed a novel deep learning method that can significantly improve contact prediction accuracy over MetaPSICOV especially for proteins without many sequence homologs. Xu’s deep model is formed by one 1D residual neural network and one 2D residual neural network. Blindly tested in the latest CASP competition (i.e., CASP12), Xu’s deep learning method is ranked first in terms of the total F1 score (a widely-used performance metric) on 38 free modeling targets. In this test, the group ranked second also employed a deep learning method. Even MetaPSICOV, which ranked third, employed deeper and wider layers than its old version. Xu group has also demonstrated in another blind test CAMEO (which can be interpreted as a fully-automated CASP) that the predicted contacts can fold quite a few proteins with a novel fold and only 65-300 sequence homologs. Xu group also shows that their method works well for membrane protein contact prediction even if trained mostly by non-membrane proteins. +Protein contact map prediction. Contact-assisted protein folding represents a promising new direction for ab initio folding of proteins without good templates in PDB, but it requires accurate contact prediction. Evolutionary coupling analysis (ECA) is an effective contact prediction method for some proteins with a very large number (>1000) of sequence homologs, but ECA fares poorly for proteins without many sequence homologs. Since (soluble) proteins with many sequence homologs are very likely to have a good template in PDB, to make contact-assisted folding really useful for ab initio folding, it is essential to predict accurate contacts for proteins without many sequence homologs. By combining ECA with a few other protein features, shallow neural network-based supervised learning methods such as MetaPSICOV [@doi: 10.1093/bioinformatics/btu791] and CoinDCA-NN[@doi:10.1093/bioinformatics/btv472] have shown some advantage over ECA for proteins with a small number of sequence homologs, but their accuracy is still not very good. In recent years, deep learning methods have been explored for contact prediction. For example, Di Lena et al. introduced a deep spatio-temporal neural network (up to 100 layers) that utilizes both spatial and temporal features to predict protein contacts[@doi:10.1093/bioinformatics/bts475]. Cheng group applied deep belief networks and boosting techniques to protein contact prediction [@10.1093/bioinformatics/bts598]. They trained deep networks by layer-wise unsupervised learning followed by fine-tuning of the entire network. Elofsson group developed an iterative deep learning technique for contact prediction, in which Random Forests are applied to predict contacts at each iteration [@doi:10.1371/journal.pcbi.1003889]. However, when blindly tested in the well-known CASP competitions, these methods do not show any advantage over MetaPSICOV[@doi: 10.1093/bioinformatics/btu791], a 2-layer neural network method. Only until 2016, Xu group proposed a novel deep learning method [@doi:10.1371/journal.pcbi.1005324] that can significantly improve contact prediction accuracy over MetaPSICOV especially for proteins without many sequence homologs. Xu’s deep model is formed by one 1D residual neural network and one 2D residual neural network. Blindly tested in the latest CASP competition (i.e., CASP12), Xu’s deep learning method is ranked first in terms of the total F1 score (a widely-used performance metric) on free-modeling targets as well as the whole set of targets. In this test, the group ranked second also employed a deep learning method. Even MetaPSICOV, which ranked third, employed deeper and wider layers than its old version. Xu group has also demonstrated in another blind test CAMEO (which can be interpreted as a fully-automated CASP) [@url:http://www.cameo3d.org/] that the predicted contacts can fold quite a few proteins with a novel fold and only 65-300 sequence homologs. Xu’s method also works well on membrane protein contact prediction even if trained mostly by non-membrane proteins. ### Signaling From fdab3e7fcc91982995d3de4fbc706c0d9e5b61b2 Mon Sep 17 00:00:00 2001 From: j3xugit Date: Sun, 8 Jan 2017 12:10:38 -0600 Subject: [PATCH 03/14] Update 04_study.md --- sections/04_study.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/sections/04_study.md b/sections/04_study.md index 29ac0065..5deda7b6 100644 --- a/sections/04_study.md +++ b/sections/04_study.md @@ -54,11 +54,11 @@ particularly notable in this area?* Proteins play fundamental roles in all biological processes including the maintenance of cellular integrity, metabolism, transcription/translation, and cell-cell communication. Complete description of protein structures and functions is a fundamental step towards understanding biological life and also highly relevant in the development of therapeutics and drugs. Tons of protein sequences have been generated, but fewer than 100,000 of them have experimentally-solved structures. As a result, computational structure prediction is essential for a majority number of protein sequences. However, predicting protein 3D structures from sequence alone is very challenging, especially when similar templates are not available. In the past decades, various computational methods have been developed to predict protein structure from different aspects, including prediction of secondary structure, torsion angles, solvent accessibility, inter-residue contact map, disorder regions and side-chain packing. -Machine learning is extensively applied to predict protein structures and some success has been achieved. For example, secondary structure can be predicted with about 80% of Q3 accuracy by shallow machine learning methods such as PSIPRED [@doi:10.1093/bioinformatics/16.4.404]. Starting from 2012, deep learning has been gradually introduced to protein structure prediction. The adopted deep learning models include deep belief network, LSTM(long short-term memory), deep convolutional neural networks (DCNN) and deep convolutional neural fields[@doi:10.1007/978-3-319-46227-1_1 @doi:10.1038/srep18962]. Here we focus on deep learning methods for two representative subproblems: secondary structure prediction and contact map prediction. Secondary structure refers to local conformation of a sequence segment while a contact map contains information of global conformation. +Machine learning is extensively applied to predict protein structures and some success has been achieved. For example, secondary structure can be predicted with about 80% of Q3 accuracy by a 2-layer neural network method PSIPRED [@doi:10.1093/bioinformatics/16.4.404]. Starting from 2012, deep learning has been gradually introduced to protein structure prediction. The adopted deep learning models include deep belief network, LSTM(long short-term memory), deep convolutional neural networks (DCNN) and deep convolutional neural fields[@doi:10.1007/978-3-319-46227-1_1 @doi:10.1038/srep18962]. Here we focus on deep learning methods for two representative subproblems: secondary structure prediction and contact map prediction. Secondary structure refers to local conformation of a sequence segment while a contact map contains information of global conformation. -Protein secondary structure prediction. Protein secondary structure can be described by 3 states or 8 states with the latter providing finer-grained information than the former. However, many more methods are developed to predict 3-state secondary structure than 8-state secondary structure. A predictor is typically evaluated by Q3 and Q8 accuracy, respectively. Qi et al. developed a multi-task deep learning method to simultaneously predict several local structure properties including secondary structures [@doi:10.1371/journal.pone.0032235]. Cheng group predicted secondary structure using deep belief networks [@doi: 10.1109/TCBB.2014.2343960]. Zhou developed an iterative deep learning framework to simultaneously predict secondary structure, backbone torsion angles and solvent accessibility [@doi:10.1038/srep11476]. However, none of these deep learning methods achieved significant improvement over PSIPRED, a 2-layer neural network method in terms of Q3 accuracy. In 2014, Zhou and Troyanskaya demonstrated that they can improve Q8 accuracy over conditional neural fields [@doi: 10.1002/pmic.201100196], a shallow architecture, by using a deep supervised and convolutional generative stochastic network[@arXiv:1403.1347], but did not report any results in terms of Q3 accuracy. In 2016 Wang and Xu developed a deep convolutional neural fields (DeepCNF) model and showed that this model can significantly improve secondary structure prediction in terms of both Q3 and Q8 accuracy[@doi:10.1038/srep18962]. DeepCNF is the first that reports Q3 accuracy of 84-85%, much higher than the 80% accuracy maintained by PSIPRED for more than 10 years. DeepCNF is also shown that it can improve prediction of solvent accessibility and disorder regions [@doi:10.1007/978-3-319-46227-1_1]. +Protein secondary structure can be described by 3 states or 8 states with the latter providing finer-grained information than the former. However, many more methods are developed to predict 3-state secondary structure than 8-state secondary structure. A predictor is typically evaluated by Q3 and Q8 accuracy, respectively. Qi et al. developed a multi-task deep learning method to simultaneously predict several local structure properties including secondary structures [@doi:10.1371/journal.pone.0032235]. Cheng group predicted secondary structure using deep belief networks [@doi: 10.1109/TCBB.2014.2343960]. Zhou developed an iterative deep learning framework to simultaneously predict secondary structure, backbone torsion angles and solvent accessibility [@doi:10.1038/srep11476]. However, none of these deep learning methods achieved significant improvement over PSIPRED in terms of Q3 accuracy. In 2014, Zhou and Troyanskaya demonstrated that they could improve Q8 accuracy over a shallow learning architecture conditional neural fields [@doi: 10.1002/pmic.201100196] by using a deep supervised and convolutional generative stochastic network[@arXiv:1403.1347], but did not report any results in terms of Q3 accuracy. In 2016 Wang and Xu developed a deep convolutional neural fields (DeepCNF) model and showed that this model can significantly improve secondary structure prediction in terms of both Q3 and Q8 accuracy[@doi:10.1038/srep18962]. DeepCNF is the first that reports Q3 accuracy of 84-85%, much higher than the 80% accuracy maintained by PSIPRED for more than 10 years. It is also reported that DeepCNF can improve prediction of solvent accessibility and disorder regions [@doi:10.1007/978-3-319-46227-1_1]. -Protein contact map prediction. Contact-assisted protein folding represents a promising new direction for ab initio folding of proteins without good templates in PDB, but it requires accurate contact prediction. Evolutionary coupling analysis (ECA) is an effective contact prediction method for some proteins with a very large number (>1000) of sequence homologs, but ECA fares poorly for proteins without many sequence homologs. Since (soluble) proteins with many sequence homologs are very likely to have a good template in PDB, to make contact-assisted folding really useful for ab initio folding, it is essential to predict accurate contacts for proteins without many sequence homologs. By combining ECA with a few other protein features, shallow neural network-based supervised learning methods such as MetaPSICOV [@doi: 10.1093/bioinformatics/btu791] and CoinDCA-NN[@doi:10.1093/bioinformatics/btv472] have shown some advantage over ECA for proteins with a small number of sequence homologs, but their accuracy is still not very good. In recent years, deep learning methods have been explored for contact prediction. For example, Di Lena et al. introduced a deep spatio-temporal neural network (up to 100 layers) that utilizes both spatial and temporal features to predict protein contacts[@doi:10.1093/bioinformatics/bts475]. Cheng group applied deep belief networks and boosting techniques to protein contact prediction [@10.1093/bioinformatics/bts598]. They trained deep networks by layer-wise unsupervised learning followed by fine-tuning of the entire network. Elofsson group developed an iterative deep learning technique for contact prediction, in which Random Forests are applied to predict contacts at each iteration [@doi:10.1371/journal.pcbi.1003889]. However, when blindly tested in the well-known CASP competitions, these methods do not show any advantage over MetaPSICOV[@doi: 10.1093/bioinformatics/btu791], a 2-layer neural network method. Only until 2016, Xu group proposed a novel deep learning method [@doi:10.1371/journal.pcbi.1005324] that can significantly improve contact prediction accuracy over MetaPSICOV especially for proteins without many sequence homologs. Xu’s deep model is formed by one 1D residual neural network and one 2D residual neural network. Blindly tested in the latest CASP competition (i.e., CASP12), Xu’s deep learning method is ranked first in terms of the total F1 score (a widely-used performance metric) on free-modeling targets as well as the whole set of targets. In this test, the group ranked second also employed a deep learning method. Even MetaPSICOV, which ranked third, employed deeper and wider layers than its old version. Xu group has also demonstrated in another blind test CAMEO (which can be interpreted as a fully-automated CASP) [@url:http://www.cameo3d.org/] that the predicted contacts can fold quite a few proteins with a novel fold and only 65-300 sequence homologs. Xu’s method also works well on membrane protein contact prediction even if trained mostly by non-membrane proteins. +Contact-assisted protein folding represents a promising new direction for ab initio folding of proteins without good templates in PDB, but it requires accurate contact prediction. Evolutionary coupling analysis (ECA) is an effective contact prediction method for some proteins with a very large number (>1000) of sequence homologs, but ECA fares poorly for proteins without many sequence homologs. Since (soluble) proteins with many sequence homologs are very likely to have a good template in PDB, to make contact-assisted folding practically useful for ab initio folding, it is essential to predict accurate contacts for proteins without many sequence homologs. By combining ECA with a few other protein features, shallow neural network-based methods such as MetaPSICOV [@doi: 10.1093/bioinformatics/btu791] and CoinDCA-NN[@doi:10.1093/bioinformatics/btv472] have shown some advantage over ECA for proteins with a small number of sequence homologs, but their accuracy is still not very good. In recent years, deep learning methods have been explored for contact prediction. For example, Di Lena et al. introduced a deep spatio-temporal neural network (up to 100 layers) that utilizes both spatial and temporal features to predict protein contacts[@doi:10.1093/bioinformatics/bts475]. Cheng group combined deep belief networks and boosting techniques to predict protein contacts [@10.1093/bioinformatics/bts598] and trained deep networks by layer-wise unsupervised learning followed by fine-tuning of the entire network. Elofsson group developed an iterative deep learning technique for contact prediction, in which Random Forests are applied to predict contacts at each iteration [@doi:10.1371/journal.pcbi.1003889]. However, blindly tested in the well-known CASP competitions, these methods did not show any advantage over MetaPSICOV[@doi: 10.1093/bioinformatics/btu791], a 2-layer neural network method. Only until 2016, Xu group proposed a novel deep learning method [@doi:10.1371/journal.pcbi.1005324] that can significantly improve contact prediction accuracy over MetaPSICOV especially for proteins without many sequence homologs. Xu’s deep model is formed by one 1D residual neural network and one 2D residual neural network. Blindly tested in the latest CASP competition (i.e., CASP12), Xu’s deep learning method is ranked first in terms of the total F1 score (a widely-used performance metric) on free-modeling targets as well as the whole set of targets. In this test, the group ranked second also employed a deep learning method. Even MetaPSICOV, which ranked third, employed deeper and wider layers than its old version. Xu group has also demonstrated in another blind test CAMEO (which can be interpreted as a fully-automated CASP) [@url:http://www.cameo3d.org/] that the predicted contacts can fold quite a few proteins with a novel fold and only 65-300 sequence homologs. Xu’s method also works well on membrane protein contact prediction even if trained mostly by non-membrane proteins. ### Signaling From 4013de76a585a807b958226f0831fc4bfc1a749d Mon Sep 17 00:00:00 2001 From: j3xugit Date: Wed, 18 Jan 2017 23:17:29 -0600 Subject: [PATCH 04/14] Update 04_study.md --- sections/04_study.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/sections/04_study.md b/sections/04_study.md index 5deda7b6..a0cd35e0 100644 --- a/sections/04_study.md +++ b/sections/04_study.md @@ -58,7 +58,8 @@ Machine learning is extensively applied to predict protein structures and some s Protein secondary structure can be described by 3 states or 8 states with the latter providing finer-grained information than the former. However, many more methods are developed to predict 3-state secondary structure than 8-state secondary structure. A predictor is typically evaluated by Q3 and Q8 accuracy, respectively. Qi et al. developed a multi-task deep learning method to simultaneously predict several local structure properties including secondary structures [@doi:10.1371/journal.pone.0032235]. Cheng group predicted secondary structure using deep belief networks [@doi: 10.1109/TCBB.2014.2343960]. Zhou developed an iterative deep learning framework to simultaneously predict secondary structure, backbone torsion angles and solvent accessibility [@doi:10.1038/srep11476]. However, none of these deep learning methods achieved significant improvement over PSIPRED in terms of Q3 accuracy. In 2014, Zhou and Troyanskaya demonstrated that they could improve Q8 accuracy over a shallow learning architecture conditional neural fields [@doi: 10.1002/pmic.201100196] by using a deep supervised and convolutional generative stochastic network[@arXiv:1403.1347], but did not report any results in terms of Q3 accuracy. In 2016 Wang and Xu developed a deep convolutional neural fields (DeepCNF) model and showed that this model can significantly improve secondary structure prediction in terms of both Q3 and Q8 accuracy[@doi:10.1038/srep18962]. DeepCNF is the first that reports Q3 accuracy of 84-85%, much higher than the 80% accuracy maintained by PSIPRED for more than 10 years. It is also reported that DeepCNF can improve prediction of solvent accessibility and disorder regions [@doi:10.1007/978-3-319-46227-1_1]. -Contact-assisted protein folding represents a promising new direction for ab initio folding of proteins without good templates in PDB, but it requires accurate contact prediction. Evolutionary coupling analysis (ECA) is an effective contact prediction method for some proteins with a very large number (>1000) of sequence homologs, but ECA fares poorly for proteins without many sequence homologs. Since (soluble) proteins with many sequence homologs are very likely to have a good template in PDB, to make contact-assisted folding practically useful for ab initio folding, it is essential to predict accurate contacts for proteins without many sequence homologs. By combining ECA with a few other protein features, shallow neural network-based methods such as MetaPSICOV [@doi: 10.1093/bioinformatics/btu791] and CoinDCA-NN[@doi:10.1093/bioinformatics/btv472] have shown some advantage over ECA for proteins with a small number of sequence homologs, but their accuracy is still not very good. In recent years, deep learning methods have been explored for contact prediction. For example, Di Lena et al. introduced a deep spatio-temporal neural network (up to 100 layers) that utilizes both spatial and temporal features to predict protein contacts[@doi:10.1093/bioinformatics/bts475]. Cheng group combined deep belief networks and boosting techniques to predict protein contacts [@10.1093/bioinformatics/bts598] and trained deep networks by layer-wise unsupervised learning followed by fine-tuning of the entire network. Elofsson group developed an iterative deep learning technique for contact prediction, in which Random Forests are applied to predict contacts at each iteration [@doi:10.1371/journal.pcbi.1003889]. However, blindly tested in the well-known CASP competitions, these methods did not show any advantage over MetaPSICOV[@doi: 10.1093/bioinformatics/btu791], a 2-layer neural network method. Only until 2016, Xu group proposed a novel deep learning method [@doi:10.1371/journal.pcbi.1005324] that can significantly improve contact prediction accuracy over MetaPSICOV especially for proteins without many sequence homologs. Xu’s deep model is formed by one 1D residual neural network and one 2D residual neural network. Blindly tested in the latest CASP competition (i.e., CASP12), Xu’s deep learning method is ranked first in terms of the total F1 score (a widely-used performance metric) on free-modeling targets as well as the whole set of targets. In this test, the group ranked second also employed a deep learning method. Even MetaPSICOV, which ranked third, employed deeper and wider layers than its old version. Xu group has also demonstrated in another blind test CAMEO (which can be interpreted as a fully-automated CASP) [@url:http://www.cameo3d.org/] that the predicted contacts can fold quite a few proteins with a novel fold and only 65-300 sequence homologs. Xu’s method also works well on membrane protein contact prediction even if trained mostly by non-membrane proteins. +Contact-assisted protein folding represents a promising new direction for ab initio folding of proteins without good templates in PDB, but it requires accurate contact prediction. Evolutionary coupling analysis (ECA) is an effective contact prediction method for some proteins with a very large number (>1000) of sequence homologs, but ECA fares poorly for proteins without many sequence homologs. Since (soluble) proteins with many sequence homologs are very likely to have a good template in PDB, to make contact-assisted folding practically useful for ab initio folding, it is essential to predict accurate contacts for proteins without many sequence homologs. By combining ECA with a few other protein features, shallow neural network-based methods such as MetaPSICOV [@doi: 10.1093/bioinformatics/btu791] and CoinDCA-NN[@doi:10.1093/bioinformatics/btv472] have shown some advantage over ECA for proteins with a small number of sequence homologs, but their accuracy is still not very good. In recent years, deep learning methods have been explored for contact prediction. For example, Di Lena et al. introduced a deep spatio-temporal neural network (up to 100 layers) that utilizes both spatial and temporal features to predict protein contacts[@doi:10.1093/bioinformatics/bts475]. Cheng group combined deep belief networks and boosting techniques to predict protein contacts [@10.1093/bioinformatics/bts598] and trained deep networks by layer-wise unsupervised learning followed by fine-tuning of the entire network. Elofsson group developed an iterative deep learning technique for contact prediction, in which Random Forests are applied to predict contacts at each iteration [@doi:10.1371/journal.pcbi.1003889]. However, blindly tested in the well-known CASP competitions, these methods did not show any advantage over MetaPSICOV[@doi: 10.1093/bioinformatics/btu791], a 2-layer neural network method. Only until 2016, Xu group proposed a novel deep learning method [@doi:10.1371/journal.pcbi.1005324] that can significantly improve contact prediction accuracy over MetaPSICOV especially for proteins without many sequence homologs. Xu’s deep model is formed by one 1D residual neural network and one 2D residual neural network. Blindly tested in the latest CASP competition (i.e., CASP12), Xu’s deep learning method is ranked first in terms of the total F1 score (a widely-used performance metric) on free-modeling targets as well as the whole set of targets. In this test, the group ranked second also employed a deep learning method. Even MetaPSICOV, which ranked third, employed deeper and wider layers than its old version. Xu group has also demonstrated in another blind test CAMEO (which can be interpreted as a fully-automated CASP) [@url:http://www.cameo3d.org/] that the predicted contacts can fold quite a few proteins with a novel fold and only 65-330 sequence homologs. Xu’s method also works well on membrane protein contact prediction even if trained mostly by non-membrane proteins. + ### Signaling From f512c6f3ee14aecffca37ea73823ba78785fd3aa Mon Sep 17 00:00:00 2001 From: j3xugit Date: Wed, 18 Jan 2017 23:50:09 -0600 Subject: [PATCH 05/14] Update 04_study.md Now each line has <80 chars (including space) --- sections/04_study.md | 107 +++++++++++++++++++++++++++++++++++++++---- 1 file changed, 98 insertions(+), 9 deletions(-) diff --git a/sections/04_study.md b/sections/04_study.md index a0cd35e0..98e485fc 100644 --- a/sections/04_study.md +++ b/sections/04_study.md @@ -52,15 +52,104 @@ particularly notable in this area?* ### Protein secondary and tertiary structure -Proteins play fundamental roles in all biological processes including the maintenance of cellular integrity, metabolism, transcription/translation, and cell-cell communication. Complete description of protein structures and functions is a fundamental step towards understanding biological life and also highly relevant in the development of therapeutics and drugs. Tons of protein sequences have been generated, but fewer than 100,000 of them have experimentally-solved structures. As a result, computational structure prediction is essential for a majority number of protein sequences. However, predicting protein 3D structures from sequence alone is very challenging, especially when similar templates are not available. In the past decades, various computational methods have been developed to predict protein structure from different aspects, including prediction of secondary structure, torsion angles, solvent accessibility, inter-residue contact map, disorder regions and side-chain packing. - -Machine learning is extensively applied to predict protein structures and some success has been achieved. For example, secondary structure can be predicted with about 80% of Q3 accuracy by a 2-layer neural network method PSIPRED [@doi:10.1093/bioinformatics/16.4.404]. Starting from 2012, deep learning has been gradually introduced to protein structure prediction. The adopted deep learning models include deep belief network, LSTM(long short-term memory), deep convolutional neural networks (DCNN) and deep convolutional neural fields[@doi:10.1007/978-3-319-46227-1_1 @doi:10.1038/srep18962]. Here we focus on deep learning methods for two representative subproblems: secondary structure prediction and contact map prediction. Secondary structure refers to local conformation of a sequence segment while a contact map contains information of global conformation. - -Protein secondary structure can be described by 3 states or 8 states with the latter providing finer-grained information than the former. However, many more methods are developed to predict 3-state secondary structure than 8-state secondary structure. A predictor is typically evaluated by Q3 and Q8 accuracy, respectively. Qi et al. developed a multi-task deep learning method to simultaneously predict several local structure properties including secondary structures [@doi:10.1371/journal.pone.0032235]. Cheng group predicted secondary structure using deep belief networks [@doi: 10.1109/TCBB.2014.2343960]. Zhou developed an iterative deep learning framework to simultaneously predict secondary structure, backbone torsion angles and solvent accessibility [@doi:10.1038/srep11476]. However, none of these deep learning methods achieved significant improvement over PSIPRED in terms of Q3 accuracy. In 2014, Zhou and Troyanskaya demonstrated that they could improve Q8 accuracy over a shallow learning architecture conditional neural fields [@doi: 10.1002/pmic.201100196] by using a deep supervised and convolutional generative stochastic network[@arXiv:1403.1347], but did not report any results in terms of Q3 accuracy. In 2016 Wang and Xu developed a deep convolutional neural fields (DeepCNF) model and showed that this model can significantly improve secondary structure prediction in terms of both Q3 and Q8 accuracy[@doi:10.1038/srep18962]. DeepCNF is the first that reports Q3 accuracy of 84-85%, much higher than the 80% accuracy maintained by PSIPRED for more than 10 years. It is also reported that DeepCNF can improve prediction of solvent accessibility and disorder regions [@doi:10.1007/978-3-319-46227-1_1]. - -Contact-assisted protein folding represents a promising new direction for ab initio folding of proteins without good templates in PDB, but it requires accurate contact prediction. Evolutionary coupling analysis (ECA) is an effective contact prediction method for some proteins with a very large number (>1000) of sequence homologs, but ECA fares poorly for proteins without many sequence homologs. Since (soluble) proteins with many sequence homologs are very likely to have a good template in PDB, to make contact-assisted folding practically useful for ab initio folding, it is essential to predict accurate contacts for proteins without many sequence homologs. By combining ECA with a few other protein features, shallow neural network-based methods such as MetaPSICOV [@doi: 10.1093/bioinformatics/btu791] and CoinDCA-NN[@doi:10.1093/bioinformatics/btv472] have shown some advantage over ECA for proteins with a small number of sequence homologs, but their accuracy is still not very good. In recent years, deep learning methods have been explored for contact prediction. For example, Di Lena et al. introduced a deep spatio-temporal neural network (up to 100 layers) that utilizes both spatial and temporal features to predict protein contacts[@doi:10.1093/bioinformatics/bts475]. Cheng group combined deep belief networks and boosting techniques to predict protein contacts [@10.1093/bioinformatics/bts598] and trained deep networks by layer-wise unsupervised learning followed by fine-tuning of the entire network. Elofsson group developed an iterative deep learning technique for contact prediction, in which Random Forests are applied to predict contacts at each iteration [@doi:10.1371/journal.pcbi.1003889]. However, blindly tested in the well-known CASP competitions, these methods did not show any advantage over MetaPSICOV[@doi: 10.1093/bioinformatics/btu791], a 2-layer neural network method. Only until 2016, Xu group proposed a novel deep learning method [@doi:10.1371/journal.pcbi.1005324] that can significantly improve contact prediction accuracy over MetaPSICOV especially for proteins without many sequence homologs. Xu’s deep model is formed by one 1D residual neural network and one 2D residual neural network. Blindly tested in the latest CASP competition (i.e., CASP12), Xu’s deep learning method is ranked first in terms of the total F1 score (a widely-used performance metric) on free-modeling targets as well as the whole set of targets. In this test, the group ranked second also employed a deep learning method. Even MetaPSICOV, which ranked third, employed deeper and wider layers than its old version. Xu group has also demonstrated in another blind test CAMEO (which can be interpreted as a fully-automated CASP) [@url:http://www.cameo3d.org/] that the predicted contacts can fold quite a few proteins with a novel fold and only 65-330 sequence homologs. Xu’s method also works well on membrane protein contact prediction even if trained mostly by non-membrane proteins. - - +Proteins play fundamental roles in all biological processes including the +maintenance of cellular integrity, metabolism, transcription/translation, and +cell-cell communication. Complete description of protein structures and +functions is a fundamental step towards understanding biological life and +also highly relevant in the development of therapeutics and drugs. Tons of +protein sequences have been generated, but fewer than 100,000 of them +have experimentally-solved structures. As a result, computational structure +prediction is essential for a majority number of protein sequences. However, +predicting protein 3D structures from sequence alone is very challenging, +especially when similar templates are not available. In the past decades, +various computational methods have been developed to predict protein +structure from different aspects, including prediction of secondary structure, +torsion angles, solvent accessibility, inter-residue contact map, disorder +regions and side-chain packing. + +Machine learning is extensively applied to predict protein structures and +some success has been achieved. For example, secondary structure can be +predicted with about 80% of Q3 accuracy by a 2-layer neural network +method PSIPRED [@doi:10.1093/bioinformatics/16.4.404]. Starting from +2012, deep learning has been gradually introduced to protein structure +prediction. The adopted deep learning models include deep belief network, +LSTM(long short-term memory), deep convolutional neural networks (DCNN) +and deep convolutional neural fields[@doi:10.1007/978-3-319-46227-1_1 +@doi:10.1038/srep18962]. Here we focus on deep learning methods for +two representative subproblems: secondary structure prediction and +contact map prediction. Secondary structure refers to local conformation of +a sequence segment while a contact map contains information of global +conformation. + +Protein secondary structure can be described by 3 states or 8 states with +the latter providing finer-grained information than the former. However, +many more methods are developed to predict 3-state secondary structure +than 8-state secondary structure. A predictor is typically evaluated by Q3 +and Q8 accuracy, respectively. Qi et al. developed a multi-task deep learning +method to simultaneously predict several local structure properties +including secondary structures [@doi:10.1371/journal.pone.0032235]. +Cheng group predicted secondary structure using deep belief networks +[@doi: 10.1109/TCBB.2014.2343960]. Zhou developed an iterative deep +learning framework to simultaneously predict secondary structure, +backbone torsion angles and solvent accessibility +[@doi:10.1038/srep11476]. However, none of these deep learning methods +achieved significant improvement over PSIPRED in terms of Q3 accuracy. In +2014, Zhou and Troyanskaya demonstrated that they could improve Q8 +accuracy over a shallow learning architecture conditional neural fields [@doi: +10.1002/pmic.201100196] by using a deep supervised and convolutional +generative stochastic network[@arXiv:1403.1347], but did not report any +results in terms of Q3 accuracy. In 2016 Wang and Xu developed a deep +convolutional neural fields (DeepCNF) model and showed that this model +can significantly improve secondary structure prediction in terms of both Q3 +and Q8 accuracy[@doi:10.1038/srep18962]. DeepCNF is the first that +reports Q3 accuracy of 84-85%, much higher than the 80% accuracy +maintained by PSIPRED for more than 10 years. It is also reported that +DeepCNF can improve prediction of solvent accessibility and disorder +regions [@doi:10.1007/978-3-319-46227-1_1]. + +Contact-assisted protein folding represents a promising new direction for ab +initio folding of proteins without good templates in PDB, but it requires +accurate contact prediction. Evolutionary coupling analysis (ECA) is an +effective contact prediction method for some proteins with a very large +number (>1000) of sequence homologs, but ECA fares poorly for proteins +without many sequence homologs. Since (soluble) proteins with many +sequence homologs are very likely to have a good template in PDB, to make +contact-assisted folding practically useful for ab initio folding, it is essential +to predict accurate contacts for proteins without many sequence homologs. +By combining ECA with a few other protein features, shallow neural +network-based methods such as MetaPSICOV [@doi: +10.1093/bioinformatics/btu791] and CoinDCA- +NN[@doi:10.1093/bioinformatics/btv472] have shown some advantage +over ECA for proteins with a small number of sequence homologs, but their +accuracy is still not very good. In recent years, deep learning methods have +been explored for contact prediction. For example, Di Lena et al. introduced +a deep spatio-temporal neural network (up to 100 layers) that utilizes both +spatial and temporal features to predict protein +contacts[@doi:10.1093/bioinformatics/bts475]. Cheng group combined +deep belief networks and boosting techniques to predict protein contacts +[@10.1093/bioinformatics/bts598] and trained deep networks by layer-wise +unsupervised learning followed by fine-tuning of the entire network. +Elofsson group developed an iterative deep learning technique for contact +prediction, in which Random Forests are applied to predict contacts at each +iteration [@doi:10.1371/journal.pcbi.1003889]. However, blindly tested in +the well-known CASP competitions, these methods did not show any +advantage over MetaPSICOV[@doi: 10.1093/bioinformatics/btu791], a 2- +layer neural network method. Only until 2016, Xu group proposed a novel +deep learning method [@doi:10.1371/journal.pcbi.1005324] that can +significantly improve contact prediction accuracy over MetaPSICOV +especially for proteins without many sequence homologs. Xu’s deep model +is formed by one 1D residual neural network and one 2D residual neural +network. Blindly tested in the latest CASP competition (i.e., CASP12), Xu’s +deep learning method is ranked first in terms of the total F1 score (a widely- +used performance metric) on free-modeling targets as well as the whole set +of targets. In this test, the group ranked second also employed a deep +learning method. Even MetaPSICOV, which ranked third, employed deeper +and wider layers than its old version. Xu group has also demonstrated in +another blind test CAMEO (which can be interpreted as a fully-automated +CASP) [@url:http://www.cameo3d.org/] that the predicted contacts can +fold quite a few proteins with a novel fold and only 65-330 sequence +homologs. Xu’s method also works well on membrane protein contact +prediction even if trained mostly by non-membrane proteins. ### Signaling From 304195df59277c29cbe7d99f142286c7cb33bf35 Mon Sep 17 00:00:00 2001 From: j3xugit Date: Sat, 28 Jan 2017 10:18:01 -0600 Subject: [PATCH 06/14] Update 04_study.md --- sections/04_study.md | 148 +++++++++++++++++++++++-------------------- 1 file changed, 78 insertions(+), 70 deletions(-) diff --git a/sections/04_study.md b/sections/04_study.md index 98e485fc..5c8f1a85 100644 --- a/sections/04_study.md +++ b/sections/04_study.md @@ -56,20 +56,22 @@ Proteins play fundamental roles in all biological processes including the maintenance of cellular integrity, metabolism, transcription/translation, and cell-cell communication. Complete description of protein structures and functions is a fundamental step towards understanding biological life and -also highly relevant in the development of therapeutics and drugs. Tons of -protein sequences have been generated, but fewer than 100,000 of them +also highly relevant in the development of therapeutics and drugs. UnitProt +currently has about 94 millions of protein sequences. Even if we remove +redundancy at 50% sequence identity level, UnitProt still has about +20 millions of protein sequences. However, fewer than 100,000 proteins have experimentally-solved structures. As a result, computational structure prediction is essential for a majority number of protein sequences. However, predicting protein 3D structures from sequence alone is very challenging, -especially when similar templates are not available. In the past decades, -various computational methods have been developed to predict protein -structure from different aspects, including prediction of secondary structure, -torsion angles, solvent accessibility, inter-residue contact map, disorder -regions and side-chain packing. +especially when similar solved structures (called templates) are not available +in the Protein Data Bank (PDB). In the past decades, various computational +methods have been developed to predict protein structure from different aspects, +including prediction of secondary structure, torsion angles, solvent accessibility, +inter-residue contact map, disorder regions and side-chain packing. Machine learning is extensively applied to predict protein structures and some success has been achieved. For example, secondary structure can be -predicted with about 80% of Q3 accuracy by a 2-layer neural network +predicted with about 80% of 3-state (i.e., Q3) accuracy by a neural network method PSIPRED [@doi:10.1093/bioinformatics/16.4.404]. Starting from 2012, deep learning has been gradually introduced to protein structure prediction. The adopted deep learning models include deep belief network, @@ -79,77 +81,83 @@ and deep convolutional neural fields[@doi:10.1007/978-3-319-46227-1_1 two representative subproblems: secondary structure prediction and contact map prediction. Secondary structure refers to local conformation of a sequence segment while a contact map contains information of global -conformation. - -Protein secondary structure can be described by 3 states or 8 states with -the latter providing finer-grained information than the former. However, -many more methods are developed to predict 3-state secondary structure -than 8-state secondary structure. A predictor is typically evaluated by Q3 -and Q8 accuracy, respectively. Qi et al. developed a multi-task deep learning -method to simultaneously predict several local structure properties -including secondary structures [@doi:10.1371/journal.pone.0032235]. +conformation. Secondary structure prediction is a basic problem and almost +an essential module of any protein structure prediction package. It has also +been frequently used to benchmark new machine learning methods. +Contact prediction is much more challenging than secondary structure prediction, +but it has a much larger impact on tertiary structure prediction. +In recent years, contact prediction has made a good progress and +its accuracy has been significantly improved [@doi:10.1371/journal.pcbi.1005324 +@doi:10.1093/bioinformatics/btu791 @doi:10.1073/pnas.0805923106 @doi:10.1371/journal.pone.0028766]. + +Protein secondary structure can exhibit three different states (alpha helix, +beta strand and loop regions) or eight finer-grained states. More methods are +developed to predict 3-state secondary structure than 8-state. A predictor is +typically evaluated by Q3 and 8-state (i.e., Q8) accuracy, respectively. +Qi et al. developed a multi-task deep learning method to simultaneously predict several +local structure properties including secondary structures [@doi:10.1371/journal.pone.0032235]. Cheng group predicted secondary structure using deep belief networks -[@doi: 10.1109/TCBB.2014.2343960]. Zhou developed an iterative deep -learning framework to simultaneously predict secondary structure, -backbone torsion angles and solvent accessibility +[@doi:10.1109/TCBB.2014.2343960]. Zhou developed an iterative deep learning framework +to simultaneously predict secondary structure, backbone torsion angles and solvent accessibility [@doi:10.1038/srep11476]. However, none of these deep learning methods achieved significant improvement over PSIPRED in terms of Q3 accuracy. In 2014, Zhou and Troyanskaya demonstrated that they could improve Q8 -accuracy over a shallow learning architecture conditional neural fields [@doi: +accuracy over a shallow learning architecture conditional neural fields [@doi: 10.1002/pmic.201100196] by using a deep supervised and convolutional generative stochastic network[@arXiv:1403.1347], but did not report any results in terms of Q3 accuracy. In 2016 Wang and Xu developed a deep convolutional neural fields (DeepCNF) model and showed that this model can significantly improve secondary structure prediction in terms of both Q3 -and Q8 accuracy[@doi:10.1038/srep18962]. DeepCNF is the first that -reports Q3 accuracy of 84-85%, much higher than the 80% accuracy -maintained by PSIPRED for more than 10 years. It is also reported that -DeepCNF can improve prediction of solvent accessibility and disorder -regions [@doi:10.1007/978-3-319-46227-1_1]. - -Contact-assisted protein folding represents a promising new direction for ab -initio folding of proteins without good templates in PDB, but it requires -accurate contact prediction. Evolutionary coupling analysis (ECA) is an -effective contact prediction method for some proteins with a very large -number (>1000) of sequence homologs, but ECA fares poorly for proteins -without many sequence homologs. Since (soluble) proteins with many -sequence homologs are very likely to have a good template in PDB, to make -contact-assisted folding practically useful for ab initio folding, it is essential -to predict accurate contacts for proteins without many sequence homologs. -By combining ECA with a few other protein features, shallow neural -network-based methods such as MetaPSICOV [@doi: -10.1093/bioinformatics/btu791] and CoinDCA- -NN[@doi:10.1093/bioinformatics/btv472] have shown some advantage -over ECA for proteins with a small number of sequence homologs, but their -accuracy is still not very good. In recent years, deep learning methods have -been explored for contact prediction. For example, Di Lena et al. introduced -a deep spatio-temporal neural network (up to 100 layers) that utilizes both -spatial and temporal features to predict protein -contacts[@doi:10.1093/bioinformatics/bts475]. Cheng group combined -deep belief networks and boosting techniques to predict protein contacts -[@10.1093/bioinformatics/bts598] and trained deep networks by layer-wise -unsupervised learning followed by fine-tuning of the entire network. -Elofsson group developed an iterative deep learning technique for contact -prediction, in which Random Forests are applied to predict contacts at each -iteration [@doi:10.1371/journal.pcbi.1003889]. However, blindly tested in -the well-known CASP competitions, these methods did not show any -advantage over MetaPSICOV[@doi: 10.1093/bioinformatics/btu791], a 2- -layer neural network method. Only until 2016, Xu group proposed a novel -deep learning method [@doi:10.1371/journal.pcbi.1005324] that can -significantly improve contact prediction accuracy over MetaPSICOV -especially for proteins without many sequence homologs. Xu’s deep model -is formed by one 1D residual neural network and one 2D residual neural -network. Blindly tested in the latest CASP competition (i.e., CASP12), Xu’s -deep learning method is ranked first in terms of the total F1 score (a widely- -used performance metric) on free-modeling targets as well as the whole set -of targets. In this test, the group ranked second also employed a deep -learning method. Even MetaPSICOV, which ranked third, employed deeper -and wider layers than its old version. Xu group has also demonstrated in -another blind test CAMEO (which can be interpreted as a fully-automated -CASP) [@url:http://www.cameo3d.org/] that the predicted contacts can -fold quite a few proteins with a novel fold and only 65-330 sequence -homologs. Xu’s method also works well on membrane protein contact -prediction even if trained mostly by non-membrane proteins. +and Q8 accuracy[@doi:10.1038/srep18962]. DeepCNF possibly is the first that reports +Q3 accuracy of 84-85%, much higher than the 80% accuracy maintained by PSIPRED +for more than 10 years. It is also reported that DeepCNF can improve prediction of +solvent accessibility and disorder regions [@doi:10.1007/978-3-319-46227-1_1]. This +improvement may be mainly due to the introduction of convolutional neural fields to +capture long-range sequential information, which is important for beta strand prediction. +Nevertheless, improving secondary structure prediction from 80% to 84-85% is unlikely to +result in a similar amount of improvement in tertiary structure prediction since secondary +structure mainly reflects coarse-grained local conformation of a protein structure. + +Protein folding restrained by predicted contacts, also called contact-assisted protein +folding, represents a promising new direction for ab initio folding of proteins without +good templates in PDB, but it requires accurate contact prediction. Evolutionary coupling +analysis (ECA) is an effective contact prediction method for some proteins with a very large +number (>1000) of sequence homologs [doi:10.1371/journal.pone.0028766], but ECA fares poorly +for proteins without many sequence homologs. Since (soluble) proteins with many sequence +homologs are likely to have a good template in PDB, to make contact-assisted folding practically +useful for ab initio folding, it is essential to predict accurate contacts for proteins +without many sequence homologs. By combining ECA with a few other protein features, shallow neural +network-based methods such as MetaPSICOV [@doi:10.1093/bioinformatics/btu791] and +CoinDCA-NN [@doi:10.1093/bioinformatics/btv472] have shown some advantage over ECA +for proteins with a small number of sequence homologs, but their accuracy is still not very good. +In recent years, deep learning methods have been explored for contact prediction. For example, +Di Lena et al. introduced a deep spatio-temporal neural network (up to 100 layers) that utilizes both +spatial and temporal features to predict protein contacts[@doi:10.1093/bioinformatics/bts475]. +Eickholt and Cheng combined deep belief networks and boosting techniques to predict protein contacts +[@10.1093/bioinformatics/bts598] and trained deep networks by layer-wise unsupervised +learning followed by fine-tuning of the entire network. Skwark and Elofsson et al. developed +an iterative deep learning technique for contact prediction by stacking a series of Random Forests +[@doi:10.1371/journal.pcbi.1003889]. However, blindly tested in the well-known CASP competitions, +these methods did not show any advantage over MetaPSICOV [@doi:10.1093/bioinformatics/btu791], a method +using two cascaded neural networks. Very recently, Wang and Xu et al. proposed a novel deep learning method +RaptorX-Contact [@doi:10.1371/journal.pcbi.1005324] that can significantly improve contact prediction +accuracy over MetaPSICOV especially for proteins without many sequence homologs. This deep model is +formed by one 1D residual neural network and one 2D residual neural network. Blindly tested in the +latest CASP competition (i.e., CASP12 [@url:http://www.predictioncenter.org/casp12/rrc_avrg_results.cgi]), +RaptorX-Contact is ranked first in terms of the total F1 score (a widely-used performance metric) on +free-modeling targets as well as the whole set of targets. In the CASP12 test, the group ranked second +also employed a deep learning method. Even MetaPSICOV, which ranked third in CASP12, employed more +and wider hidden layers in its neural networks than its old version. Wang and Xu et al. have also +demonstrated in another blind test CAMEO (which can be interpreted as a fully-automated +CASP) [@url:http://www.cameo3d.org/] that the predicted contacts can fold quite a few proteins +with a novel fold and only 65-330 sequence homologs and that their method also works well on membrane +protein contact prediction even if trained mostly by non-membrane proteins. In fact, most of the top 10 +contact prediction groups in CASP12 employed some kind of deep learning techniques. The RaptorX-Contact +method performed better mainly due to introduction of residual neural networks and exploiting contact +occurrence patterns by simultaneous prediction of all the contacts in a single protein. +It is still possible to further improve contact prediction by studying some new network architectures. +In addition, the deep learning methods summarized above also apply to the prediction of interfacial contacts +of a protein complex. However, current methods fail when proteins in question have almost no sequence homologs. ### Signaling From 3d6a849e1ba75f57fba1ff1cd960d0e2f9da6432 Mon Sep 17 00:00:00 2001 From: j3xugit Date: Sat, 28 Jan 2017 10:20:44 -0600 Subject: [PATCH 07/14] Update 04_study.md --- sections/04_study.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/sections/04_study.md b/sections/04_study.md index 5c8f1a85..b4e11f1a 100644 --- a/sections/04_study.md +++ b/sections/04_study.md @@ -86,8 +86,8 @@ an essential module of any protein structure prediction package. It has also been frequently used to benchmark new machine learning methods. Contact prediction is much more challenging than secondary structure prediction, but it has a much larger impact on tertiary structure prediction. -In recent years, contact prediction has made a good progress and -its accuracy has been significantly improved [@doi:10.1371/journal.pcbi.1005324 +In recent years, contact prediction has made good progress and its accuracy +has been significantly improved [@doi:10.1371/journal.pcbi.1005324 @doi:10.1093/bioinformatics/btu791 @doi:10.1073/pnas.0805923106 @doi:10.1371/journal.pone.0028766]. Protein secondary structure can exhibit three different states (alpha helix, From f2d469ceab074521da0632d3297528d48e71df04 Mon Sep 17 00:00:00 2001 From: j3xugit Date: Sat, 28 Jan 2017 10:21:50 -0600 Subject: [PATCH 08/14] Update 04_study.md --- sections/04_study.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sections/04_study.md b/sections/04_study.md index b4e11f1a..f93e2660 100644 --- a/sections/04_study.md +++ b/sections/04_study.md @@ -93,7 +93,7 @@ has been significantly improved [@doi:10.1371/journal.pcbi.1005324 Protein secondary structure can exhibit three different states (alpha helix, beta strand and loop regions) or eight finer-grained states. More methods are developed to predict 3-state secondary structure than 8-state. A predictor is -typically evaluated by Q3 and 8-state (i.e., Q8) accuracy, respectively. +typically evaluated by 3-state (i.e., Q3) and 8-state (i.e., Q8) accuracy, respectively. Qi et al. developed a multi-task deep learning method to simultaneously predict several local structure properties including secondary structures [@doi:10.1371/journal.pone.0032235]. Cheng group predicted secondary structure using deep belief networks From 7f400b5db26ddf6e33f9897f966542238454008d Mon Sep 17 00:00:00 2001 From: j3xugit Date: Sat, 28 Jan 2017 10:30:28 -0600 Subject: [PATCH 09/14] Update 04_study.md --- sections/04_study.md | 37 ++++++++++++++++++------------------- 1 file changed, 18 insertions(+), 19 deletions(-) diff --git a/sections/04_study.md b/sections/04_study.md index f93e2660..b5aa5428 100644 --- a/sections/04_study.md +++ b/sections/04_study.md @@ -96,25 +96,24 @@ developed to predict 3-state secondary structure than 8-state. A predictor is typically evaluated by 3-state (i.e., Q3) and 8-state (i.e., Q8) accuracy, respectively. Qi et al. developed a multi-task deep learning method to simultaneously predict several local structure properties including secondary structures [@doi:10.1371/journal.pone.0032235]. -Cheng group predicted secondary structure using deep belief networks -[@doi:10.1109/TCBB.2014.2343960]. Zhou developed an iterative deep learning framework -to simultaneously predict secondary structure, backbone torsion angles and solvent accessibility -[@doi:10.1038/srep11476]. However, none of these deep learning methods -achieved significant improvement over PSIPRED in terms of Q3 accuracy. In -2014, Zhou and Troyanskaya demonstrated that they could improve Q8 -accuracy over a shallow learning architecture conditional neural fields [@doi: -10.1002/pmic.201100196] by using a deep supervised and convolutional -generative stochastic network[@arXiv:1403.1347], but did not report any -results in terms of Q3 accuracy. In 2016 Wang and Xu developed a deep -convolutional neural fields (DeepCNF) model and showed that this model -can significantly improve secondary structure prediction in terms of both Q3 -and Q8 accuracy[@doi:10.1038/srep18962]. DeepCNF possibly is the first that reports -Q3 accuracy of 84-85%, much higher than the 80% accuracy maintained by PSIPRED -for more than 10 years. It is also reported that DeepCNF can improve prediction of -solvent accessibility and disorder regions [@doi:10.1007/978-3-319-46227-1_1]. This -improvement may be mainly due to the introduction of convolutional neural fields to -capture long-range sequential information, which is important for beta strand prediction. -Nevertheless, improving secondary structure prediction from 80% to 84-85% is unlikely to +Spencer, Eickholt and Cheng predicted secondary structure using deep belief networks +[@doi:10.1109/TCBB.2014.2343960]. Heffernan and Zhou et al. developed an iterative +deep learning framework to simultaneously predict secondary structure, backbone torsion +angles and solvent accessibility [@doi:10.1038/srep11476]. However, none of these deep +learning methods achieved significant improvement over PSIPRED [@doi:10.1006/jmbi.1999.3091] +in terms of Q3 accuracy. In 2014, Zhou and Troyanskaya demonstrated that they could +improve Q8 accuracy over a shallow learning architecture conditional neural fields [@doi:10.1002/pmic.201100196] +by using a deep supervised and convolutional generative stochastic network[@arXiv:1403.1347], +but did not report any results in terms of Q3 accuracy. In 2016 Wang and Xu et al. developed a deep +convolutional neural fields (DeepCNF) model that can significantly improve secondary +structure prediction in terms of both Q3 and Q8 accuracy[@doi:10.1038/srep18962]. +DeepCNF possibly is the first that reports Q3 accuracy of 84-85%, much higher than +the 80% accuracy maintained by PSIPRED for more than 10 years. +It is also reported that DeepCNF can improve prediction of solvent accessibility +and disorder regions [@doi:10.1007/978-3-319-46227-1_1]. This improvement may be mainly +due to the introduction of convolutional neural fields to capture long-range +sequential information, which is important for beta strand prediction. Nevertheless, +improving secondary structure prediction from 80% to 84-85% is unlikely to result in a similar amount of improvement in tertiary structure prediction since secondary structure mainly reflects coarse-grained local conformation of a protein structure. From 1d583f26b355d7384135246eaeacf6b153d1db78 Mon Sep 17 00:00:00 2001 From: j3xugit Date: Sat, 28 Jan 2017 12:35:20 -0600 Subject: [PATCH 10/14] Update 04_study.md --- sections/04_study.md | 52 +++++++++++++++++++++++--------------------- 1 file changed, 27 insertions(+), 25 deletions(-) diff --git a/sections/04_study.md b/sections/04_study.md index b5aa5428..ed4e0cce 100644 --- a/sections/04_study.md +++ b/sections/04_study.md @@ -60,12 +60,12 @@ also highly relevant in the development of therapeutics and drugs. UnitProt currently has about 94 millions of protein sequences. Even if we remove redundancy at 50% sequence identity level, UnitProt still has about 20 millions of protein sequences. However, fewer than 100,000 proteins -have experimentally-solved structures. As a result, computational structure -prediction is essential for a majority number of protein sequences. However, -predicting protein 3D structures from sequence alone is very challenging, -especially when similar solved structures (called templates) are not available -in the Protein Data Bank (PDB). In the past decades, various computational -methods have been developed to predict protein structure from different aspects, +have experimentally-solved structures in Protein Data Bank (PDB). As a result, +computational structure prediction is essential for a majority number of +protein sequences. However, predicting protein 3D structures from sequence alone +is very challenging, especially when similar solved structures (called templates) +are not available in PDB. In the past decades, various computational methods have +been developed to predict protein structure from different aspects, including prediction of secondary structure, torsion angles, solvent accessibility, inter-residue contact map, disorder regions and side-chain packing. @@ -83,7 +83,7 @@ contact map prediction. Secondary structure refers to local conformation of a sequence segment while a contact map contains information of global conformation. Secondary structure prediction is a basic problem and almost an essential module of any protein structure prediction package. It has also -been frequently used to benchmark new machine learning methods. +been used as sequence labeling benchmark in the machine learning community. Contact prediction is much more challenging than secondary structure prediction, but it has a much larger impact on tertiary structure prediction. In recent years, contact prediction has made good progress and its accuracy @@ -117,16 +117,16 @@ improving secondary structure prediction from 80% to 84-85% is unlikely to result in a similar amount of improvement in tertiary structure prediction since secondary structure mainly reflects coarse-grained local conformation of a protein structure. -Protein folding restrained by predicted contacts, also called contact-assisted protein -folding, represents a promising new direction for ab initio folding of proteins without -good templates in PDB, but it requires accurate contact prediction. Evolutionary coupling -analysis (ECA) is an effective contact prediction method for some proteins with a very large -number (>1000) of sequence homologs [doi:10.1371/journal.pone.0028766], but ECA fares poorly -for proteins without many sequence homologs. Since (soluble) proteins with many sequence -homologs are likely to have a good template in PDB, to make contact-assisted folding practically -useful for ab initio folding, it is essential to predict accurate contacts for proteins -without many sequence homologs. By combining ECA with a few other protein features, shallow neural -network-based methods such as MetaPSICOV [@doi:10.1093/bioinformatics/btu791] and +Protein contact prediction and contact-assisted folding (i.e., folding proteins using +predicted contacts as restraints) represents a promising new direction for ab initio folding +of proteins without good templates in PDB. +Evolutionary coupling analysis (ECA) is an effective contact prediction method for some +proteins with a very large number (>1000) of sequence homologs [doi:10.1371/journal.pone.0028766], +but ECA fares poorly for proteins without many sequence homologs. Since (soluble) proteins with +many sequence homologs are likely to have a good template in PDB, to make contact-assisted +folding practically useful for ab initio folding, it is essential to predict accurate contacts +for proteins without many sequence homologs. By combining ECA with a few other protein features, +shallow neural network-based methods such as MetaPSICOV [@doi:10.1093/bioinformatics/btu791] and CoinDCA-NN [@doi:10.1093/bioinformatics/btv472] have shown some advantage over ECA for proteins with a small number of sequence homologs, but their accuracy is still not very good. In recent years, deep learning methods have been explored for contact prediction. For example, @@ -140,23 +140,25 @@ an iterative deep learning technique for contact prediction by stacking a series these methods did not show any advantage over MetaPSICOV [@doi:10.1093/bioinformatics/btu791], a method using two cascaded neural networks. Very recently, Wang and Xu et al. proposed a novel deep learning method RaptorX-Contact [@doi:10.1371/journal.pcbi.1005324] that can significantly improve contact prediction -accuracy over MetaPSICOV especially for proteins without many sequence homologs. This deep model is -formed by one 1D residual neural network and one 2D residual neural network. Blindly tested in the -latest CASP competition (i.e., CASP12 [@url:http://www.predictioncenter.org/casp12/rrc_avrg_results.cgi]), +over MetaPSICOV especially for proteins without many sequence homologs. RaptorX-Contact employs a network +architecture formed by one 1D residual neural network and one 2D residual neural network. +Blindly tested in the latest CASP competition (i.e., CASP12 [@url:http://www.predictioncenter.org/casp12/rrc_avrg_results.cgi]), RaptorX-Contact is ranked first in terms of the total F1 score (a widely-used performance metric) on free-modeling targets as well as the whole set of targets. In the CASP12 test, the group ranked second also employed a deep learning method. Even MetaPSICOV, which ranked third in CASP12, employed more -and wider hidden layers in its neural networks than its old version. Wang and Xu et al. have also +and wider hidden layers than its old version. Wang and Xu et al. have also demonstrated in another blind test CAMEO (which can be interpreted as a fully-automated -CASP) [@url:http://www.cameo3d.org/] that the predicted contacts can fold quite a few proteins +CASP) [@url:http://www.cameo3d.org/] that their predicted contacts can help fold quite a few proteins with a novel fold and only 65-330 sequence homologs and that their method also works well on membrane protein contact prediction even if trained mostly by non-membrane proteins. In fact, most of the top 10 contact prediction groups in CASP12 employed some kind of deep learning techniques. The RaptorX-Contact method performed better mainly due to introduction of residual neural networks and exploiting contact occurrence patterns by simultaneous prediction of all the contacts in a single protein. -It is still possible to further improve contact prediction by studying some new network architectures. -In addition, the deep learning methods summarized above also apply to the prediction of interfacial contacts -of a protein complex. However, current methods fail when proteins in question have almost no sequence homologs. +It is still possible to further improve contact prediction by studying new deep network architectures. +However, current methods fail when proteins in question have almost no sequence homologs. It is unclear if there +is an effective way to deal with this type of proteins or not except waiting for more sequence homologs. +Finally, the deep learning methods summarized above also apply to interfacial contact prediction +of a protein complex, but may be less effective since on average protein complexes have fewer sequence homologs. ### Signaling From 2d94fbb951bc5f6d5e5c39eaf7630457e8f0c18f Mon Sep 17 00:00:00 2001 From: j3xugit Date: Sat, 28 Jan 2017 12:43:17 -0600 Subject: [PATCH 11/14] Update 04_study.md --- sections/04_study.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sections/04_study.md b/sections/04_study.md index ed4e0cce..e1a57908 100644 --- a/sections/04_study.md +++ b/sections/04_study.md @@ -121,7 +121,7 @@ Protein contact prediction and contact-assisted folding (i.e., folding proteins predicted contacts as restraints) represents a promising new direction for ab initio folding of proteins without good templates in PDB. Evolutionary coupling analysis (ECA) is an effective contact prediction method for some -proteins with a very large number (>1000) of sequence homologs [doi:10.1371/journal.pone.0028766], +proteins with a very large number (>1000) of sequence homologs [@doi:10.1371/journal.pone.0028766], but ECA fares poorly for proteins without many sequence homologs. Since (soluble) proteins with many sequence homologs are likely to have a good template in PDB, to make contact-assisted folding practically useful for ab initio folding, it is essential to predict accurate contacts From a2417ad12740d49d26a7e8ea484e421ecbe940a4 Mon Sep 17 00:00:00 2001 From: Anthony Gitter Date: Fri, 7 Apr 2017 07:00:03 -0500 Subject: [PATCH 12/14] Line wrap to trigger CI build --- sections/04_study.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/sections/04_study.md b/sections/04_study.md index e1a57908..a361c9ee 100644 --- a/sections/04_study.md +++ b/sections/04_study.md @@ -88,7 +88,8 @@ Contact prediction is much more challenging than secondary structure prediction, but it has a much larger impact on tertiary structure prediction. In recent years, contact prediction has made good progress and its accuracy has been significantly improved [@doi:10.1371/journal.pcbi.1005324 -@doi:10.1093/bioinformatics/btu791 @doi:10.1073/pnas.0805923106 @doi:10.1371/journal.pone.0028766]. +@doi:10.1093/bioinformatics/btu791 @doi:10.1073/pnas.0805923106 +@doi:10.1371/journal.pone.0028766]. Protein secondary structure can exhibit three different states (alpha helix, beta strand and loop regions) or eight finer-grained states. More methods are From 3b344e3e82f2c4fa0661917b53a1f92c352a2c7f Mon Sep 17 00:00:00 2001 From: Anthony Gitter Date: Fri, 7 Apr 2017 09:08:48 -0500 Subject: [PATCH 13/14] Fix doi tag --- sections/04_study.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sections/04_study.md b/sections/04_study.md index a361c9ee..15ff7c6f 100644 --- a/sections/04_study.md +++ b/sections/04_study.md @@ -134,7 +134,7 @@ In recent years, deep learning methods have been explored for contact prediction Di Lena et al. introduced a deep spatio-temporal neural network (up to 100 layers) that utilizes both spatial and temporal features to predict protein contacts[@doi:10.1093/bioinformatics/bts475]. Eickholt and Cheng combined deep belief networks and boosting techniques to predict protein contacts -[@10.1093/bioinformatics/bts598] and trained deep networks by layer-wise unsupervised +[@doi:10.1093/bioinformatics/bts598] and trained deep networks by layer-wise unsupervised learning followed by fine-tuning of the entire network. Skwark and Elofsson et al. developed an iterative deep learning technique for contact prediction by stacking a series of Random Forests [@doi:10.1371/journal.pcbi.1003889]. However, blindly tested in the well-known CASP competitions, From 95b9be2f121e2a30c6b27928b2569c26c9ec6e07 Mon Sep 17 00:00:00 2001 From: Anthony Gitter Date: Fri, 7 Apr 2017 09:14:12 -0500 Subject: [PATCH 14/14] Fix arxiv reference --- sections/04_study.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sections/04_study.md b/sections/04_study.md index 15ff7c6f..d5eed680 100644 --- a/sections/04_study.md +++ b/sections/04_study.md @@ -104,7 +104,7 @@ angles and solvent accessibility [@doi:10.1038/srep11476]. However, none of thes learning methods achieved significant improvement over PSIPRED [@doi:10.1006/jmbi.1999.3091] in terms of Q3 accuracy. In 2014, Zhou and Troyanskaya demonstrated that they could improve Q8 accuracy over a shallow learning architecture conditional neural fields [@doi:10.1002/pmic.201100196] -by using a deep supervised and convolutional generative stochastic network[@arXiv:1403.1347], +by using a deep supervised and convolutional generative stochastic network[@arxiv:1403.1347], but did not report any results in terms of Q3 accuracy. In 2016 Wang and Xu et al. developed a deep convolutional neural fields (DeepCNF) model that can significantly improve secondary structure prediction in terms of both Q3 and Q8 accuracy[@doi:10.1038/srep18962].