Skip to content

MBZUAI/AI4Bio-Reading-List

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

55 Commits
 
 
 
 
 
 

Repository files navigation

AI4Bio-Reading-List

This is an AI for Biology reading list maintained by the MBZUAI AI4Bio Group.

Contents:

Note: For applications of diffusion methods in protein science, check Diffusion reading list.

1. Protein Level

1.1 Protein Structure Prediction

  • [2022.11.17 Pre] Highly accurate protein structure prediction with AlphaFold. Nature. 2021. Jumper, J., Evans, R., Pritzel, A., Green, T., Figurnov, M., Ronneberger, O., ... & Hassabis, D. [Paper] [Slides]

  • Accurate prediction of protein structures and interactions using a three-track neural network. Science. 2021. Baek, M., DiMaio, F., Anishchenko, I., Dauparas, J., Ovchinnikov, S., Lee, G. R., ... & Baker, D. [Paper]

  • ColabFold: making protein folding accessible to all]. Nature Methods. 2022. Mirdita, M., Schütze, K., Moriwaki, Y., Heo, L., Ovchinnikov, S., & Steinegger, M. [Paper]

  • [2022.12.01 Pre] Language models of protein sequences at the scale of evolution enable accurate structure prediction. BioRxiv. 2022. Lin, Z., Akin, H., Rao, R., Hie, B., Zhu, Z., Lu, W., ... & Rives, A. [Paper] [Slides]

  • [2022.12.08 Pre] High-resolution de novo structure prediction from primary sequence. BioRxiv. 2022. Wu, R., Ding, F., Wang, R., Shen, R., Zhang, X., Luo, S., ... & Peng, J. [Paper] [Slides]

  • [2022.12.08 Pre] Helixfold-single: Msa-free protein structure prediction by using protein language model as an alternative. ArXiv. 2022. Fang, X., Wang, F., Liu, L., He, J., Lin, D., Xiang, Y., ... & Song, L. [Paper] [Slides]

  • [2023.06.29 Pre] Protein structure prediction with in-cell photo-crosslinking mass spectrometry and deep learning Nature Biotechnology. 2023. [Paper] [Slides]

1.2 Protein Function Prediction

  • Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. PNAS. 2021. Rives, A., Meier, J., Sercu, T., Goyal, S., Lin, Z., Liu, J., ... & Fergus, R. [Paper]

  • Leveraging implicit knowledge in neural networks for functional dissection and engineering of proteins. Nature Machine Intelligence. 2019. Upmeier zu Belzen, J., Bürgel, T., Holderbach, S., Bubeck, F., Adam, L., Gandor, C., ... & Eils, R. [Paper]

  • Protein function prediction is improved by creating synthetic feature samples with generative adversarial networks. Nature Machine Intelligence. 2020. Wan, C., & Jones, D. T. [Paper]

  • Protein function prediction for newly sequenced organisms. Nature Machine Intelligence. 2021. Torres, M., Yang, H., Romero, A. E., & Paccanaro, A. [Paper]

  • [2023.07.13 Pre] Enzyme function prediction using contrastive learning. Science. 2023. Yu, T., Cui, H., Li, J. C., Luo, Y., Jiang, G., & Zhao, H. [Paper] [Slides]

1.3 Protein Design

  • Expanding functional protein sequence spaces using generative adversarial networks. Nature Machine Intelligence. 2021. Repecka, D., Jauniskis, V., Karpus, L., Rembeza, E., Rokaitis, I., Zrimec, J., ... & Zelezniak, A. [Paper]

  • Transformer-based protein generation with regularized latent space optimization. Nature Machine Intelligence. 2022. Castro, E., Godavarthi, A., Rubinfien, J., Givechian, K., Bhaskar, D., & Krishnaswamy, S. [Paper]

  • [2023.01.12 Pre] A high-level programming language for generative protein design. bioRxiv. 2022-12. Hie, B., Candido, S., Lin, Z., Kabeli, O., Rao, R., Smetanin, N., ... & Rives, A. [Paper] [Slides]

  • [2023.03.16 Pre] A universal deep-learning model for zinc finger design enables transcription factor reprogramming. Nature Biotechnology. 2023. Ichikawa, D. M., Abdin, O., Alerasool, N., Kogenaru, M., Mueller, A. L., Wen, H., ... & Noyes, M. B. [Paper] [Slides]

  • [2023.07.20 Pre] Large language models generate functional protein sequences across diverse families. Nature Biotechnology. 2023. Madani, A., Krause, B., Greene, E. R., Subramanian, S., Mohr, B. P., Holton, J. M., ... & Naik, N. [Paper][Slides]

  • [2023.08.03 Pre] Top-down design of protein architectures with reinforcement learning Science. 2023. Lutz, I. D., Wang, S., Norn, C., Courbet, A., Borst, A. J., Zhao, Y. T., ... & Baker, D. [Paper] [Slides]

2. Protein Interaction Level

  • Predicting drug–protein interaction using quasi-visual question answering system. Nature Machine Intelligence. 2020. Zheng, S., Li, Y., Chen, S., Xu, J., & Yang, Y. (2020). [Paper]

  • A topology-based network tree for the prediction of protein–protein binding affinity changes following mutation. Nature Machine Intelligence. 2020. Wang, M., Cang, Z., & Wei, G. W. [Paper]

  • Computed structures of core eukaryotic protein complexes. Science. 2021. Humphreys, I. R., Pei, J., Baek, M., Krishnakumar, A., Anishchenko, I., Ovchinnikov, S., ... & Baker, D. [Paper]

  • Harnessing protein folding neural networks for peptide–protein docking. Nature communications. 2022. Tsaban, T., Varga, J. K., Avraham, O., Ben-Aharon, Z., Khramushin, A., & Schueler-Furman, O. [Paper]

  • Protein complex prediction with AlphaFold-Multimer. BioRxiv. 2022. Evans, R., O’Neill, M., Pritzel, A., Antropova, N., Senior, A., Green, T., ... & Hassabis, D. [Paper]

  • Improved prediction of protein-protein interactions using AlphaFold2. Nature communications. 2022. Bryant, P., Pozzati, G., & Elofsson, A. [Paper]

  • AF2Complex predicts direct physical interactions in multimeric proteins with deep learning. Nature communications. 2022. Gao, M., Nakajima An, D., Parks, J. M., & Skolnick, J. [Paper]

  • Uni-Fold Symmetry: harnessing symmetry in folding large protein complexes. bioRxiv. 2022. Li, Z., Yang, S., Liu, X., Chen, W., Wen, H., Shen, F., ... & Zhang, L. [Paper]

  • Predicting the structure of large protein complexes using AlphaFold and Monte Carlo tree search. Nature Communications. 2022. Bryant, P., Pozzati, G., Zhu, W., Shenoy, A., Kundrotas, P., & Elofsson, A. [Paper]

  • Improve the Protein Complex Prediction with Protein Language Models. bioRxiv. 2022. Chen, B., Xie, Z., Xu, J., Qiu, J., Ye, Z., & Tang, J. [Paper]

3. Cell Level

3.1 Cell Type Annotation

  • Clustering single-cell RNA-seq data with a model-based deep learning approach. Nature Machine Intelligence. 2019. Tian, T., Wan, J., Song, Q., & Wei, Z. [Paper]

  • An interpretable deep-learning architecture of capsule networks for identifying cell-type gene expression programs from single-cell RNA-sequencing data. Nature Machine Intelligence. 2020. Wang, L., Nie, R., Yu, Z., Xin, R., Zheng, C., Zhang, Z., ... & Cai, J. [Paper]

  • Iterative transfer learning with neural network for clustering and cell type classification in single-cell RNA-seq analysis. Nature Machine Intelligence. 2020. Hu, J., Li, X., Hu, G., Lyu, Y., Susztak, K., & Li, M. [Paper]

  • Simultaneous deep generative modelling and clustering of single-cell genomic data. Nature Machine Intelligence. 2021. Liu, Q., Chen, S., Jiang, R., & Wong, W. H. (2021). [Paper]

  • scBERT as a large-scale pretrained deep language model for cell type annotation of single-cell RNA-seq data. Nature Machine Intelligence. 2022. Yang, F., Wang, W., Wang, F., Fang, Y., Tang, D., Huang, J., ... & Yao, J. [Paper] [Slides]

  • A multi-use deep learning method for CITE-seq and single-cell RNA-seq data integration with cell surface protein prediction and imputation. Nature Machine Intelligence. 2022. Lakkis, J., Schroeder, A., Su, K., Lee, M. Y., Bashore, A. C., Reilly, M. P., & Li, M. [Paper]

  • Contrastive learning enables rapid mapping to multimodal single-cell atlas of multimillion scale. Nature Machine Intelligence. 2022. Yang, M., Yang, Y., Xie, C., Ni, M., Liu, J., Yang, H., ... & Wang, J. [Paper]

  • Interpreting the B-cell receptor repertoire with single-cell gene expression using Benisse. Nature Machine Intelligence. 2022. Zhang, Z., Chang, W. Y., Wang, K., Yang, Y., Wang, X., Yao, C., ... & Wang, T. [Paper]

  • Simultaneous dimensionality reduction and integration for single-cell ATAC-seq data using deep learning. Nature Machine Intelligence. 2022. Kopp, W., Akalin, A., & Ohler, U. [Paper]

  • Cell type annotation of single-cell chromatin accessibility data via supervised Bayesian embedding. Nature Machine Intelligence. 2022. Chen, X., Chen, S., Song, S., Gao, Z., Hou, L., Zhang, X., ... & Jiang, R. [Paper]

  • Deep learning of cross-species single-cell landscapes identifies conserved regulatory programs underlying cell types. Nature Genetics. 2022. Li, Jiaqi, Jingjing Wang, Peijing Zhang, Renying Wang, Yuqing Mei, Zhongyi Sun, Lijiang Fei et al. [Paper]

3.2 Perturbation Modeling

3.2.1 Methods

  • [2022.12.15 Pre] GEARS: Predicting transcriptional outcomes of novel multi-gene perturbations. BioRxiv. 2022. Roohani, Y., Huang, K., & Leskovec, J. [Paper] [Slides]

  • [2023.01.26 Pre] Effective gene expression prediction from sequence by integrating long-range interactions. Nature methods. 2021. Avsec, Ž., Agarwal, V., Visentin, D., Ledsam, J. R., Grabska-Barwinska, A., Taylor, K. R., ... & Kelley, D. R. [Paper] [Slides]

  • Compositional perturbation autoencoder for single-cell response modeling. BioRxiv. 2021. Lotfollahi, M., Susmelj, A. K., Donno, C. D., Ji, Y., Ibarra, I. L., Wolf, F. A., Yakubova, N., Theis, F. J., & Lopez-Paz, D. [Paper]

  • Predicting Cellular Responses to Novel Drug Perturbations at a Single-Cell Resolution. ArXiv. 2022. Hetzel, L., Böhm, S., Kilbertus, N., Günnemann, S., Lotfollahi, M., & Theis, F. [Paper]

  • MultiCPA: Multimodal Compositional Perturbation Autoencoder. BioRxiv. 2022. Inecik, K., Uhlmann, A., Lotfollahi, M., & Theis, F. [Paper]

  • Machine learning for perturbational single-cell omics. Cell Systems. Cell Systems. 2021. Ji, Y., Lotfollahi, M., Wolf, F. A., & Theis, F. J. [Paper]

  • Learning Single-Cell Perturbation Responses using Neural Optimal Transport. BioRxiv. 2021. Bunne, C., Stark, S. G., Gut, G., Castillo, J. S. del, Lehmann, K.-V., Pelkmans, L., Krause, A., & Rätsch, G. [Paper]

  • Transfer learning enables predictions in network biology. Nature. 2023. Theodoris, C. V., Xiao, L., Chopra, A., Chaffin, M. D., Al Sayed, Z. R., Hill, M. C., ... & Ellinor, P. T. [Paper][Slides]

3.2.2 Data

  • scPerturb: Harmonized Single-Cell Perturbation Data. bioRxiv. 2023. Peidli, S., Green, T. D., Shen, C., Gross, T., Min, J., Garda, S., Yuan, B., Schumacher, L. J., Taylor-King, J. P., Marks, D. S., Luna, A., Blüthgen, N., & Sander, C. [Paper]

  • SERGIO: A Single-Cell Expression Simulator Guided by Gene Regulatory Networks. 2020. Cell Systems. Dibaeinia, P., & Sinha, S. [Paper]

4. Others

About

Must-read papers on AI for Biology

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published