Skip to content

BioDataLearning/UDSM-LLPS-Syn

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

49 Commits
 
 
 
 
 
 
 
 

Repository files navigation

UDSM-LLPS-Syn

In this study, we applied the deep sequence model – UDSMProt to two new protein classification tasks.

  1. predict proteins with liquid-liquid phase separation propensity
  2. predict synaptic proteins

Our results have shown that, without prior domain knowledge and only based on protein sequences, the fine-tuned language models achieved high classification accuracies and outperformed baseline models using compositional k-mer features in both tasks. For details of this work, please refer to our paper "Deep sequence representation learning for predicting human proteins with liquid-liquid phase separation propensity and synaptic functions" (Wei and Wang, 2022)

Dependencies

Please refer to the orignal repository of UDSMProt for detailed information.

Application Documentation

Users are welcome to use the fine-tuned models in both learning tasks for comparisons in their own research.
Here, we provide one example to show the application of the fine-tuned UDSM-LLPS models in the first learning task. As stated in our paper, in addition to LLPSDB and PhaSepDB data, we also evaluated the performance of UDSM-LLPS on another well-known database – DrLLPS. DrLLPS is currently the most comprehensive database with the largest collection of LLPS-associated proteins in 164 eukaryotes. In DrLLPS, LLPS-associated proteins can be browsed by three LLPS types, including

  • scaffolds, proteins that can drive or undergo LLPS;
  • clients, proteins that can be recruited by scaffolds for the formation of biomolecular condensates;
  • regulators, proteins that have not been identified to undergo LLPS but shown to be involved in regulating LLPS behaviors.

Description of files

  • DrLLPS data: task_1/application/DrLLPS_data.csv stores 3627 reviewed human LLPS-associated proteins categorized by the three types, consisting of 100 scaffolds, 2,998 clients, and 529 regulators.
  • Fine-tuned UDSM-LLPS models: UDSM-LLPS_Random.pkl and UDSM-LLPS_UniRef.pkl under task_1/
  • Utils file: model_utils.py downloaded from the original UDSMProt repository
  • Token file: tok_itos.npy

Jupyter Notebook Documentation

Please see two Jupyter Notebooks under task_1/application/ for detailed steps:

  • 1. Predict LLPS propensity of DrLLPS data.ipynb
  • 2. UDSM-LLPS prediction results on DrLLPS data.ipynb

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published