Skip to content

ehsanasgari/Deep-Proteomics

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 

Repository files navigation

Continuous Distributed Representation of Biological Sequences for Deep Genomics and Deep Proteomics

Update: More recent model trained over UniRef50 can be downloaded from the following link, July 2020.

wget http://deepbio.info/uniref_embeddings.zip

We introduce a new representation for biological sequences. Named bio-vectors (BioVec) to refer to biological sequences in general with protein-vectors (ProtVec) for proteins (amino-acid sequences) and gene-vectors (GeneVec) for gene sequences, this representation can be widely used in applications of deep learning in proteomics and genomics. Biovectors are basically n-gram character skip-gram wordvectors for biological sequences (DNA, RNA, and Protein). In this work, we have explored biophysical and biochemical meaning of this space. In addition, in variety of bioinformatics tasks we have shown the strength of such a sequence representation.

http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0141287

@article{asgari2015continuous,
  title={Continuous Distributed Representation of Biological Sequences for Deep Proteomics and Genomics},
  author={Asgari, Ehsaneddin and Mofrad, Mohammad RK},
  journal={PloS one},
  volume={10},
  number={11},
  pages={e0141287},
  year={2015},
  publisher={Public Library of Science}
}

journal pone 0141287 g002

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published