Dense Representation of the Functional Protein Space with Wasserstein Auto-Encoders

This project aims at applying Wasserstein Auto-Encoders (WAE) to protein function determination, in order to explore the possibility of using the framework as a guiding tool for biologists in the exploration of new protein sequences.

See the report and the poster for the project.

Related Work

We were inspired by the work of Sinai et al., from the Marks Laboratory at Harvard. Instead of using the Variational Auto-Encoder framework, applied the WAE to the same task, hoping to leverage the better results from this novel method.

Implementation

During this work, we wrote our own implementation of the WAE framework in Pytorch, using Maximum Mean Discrepancy (MMD) as a divergence measure between the marginal and the prior (see section 3.3 in the report for more information).

The implementation can be found here. Task-specific implementations were added the protein.py module in autoencoders/.

Dataset

We used the same dataset as Sinai et al. The folder data/ contains the processed data.

Training

The folder notebooks/ contains the notebooks that were used to train the models.

References

See the reference folder for a review of the literature surrounding this project.

For further reference regarding the closely related Variational Auto-Encoder framework, there are nice high-level presentations, for example by Jaan Altosaar and Jeremy Jordan.

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
autoencoders		autoencoders
data		data
docs		docs
models		models
notebooks		notebooks
reference		reference
.gitignore		.gitignore
Numerical Analysis.ipynb		Numerical Analysis.ipynb
poster.pdf		poster.pdf
readme.md		readme.md
report.pdf		report.pdf
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Dense Representation of the Functional Protein Space with Wasserstein Auto-Encoders

Related Work

Implementation

Dataset

Training

References

About

Releases

Packages

Contributors 2

Languages

bdura/pgm-project

Folders and files

Latest commit

History

Repository files navigation

Dense Representation of the Functional Protein Space with Wasserstein Auto-Encoders

Related Work

Implementation

Dataset

Training

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages