-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
0 parents
commit 855a0c0
Showing
22 changed files
with
3,849 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,128 @@ | ||
|
||
# Created by https://www.gitignore.io/api/python,jupyternotebooks | ||
# Edit at https://www.gitignore.io/?templates=python,jupyternotebooks | ||
|
||
### JupyterNotebooks ### | ||
# gitignore template for Jupyter Notebooks | ||
# website: http://jupyter.org/ | ||
|
||
.ipynb_checkpoints | ||
*/.ipynb_checkpoints/* | ||
|
||
# IPython | ||
profile_default/ | ||
ipython_config.py | ||
|
||
# Remove previous ipynb_checkpoints | ||
# git rm -r .ipynb_checkpoints/ | ||
|
||
### Python ### | ||
# Byte-compiled / optimized / DLL files | ||
__pycache__/ | ||
*.py[cod] | ||
*$py.class | ||
|
||
# C extensions | ||
*.so | ||
|
||
# Distribution / packaging | ||
.Python | ||
build/ | ||
develop-eggs/ | ||
dist/ | ||
downloads/ | ||
eggs/ | ||
.eggs/ | ||
lib/ | ||
lib64/ | ||
parts/ | ||
sdist/ | ||
var/ | ||
wheels/ | ||
pip-wheel-metadata/ | ||
share/python-wheels/ | ||
*.egg-info/ | ||
.installed.cfg | ||
*.egg | ||
MANIFEST | ||
|
||
# PyInstaller | ||
# Usually these files are written by a python script from a template | ||
# before PyInstaller builds the exe, so as to inject date/other infos into it. | ||
*.manifest | ||
*.spec | ||
|
||
# Installer logs | ||
pip-log.txt | ||
pip-delete-this-directory.txt | ||
|
||
# Unit test / coverage reports | ||
htmlcov/ | ||
.tox/ | ||
.nox/ | ||
.coverage | ||
.coverage.* | ||
.cache | ||
nosetests.xml | ||
coverage.xml | ||
*.cover | ||
.hypothesis/ | ||
.pytest_cache/ | ||
|
||
# Translations | ||
*.mo | ||
*.pot | ||
|
||
# Scrapy stuff: | ||
.scrapy | ||
|
||
# Sphinx documentation | ||
docs/_build/ | ||
|
||
# PyBuilder | ||
target/ | ||
|
||
# pyenv | ||
.python-version | ||
|
||
# pipenv | ||
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control. | ||
# However, in case of collaboration, if having platform-specific dependencies or dependencies | ||
# having no cross-platform support, pipenv may install dependencies that don't work, or not | ||
# install all needed dependencies. | ||
#Pipfile.lock | ||
|
||
# celery beat schedule file | ||
celerybeat-schedule | ||
|
||
# SageMath parsed files | ||
*.sage.py | ||
|
||
# Spyder project settings | ||
.spyderproject | ||
.spyproject | ||
|
||
# Rope project settings | ||
.ropeproject | ||
|
||
# Mr Developer | ||
.mr.developer.cfg | ||
.project | ||
.pydevproject | ||
|
||
# mkdocs documentation | ||
/site | ||
|
||
# mypy | ||
.mypy_cache/ | ||
.dmypy.json | ||
dmypy.json | ||
|
||
# Pyre type checker | ||
.pyre/ | ||
|
||
# End of https://www.gitignore.io/api/python,jupyternotebooks | ||
|
||
|
||
deepgrp/*.c | ||
deepgrp/_mss/pymss.c |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,91 @@ | ||
================================================================== | ||
DeepGRP - Deep learning for Genomic Repetitive element Prediction | ||
================================================================== | ||
|
||
DeepGRP is a python package used to predict genomic repetitive elements | ||
with a deep learning model consisting of bidirectional gated recurrent units | ||
with attention. | ||
The idea of DeepGRP was initially based on `dna-nn`__, but was re-implemented | ||
and extended using `TensorFlow`__ 2.1. | ||
DeepGRP was tested for the prediction of HSAT2,3, alphoid, Alu | ||
and LINE-1 elements. | ||
|
||
.. __: https://github.com/lh3/dna-nn | ||
.. __: https://www.tensorflow.org | ||
|
||
Getting Started | ||
=============== | ||
|
||
Installation | ||
------------ | ||
|
||
For installation you can use provided wheels with pip:: | ||
|
||
pip install deepgrp-0.1.0-cp37-cp37m-linux_x86_64.whl | ||
|
||
Additionally you can install the developmental version with `poetry`__:: | ||
|
||
git clone https://github.com/fhausmann/deepgrp | ||
cd deepgrp | ||
poetry install | ||
|
||
.. __: https://python-poetry.org/ | ||
|
||
Data preprocessing | ||
------------------ | ||
For training and hyperparameter optimization the data have to be preprocessed. | ||
For inference / prediction the FASTA sequences can directly be used and you | ||
can skip this process. | ||
The provided script `parse_rm` can be used to extract repeat annotations from | ||
`RepeatMasker`__ annotations to a TAB seperated format by:: | ||
|
||
parse_rm GENOME.fa.out > GENOME.bed | ||
|
||
.. __: http://www.repeatmasker.org/ | ||
|
||
The FASTA sequences have to be converted to a one-hot-encoded representation, | ||
which can be done with:: | ||
|
||
preprocess_sequence FASTAFILE.fa.gz | ||
|
||
`preprocess_sequence` creates a one-hot-encoded representation in numpy | ||
compressed format in the same directory. | ||
|
||
|
||
Hyperparameter optimization | ||
--------------------------- | ||
For Hyperparameter optimization the github repository provides | ||
a jupyter `notebook`__ which can be used. | ||
|
||
.. __: https://github.com/fhausmann/deepgrp/blob/master/notebooks/DeepGRP.ipynb | ||
|
||
Hyperparameter optimization is based on the `hyperopt`__ package. | ||
|
||
.. __: https://github.com/hyperopt/hyperopt | ||
|
||
Training | ||
-------- | ||
|
||
Training of a model can be performed with the provided jupyter `notebook`__. | ||
|
||
.. __: https://github.com/fhausmann/deepgrp/blob/master/notebooks/Training.ipynb | ||
|
||
Prediction | ||
---------- | ||
The prediction can be done with the deepgrp main function like:: | ||
|
||
deepgrp <modelfile> <fastafile> [<fastafile>, ...] | ||
|
||
where `<modelfile>` contains the trained model in `HDF5`__ | ||
format and `<fastafile>` is a (multi-)FASTA file containing DNA sequences. | ||
Several FASTA files can be given at once. | ||
|
||
.. __: https://www.tensorflow.org/tutorials/keras/save_and_load | ||
|
||
Requirements | ||
============ | ||
Requirements are listed in `pyproject.toml`__. | ||
|
||
.. __: https://github.com/fhausmann/deepgrp/blob/master/pyproject.toml | ||
|
||
Additionally for compiling C/Cython code, a C compiler should be installed. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
""" Build file for cython extensions """ | ||
from distutils.core import Extension | ||
import numpy | ||
from Cython.Build import cythonize | ||
|
||
_EXTENSIONS = [ | ||
Extension("deepgrp.mss", | ||
sources=["deepgrp/_mss/pymss.pyx", "./deepgrp/_mss/mss.c"], | ||
include_dirs=[numpy.get_include()] + ["./deepgrp"]), | ||
Extension("deepgrp.sequence", | ||
sources=["deepgrp/sequence.pyx"], | ||
include_dirs=[numpy.get_include()]), | ||
] | ||
|
||
|
||
def build(setup_kwargs): | ||
""" | ||
This function is mandatory in order to build the extensions. | ||
""" | ||
|
||
setup_kwargs.update({'ext_modules': cythonize(_EXTENSIONS)}) |
Empty file.
Oops, something went wrong.