Binding site predictor

Data

DNA

DNA is ordered in a 4D-tensor with for each training range, a one-hot encoding (4 element vector) of all nucleotides in that range (assumed to always be 200). If a sequence is unknown ('N' in FASTA format), the entire corresponding one-hot vector is zero. Then, for all nucleotides seperately, ranges are combined into bits in a uint8 array for compactness (see numpy.unpackbits). Eventually the data is pickled into dna.pkl with protocol 2.

ChIP

ChIP-seq peak conservative calls are collected per range and per protein-cell combination in a 3D-tensor. The data is then pickled into chip.conservative.pkl with protocol 2.

For experiments with only one protein (currently the only experiment supported) one can generate a protein-cell specific dataset by running python3 pick_chip.py --cell X --protein Y, where X and Y are the indices of both properties in the original tensor.

Training

One can start training on a specific protein-cell pair by running python3 train.py --epochs E --chip-path chip-X-Y.conservative.pkl. Check python3 train.py --help for additional training options.

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
.gitignore		.gitignore
README.md		README.md
data.py		data.py
gen_test.py		gen_test.py
model.py		model.py
motifs.txt		motifs.txt
pick_chip.py		pick_chip.py
requirements.txt		requirements.txt
train.py		train.py
unpack.py		unpack.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Binding site predictor

Data

DNA

ChIP

Training

About

Releases

Packages

Contributors 2

Languages

erikvdplas/binding-site-predictor

Folders and files

Latest commit

History

Repository files navigation

Binding site predictor

Data

DNA

ChIP

Training

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages