This repository contains code for generating the results in Kaplow et al., "Neural network modeling of differential binding between wild-type and mutant CTCF reveals putative binding preferences for zinc fingers 1-2," BMC Genomics, 2022, including the wrapper for TF-MoDISco (Shrikumar et al., "TF-MoDISco v0.4.4.2-alpha: Technical Note," arXiv, 2018) that was used for obtaining TF-MoDISco motifs from deep convolutional neural networks trained to predict whether a CTCF ChIP-seq peak would have significantly lower in a dataset from CTCF with a mutated zinc finger as well as ipython notebooks for visualizing the TF-MoDISco results from those neural networks. It also contains scripts for analyses involving the TF-MoDISco results.
- TF-MoDISco wrapper
- utilities used by
- ipython notebooks: code for visualizing results for each neural network, where the zinc finger number in the notebook name indicates the zinc finger mutant corresponding to the model; require data from
- converts an output file from FIMO (Grant et al., "FIMO: Scanning for occurrences of a given motif," Bioinformatics, 2011) to a bed file
- code for analyses involving CTCF-s data (Le et al., "An alternative CTCF isoform antagonizes canonical CTCF occupancy and changes chromatin architecture to promote apoptosis," Nature Communications, 2019)
- code for analyses of mouse activated B cell peaks (Nakahashi et al., "A genome-wide map of CTCF multivalency redefines the CTCF code," Cell Reports, 2013) overlapping the core, upstream, and downstream motifs with no FIMO motif hit cutoff
- code for analyses of mouse heart peaks (mouse ENCODE) overlapping the core, upstream, and downstream motifs with no FIMO motif hit cutoff
- code for analyses of mouse heart peaks overlapping the core, upstream, and downstream motifs with the FIMO motif hit q-value < 0.05 cutoff
- code for analyses of mouse heart peaks overlapping the core, upstream, and downstream motifs with the default FIMO motif hit cutoff
- code for analyses of mouse liver peaks (mouse ENCODE) overlapping the core, upstream, and downstream motifs with no FIMO motif hit cutoff
- code for analyses of mouse liver peaks overlapping the core, upstream, and downstream motifs with the FIMO motif hit q-value < 0.05 cutoff
- code for analyses of mouse liver peaks overlapping the core, upstream, and downstream motifs with the default FIMO motif hit cutoff
- code for analyses of mouse activated B cell peaks overlapping the core, upstream, and downstream motifs with the default FIMO motif hit cutoff
- code for analyses of mouse activated B cell peaks overlapping the core, upstream, and downstream motifs with the FIMO motif hit q-value < 0.05 cutoff
- deseq2Script.r: code for obtaining differential peaks between wild type CTCF ChIP-seq and CTCF ChIP-seq with the zinc finger 1 mutant
- code for obtaining DeepLIFT score bigwig files for each of the wild type CTCF verses mutant CTCF binding prediction models
- code for obtaining DeepLIFT scores for each of the wild type CTCF versus mutant CTCF binding prediction models
- code for running TF-MoDISco on DeepLIFT scores for each of the wild type CTCF versus mutant CTCF binding prediction models
- gets the best motif hit from FIMO in a bed file
- make violin plots for CTCFs analysis visualizations
- wrapper for DeepLIFT (Shrikumar et al., "Learning important features through propagating activation differences," ICML, 2017) for models trained using Keras 0.3.2 with the Theano backend
- makes a single bedGraph file from a text file with per-position DeepLIFT scores
- makes a bedGraph file for each sequence from a text file with per-position DeepLIFT scores
- wrapper for DeepLIFT for models trained using Keras 0.3.2 with the Theano background that iterates through cross-validation folds
- utilities for converting DNA sequence files into the numpy files for training deep learning models
- wrapper for earlier version of TF-MoDISco that contains utilities for subsetting sequences based on deep learning model predictions
- python 2.7.15 (required for ipython notebooks, evaluationScripts, and utils) or 3.7.1
- numpy 1.14.3 (python 2) or 1.17.0 (python 3)
- matplotlib 2.2.3 (python 2) or 3.0.2 (python 3)
- h5py 2.6.0 (python 2) or 2.10.0 (python 3)
- seaborn 0.9.0 (required for only ipython notebooks)
- modisco (python 2), (python 2), or (python 3)
- pybedtools 0.7.8 (python 2) or 0.8.1 (python 3)
- biopython 1.68 (python 2) or 1.73 (python 3)
- cython 0.29.12 (python 2) or 0.29.13 (python 3)
- meme 4.12.0 (evaluationScripts only)
- R 3.5.1 (evaluationScripts only)
- DESeq2 1.22.2 (evaluationScripts only)
- pybedtools 0.7.8 (evaluationScripts only)
- deeplift 0.5.5-theano (evaluationScripts only)
- keras 0.3.2 (evaluationScripts only)
- bedGraphToBigWig ( (evaluationScripts only)
- Biopython 1.68 (evaluationScripts only)
- cython 0.29.12 (evaluationScripts only)
Kaplow IM*, Banerjee A, Foo CS*. Neural network modeling of differential binding between wild-type and mutant CTCF reveals putative binding preferences for zinc fingers 1-2. BMC Genomics, 23: 295, 2022.
Irene Kaplow:
Chuan Sheng Foo: