Skip to content

wmalab/SHICEDO

Repository files navigation

SHICEDO: Single-cell Hi-C Resolution Enhancement with Reduced Over-smoothing

In this work, we introduce SHICEDO, a novel deep-learning model specifically designed for enhancing scHi-C resolution while addressing the over-smoothing issue. Built on a generative adversarial network (GAN) framework, SHICEDO's generator can process low-resolution scHi-C input of varying scales and sizes, generating an enhanced scHi-C matrix as the output. Leveraging our prior work on bulk Hi-C data, EnHiC, we have incorporated and improved its rank-one feature extraction and reconstruction techniques, along with our new feature refinement modules, into the SHICEDO framework.

Model_Overview

This is a citation from a source.

Installation

For the environment: Install PyTorch based on the CUDA version of your machine.
Please check PyTorch for details

In this demo, the machine has CUDA Version: 11.6
To create SCHICEDO environment, use:
conda env create -f SCHICEDO_environment.yml
To activate this environment, use
conda activate SCHICEDO

Download processed data

The processed data is available at the following link:
Download processed data.
There are two processed data available, in the following example, we will demo with processed Lee et al. dataset in folder Lee
The downloaded data may be compressed in different files, please move the files into one folder after Extract

  1. mkdir data
  2. Please download the processed data to the data folder and use the correct path in the script for data loading.
    If you wish to preprocess other datasets. Please check the data preprocessing section

Data preprocessing

If you wish to process raw data, please run the following command:
In this example, we show how to process the Nagano et al raw data is available at Download raw data.
cd data_preprocessing
mkdir process_data/Nagano

./data_preprocessing.sh
data_preprocessing.sh will run 6 scripts to save processed data:

  1. Filter the cells based on contact number python data_filter.py
  2. Filter out the inter-chromosomal interactions python filter_true_data.py
  3. Downsampling the matrix to generate low-resolution input python down_sampling_sciHiC.py
  4. Run Rscrip to do Bandnorm Rscript bandnorm.R
    (Please follow the instruction to install Bandnorm)
  5. Organize normalized result python run_bandnorm.py
  6. Divide large matrixes into submatrices and save as torch tensor python generate_input.py

Training

For optimal performance when training on new data, parameter fine-tuning is essential.
The model and date setting were the same as described in the paper.
After choosing suitable hyper-parameters, the model can be trained with the following command:
python test_train.py

Prediction

After training, Enhanced scHi-C can predict with the following command:
python test_prediction.py
Users can also use the provided pre-trained model to make predictions.
Please change the corresponding model loading path in the test_prediction.py file.

Prediction with pre-trained model

Users can use the provided pre-trained model to make the prediction:

  1. mkdir pretrained_model
  2. Please download the pretrained model to the pretrained_model folder and use the correct path in the script Download pre-trained model.
  3. python test_pretrained_prediction.py

Evaluation

After prediction, users can generate the MSE and macro F1 of low resolution and prediction by running the following command:
python test_evaluation.py

Heatmap and loss visualization

If you wish to check the heatmap of low resolution, prediction, and true scHi-C, please run the following command:
tensorboard --logdir=runs/heatmap

Demo

Here we used processed Lee et al. (download from Download processed data) to demo the training, prediction and evaluation process:
>> conda activate SHICEDO
> python test_train.py
> python test_prediction.py
> python test_evaluation.py
For heatmap and loss visitation:
tensorboard --logdir=runs/heatmap

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published