In this work, we introduce SHICEDO, a novel deep-learning model specifically designed for enhancing scHi-C resolution while addressing the over-smoothing issue. Built on a generative adversarial network (GAN) framework, SHICEDO's generator can process low-resolution scHi-C input of varying scales and sizes, generating an enhanced scHi-C matrix as the output. Leveraging our prior work on bulk Hi-C data, EnHiC, we have incorporated and improved its rank-one feature extraction and reconstruction techniques, along with our new feature refinement modules, into the SHICEDO framework.
This is a citation from a source.
- SHICEDO
- Demo
For the environment: Install PyTorch based on the CUDA version of your machine.
Please check PyTorch for details
In this demo, the machine has CUDA Version: 11.6
To create SCHICEDO environment, use:
conda env create -f SCHICEDO_environment.yml
To activate this environment, use
conda activate SCHICEDO
The processed data is available at the following link:
Download processed data.
There are two processed data available, in the following example, we will demo with processed Lee et al. dataset in folder Lee
The downloaded data may be compressed in different files, please move the files into one folder after Extract
mkdir data
- Please download the processed data to the data folder and use the correct path in the script for data loading.
If you wish to preprocess other datasets. Please check the data preprocessing section
If you wish to process raw data, please run the following command:
In this example, we show how to process the Nagano et al raw data is available at Download raw data.
cd data_preprocessing
mkdir process_data/Nagano
./data_preprocessing.sh
data_preprocessing.sh will run 6 scripts to save processed data:
- Filter the cells based on contact number
python data_filter.py
- Filter out the inter-chromosomal interactions
python filter_true_data.py
- Downsampling the matrix to generate low-resolution input
python down_sampling_sciHiC.py
- Run Rscrip to do Bandnorm
Rscript bandnorm.R
(Please follow the instruction to install Bandnorm) - Organize normalized result
python run_bandnorm.py
- Divide large matrixes into submatrices and save as torch tensor
python generate_input.py
For optimal performance when training on new data, parameter fine-tuning is essential.
The model and date setting were the same as described in the paper.
After choosing suitable hyper-parameters, the model can be trained with the following command:
python test_train.py
After training, Enhanced scHi-C can predict with the following command:
python test_prediction.py
Users can also use the provided pre-trained model to make predictions.
Please change the corresponding model loading path in the test_prediction.py file.
Users can use the provided pre-trained model to make the prediction:
mkdir pretrained_model
- Please download the pretrained model to the pretrained_model folder and use the correct path in the script Download pre-trained model.
python test_pretrained_prediction.py
After prediction, users can generate the MSE and macro F1 of low resolution and prediction by running the following command:
python test_evaluation.py
If you wish to check the heatmap of low resolution, prediction, and true scHi-C, please run the following command:
tensorboard --logdir=runs/heatmap
Here we used processed Lee et al. (download from Download processed data) to demo the training, prediction and evaluation process:
>> conda activate SHICEDO
> python test_train.py
> python test_prediction.py
> python test_evaluation.py
For heatmap and loss visitation:
tensorboard --logdir=runs/heatmap