Skip to content

ChenfuShi/hic_master_pipeline

Repository files navigation

HiC analyser for University of Manchster servers

SGE batch system, SUBMIT_script for running and merger_script for merging different samples together

How to use:

Generally there are a lot of parameters to set in the config.py file. These should work but you might need to reconfigure the location of the scratch folders for your account.

The way this software works is that it checks for folders in the reads_here directory and then checks which folders exist in the TADs directory. It then choses one of the folders that are discrepant and analyses that. Once it choses which file to run it will create a folder in the TADs directory to hold that file for the other parallel scripts. The delay of 20 seconds is there to make sure two scripts don't collide into eachother. If you want to reset the run you need to remove the folder from the TADs directory or you can override the automatic process by inputing the folder as -i SAMPLE_PROTOCOL. you can also override steps to run by adding -s STEP_1 -s STEP_2 ....

Processing Hi-C data

  1. Create symbolic links to the input data folder (reads_here). Each sample needs to be within one folder, will all the reads for all the lanes in there. The folder name needs to end in one of the following: ARIMA, HIND or MBOI. Set this based on the restriction enzyme that was used for your Hi-C library. It is very important that the name of the files are the same for the paired end execpt the R1.fastq.gz and R2.fastq.gz. So your files should look like SAMPLENAME_lane_R1.fastq.gz and SAMPLENAME_lane_R2.fastq.gz.

  2. Set in SUBMIT_script.sh the number of samples you are going to process. To do so set #$ -t 1-N with N being the number of samples you have.

  3. Submit the script to the CSF.

Merging of different samples for combined datasets

  1. All the files need to be individually processed first.

  2. After that modify the merger_script.sh and call the python script in the following way python hic_merger.py -i sample_1_ARIMA -i sample_2_ARIMA -i sample_N_ARIMA -o merged_ARIMA

  3. Submit the script to the CSF

QC outputs

QC files will be generated by the pipeline. Reads QC from fastp will be located in the fastp_reports folder. General sequencing metrics will be located here

Hi-C specific QC metrics will be located in the hic-pro_outputs folder. The suggested method of visualizing these QC metrics is by combining the data using multiQC. For more information about these metrics: http://nservant.github.io/HiC-Pro/RESULTS.html

Software versions

hic-pro: 3.0.0 fastp: 0.20.1 bedtools: 2.30.0 samtools: 1.9 bowtie2: 2.4.2 ontad: v1.2 java: 1.8.0 juicertools: 1.22.01 hic2cool: 0.8.3

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published