HiC analyser for University of Manchster servers

SGE batch system, SUBMIT_script for running and merger_script for merging different samples together

How to use:

Generally there are a lot of parameters to set in the config.py file. These should work but you might need to reconfigure the location of the scratch folders for your account.

The way this software works is that it checks for folders in the reads_here directory and then checks which folders exist in the TADs directory. It then choses one of the folders that are discrepant and analyses that. Once it choses which file to run it will create a folder in the TADs directory to hold that file for the other parallel scripts. The delay of 20 seconds is there to make sure two scripts don't collide into eachother. If you want to reset the run you need to remove the folder from the TADs directory or you can override the automatic process by inputing the folder as -i SAMPLE_PROTOCOL. you can also override steps to run by adding -s STEP_1 -s STEP_2 ....

Processing Hi-C data

Create symbolic links to the input data folder (reads_here). Each sample needs to be within one folder, will all the reads for all the lanes in there. The folder name needs to end in one of the following: ARIMA, HIND or MBOI. Set this based on the restriction enzyme that was used for your Hi-C library. It is very important that the name of the files are the same for the paired end execpt the R1.fastq.gz and R2.fastq.gz. So your files should look like SAMPLENAME_lane_R1.fastq.gz and SAMPLENAME_lane_R2.fastq.gz.
Set in SUBMIT_script.sh the number of samples you are going to process. To do so set #$ -t 1-N with N being the number of samples you have.
Submit the script to the CSF.

Merging of different samples for combined datasets

All the files need to be individually processed first.
After that modify the merger_script.sh and call the python script in the following way python hic_merger.py -i sample_1_ARIMA -i sample_2_ARIMA -i sample_N_ARIMA -o merged_ARIMA
Submit the script to the CSF

QC outputs

QC files will be generated by the pipeline. Reads QC from fastp will be located in the fastp_reports folder. General sequencing metrics will be located here

Hi-C specific QC metrics will be located in the hic-pro_outputs folder. The suggested method of visualizing these QC metrics is by combining the data using multiQC. For more information about these metrics: http://nservant.github.io/HiC-Pro/RESULTS.html

Software versions

hic-pro: 3.0.0 fastp: 0.20.1 bedtools: 2.30.0 samtools: 1.9 bowtie2: 2.4.2 ontad: v1.2 java: 1.8.0 juicertools: 1.22.01 hic2cool: 0.8.3

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
python_bits		python_bits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
SUBMIT_script.sh		SUBMIT_script.sh
get_all_cooled.sh		get_all_cooled.sh
hic_merger.py		hic_merger.py
hicpro3_conda_environment.yml		hicpro3_conda_environment.yml
master_hic_processor.py		master_hic_processor.py
master_hic_processor_allele.py		master_hic_processor_allele.py
merger_script.sh		merger_script.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HiC analyser for University of Manchster servers

How to use:

Processing Hi-C data

Merging of different samples for combined datasets

QC outputs

Software versions

About

Releases

Packages

Languages

License

ChenfuShi/hic_master_pipeline

Folders and files

Latest commit

History

Repository files navigation

HiC analyser for University of Manchster servers

How to use:

Processing Hi-C data

Merging of different samples for combined datasets

QC outputs

Software versions

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages