Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running Hi-C pro yields no results #400

Closed
yamzaleg opened this issue Jan 27, 2021 · 20 comments
Closed

Running Hi-C pro yields no results #400

yamzaleg opened this issue Jan 27, 2021 · 20 comments

Comments

@yamzaleg
Copy link

Hello,

I'm trying to run perform a HiChIP analysis -which takes principles of ChIP and Hi-C to determine interactions between genomic loci. I'm trying to run Hi-C Pro in published data to make sure I know how to perform the analysis before proceeding to mine. Basically I've made sure to have the fastq files in a particular directory (each individual sample has it's own directory within the major fastq directory). I've also filled out the config-hicpro.txt. When I try to run the command I see that in my output directory I now have a symbolic link to my raw data (called rawdata) and a new configuration file (with the same name). Nothing seems to be running and I'm stuck. Can you help me out?

the code I'm running:
/home1/amzaleg/new/amzaleg/hichip_tools/HiC-Pro-2.11.4/bin/HiC-Pro -i ~/new/amzaleg/hichip/raw_data -o ~/new/amzaleg/hichip/run/new_output -c ~/new/amzaleg/hichip/run/config-hicpro.txt -s mapping -s proc_hic

I'll also attach my configuration file
config-hicpro.txt

@nservant
Copy link
Owner

Hi,
Everything looks good. Could you show me the content of your raw_data folder please ?
N

@yamzaleg
Copy link
Author

Yes, of course.

amzaleg@discovery2:~/new/amzaleg/hichip$ tree raw_data
raw_data
├── GM_rep1
│   ├── GM_HiChIP_H3K27Ac_rep1_1.fastq.gz
│   └── GM_HiChIP_H3K27Ac_rep1_2.fastq.gz
├── GM_rep2
│   ├── GM_HiChIP_H3K27Ac_rep2_1.fastq.gz
│   └── GM_HiChIP_H3K27Ac_rep2_2.fastq.gz
├── MyLa_rep1
│   ├── MyLa_HiChIP_H3K27Ac_rep1_1.fastq.gz
│   └── MyLa_HiChIP_H3K27Ac_rep1_2.fastq.gz
└── MyLa_rep2
├── MyLa_HiChIP_H3K27Ac_rep2_1.fastq.gz
└── MyLa_HiChIP_H3K27Ac_rep2_2.fastq.gz

@nservant
Copy link
Owner

ok. So update the config file with ;

PAIR1_EXT=_1
PAIR2_EXT=_2

These options allow to detect R1 and R2 files using a single regexp.
Best

@yamzaleg
Copy link
Author

Thank you so much.
Unfortunately, I am now getting another error:
Run HiC-Pro 2.11.4

Wed Jan 27 00:44:36 PST 2021
Bowtie2 alignment step1 ...
Logs: logs/GM_rep1/mapping_step1.log
[main_samview] fail to read the header from "-".
[main_samview] fail to read the header from "-".
Exit: Error in reads alignment - Exit
make: *** [bowtie_global] Error 1

@nservant
Copy link
Owner

could you check if you have an error message in logs/GM_rep1/mapping_step1.log please ?

@nservant
Copy link
Owner

you did not provide the bowtie2 index in your conf !
Please set up the BOWTIE2_IDX_PATH when the path to bowtie2 indexes.
As you put hg19 in the reference genome, indexes must be named with the hg19 prefix. Otherwise, update the reference genome.

@yamzaleg
Copy link
Author

thank you so much for your prompt responses!
I noticed my error and adjusted the config file to include path to the indices of bowtie2. For some reason I keep getting an error that the directory with my indices is not found.

I'll send you my updated config file as well as how the directory with my hg19 bowtie2 indices look.
config-hicpro_1.txt
Screen Shot 2021-01-27 at 1 57 59 AM

@nservant
Copy link
Owner

I know that's a bit unclear but actually bowtie2 indexes are detected by a concatenation of BOWTIE2_INDEX_PATH and REFERENCE_GENOME ...
So in your case, please use ;

BOWTIE2_IDX_PATH = ~/new/amzaleg/chipseq/hg19/hg19_bt2/
REFERENCE_GENOME = hg19_bt2

@yamzaleg
Copy link
Author

Thank you so much! It seems to be running now. I appreciate all your help.

@yamzaleg
Copy link
Author

Hello!
I was able to run the code, and my University uses Slurm, based on the paper I thought using 10 Gb of memory per processor core spit into 10 tasks for 20 hours. Unfortunately, it ran through all 20 hours and it didn't complete the first fastq file. Should I ask for more memory? Am I going about this correctly?
Below is my slurm script:

#!/bin/bash
#SBATCH --ntasks=10
#SBATCH --mem-per-cpu=10GB
#SBATCH --time=20:00:00

/home1/amzaleg/new/amzaleg/hichip_tools/HiC-Pro-2.11.4/bin/HiC-Pro -i ~/new/amzaleg/hichip/raw_data -o ~/new/amzaleg/hichip/run/new_output -c ~/new/amzaleg/hichip/run/config-hicpro.txt -s mapping -s proc_hic

@nservant
Copy link
Owner

Hi,
If you want to speed up the processing, you should ;

  • split the fastq files into chunks (see bin/utils/split_reads.py)
  • Put all chunks in the same output folder
  • Run the analysis in parallel mode

In this case, all chunks will be processed in parallel and then merged before building the contact maps.
The number of cores that you set up in the config is not extremely useful, as they will only impact the bowtie2 mapping ....
Most of the other analysis steps are not multi-threarded.

Best

@yamzaleg
Copy link
Author

yamzaleg commented Feb 8, 2021

Just a quick clarification: I was able to do the split to 10 million chucks for every fastq file. When you say "put all the chunks in the same output folder" I put split files for both pairs of the fastq files for each condition in a separate directory. I then ran the parallel command directing the input files to where the directory of all split files are.

Whe I ran the parallel code I got the both .sh files to run via SLURM. I'm just concerned that I'm doing something wrong because the split files look like this:

├── GM_rep1
│   ├── 00_GM_HiChIP_H3K27Ac_rep1_1.fastq
│   ├── 00_GM_HiChIP_H3K27Ac_rep1_2.fastq
│   ├── 01_GM_HiChIP_H3K27Ac_rep1_1.fastq
│   ├── 01_GM_HiChIP_H3K27Ac_rep1_2.fastq
│   ├── 02_GM_HiChIP_H3K27Ac_rep1_1.fastq
│   ├── 02_GM_HiChIP_H3K27Ac_rep1_2.fastq
│   ├── 03_GM_HiChIP_H3K27Ac_rep1_1.fastq
│   ├── 03_GM_HiChIP_H3K27Ac_rep1_2.fastq
│   ├── 04_GM_HiChIP_H3K27Ac_rep1_1.fastq
│   ├── 04_GM_HiChIP_H3K27Ac_rep1_2.fastq
│   ├── 05_GM_HiChIP_H3K27Ac_rep1_1.fastq
│   ├── 05_GM_HiChIP_H3K27Ac_rep1_2.fastq
│   ├── 06_GM_HiChIP_H3K27Ac_rep1_1.fastq
│   ├── 06_GM_HiChIP_H3K27Ac_rep1_2.fastq
│   ├── 07_GM_HiChIP_H3K27Ac_rep1_1.fastq
│   ├── 07_GM_HiChIP_H3K27Ac_rep1_2.fastq
│   ├── 08_GM_HiChIP_H3K27Ac_rep1_1.fastq
│   ├── 08_GM_HiChIP_H3K27Ac_rep1_2.fastq
│   ├── 09_GM_HiChIP_H3K27Ac_rep1_1.fastq
│   ├── 09_GM_HiChIP_H3K27Ac_rep1_2.fastq
│   ├── 10_GM_HiChIP_H3K27Ac_rep1_1.fastq
│   ├── 10_GM_HiChIP_H3K27Ac_rep1_2.fastq
│   ├── 11_GM_HiChIP_H3K27Ac_rep1_1.fastq
│   ├── 11_GM_HiChIP_H3K27Ac_rep1_2.fastq
│   ├── 12_GM_HiChIP_H3K27Ac_rep1_1.fastq
│   ├── 12_GM_HiChIP_H3K27Ac_rep1_2.fastq
│   ├── 13_GM_HiChIP_H3K27Ac_rep1_1.fastq
│   ├── 13_GM_HiChIP_H3K27Ac_rep1_2.fastq
│   ├── 14_GM_HiChIP_H3K27Ac_rep1_1.fastq
│   ├── 14_GM_HiChIP_H3K27Ac_rep1_2.fastq
│   ├── 15_GM_HiChIP_H3K27Ac_rep1_1.fastq
│   ├── 15_GM_HiChIP_H3K27Ac_rep1_2.fastq
│   ├── 16_GM_HiChIP_H3K27Ac_rep1_1.fastq
│   ├── 16_GM_HiChIP_H3K27Ac_rep1_2.fastq
│   ├── 17_GM_HiChIP_H3K27Ac_rep1_1.fastq
│   ├── 17_GM_HiChIP_H3K27Ac_rep1_2.fastq
│   ├── 18_GM_HiChIP_H3K27Ac_rep1_1.fastq
│   ├── 18_GM_HiChIP_H3K27Ac_rep1_2.fastq
│   ├── 19_GM_HiChIP_H3K27Ac_rep1_1.fastq
│   ├── 19_GM_HiChIP_H3K27Ac_rep1_2.fastq
│   ├── 20_GM_HiChIP_H3K27Ac_rep1_1.fastq
│   ├── 20_GM_HiChIP_H3K27Ac_rep1_2.fastq
│   ├── 21_GM_HiChIP_H3K27Ac_rep1_1.fastq
│   ├── 21_GM_HiChIP_H3K27Ac_rep1_2.fastq
│   ├── 22_GM_HiChIP_H3K27Ac_rep1_1.fastq
│   ├── 22_GM_HiChIP_H3K27Ac_rep1_2.fastq
│   ├── 23_GM_HiChIP_H3K27Ac_rep1_1.fastq
│   ├── 23_GM_HiChIP_H3K27Ac_rep1_2.fastq
│   ├── 24_GM_HiChIP_H3K27Ac_rep1_1.fastq
│   ├── 24_GM_HiChIP_H3K27Ac_rep1_2.fastq
│   ├── 25_GM_HiChIP_H3K27Ac_rep1_1.fastq
│   ├── 25_GM_HiChIP_H3K27Ac_rep1_2.fastq
│   ├── 26_GM_HiChIP_H3K27Ac_rep1_1.fastq
│   ├── 26_GM_HiChIP_H3K27Ac_rep1_2.fastq
│   ├── 27_GM_HiChIP_H3K27Ac_rep1_1.fastq
│   ├── 27_GM_HiChIP_H3K27Ac_rep1_2.fastq
│   ├── 28_GM_HiChIP_H3K27Ac_rep1_1.fastq
│   ├── 28_GM_HiChIP_H3K27Ac_rep1_2.fastq
│   ├── 29_GM_HiChIP_H3K27Ac_rep1_1.fastq
│   ├── 29_GM_HiChIP_H3K27Ac_rep1_2.fastq
│   ├── 30_GM_HiChIP_H3K27Ac_rep1_1.fastq
│   ├── 30_GM_HiChIP_H3K27Ac_rep1_2.fastq
│   ├── 31_GM_HiChIP_H3K27Ac_rep1_1.fastq
│   ├── 31_GM_HiChIP_H3K27Ac_rep1_2.fastq
│   ├── 32_GM_HiChIP_H3K27Ac_rep1_1.fastq
│   ├── 32_GM_HiChIP_H3K27Ac_rep1_2.fastq
│   ├── 33_GM_HiChIP_H3K27Ac_rep1_1.fastq
│   └── 33_GM_HiChIP_H3K27Ac_rep1_2.fastq

After I ran the part 1 .sh file yielding this result :

├── 00_GM_HiChIP_H3K27Ac_rep1_1_hg19_bt2.bwt2glob.bam
├── 00_GM_HiChIP_H3K27Ac_rep1_1_hg19_bt2.bwt2glob.unmap.fastq
├── 00_GM_HiChIP_H3K27Ac_rep1_2_hg19_bt2.bwt2glob.bam
├── 00_GM_HiChIP_H3K27Ac_rep1_2_hg19_bt2.bwt2glob.unmap.fastq
├── 01_GM_HiChIP_H3K27Ac_rep1_1_hg19_bt2.bwt2glob.bam
├── 01_GM_HiChIP_H3K27Ac_rep1_1_hg19_bt2.bwt2glob.unmap.fastq
├── 01_GM_HiChIP_H3K27Ac_rep1_2_hg19_bt2.bwt2glob.bam
├── 01_GM_HiChIP_H3K27Ac_rep1_2_hg19_bt2.bwt2glob.unmap.fastq
├── 02_GM_HiChIP_H3K27Ac_rep1_1_hg19_bt2.bwt2glob.bam
├── 02_GM_HiChIP_H3K27Ac_rep1_1_hg19_bt2.bwt2glob.unmap.fastq
├── 02_GM_HiChIP_H3K27Ac_rep1_2_hg19_bt2.bwt2glob.bam
├── 02_GM_HiChIP_H3K27Ac_rep1_2_hg19_bt2.bwt2glob.unmap.fastq
├── 03_GM_HiChIP_H3K27Ac_rep1_1_hg19_bt2.bwt2glob.bam
├── 03_GM_HiChIP_H3K27Ac_rep1_1_hg19_bt2.bwt2glob.unmap.fastq
├── 03_GM_HiChIP_H3K27Ac_rep1_2_hg19_bt2.bwt2glob.bam
├── 03_GM_HiChIP_H3K27Ac_rep1_2_hg19_bt2.bwt2glob.unmap.fastq
├── 04_GM_HiChIP_H3K27Ac_rep1_1_hg19_bt2.bwt2glob.bam
├── 04_GM_HiChIP_H3K27Ac_rep1_1_hg19_bt2.bwt2glob.unmap.fastq
├── 04_GM_HiChIP_H3K27Ac_rep1_2_hg19_bt2.bwt2glob.bam
├── 04_GM_HiChIP_H3K27Ac_rep1_2_hg19_bt2.bwt2glob.unmap.fastq
├── 05_GM_HiChIP_H3K27Ac_rep1_1_hg19_bt2.bwt2glob.bam
├── 05_GM_HiChIP_H3K27Ac_rep1_1_hg19_bt2.bwt2glob.unmap.fastq
├── 05_GM_HiChIP_H3K27Ac_rep1_2_hg19_bt2.bwt2glob.bam
├── 05_GM_HiChIP_H3K27Ac_rep1_2_hg19_bt2.bwt2glob.unmap.fastq
├── 06_GM_HiChIP_H3K27Ac_rep1_1_hg19_bt2.bwt2glob.bam
├── 06_GM_HiChIP_H3K27Ac_rep1_1_hg19_bt2.bwt2glob.unmap.fastq
├── 06_GM_HiChIP_H3K27Ac_rep1_2_hg19_bt2.bwt2glob.bam
├── 06_GM_HiChIP_H3K27Ac_rep1_2_hg19_bt2.bwt2glob.unmap.fastq
├── 07_GM_HiChIP_H3K27Ac_rep1_1_hg19_bt2.bwt2glob.bam
├── 07_GM_HiChIP_H3K27Ac_rep1_1_hg19_bt2.bwt2glob.unmap.fastq
├── 07_GM_HiChIP_H3K27Ac_rep1_2_hg19_bt2.bwt2glob.bam
├── 07_GM_HiChIP_H3K27Ac_rep1_2_hg19_bt2.bwt2glob.unmap.fastq
├── 08_GM_HiChIP_H3K27Ac_rep1_1_hg19_bt2.bwt2glob.bam
├── 08_GM_HiChIP_H3K27Ac_rep1_1_hg19_bt2.bwt2glob.unmap.fastq
├── 08_GM_HiChIP_H3K27Ac_rep1_2_hg19_bt2.bwt2glob.bam
├── 08_GM_HiChIP_H3K27Ac_rep1_2_hg19_bt2.bwt2glob.unmap.fastq
├── 09_GM_HiChIP_H3K27Ac_rep1_1_hg19_bt2.bwt2glob.bam
├── 09_GM_HiChIP_H3K27Ac_rep1_1_hg19_bt2.bwt2glob.unmap.fastq
├── 09_GM_HiChIP_H3K27Ac_rep1_2_hg19_bt2.bwt2glob.bam
├── 09_GM_HiChIP_H3K27Ac_rep1_2_hg19_bt2.bwt2glob.unmap.fastq
├── 10_GM_HiChIP_H3K27Ac_rep1_1_hg19_bt2.bwt2glob.bam
├── 10_GM_HiChIP_H3K27Ac_rep1_1_hg19_bt2.bwt2glob.unmap.fastq
├── 10_GM_HiChIP_H3K27Ac_rep1_2_hg19_bt2.bwt2glob.bam
├── 10_GM_HiChIP_H3K27Ac_rep1_2_hg19_bt2.bwt2glob.unmap.fastq
├── 11_GM_HiChIP_H3K27Ac_rep1_1_hg19_bt2.bwt2glob.bam
├── 11_GM_HiChIP_H3K27Ac_rep1_1_hg19_bt2.bwt2glob.unmap.fastq
├── 11_GM_HiChIP_H3K27Ac_rep1_2_hg19_bt2.bwt2glob.bam
├── 11_GM_HiChIP_H3K27Ac_rep1_2_hg19_bt2.bwt2glob.unmap.fastq
├── 12_GM_HiChIP_H3K27Ac_rep1_1_hg19_bt2.bwt2glob.bam
├── 12_GM_HiChIP_H3K27Ac_rep1_1_hg19_bt2.bwt2glob.unmap.fastq
├── 12_GM_HiChIP_H3K27Ac_rep1_2_hg19_bt2.bwt2glob.bam
├── 12_GM_HiChIP_H3K27Ac_rep1_2_hg19_bt2.bwt2glob.unmap.fastq
├── 13_GM_HiChIP_H3K27Ac_rep1_1_hg19_bt2.bwt2glob.bam
├── 13_GM_HiChIP_H3K27Ac_rep1_1_hg19_bt2.bwt2glob.unmap.fastq
├── 13_GM_HiChIP_H3K27Ac_rep1_2_hg19_bt2.bwt2glob.bam
├── 13_GM_HiChIP_H3K27Ac_rep1_2_hg19_bt2.bwt2glob.unmap.fastq
├── 14_GM_HiChIP_H3K27Ac_rep1_1_hg19_bt2.bwt2glob.bam
├── 14_GM_HiChIP_H3K27Ac_rep1_1_hg19_bt2.bwt2glob.unmap.fastq
├── 14_GM_HiChIP_H3K27Ac_rep1_2_hg19_bt2.bwt2glob.bam
├── 14_GM_HiChIP_H3K27Ac_rep1_2_hg19_bt2.bwt2glob.unmap.fastq
├── 15_GM_HiChIP_H3K27Ac_rep1_1_hg19_bt2.bwt2glob.bam
├── 15_GM_HiChIP_H3K27Ac_rep1_1_hg19_bt2.bwt2glob.unmap.fastq
├── 15_GM_HiChIP_H3K27Ac_rep1_2_hg19_bt2.bwt2glob.bam
├── 15_GM_HiChIP_H3K27Ac_rep1_2_hg19_bt2.bwt2glob.unmap.fastq
├── 16_GM_HiChIP_H3K27Ac_rep1_1_hg19_bt2.bwt2glob.bam
├── 16_GM_HiChIP_H3K27Ac_rep1_1_hg19_bt2.bwt2glob.unmap.fastq
├── 16_GM_HiChIP_H3K27Ac_rep1_2_hg19_bt2.bwt2glob.bam
├── 16_GM_HiChIP_H3K27Ac_rep1_2_hg19_bt2.bwt2glob.unmap.fastq
├── 17_GM_HiChIP_H3K27Ac_rep1_1_hg19_bt2.bwt2glob.bam
├── 17_GM_HiChIP_H3K27Ac_rep1_1_hg19_bt2.bwt2glob.unmap.fastq
├── 17_GM_HiChIP_H3K27Ac_rep1_2_hg19_bt2.bwt2glob.bam
├── 17_GM_HiChIP_H3K27Ac_rep1_2_hg19_bt2.bwt2glob.unmap.fastq
├── 18_GM_HiChIP_H3K27Ac_rep1_1_hg19_bt2.bwt2glob.bam
├── 18_GM_HiChIP_H3K27Ac_rep1_1_hg19_bt2.bwt2glob.unmap.fastq
├── 18_GM_HiChIP_H3K27Ac_rep1_2_hg19_bt2.bwt2glob.bam
├── 18_GM_HiChIP_H3K27Ac_rep1_2_hg19_bt2.bwt2glob.unmap.fastq
├── 19_GM_HiChIP_H3K27Ac_rep1_1_hg19_bt2.bwt2glob.bam
├── 19_GM_HiChIP_H3K27Ac_rep1_1_hg19_bt2.bwt2glob.unmap.fastq
├── 19_GM_HiChIP_H3K27Ac_rep1_2_hg19_bt2.bwt2glob.bam
├── 19_GM_HiChIP_H3K27Ac_rep1_2_hg19_bt2.bwt2glob.unmap.fastq
├── 20_GM_HiChIP_H3K27Ac_rep1_1_hg19_bt2.bwt2glob.bam
├── 20_GM_HiChIP_H3K27Ac_rep1_1_hg19_bt2.bwt2glob.unmap.fastq
├── 20_GM_HiChIP_H3K27Ac_rep1_2_hg19_bt2.bwt2glob.bam
├── 20_GM_HiChIP_H3K27Ac_rep1_2_hg19_bt2.bwt2glob.unmap.fastq
├── 21_GM_HiChIP_H3K27Ac_rep1_1_hg19_bt2.bwt2glob.bam
├── 21_GM_HiChIP_H3K27Ac_rep1_1_hg19_bt2.bwt2glob.unmap.fastq
├── 21_GM_HiChIP_H3K27Ac_rep1_2_hg19_bt2.bwt2glob.bam
├── 21_GM_HiChIP_H3K27Ac_rep1_2_hg19_bt2.bwt2glob.unmap.fastq
├── 22_GM_HiChIP_H3K27Ac_rep1_1_hg19_bt2.bwt2glob.bam
├── 22_GM_HiChIP_H3K27Ac_rep1_1_hg19_bt2.bwt2glob.unmap.fastq
├── 22_GM_HiChIP_H3K27Ac_rep1_2_hg19_bt2.bwt2glob.bam
├── 22_GM_HiChIP_H3K27Ac_rep1_2_hg19_bt2.bwt2glob.unmap.fastq
├── 23_GM_HiChIP_H3K27Ac_rep1_1_hg19_bt2.bwt2glob.bam
├── 23_GM_HiChIP_H3K27Ac_rep1_1_hg19_bt2.bwt2glob.unmap.fastq
├── 23_GM_HiChIP_H3K27Ac_rep1_2_hg19_bt2.bwt2glob.bam
├── 23_GM_HiChIP_H3K27Ac_rep1_2_hg19_bt2.bwt2glob.unmap.fastq
├── 24_GM_HiChIP_H3K27Ac_rep1_1_hg19_bt2.bwt2glob.bam
├── 24_GM_HiChIP_H3K27Ac_rep1_1_hg19_bt2.bwt2glob.unmap.fastq
├── 24_GM_HiChIP_H3K27Ac_rep1_2_hg19_bt2.bwt2glob.bam
├── 24_GM_HiChIP_H3K27Ac_rep1_2_hg19_bt2.bwt2glob.unmap.fastq
├── 25_GM_HiChIP_H3K27Ac_rep1_1_hg19_bt2.bwt2glob.bam
├── 25_GM_HiChIP_H3K27Ac_rep1_1_hg19_bt2.bwt2glob.unmap.fastq
├── 25_GM_HiChIP_H3K27Ac_rep1_2_hg19_bt2.bwt2glob.bam
├── 25_GM_HiChIP_H3K27Ac_rep1_2_hg19_bt2.bwt2glob.unmap.fastq
├── 26_GM_HiChIP_H3K27Ac_rep1_1_hg19_bt2.bwt2glob.bam
├── 26_GM_HiChIP_H3K27Ac_rep1_1_hg19_bt2.bwt2glob.unmap.fastq
├── 26_GM_HiChIP_H3K27Ac_rep1_2_hg19_bt2.bwt2glob.bam
├── 26_GM_HiChIP_H3K27Ac_rep1_2_hg19_bt2.bwt2glob.unmap.fastq
├── 27_GM_HiChIP_H3K27Ac_rep1_1_hg19_bt2.bwt2glob.bam
├── 27_GM_HiChIP_H3K27Ac_rep1_1_hg19_bt2.bwt2glob.unmap.fastq
├── 27_GM_HiChIP_H3K27Ac_rep1_2_hg19_bt2.bwt2glob.bam
├── 27_GM_HiChIP_H3K27Ac_rep1_2_hg19_bt2.bwt2glob.unmap.fastq
├── 28_GM_HiChIP_H3K27Ac_rep1_1_hg19_bt2.bwt2glob.bam
├── 28_GM_HiChIP_H3K27Ac_rep1_1_hg19_bt2.bwt2glob.unmap.fastq
├── 28_GM_HiChIP_H3K27Ac_rep1_2_hg19_bt2.bwt2glob.bam
├── 28_GM_HiChIP_H3K27Ac_rep1_2_hg19_bt2.bwt2glob.unmap.fastq
├── 29_GM_HiChIP_H3K27Ac_rep1_1_hg19_bt2.bwt2glob.bam
├── 29_GM_HiChIP_H3K27Ac_rep1_1_hg19_bt2.bwt2glob.unmap.fastq
├── 29_GM_HiChIP_H3K27Ac_rep1_2_hg19_bt2.bwt2glob.bam
├── 29_GM_HiChIP_H3K27Ac_rep1_2_hg19_bt2.bwt2glob.unmap.fastq
├── 30_GM_HiChIP_H3K27Ac_rep1_1_hg19_bt2.bwt2glob.bam
├── 30_GM_HiChIP_H3K27Ac_rep1_1_hg19_bt2.bwt2glob.unmap.fastq
├── 30_GM_HiChIP_H3K27Ac_rep1_2_hg19_bt2.bwt2glob.bam
├── 30_GM_HiChIP_H3K27Ac_rep1_2_hg19_bt2.bwt2glob.unmap.fastq
├── 31_GM_HiChIP_H3K27Ac_rep1_1_hg19_bt2.bwt2glob.bam
├── 31_GM_HiChIP_H3K27Ac_rep1_1_hg19_bt2.bwt2glob.unmap.fastq
├── 31_GM_HiChIP_H3K27Ac_rep1_2_hg19_bt2.bwt2glob.bam
├── 31_GM_HiChIP_H3K27Ac_rep1_2_hg19_bt2.bwt2glob.unmap.fastq
├── 32_GM_HiChIP_H3K27Ac_rep1_1_hg19_bt2.bwt2glob.bam
├── 32_GM_HiChIP_H3K27Ac_rep1_1_hg19_bt2.bwt2glob.unmap.fastq
├── 32_GM_HiChIP_H3K27Ac_rep1_2_hg19_bt2.bwt2glob.bam
├── 32_GM_HiChIP_H3K27Ac_rep1_2_hg19_bt2.bwt2glob.unmap.fastq
├── 33_GM_HiChIP_H3K27Ac_rep1_1_hg19_bt2.bwt2glob.bam
├── 33_GM_HiChIP_H3K27Ac_rep1_1_hg19_bt2.bwt2glob.unmap.fastq
├── 33_GM_HiChIP_H3K27Ac_rep1_2_hg19_bt2.bwt2glob.bam
└── 33_GM_HiChIP_H3K27Ac_rep1_2_hg19_bt2.bwt2glob.unmap.fastq

with some of those files being empty. Is this what you had in mind?

@nservant
Copy link
Owner

nservant commented Feb 8, 2021

Hi
The input data looks ok, but indeed, you shouldn't have empty file in the end.
Anything error in the log folder ?
Do you have a bwt2 folder ?

@yamzaleg
Copy link
Author

yamzaleg commented Feb 8, 2021

Hi,
I do have a logs folder, but I don't see any errors (there are many files in there for each replicate directory). Should I look for one in particular?
I also see the bwt2 folder and it looks like there are no empty files there, but for some reason one of my replicates for one of the samples doesn't have all the split files.

In MyLa_rep2 there should be split files from 0-21, but only 10-21 are in the btw2 folder.

@yamzaleg
Copy link
Author

yamzaleg commented Feb 8, 2021

So I checked the logs for that replicate looking at one of the chunks that were missing and this was the error in the file

##HiC-Pro mapping
Error reading block of _offs[] array: 8188, 716196308Error Reading File!
Error: Encountered internal Bowtie 2 exception (#1)
Command: /home1/amzaleg/new/amzaleg/bin/bowtie2-align-s --wrapper basic-0 --very-sensitive -L 30 --score-min L,-0.6,-0.2 --end-to-end --reorder --rg-id BMG --rg SM:00_MyLa_HiChIP_H3K27Ac_rep2_2 -p 5 -x /home1/amzaleg/new/amzaleg/chipseq/hg19/hg19_bt2//hg19_bt2 --passthrough -U rawdata/MyLa_rep2/00_MyLa_HiChIP_H3K27Ac_rep2_2.fastq
(ERR): bowtie2-align exited with value 1

@nservant
Copy link
Owner

nservant commented Feb 8, 2021

never seen that before ! it seems to be an internal bowtie2 error ...

@yamzaleg
Copy link
Author

yamzaleg commented Feb 8, 2021

That's weird as it didn't happen to any of my other samples (or even all the chunks for that replicate)! I'll do some investigating. Thank you for your help!

@nservant
Copy link
Owner

nservant commented Feb 8, 2021

http://seqanswers.com/forums/showthread.php?t=5318

"These types of errors occur when the files are genuinely either corrupt or incomplete (e.g. if the disk becomes exhausted during the index-building process). Can you send detailed output from one example where this happens, including a 'ls -l' on the index files after bowtie-build completes?"

@nservant
Copy link
Owner

nservant commented Feb 8, 2021

Maybe you can check if the files which crashed are complete (reads/qualities) or try to realign one of them manually, to see if you can reproduce the error

@ProfH2SO4
Copy link

I know that's a bit unclear but actually bowtie2 indexes are detected by a concatenation of BOWTIE2_INDEX_PATH and REFERENCE_GENOME ... So in your case, please use ;

BOWTIE2_IDX_PATH = ~/new/amzaleg/chipseq/hg19/hg19_bt2/
REFERENCE_GENOME = hg19_bt2

It works! Thank you very much :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants