Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

methylpy add-methylation-level issue #53

Closed
coralzhang opened this issue Apr 7, 2020 · 15 comments
Closed

methylpy add-methylation-level issue #53

coralzhang opened this issue Apr 7, 2020 · 15 comments

Comments

@coralzhang
Copy link

coralzhang commented Apr 7, 2020

I have a similar issue to #48
But my output tsv file has only the bed file
This is the test bed file
chromosome start end
2 1 40001
2 40001 80001
2 80001 120001
2 120001 160001
This is the output I get
chromosome start end methylation_level_ACA methylation_level_ACB methylation_level_pCMT3-RNAiA methylation_level_pCMT3-RNAiB
2 1 40001
2 40001 80001
2 80001 120001
2 120001 160001

I checked that my files are using proper tab as delimiter and the allc files are not empty....
2 1647 - CAT 2 20 1
2 1649 + CGT 10 12 1
2 1650 - CGT 16 21 1
2 1653 + CCT 0 12 0

Here is the code
methylpy add-methylation-level
--input-tsv-file methylpy_wgenome.tsv
--output-file $outfile
--allc-files allc_ACA.tsv.gz allc_ACB.tsv.gz allc_pCMT3-RNAiA.tsv.gz allc_pCMT3-RNAiB.tsv.gz
--samples ACA ACB pCMT3-RNAiA pCMT3-RNAiB
--mc-type CHH
--num-procs 10

======
update- this is interesting. In the end I figured that it is because the allc files are not sorted. Shouldn't this be sorted automatically? This is my pair end pipeline code, did I miss anything???

methylpy paired-end-pipeline
--read1-files ./_1.fq.gz --read2-files ./_2.fq.gz --sample ${folderall[$indi]}
--merge-by-max-mapq True
--binom-test True
--unmethylated-control chloroplast
--min-cov 3
--forward-ref $fref --reverse-ref $rref --ref-fasta $reff
--num-procs 20 --sort-mem 35000000000
--path-to-output $directout
--remove-clonal True
--path-to-picard="/home/softwares/picard/"
--aligner-options "-p 8"
--trim-read False
1>$stdoutfile 2>$errfile

@yupenghe
Copy link
Owner

yupenghe commented Apr 9, 2020

It is weird. Allc file should be sorted. I can look into this if you can share some files to reproduce the issue. Also, it would be helpful to check whether the bam file created is sorted.

@coralzhang
Copy link
Author

coralzhang commented Apr 10, 2020

Thanks for the response.
Looks like the "*processed_reads_no_clonal.bam " file is properly sorted.
What kind of data can I provide you?

@yupenghe
Copy link
Owner

It will be great if you can provide the processed_reads_no_clonal.bam file that was used to generate the unsorted allc files.

@coralzhang
Copy link
Author

How about this?

subSample.bam.zip

@yupenghe
Copy link
Owner

Thanks. Were you able to produce unsorted allc file from this bam file? Can you also point me to the genome (FASTA)?

@coralzhang
Copy link
Author

coralzhang commented Apr 22, 2020

Were you able to produce unsorted allc file from this bam file?
Yes. I simply used all the example code on the webpage.
The genome is downloaded from here
ftp://ftp.ensemblgenomes.org/pub/plants/release-46/fasta/solanum_lycopersicum/dna/Solanum_lycopersicum.SL3.0.dna.toplevel.fa.gz

@yupenghe
Copy link
Owner

I don't think so. The allc file I got from the bam file is sorted. Here is the command I used

methylpy call-methylation-state --input-file subSample.bam --sample test --ref-fasta Solanum_lycopersicum.SL3.0.dna.toplevel.fa --paired-end True

@coralzhang
Copy link
Author

That is good to know. Here is what I used. Is there anything wrong that I did not realize with this?
I looped each sample for paried end pipeline.

methylpy paired-end-pipeline
--read1-files ./_1.fq.gz --read2-files ./_2.fq.gz --sample ${folder[$indi]}
--merge-by-max-mapq True
--binom-test True
--unmethylated-control chloroplast
--min-cov 3
--forward-ref $fref --reverse-ref $rref --ref-fasta $reff
--num-procs 20 --sort-mem 35000000000
--path-to-output $directout
--remove-clonal True
--path-to-picard="/softwares/picard/"
--aligner-options "-p 8"
--trim-read False
1>$stdoutfile 2>$errfile

And used this function add-methylation-level

methylpy add-methylation-level
--input-tsv-file methylpy_wgenome.tsv
--output-file $outfile
--allc-files allc_ACA.tsv.gz allc_ACB.tsv.gz
--samples ACA-st ACB-st
--mc-type $mctype
--num-procs 4
1>$stdoutfile1 2>$errfile1

Then I do can call DMR on the allc file properly.

methylpy DMRfind
--allc-files allc_ACA.tsv.gz allc_ACB.tsv.gz allc_pCMT3-RNAiA.tsv.gz allc_pCMT3-RNAiB.tsv.gz
--samples ACA ACB pCMT3-RNAiA pCMT3-RNAiB
--mc-type $mctype
--sample-category wild wild RNAi RNAi
--min-cluster 2
--chroms 1 2 3 4 5 6 7 8 9 10 11 12
--num-procs 100
--dmr-max-dist 1000
--sig-cutoff 0.05
--output-prefix $prefixname1
1>$stdoutfile1 2>$errfile1

@yupenghe
Copy link
Owner

The commands look good to me. Did you get unsorted allc files and end up with some error in the DMR finding step?

@coralzhang
Copy link
Author

Right. I got unsorted allc files. I could do DMR finding. But I got errors when I try to add methylation levels using the allc files. In the end, I just unzipped the allc file, sorted them, and then feed the files back to add methylation level, it worked. Anyway, as long as I did not get any errors or warnings with the DMR finding. That should be fine, correct?

@yupenghe
Copy link
Owner

That is weird. I would recommend rerunning DMR finding on the sorted allc files to make sure things are working fine.

@coralzhang
Copy link
Author

I have a follow-up question- what does it mean if my output has NA?

@yupenghe
Copy link
Owner

It means that there is no reads to estimate methylation level in that region in that sample.

@coralzhang
Copy link
Author

coralzhang commented Jan 8, 2021

Hi
I went back and checked the output of this and it says this

samtools sort: couldn't allocate memory for bam_mem
[bam_sort_core] merging from 79 files and 1 in-memory blocks...
INFO 2020-09-21 20:19:00 MarkDuplicates

I guess my bam and allc file not sorted has to do with my settings of the number of processors and the sort memory being nonproportional to each other?
I am attaching the whole output here. Should I consider rerun?
https://www.dropbox.com/s/2vbkmiexmqqdzzc/ACA.log?dl=0

@yupenghe
Copy link
Owner

yupenghe commented Jan 9, 2021

Ah, that is good to know. You may want to drop --sort-mem 35000000000 option or replace it with something like --sort-mem 1G.

You don't have to rerun from scratch. Manually doing samtools sort to get sorted bam file and then regenerating allc files using methylpy call-methylation-state will do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants