Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BSmooth error #104

Open
priyatamapandey opened this issue Sep 29, 2021 · 24 comments
Open

BSmooth error #104

priyatamapandey opened this issue Sep 29, 2021 · 24 comments

Comments

@priyatamapandey
Copy link

priyatamapandey commented Sep 29, 2021

Hi,
I am trying to use BSmooth function on the 16 WGBS samples and it showed the following error.

Screen Shot 2021-09-29 at 2 23 40 PM

Please suggest me what I need to correct?
I used the following code for reading my 16 coverage files and then performing the BSmooth.

bismarkBSseq <- read.bismark(files = file.list,
                               colData = DataFrame(row.names = sampleName),
                                BPPARAM = bpparam,
                               rmZeroCov = FALSE,
                               strandCollapse = FALSE,
                               verbose = TRUE)

pData(bismarkBSseq) <- sample_covariate

bismarkBSseq.fit <- BSmooth(
    BSseq = bismarkBSseq, 
    BPPARAM = MulticoreParam(workers = 8), 
    verbose = TRUE)

Thank you so much for your help.
Priya

@PeteHaitch
Copy link
Contributor

Please share the output of BiocManager::valid()

@kasperdanielhansen
Copy link
Contributor

kasperdanielhansen commented Sep 30, 2021 via email

@priyatamapandey
Copy link
Author

Hi Pete and Hansen,
Thanks so much for your reply. Output of BiocManager::valid() showed TRUE.
I am working on human WGBS samples. I ran for chr22 only and it worked with warning (attached).

Screen Shot 2021-09-29 at 7 18 42 PM

Please suggest.
Thank you,
Priya

@kasperdanielhansen
Copy link
Contributor

kasperdanielhansen commented Sep 30, 2021 via email

@PeteHaitch
Copy link
Contributor

PeteHaitch commented Sep 30, 2021

Also, please show the full output of BiocManager::valid(), which includes useful information like which version of R and Bioconductor you are using.

Sorry, it only returns that info that if it returns FALSE.
Instead, please share the output of sessionInfo() and BiocManager::version()

@priyatamapandey
Copy link
Author

Hi Kasper,

  1. Screen Shot 2021-09-30 at 9 36 50 AM
  2. I am new in this project and I don't know why I have 65MCpGs? According to the literature, human WGBS data usually have 28millions CpGs so your question is why I have more CpGs or less?
    Thank you,
    Priya

@priyatamapandey
Copy link
Author

Hi Pete,
When I ran again it returns the session info.

Screen Shot 2021-09-30 at 9 27 48 AM

Screen Shot 2021-09-30 at 9 25 33 AM

Thank you,
Priya

@PeteHaitch
Copy link
Contributor

Thanks, @priyatamapandey. Everything looks up to date, which makes debugging easier :)

Regarding the 65M CpGs, are your CpGs stranded or have they been aggregated across strands?
The easiest way to determine this is to look at rowData(BSseq) and check:

  1. the strand column and
  2. if there are adjacent CpGs, e.g., chr1:4 and chr1:5 would likely refer to the same CpG but the first one on the + strand and the second on the - strand.

If you share the output of running rowData() on your object we might be able to help debug.

We typically aggregate across strands before smoothing CpG methylation, and bsseq::combineList() should be able to help with this (if needed).

Unfortunately, we've also experienced sporadic failures when smoothing large BSseq objects that can lead to these obscure error message, particularly when using parallelisation on a shared computing environment (e.g., a university cluster).
One option is to smooth each chromosome separately and then rbind() the results.
Smoothing is independent across chromosomes (and samples), so you might manually split up the object into smaller subsets and smooth and then rbind() and/or cbind() the results back together (I'd probably advise against subsetting within a chromosome because that might give slightly different results to smoothing the entire chromosome).
It's a bit clunky but should work and allows you to re-smooth particular subsets if you experience one of these sporadic failures

@priyatamapandey
Copy link
Author

Hi Pete,

Thank you for all the suggestions. So, do you think I should not use multicore option even for the chromosome wise smoothing. It have split up the object with using 3 cores/parallel and it failed with the following error.

Screen Shot 2021-10-01 at 11 37 49 AM

Then, I have tried chromosome2 without using multiple core.
Here is my command.

bismarkBSseqOrdered2 <- orderBSseq(bismarkBSseq, seqOrder = "chr2" )

 bismarkBSseq.chr2 <- BSmooth(
    BSseq = bismarkBSseqOrdered2, 
    BPPARAM = SerialParam(progressbar = TRUE),
    verbose = TRUE)

It took few hours to complete and at the end it retuned error. Although for chr1 the above code worked after taking long hours. I am running on my iMAC for now which has 32gb memory and 8core. I do have 100 samples to process once I get successful, then I would probably use institute cluster to perform.

Here is the error for chr2 smoothing.

Screen Shot 2021-10-01 at 11 04 05 AM

I have also ran rowData() on the enitre BSseq object and on chr1 and chr2 BSseq object. Since it did not return any column information. I am attaching the BSseq object jist here. It might help to understand the stand information.

Screen Shot 2021-10-01 at 11 12 04 AM

This is the large BSseq object

Screen Shot 2021-10-01 at 11 14 40 AM

Thank you so much for your help,
Priya

@kasperdanielhansen
Copy link
Contributor

kasperdanielhansen commented Oct 1, 2021 via email

@priyatamapandey
Copy link
Author

Hi Kasper,
Ok. I will change that. I found that example in read.bismark R documentation so thought to start with.
Also, I checked my coverage file and found that I do have many more chromosome than Chr1 to chr22 and X and Y such as chr19_KI270883v1_alt, chr19_KI270890v1_alt, chr19_KI270919v1_alt, chrUn_KI270334v1, chrUn_KI270373v1, chrUn_KI270376v1.
I think that is causing many addition CpGs. I want to consider only chr1 to chr22 and chrX and chrY while reading files in the read.bismark but do not understand how to set that condition? Is there Loci parameter contain that option in read.bismark function?

Thank you,
Priya

@kasperdanielhansen
Copy link
Contributor

kasperdanielhansen commented Oct 2, 2021 via email

@PeteHaitch
Copy link
Contributor

I want to consider only chr1 to chr22 and chrX and chrY while reading files in the read.bismark but do not understand how to set that condition? Is there Loci parameter contain that option in read.bismark function?

There's a section 'Specifying loci' in the documentation for read.bismark() (see help("read.bismark", package = "bsseq")), so you can do it that way or by subsetting your BSseq object to only the chromosomes you're interested in.

@priyatamapandey
Copy link
Author

Happy New Year Dr. Hansen and Pete,

I have run bsmooth on the default setting and also want to customize the setting to see, whether different settings are improving our results but most of the setting getting failed at some point. I have divided my data chromosome wise and some of the setting is getting fail for chromosome 1, some for chromosome 2 etc.

Below is the setting list which I have already tried

Screen Shot 2022-01-05 at 3 43 01 PM

Here is the error returned

Screen Shot 2022-01-06 at 12 51 14 PM

Please suggest how to fix this issue. Any idea why it is throwing error?

Thank you so much for your help,
Priya

@PeteHaitch
Copy link
Contributor

It's very hard to know and I can only refer you to our previous comments.
If the data are (strand-aggregated) CpGs from human samples then I think the default smoothing parameters are a sensible choice.

@priyatamapandey
Copy link
Author

Thank you for your suggestion,

I am working on the DMRs now and want to understand the cutoff options. I don't understand how to choose cutoff vector? I calculated t-statistics and generated the following figure. I don't see much difference in corrected and uncorrected in figure.

Screen Shot 2022-01-24 at 8 26 47 PM

Screen Shot 2022-01-24 at 8 27 02 PM

Screen Shot 2022-01-24 at 8 32 09 PM

Please suggest me how to select cutoff while checking the t-statistics?
Thank you,
Priya

@kir1to455
Copy link

Hi,Priya
I also meet same problems about the parameters (ns=70,h=1000,maxGap=10^8) when I using Bsmooth . I found some chromosomes succeed but some chromosomes failed.
image
If you have solved this problems, can you give some advice?
Thank you in advance,
Kirito

@priyatamapandey
Copy link
Author

HI Kirito,
At default setting, BSmooth worked. I wanted to try BSmooth with different settings as my data is not cancer data but it did not work for the another setting mostly failed at chromosome 1. I would suggest if you are running on the cluster/HPC, try to provide more memory and keep minimum workers.

Best,
Priya

@PeteHaitch
Copy link
Contributor

We've used the default settings for many different human (and possibly mouse) sample types, not just cancer data.
I think the recommendation would be not to change the default settings unless you have good reason.

@kir1to455
Copy link

We've used the default settings for many different human (and possibly mouse) sample types, not just cancer data. I think the recommendation would be not to change the default settings unless you have good reason.

Hi Pete, I also meet some problems about the parameters (ns=70,h=1000,maxGap=10^8) when I using Bsmooth . I found some chromosomes succeed but some chromosomes failed. For example , chr 1 failed, chrX succeed ,chr21 succeed.
chr1
image
image
chrX
image
chr21
image
Also,the results with warning messages.
Would you give some advice?
Best wishes,
Kirito

@PeteHaitch
Copy link
Contributor

It's very hard without a reproducible example and further details.
What species is your sample?
What methylation context (e.g., CpGs)?
What computing resources are you running on?

@kir1to455
Copy link

It's very hard without a reproducible example and further details. What species is your sample? What methylation context (e.g., CpGs)? What computing resources are you running on?

Species is human, and context is GpC.I running it in my PC, and with 3.6GHz 6core processor.

@PeteHaitch
Copy link
Contributor

Is it really GpC (e.g., a NOMe-Seq-like protocol) and not CpG methylation (e.g., a standard bisulfite-seq-like protocol)?

bsseq was initially designed for analysing CpG methylation data from bisulfite-sequencing protocols.
This is what the default parameters to BSmooth() are aimed at (in particular, for human+mouse-like genomes).
More recently, we have used bsseq to analyse CpH (e.g., CpA) methylation data from bisulfite-sequencing protocols in human samples, but with different smoothing parameters; importantly, getting satisfactory smoothing results was a lot of trial and error.

As far as I'm aware, neither @kasperdanielhansen or I have used it to analyse GpC methylation from NOMe-Seq-like data.
And I very much doubt the default parameters would work out of the box since the distribution of GpCs along the genome is very different to the distribution of CpGs, as illustrated below:

> suppressPackageStartupMessages(library(bsseq))
> suppressPackageStartupMessages(library(BSgenome.Hsapiens.UCSC.hg38))
> # Identify all GpC and CpG dinucleotides on forward strand of chr1 in GRCh38
> gpc_chr1 <- findLoci("GC", BSgenome.Hsapiens.UCSC.hg38, "chr1", strand = "+")
> cpg_chr1 <- findLoci("CG", BSgenome.Hsapiens.UCSC.hg38, "chr1", strand = "+")
> # How many are there?
> length(gpc_chr1)
[1] 10145272
> length(cpg_chr1)
[1] 2375159
> # What is the distance between adjacent loci?
> par(mfrow = c(1, 2))
> plot(density(log10(start(gpc_chr1)[-1] - start(gpc_chr1)[-length(gpc_chr1)])), "GpC")
> plot(density(log10(start(cpg_chr1)[-1] - start(cpg_chr1)[-length(cpg_chr1)])), "CpG")

image

As bsseq is not designed for analysing GpC data and we have not tried this ourselves, we cannot really offer more substantial support.
bsseq may be suitable, but my guess it's going to take some substantial research effort to identify appropriate parameters.

@kasperdanielhansen
Copy link
Contributor

kasperdanielhansen commented Feb 10, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants