-
Notifications
You must be signed in to change notification settings - Fork 144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error Estimation Issues #1990
Comments
Could you share the reverse file that is throwing this error from |
Thank you for getting back to me so quickly! Unfortunately, the issue we're facing is that there doesn't seem to be a specific reverse file that's throwing the error. When I subset the dataset by filename into 10 subcategories and run those through the code, it always reaches the selfConsist step 1, which I assumed suggests that it's working since it never reaches that step when I run it with all the data. However, I can try letting the code finish running for the subcategories to see if it crashes somewhere midway through if you think that might be helpful? |
My goal is to have a small dataset that reproduces the error you are seeing. That helps identify where the error is arising on my side, since I'm not familiar with the error you are reporting. So any single sample that throws this error is great. |
I apologize for the long wait! I've been trying to identify specific files that might be causing the error to send you a sample, but I couldn't pinpoint any problematic files. The issue seems to arise only when I process the entire dataset. In my manual debugging efforts, I attempted leave-one-out testing with the files. Interestingly, excluding any single file made the process run smoothly. So, for our dataset of 881 samples, running the process with any random 880 files resolved the issue. This suggests the problem is likely due to the number of files rather than the content of any specific files. For now, I've decided to proceed with 880 files. The error learning gave me these plots (using option 4 from this GitHub thread about binned quality scores), and they look good to me. However, I would love to hear your input on this matter. Given this fix, is there anything you can think of that was causing the issue in error learning? Thank you for all your help! |
The error model plots look good to me.
No idea at all. That is baffling that removing one (random) sample would fix the |
Hello, I also have very similar issues with learnErrors step on the reverse reads. We are analyzing a small dataset (24 samples) of 16S rRNA gene V4 region. Our data are NovaSeq and we received them in a very good quality from the facility, with at least 40 000 reads per sample. I am using dada version ‘1.32.0’ and R 4.4.1. I am attaching Rmarkdown and the screen of an error. |
Hello,
We are having issues with the error estimation function in Dada2 on our reverse reads specifically. Whenever we run the code to learn errors (i.e. errR <- learnErrors(filtRs, multithread=TRUE, verbose = TRUE)), we get the following error message for the reverse reads:
Error rates could not be estimated (this is usually because of very few reads). Error in getErrors(err, enforce = TRUE) : Error matrix is NULL. Calls: learnErrors -> dada -> getErrors Execution halted
In response, we plotted the number of reads for each file and found that they all should have more than enough reads to run (the smallest number of reads was in the thousands). We're working with binned quality scores, which is not ideal but didn't pose a problem for the forward reads as they plotted without issue.
My initial thought was that there is a handful of files that might be causing this issue, so I ran the code for small subsections to narrow down where the issue is, but the code ran perfectly fine in this case. More specifically, I went by the last digit in the ID of our filenames, and ran it for all files ending in 0, then in 1, then 2 and so on up to 9, and saw no issues. However, when I decided to flip this method, where I leave out one group at a time (meaning I plotted everything except the files ending in 0), I got the strange error again.
For reference, these are an example of the forward and reverse files we're using in this code (converted to csv and truncated to be only 5000 lines):
SRR24062132_forward.csv
SRR24062132_reverse.csv
We're a bit stumped as to where we should go from here, since the files look similar enough from our perspective and the code we use is identical to the tutorial (except for a change int he truncLen parameter). We haven't been able to narrow down where the issue might be at all, so any help at all would be appreciated! Thank you!
The text was updated successfully, but these errors were encountered: