-
Notifications
You must be signed in to change notification settings - Fork 123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RuntimeError: CUDA error: an illegal memory access was encountered #209
Comments
same here! |
Hi - I'm also getting this error:
This happens reproducibly in processing the run suggesting that it is a specific read or file. Is there any way to log this better to work out the issue? |
@mattloose @Flower9618 @thackl Would it be possible to share data and command to reproduce this issue? As the error suggests, could you also run with |
Hello! Sure - the command was:
I can also run with the suggested flag. I can share the data but will do that through a different channel! Cheers Matt |
Yes please do - Mike dot Vella at Nanoporetech dot com |
Just an update - running with CUDA_LAUNCH_BLOCKING=1 doesn't appear to help much: Traceback (most recent call last): |
One comment is that this is running on an HPC in a shared GPU environment although this should have a single card assigned to it. |
Does look like a read issue though - the error is happening at exactly the same point each time: trial 1:
trial 2:
|
Okay, this looked like it was a scaling issue and I think 6e91a9d should sort it. |
@mattloose let us know if the scaling change has resolved your issue, in the meantime I'm investigating a lower-level fix (which should have prevented the illegal memory access in the first place). |
Can confirm that scaling change has resolved this issue... |
Fixed in v0.5.1. |
I spoke too soon. It is still crashing with this same warning message, but now on a later file: Traceback (most recent call last): I'm considering re-batching the files and running on smaller subsets. |
This could be a different error - could you send me the files and instructions to reproduce? (let's pick this up over email). |
Yep - will do. |
I had missed the same scaling fix on short read scaling path - resolved in 3187198. |
I've been having the same issue — I trained a model based on one of the pre-existing models and have been unable to successfully basecall all my data using this model - so far I've identified two separate fast5 files where I also receive the below error
I'm working with fungal genomes, but as with the above people the issue has been reoccuring at the same file each time. |
Hi, are you using the latest version of Bonito? We believe this problem was resolved.
…On Wed, 9 Mar 2022 at 00:56, teenjes ***@***.***> wrote:
I've been having the same issue — I trained a model based on one of the
pre-existing models and have been unable to successfully basecall all my
data using this model - so far I've identified two separate fast5 files
where I also receive the below error
Traceback (most recent call last):
File "/apps/python3/3.8.5/lib/python3.8/threading.py", line 932, in
_bootstrap_inner
self.run()
File
"/g/data/xf3/te4331/bonito-0.5.0/venv3/lib/python3.8/site-packages/ont_bonito_cuda113-0.5.0-py3.8.egg/bonito/multiprocessing.py",
line 110, in run
for item in self.iterator:
File
"/g/data/xf3/te4331/bonito-0.5.0/venv3/lib/python3.8/site-packages/ont_bonito_cuda113-0.5.0-py3.8.egg/bonito/crf/basecall.py",
line 69, in
(read, compute_scores(model, batch, reverse=reverse)) for read, batch in
batches
File
"/g/data/xf3/te4331/bonito-0.5.0/venv3/lib/python3.8/site-packages/ont_bonito_cuda113-0.5.0-py3.8.egg/bonito/crf/basecall.py",
line 35, in compute_scores
sequence, qstring, moves = beam_search(
File
"/g/data/xf3/te4331/bonito-0.5.0/venv3/lib/python3.8/site-packages/ont_koi-0.0.7-py3.8-linux-x86_64.egg/koi/decode.py",
line 58, in beam_search
moves = moves.data.reshape(N, -1).cpu()
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API
call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
I'm working with fungal genomes, but as with the above people the issue
has been reoccuring at the same file each time.
—
Reply to this email directly, view it on GitHub
<#209 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AALYB7N5P4KJLN4PNBJZTYLU67ZL3ANCNFSM5JY3RDBQ>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you were assigned.Message ID:
***@***.***>
|
I have guppy 6.0.1 and have pulled the latest updates of Bonito so should have the most recent version, however this does not appear to have solved the issue. |
I've pulled the latest version and even set the HEAD specifically to 3187198 but continue to run into this issue
|
Apologies this is taking a while to resolve - we are working on it and will keep this thread updated. |
@teenjes can you send me your model and data? |
I'm encountering the following problem with one of my runs. A different run finished successfully. The error reproducibly occurs for this run after 90605 reads, so there seems to be a specific issue related to the data. Any ideas?
The text was updated successfully, but these errors were encountered: