Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA Illegal memory access was encountered #1941

Closed
neurosynapse opened this issue Aug 19, 2022 · 8 comments
Closed

CUDA Illegal memory access was encountered #1941

neurosynapse opened this issue Aug 19, 2022 · 8 comments

Comments

@neurosynapse
Copy link

neurosynapse commented Aug 19, 2022

Hello,

Im trying test several different segmentation approaches on a custom dataset with three classes (background, object1, object2). In lot of cases (for example sem_fpn, vit) I get what(): CUDA error: an illegal memory access was encountered error. I have tried dataset with both reduce_zero_label=False and True with no changes. It would be nice if you could help me in this.

image

Best regards,
Roberts

@sainivedh19pt
Copy link

Hi @Franko9999 , I'm trying to do the same train with 3 classes (background, cls1, cls2) with reduce_zero_label-False and True In both cases the training output is very bad. Only the first class gets trained.

{"mode": "val", "epoch": 1300, "iter": 2, "lr": 0.00645, "aAcc": 0.6337, "mIoU": 0.2112, "mAcc": 0.3333, "IoU.background": 0.6337, "IoU.cat": 0.0, "IoU.dog": 0.0, "Acc.background": 1.0, "Acc.cat": 0.0, "Acc.dog": 0.0}

Not sure how to resolve this

@neurosynapse
Copy link
Author

Yes, the problem is I have tried lot of models ( at least 50 %) and the same problem persists for use cases where there is 2 or 3 class dataset. Only the first class gets trained or some weird errors appear. It would be nice if someone could like in to it. Is it possible?

Best regards,
Roberts

@xiexinch
Copy link
Collaborator

Hi, @Franko9999, @sainivedh19pt,
We would like to reproduce this error, if possible, please tell us what changes you have made to the code.

@xiexinch
Copy link
Collaborator

xiexinch commented Aug 23, 2022

There were similar issues before #270 and #1330.

@xiexinch xiexinch added awaiting response and removed WIP Work in process labels Aug 23, 2022
@neurosynapse
Copy link
Author

Hi,

Thank you I solved my problem. I had to change mask to labelled pixels as (0, num_classes-1). However, I experience the same problem as you, that is, only background class gets trained.

image

Best regards,
Roberts

@neurosynapse
Copy link
Author

Is there some way to concentrate only on training the non-background class (or apply some weight to this accuracy not the background one)?

Best regards,
Roberts

@neurosynapse
Copy link
Author

neurosynapse commented Aug 23, 2022

Hello,

Finally found the answer regarding model training. In my case in config/base/dataset configuration file I had given the dict(type='LoadAnnotations', reduce_zero_label=True). Reduce_zero_label should be False the same as in mmseg/dataset/ file you create for your dataset.

image

image

Best regards,
Roberts

@timothylimyl
Copy link

The weird thing about this error in this repo is that:

  1. It is a recent error for custom dataset training, previously I have not seen this.
  2. Sometimes training works but then the error will pop out during validation.
  3. Sometimes without any code changes, the training and validation will suddenly work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants