Cannot start evaluation after running Train_MambaBCD.py #37

lin-ovvo1111 · 2024-06-12T07:09:09Z

在500个iter过后，开始starting evaluation，然后就卡在那里不动了，源代码中似乎是用test测试集进行评估的，不知道哪里出现问题了。

ChenHongruixuan · 2024-06-12T20:16:18Z

Hi, thank you for your question. Did you solve your issue? Evaluation stage takes time. If that still doesn't work, you can try lowering the batch size.

NUAAZJY · 2024-06-15T10:16:49Z

类似的问题，训练一开始是正常的，但是在首次starting evaluation时报cuda内存错误，调整batchsize没有效果。看起来不像是GPU内存不足的问题，请问作者这种情况是不是需要增加一些关于cuda内存管理的代码。

CUDA out of memory. Tried to allocate 24.00 MiB (GPU 0; 11.76 GiB total capacity; 10.47 GiB already allocated; 33.62 MiB free; 10.49 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

XiaoBaiHhy · 2024-06-16T13:53:20Z

类似的问题，训练一开始是正常的，但是在首次starting evaluation时报cuda内存错误，调整batchsize没有效果。看起来不像是GPU内存不足的问题，请问作者这种情况是不是需要增加一些关于cuda内存管理的代码。

CUDA out of memory. Tried to allocate 24.00 MiB (GPU 0; 11.76 GiB total capacity; 10.47 GiB already allocated; 33.62 MiB free; 10.49 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

请问这个问题你解决了吗？我也遇到这个问题了

NUAAZJY · 2024-06-16T15:15:28Z

类似的问题，训练一开始是正常的，但是在首次starting evaluation时报cuda内存错误，调整batchsize没有效果。看起来不像是GPU内存不足的问题，请问作者这种情况是不是需要增加一些关于cuda内存管理的代码。

CUDA out of memory. Tried to allocate 24.00 MiB (GPU 0; 11.76 GiB total capacity; 10.47 GiB already allocated; 33.62 MiB free; 10.49 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

请问这个问题你解决了吗？我也遇到这个问题了

还没有，加了清理128MB的碎片设置、还试了tiny版的模型也是一样

ChenHongruixuan · 2024-06-17T10:04:02Z

Hi guys,

Thank you so much for your question. May I ask which dataset are you running?

Best,

NUAAZJY · 2024-06-17T10:06:36Z

levircd and SYSU

…

---原始邮件--- 发件人: "Sapere ***@***.***> 发送时间: 2024年6月17日(周一) 晚上6:04 收件人: ***@***.***>; 抄送: ***@***.******@***.***>; 主题: Re: [ChenHongruixuan/MambaCD] Train_MambaBCD.py运行后无法进行starting evaluation (Issue #37) Hi guys, Thank you so much for your question. May I ask which dataset are you running? Best, — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: ***@***.***>

ChenHongruixuan · 2024-06-17T10:10:16Z

Are you running into this issue on both datasets, or just LEVIR-CD+?

NUAAZJY · 2024-06-17T10:21:58Z

Are you running into this issue on both datasets, or just LEVIR-CD+?

both datasets

ChenHongruixuan · 2024-06-17T20:37:17Z

Hi,

That's quite weird. For the LEVIR-CD+ dataset, since the image size in it is 1024x1024, the problem may occur. Thus, you may need to crop it into smaller size by yourself. But evalution on the SYSU dataset should not have that problem. We have updated the code, please try to train again with the current version of the code.

Best,

NUAAZJY · 2024-06-17T23:20:07Z

Thank you for your help and hard work！ I'll try the new code. And good luck for your paper.

…

---原始邮件--- 发件人: "Sapere ***@***.***> 发送时间: 2024年6月18日(周二) 凌晨4:37 收件人: ***@***.***>; 抄送: ***@***.******@***.***>; 主题: Re: [ChenHongruixuan/MambaCD] Train_MambaBCD.py运行后无法进行starting evaluation (Issue #37) Hi, That's quite weird. For the LEVIR-CD+ dataset, since the image size in it is 1024x1024, the problem may occur. Thus, you may need to crop it into smaller size by yourself. But evalution on the SYSU dataset should not have that problem. We have updated the code, please try to train again with the current version of the code. Best, — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: ***@***.***>

NUAAZJY · 2024-06-18T12:29:36Z

Hi,

That's quite weird. For the LEVIR-CD+ dataset, since the image size in it is 1024x1024, the problem may occur. Thus, you may need to crop it into smaller size by yourself. But evalution on the SYSU dataset should not have that problem. We have updated the code, please try to train again with the current version of the code.

Best,

你好，非常感谢你的帮助，我已经从成功在SYSU数据集上复现了BCD代码，且精度与论文相符，LEVIR-CD数据集我会裁剪成256*256版本后再去尝试。
但是在复现过程中我发现您的代码中可能存在两处小问题：
1、train_MambaCD.py中，作者可能是出于速度考虑，把第149-156行代码放在了for循环外，但是会使得Evaluation过程看起来像卡住了一样（实际上只是Evaluation过程比较慢）；
2、infer_MambaCD.py中,第69-70行代码中的feature_map_saved_path参数没有定义，会导致infer程序报错。并且feature_map_saved_path参数似乎并未用到，删除69-70行代码后可正常运行。

ChenHongruixuan · 2024-06-18T14:06:40Z

Hi,

你好，非常感谢你的帮助，我已经从成功在SYSU数据集上复现了BCD代码，且精度与论文相符，LEVIR-CD数据集我会裁剪成256*256版本后再去尝试。

Glad to hear that!

train_MambaCD.py中，作者可能是出于速度考虑，把第149-156行代码放在了for循环外，但是会使得Evaluation过程看起来像卡住了一样（实际上只是Evaluation过程比较慢）；

The evaluation code is placed on the outside to get the final accuracy. To increase the speed of evaluation, you need to increase eval_batch_size. the current setting is 1.

infer_MambaCD.py中,第69-70行代码中的feature_map_saved_path参数没有定义，会导致infer程序报错。并且feature_map_saved_path参数似乎并未用到，删除69-70行代码后可正常运行。

Thank you for pointing this out. We will fix this error soon.

Best,

ChenHongruixuan pinned this issue Jun 18, 2024

ChenHongruixuan closed this as completed Jun 18, 2024

ChenHongruixuan changed the title ~~Train_MambaBCD.py运行后无法进行starting evaluation~~ Cannot start evaluation after running Train_MambaBCD.py Sep 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot start evaluation after running Train_MambaBCD.py #37

Cannot start evaluation after running Train_MambaBCD.py #37

lin-ovvo1111 commented Jun 12, 2024

ChenHongruixuan commented Jun 12, 2024

NUAAZJY commented Jun 15, 2024

XiaoBaiHhy commented Jun 16, 2024

类似的问题，训练一开始是正常的，但是在首次starting evaluation时报cuda内存错误，调整batchsize没有效果。看起来不像是GPU内存不足的问题，请问作者这种情况是不是需要增加一些关于cuda内存管理的代码。

NUAAZJY commented Jun 16, 2024

类似的问题，训练一开始是正常的，但是在首次starting evaluation时报cuda内存错误，调整batchsize没有效果。看起来不像是GPU内存不足的问题，请问作者这种情况是不是需要增加一些关于cuda内存管理的代码。

ChenHongruixuan commented Jun 17, 2024

NUAAZJY commented Jun 17, 2024 via email

ChenHongruixuan commented Jun 17, 2024

NUAAZJY commented Jun 17, 2024

ChenHongruixuan commented Jun 17, 2024

NUAAZJY commented Jun 17, 2024 via email

NUAAZJY commented Jun 18, 2024

ChenHongruixuan commented Jun 18, 2024 •

edited

Loading

Cannot start evaluation after running Train_MambaBCD.py #37

Cannot start evaluation after running Train_MambaBCD.py #37

Comments

lin-ovvo1111 commented Jun 12, 2024

ChenHongruixuan commented Jun 12, 2024

NUAAZJY commented Jun 15, 2024

类似的问题，训练一开始是正常的，但是在首次starting evaluation时报cuda内存错误，调整batchsize没有效果。看起来不像是GPU内存不足的问题，请问作者这种情况是不是需要增加一些关于cuda内存管理的代码。

XiaoBaiHhy commented Jun 16, 2024

类似的问题，训练一开始是正常的，但是在首次starting evaluation时报cuda内存错误，调整batchsize没有效果。看起来不像是GPU内存不足的问题，请问作者这种情况是不是需要增加一些关于cuda内存管理的代码。

NUAAZJY commented Jun 16, 2024

类似的问题，训练一开始是正常的，但是在首次starting evaluation时报cuda内存错误，调整batchsize没有效果。看起来不像是GPU内存不足的问题，请问作者这种情况是不是需要增加一些关于cuda内存管理的代码。

ChenHongruixuan commented Jun 17, 2024

NUAAZJY commented Jun 17, 2024 via email

ChenHongruixuan commented Jun 17, 2024

NUAAZJY commented Jun 17, 2024

ChenHongruixuan commented Jun 17, 2024

NUAAZJY commented Jun 17, 2024 via email

NUAAZJY commented Jun 18, 2024

ChenHongruixuan commented Jun 18, 2024 • edited Loading

ChenHongruixuan commented Jun 18, 2024 •

edited

Loading