Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About the different between test/mIoU and test/mIoU_part #3

Closed
Sylva-Lin opened this issue Mar 29, 2024 · 12 comments
Closed

About the different between test/mIoU and test/mIoU_part #3

Sylva-Lin opened this issue Mar 29, 2024 · 12 comments

Comments

@Sylva-Lin
Copy link

Hi, thank you for your work! when I trained the model in ScanNet, I found two types of mIoU in the result. Are there any differences between them? And what type of mIoU is the result in your paper?

@Aristo23333
Copy link
Collaborator

Hi, thank you for your work! when I trained the model in ScanNet, I found two types of mIoU in the result. Are there any differences between them? And what type of mIoU is the result in your paper?

Hello, thank you for interested in our work, and indeed our results are calculated to test_mIoU and test_mIoU_part two kinds of results, the latter used in our paper. The main difference between them is that the latter takes into account unimportant mask information such as scene background, in fact, you can read the difference between them in the segmentation.py.

@Sylva-Lin
Copy link
Author

Hi, thank you for your work! when I trained the model in ScanNet, I found two types of mIoU in the result. Are there any differences between them? And what type of mIoU is the result in your paper?

Hello, thank you for interested in our work, and indeed our results are calculated to test_mIoU and test_mIoU_part two kinds of results, the latter used in our paper. The main difference between them is that the latter takes into account unimportant mask information such as scene background, in fact, you can read the difference between them in the segmentation.py.

Thank you for your reply, I trained Model in ScanNet for 700 Epoch, but the best test_mIoU_part was only 70.26%, which is much lower than the results of the paper with 74.6%. I used a single RTX 3090, other configs are the same as the default. Do you think any reason caused this result?

@Aristo23333
Copy link
Collaborator

Aristo23333 commented Mar 30, 2024 via email

@Sylva-Lin
Copy link
Author

It is important to point out that there are many uncertainties in the training of the Mamba network. Our training has tried under 3 blocks of 4090 and 2 blocks of 4090 in different environments, and there is also a certain error, but almost all of them are above 73%. For your results, I would suggest that you first try to make sure that the environment is as consistent as possible with the reference we provided, and secondly, try to compare multiple training sessions. You may be able to verify this with the eval function and the CPKT we provide.Thank you!
----- 原始邮件 ----- 发件人: "Zhenchao Lin" @.> 收件人: "IRMVLab/Point-Mamba" @.> 抄送: "Yu Rui Ji" @.>, "Comment" @.> 发送时间: 星期六, 2024年 3 月 30日 下午 5:46:05 主题: Re: [IRMVLab/Point-Mamba] About the different between test/mIoU and test/mIoU_part (Issue #3)

Hi, thank you for your work! when I trained the model in ScanNet, I found two types of mIoU in the result. Are there any differences between them? And what type of mIoU is the result in your paper? Hello, thank you for interested in our work, and indeed our results are calculated to test_mIoU and test_mIoU_part two kinds of results, the latter used in our paper. The main difference between them is that the latter takes into account unimportant mask information such as scene background, in fact, you can read the difference between them in the segmentation.py.
Thank you for your reply, I trained Model in ScanNet for 700 Epoch, but the best test_mIoU_part was only 70.26%, which is much lower than the results of the paper with 74.6%. I used a single RTX 3090, other configs are the same as the default. Do you think any reason caused this result?

-- Reply to this email directly or view it on GitHub: #3 (comment) You are receiving this because you commented. Message ID: @.***>

Hi, I tried to use multiple GPUs for training, but the following error emerged, did you meet this question? Could you tell me how to solve it? thank.
屏幕截图 2024-04-01 133645

@Sylva-Lin
Copy link
Author

As I use "find_unused_parameters: True" in your default config, the error is as follows:
屏幕截图 2024-04-01 143529

@Sylva-Lin
Copy link
Author

Sylva-Lin commented Apr 1, 2024 via email

@Aristo23333
Copy link
Collaborator

Aristo23333 commented Apr 2, 2024 via email

@Aristo23333
Copy link
Collaborator

Aristo23333 commented Apr 2, 2024 via email

@Sylva-Lin
Copy link
Author

I didn't change this parameter to True in my actual experiment.I think maybe you are experiencing a problem with unused parameters during training and want to use this check?I seem to have encountered a similar problem, but it doesn't affect the normal training, maybe you can keep it. In addition, this parameter is inherited from Octformer, and you can also refer to his code for comparative study.

I use the instructions of octformer to carry the environment, I can train octformer in parallel normally, but I strictly follow the environment configuration you provided, and I have not been able to train in parallel normally, I make sure that all the configurations are consistent with those you provide, but I just can't train in parallel normally

@Sylva-Lin
Copy link
Author

My system is Ubuntu 20.04 and Cuda uses 11.8 installed on the system

@USTCLH
Copy link

USTCLH commented Apr 6, 2024

I didn't change this parameter to True in my actual experiment.I think maybe you are experiencing a problem with unused parameters during training and want to use this check?I seem to have encountered a similar problem, but it doesn't affect the normal training, maybe you can keep it. In addition, this parameter is inherited from Octformer, and you can also refer to his code for comparative study.

I use the instructions of octformer to carry the environment, I can train octformer in parallel normally, but I strictly follow the environment configuration you provided, and I have not been able to train in parallel normally, I make sure that all the configurations are consistent with those you provide, but I just can't train in parallel normally

Hello, this may help you

@Sylva-Lin
Copy link
Author

I didn't change this parameter to True in my actual experiment.I think maybe you are experiencing a problem with unused parameters during training and want to use this check?I seem to have encountered a similar problem, but it doesn't affect the normal training, maybe you can keep it. In addition, this parameter is inherited from Octformer, and you can also refer to his code for comparative study.

I use the instructions of octformer to carry the environment, I can train octformer in parallel normally, but I strictly follow the environment configuration you provided, and I have not been able to train in parallel normally, I make sure that all the configurations are consistent with those you provide, but I just can't train in parallel normally

Hello, this may help you

Thank you, now I can train normally with multiple GPUs. How did you find out about changing this code? Can you share your experience?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants