Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About the training speed on SemanticKITTI #6

Open
zzh-ecnu opened this issue Aug 2, 2021 · 3 comments
Open

About the training speed on SemanticKITTI #6

zzh-ecnu opened this issue Aug 2, 2021 · 3 comments

Comments

@zzh-ecnu
Copy link

zzh-ecnu commented Aug 2, 2021

Hello, i implement your code on SemanticKITTI dataset, and keep the default config setting.

However, i have cost 3 days to train the network, and the current epoch is 30. The max_epoch is 100. Following this setting, i will cost 10 days to train the whole pipeline.

The GPU i used is Tesla-V100 16G.

The process is too slow to train, so any suggestions?

Thanks sincerely~

@edwardzhou130
Copy link
Owner

Sorry for the late reply. Here are some tips to speed up the training:

  • In my experiment, the model needs only around 20/20(before and after SAP starts) epochs to reach the highest validation PQ. So probably you can use a smaller epoch number, like max 50 epoch and starting SAP at the 20th epoch).
  • Another suggestion is to decrease the grid size. Grid size [320,240,32] has 1.5% PQ loss compared to the default grid size [480,360,32] while the training is much faster.
  • The validation in training takes a lot of time. You can increase the interval of the validation at the beginning of the training to save some time.

@zzh-ecnu
Copy link
Author

zzh-ecnu commented Aug 16, 2021

Thanks for your carefully reply~

Recently, i try to introduce pytorch multi-gpu support into this base, however i meet a problem about the dimension inconsistent.

According to the error infomation, it looks like the input visibilty feature is inconsistent with the output point feature of the network.

Currently i have no idea about this error, could you have any suggestions, thanks sincerely~

@Carl12138aka
Copy link

Sorry for the late reply. Here are some tips to speed up the training:

  • In my experiment, the model needs only around 20/20(before and after SAP starts) epochs to reach the highest validation PQ. So probably you can use a smaller epoch number, like max 50 epoch and starting SAP at the 20th epoch).
  • Another suggestion is to decrease the grid size. Grid size [320,240,32] has 1.5% PQ loss compared to the default grid size [480,360,32] while the training is much faster.
  • The validation in training takes a lot of time. You can increase the interval of the validation at the beginning of the training to save some time.

Could you please tell me how to increase the interval of the validation,thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants