Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training Speed #33

Open
jytime opened this issue Mar 9, 2024 · 5 comments
Open

Training Speed #33

jytime opened this issue Mar 9, 2024 · 5 comments

Comments

@jytime
Copy link
Contributor

jytime commented Mar 9, 2024

I happen to find the release training code seems to be super slow compared to the original (internal) implementation when training on 8GPUs. It seems the single GPU training does not suffer from this. Mark it here and delve later

@jytime
Copy link
Contributor Author

jytime commented Mar 9, 2024

This looks like because accelerate is not set up correctly and hence data loading is 10x slower. Put this issue here in case someone may meet the problem. The number of sec/it in the log indicates the time used for each training step. It should be within 1-3 seconds. If a training step takes more than this value, usually there is something wrong.

If someone meets this problem, the simplest solution may be to use pytorch's own distributed training and remove accelerate/accelerator in our training code.

@sungh66
Copy link

sungh66 commented Mar 11, 2024

Hi,i recently did the reproduction of this article. On 41 class of CO3d data, the highest racc_15 of the training set is 0.93, the tacc_15 is close to 0.8, and the speed is 0.8sec/it. Is this result normal?

@jytime
Copy link
Contributor Author

jytime commented Mar 11, 2024

Hi @sungh66 the result looks good. In my own logs, the tacc_15 during training should be slightly higher, close to 0.9. But it should be fine as long as the testing result is consistent, because the accuracy during training is highly affected by the degree of data augmentation.

@sungh66
Copy link

sungh66 commented May 16, 2024

Hi, @jytime Does normal inference time have to include the time to load superglue models, the time to extract and match features? I inference 200 pictures at a time, and the time for this part is close to 40 minutes,it is too long. Is it possible to load the model only once to inference different videos?

@jytime
Copy link
Contributor Author

jytime commented May 16, 2024

Hey you could have a try on lightglue instead of superglue, as here:

matcher_conf = match_features.confs["superglue"]

to

matcher_conf = match_features.confs["superpoint+lightglue"]

It should basically give a similar result while be 2x or 3x faster.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants