Training Speed #33

jytime · 2024-03-09T02:21:39Z

I happen to find the release training code seems to be super slow compared to the original (internal) implementation when training on 8GPUs. It seems the single GPU training does not suffer from this. Mark it here and delve later

jytime · 2024-03-09T03:07:58Z

This looks like because accelerate is not set up correctly and hence data loading is 10x slower. Put this issue here in case someone may meet the problem. The number of sec/it in the log indicates the time used for each training step. It should be within 1-3 seconds. If a training step takes more than this value, usually there is something wrong.

If someone meets this problem, the simplest solution may be to use pytorch's own distributed training and remove accelerate/accelerator in our training code.

sungh66 · 2024-03-11T08:55:22Z

Hi,i recently did the reproduction of this article. On 41 class of CO3d data, the highest racc_15 of the training set is 0.93, the tacc_15 is close to 0.8, and the speed is 0.8sec/it. Is this result normal?

jytime · 2024-03-11T20:40:40Z

Hi @sungh66 the result looks good. In my own logs, the tacc_15 during training should be slightly higher, close to 0.9. But it should be fine as long as the testing result is consistent, because the accuracy during training is highly affected by the degree of data augmentation.

sungh66 · 2024-05-16T11:10:59Z

Hi， @jytime Does normal inference time have to include the time to load superglue models, the time to extract and match features? I inference 200 pictures at a time, and the time for this part is close to 40 minutes,it is too long. Is it possible to load the model only once to inference different videos?

jytime · 2024-05-16T12:27:53Z

Hey you could have a try on lightglue instead of superglue, as here:

PoseDiffusion/pose_diffusion/util/match_extraction.py

Line 92 in 41d1cf8

matcher_conf = match_features.confs["superglue"]

to

matcher_conf = match_features.confs["superpoint+lightglue"]

It should basically give a similar result while be 2x or 3x faster.

jytime mentioned this issue Mar 9, 2024

RT export questions #30

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training Speed #33

Training Speed #33

jytime commented Mar 9, 2024 •

edited

Loading

jytime commented Mar 9, 2024 •

edited

Loading

sungh66 commented Mar 11, 2024

jytime commented Mar 11, 2024 •

edited

Loading

sungh66 commented May 16, 2024

jytime commented May 16, 2024

Training Speed #33

Training Speed #33

Comments

jytime commented Mar 9, 2024 • edited Loading

jytime commented Mar 9, 2024 • edited Loading

sungh66 commented Mar 11, 2024

jytime commented Mar 11, 2024 • edited Loading

sungh66 commented May 16, 2024

jytime commented May 16, 2024

jytime commented Mar 9, 2024 •

edited

Loading

jytime commented Mar 9, 2024 •

edited

Loading

jytime commented Mar 11, 2024 •

edited

Loading