Training concept Issue: Use of Repetition for Short Motion Sequences #31

rohitpaul24 · 2025-01-15T05:28:07Z

Thanking you again for the training script, just have some doubt regarding the training approach

In the current implementation, short motion sequences are handled by repeating the motion data to match the required length of 2 * n_motions. While this ensures a uniform batch size, it introduces potential issues with motion continuity and smoothness. Specifically, repeating the same motion clip can lead to a discontinuity between the repeated frames, where the first frame of the repeated clip may have a large difference from the last frame of the original clip.

Potential Issue:
When the repeated frames are processed during training, the model might struggle to maintain smooth transitions, resulting in artifacts or jitter in the generated motion. This discontinuity could negatively impact the model's ability to learn realistic and smooth motion sequences.

Whether Instead of repeating the motion data, using one of the following approaches would be better:
Neutral Source Motion Padding: Use a predefined neutral motion state (e.g., rest pose) for padding and an indicator to mark these as non-informative frames.
As it will preserve the continuity of the original motion data and prevent the model from learning unrealistic transitions.

Questions:
What was the rationale behind using repetition instead of padding?
Have you tested it with zero padding or neutral source padding, and not opted to go with it?

Looking forward to your thoughts on this!

xuyangcao · 2025-01-15T05:59:06Z

Thanking you again for the training script, just have some doubt regarding the training approach

In the current implementation, short motion sequences are handled by repeating the motion data to match the required length of 2 * n_motions. While this ensures a uniform batch size, it introduces potential issues with motion continuity and smoothness. Specifically, repeating the same motion clip can lead to a discontinuity between the repeated frames, where the first frame of the repeated clip may have a large difference from the last frame of the original clip.

Potential Issue: When the repeated frames are processed during training, the model might struggle to maintain smooth transitions, resulting in artifacts or jitter in the generated motion. This discontinuity could negatively impact the model's ability to learn realistic and smooth motion sequences.

Whether Instead of repeating the motion data, using one of the following approaches would be better: Neutral Source Motion Padding: Use a predefined neutral motion state (e.g., rest pose) for padding and an indicator to mark these as non-informative frames. As it will preserve the continuity of the original motion data and prevent the model from learning unrealistic transitions.

Questions: What was the rationale behind using repetition instead of padding? Have you tested it with zero padding or neutral source padding, and not opted to go with it?

Looking forward to your thoughts on this!

Hi, thank you for your question. Yes, simply repeating the motion data may be one of the reasons for unsmooth motions. In the latest version of the model, we removed data less than 4s, which means repeating strategy will not work during training.

As for the two alternatives you mentioned, we did not try yet, you can try them on.

Looking forward to your feedback.

xuyangcao · 2025-01-15T07:24:20Z

Thanking you again for the training script, just have some doubt regarding the training approach

In the current implementation, short motion sequences are handled by repeating the motion data to match the required length of 2 * n_motions. While this ensures a uniform batch size, it introduces potential issues with motion continuity and smoothness. Specifically, repeating the same motion clip can lead to a discontinuity between the repeated frames, where the first frame of the repeated clip may have a large difference from the last frame of the original clip.

Potential Issue: When the repeated frames are processed during training, the model might struggle to maintain smooth transitions, resulting in artifacts or jitter in the generated motion. This discontinuity could negatively impact the model's ability to learn realistic and smooth motion sequences.

Whether Instead of repeating the motion data, using one of the following approaches would be better: Neutral Source Motion Padding: Use a predefined neutral motion state (e.g., rest pose) for padding and an indicator to mark these as non-informative frames. As it will preserve the continuity of the original motion data and prevent the model from learning unrealistic transitions.

Questions: What was the rationale behind using repetition instead of padding? Have you tested it with zero padding or neutral source padding, and not opted to go with it?

Looking forward to your thoughts on this!

Another strategy that is worth trying is to smooth the motions generated in Liveportrait during data preparation, inspired by this issue: KwaiVGI/LivePortrait#439

rohitpaul24 · 2025-01-28T05:17:34Z

@xuyangcao Thanks for the reply

As i was testing on replacing the repetitive data, with some new data structure where instead of cropping 200 frames. I am using a sliding window approach as the video data i am using are quite long. Where i am also adding 0 padding at the end of the data to make it a multple of 100 frames similar to as we are doing it in inference.

Now given I am seeing a convergence of validation loss. For exp smooth loss, it is rising eventhough the value is in range e-6. Whereas i see some zig zag pattern in exp velocity loss as it still converging.
Is it common? or is it something to do with my approach

Thank you

johndpope · 2025-02-09T05:23:36Z

in vasa paper - they use 50 frames with stride of 25 to augment data.
presumably they found the 100 frames from diffposetalk to be inferior. am i mistaken?
this utils/common.py/ truncate_motion_coef_and_audio - in my understanding - helps the model produce an output that's not uniform to window size (eg 7.3 seconds) - but the prev_motion / prev_audio should provide transformer enough context to create a smooth recreation / flow - right?

did removing the alignment mask do anything for training?
im running experiment locally -
how long till it can converge? how many steps?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training concept Issue: Use of Repetition for Short Motion Sequences #31

Training concept Issue: Use of Repetition for Short Motion Sequences #31

rohitpaul24 commented Jan 15, 2025 •

edited

Loading

xuyangcao commented Jan 15, 2025

xuyangcao commented Jan 15, 2025

rohitpaul24 commented Jan 28, 2025

johndpope commented Feb 9, 2025 •

edited

Loading

Training concept Issue: Use of Repetition for Short Motion Sequences #31

Training concept Issue: Use of Repetition for Short Motion Sequences #31

Comments

rohitpaul24 commented Jan 15, 2025 • edited Loading

xuyangcao commented Jan 15, 2025

xuyangcao commented Jan 15, 2025

rohitpaul24 commented Jan 28, 2025

johndpope commented Feb 9, 2025 • edited Loading

rohitpaul24 commented Jan 15, 2025 •

edited

Loading

johndpope commented Feb 9, 2025 •

edited

Loading