You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Yes, it can be done in real time, we have already deployed it in our internal applications.
Table 3 in our paper shows that the single-step inferance time of the MotionDiT module is 62ms (80 frames).
In offline mode, the overlap is 10 frames, which is equivalent to 70 valid frames for each inference. At 25fps, the RTF is 62 / (70 * 40) = 0.022, which is much faster than real time. However, the first frame delay (FFD) is too large (more than 70 * 40 = 2800ms).
In online mode, in order to support streaming, some RTF needs to be sacrificed, so we configure a larger overlap (70~75 frames), so that RTF is still less than 1 (0.155~0.31) while ensuring a reasonable FFD (<400ms).
Can this actually be real time if lmdm seq frames has to be 80?
The text was updated successfully, but these errors were encountered: