You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to map sentence features with corresponding frame-level features.
Because the features are extracted at 5-FPS, the timestamps provided in the annotation-files would translate to 5 times their values for obtaining the indices of corresponding video features, no?
E.g., if timestamps of a particular sentence is [4239.133, 4240.508], then I would imagine that the video features would be from visual[movie_id][4239 x 5] until visual[movie_id][4241 x 5].
Could you please let me know if this is the right way to interpret the data?
Kind regards.
The text was updated successfully, but these errors were encountered:
Hi Mattia,
Thank you for your nice work!
I am trying to map sentence features with corresponding frame-level features.
Because the features are extracted at 5-FPS, the timestamps provided in the annotation-files would translate to 5 times their values for obtaining the indices of corresponding video features, no?
E.g., if timestamps of a particular sentence is
[4239.133, 4240.508]
, then I would imagine that the video features would be fromvisual[movie_id][4239 x 5]
untilvisual[movie_id][4241 x 5]
.Could you please let me know if this is the right way to interpret the data?
Kind regards.
The text was updated successfully, but these errors were encountered: