Mapping sentences to features of video frames #18

akskuchi · 2024-12-09T15:42:17Z

Hi Mattia,

Thank you for your nice work!

I am trying to map sentence features with corresponding frame-level features.
Because the features are extracted at 5-FPS, the timestamps provided in the annotation-files would translate to 5 times their values for obtaining the indices of corresponding video features, no?
E.g., if timestamps of a particular sentence is [4239.133, 4240.508], then I would imagine that the video features would be from visual[movie_id][4239 x 5] until visual[movie_id][4241 x 5].
Could you please let me know if this is the right way to interpret the data?

Kind regards.

The text was updated successfully, but these errors were encountered:

Soldelli · 2024-12-18T07:40:15Z

Dear @akskuchi, your intuition is correct.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mapping sentences to features of video frames #18

Mapping sentences to features of video frames #18

akskuchi commented Dec 9, 2024

Soldelli commented Dec 18, 2024

Mapping sentences to features of video frames #18

Mapping sentences to features of video frames #18

Comments

akskuchi commented Dec 9, 2024

Soldelli commented Dec 18, 2024