You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It seems that in your training/eval data there is only one 2048 2d feature and one 2048 3d feature for a sentence. But using the feature extractor in https://github.com/antoine77340/video_feature_extractor , it seems that there will be nx2048 features for a sentence (if the sentence is n seconds in duration for 2d, and approximately n/1.5 seconds for 3d). How do I aggregate nx2048 features into one 2048 feature as stated in your paper by using temporal max-pooling ? Just select the max value for each dimension ?
The text was updated successfully, but these errors were encountered:
dixonhsiao
changed the title
how to aggregate n*2048 features into one 2048 feature ?
how to aggregate nx2048 features into one 2048 feature ?
Sep 10, 2019
It seems that in your training/eval data there is only one 2048 2d feature and one 2048 3d feature for a sentence. But using the feature extractor in https://github.com/antoine77340/video_feature_extractor , it seems that there will be nx2048 features for a sentence (if the sentence is n seconds in duration for 2d, and approximately n/1.5 seconds for 3d). How do I aggregate nx2048 features into one 2048 feature as stated in your paper by using temporal max-pooling ? Just select the max value for each dimension ?
The text was updated successfully, but these errors were encountered: