Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Commands to MQ Training with VSGN #1

Closed
JunweiLiang opened this issue Jul 8, 2022 · 10 comments
Closed

Commands to MQ Training with VSGN #1

JunweiLiang opened this issue Jul 8, 2022 · 10 comments

Comments

@JunweiLiang
Copy link

Hi, thanks for releasing the code!

Could you provide some instructions on how to run VSGN training with EgoVLP features (hyper-parameters, learning rate, etc.)? Thanks!

Junwei

@QinghongLin
Copy link
Collaborator

Hello Junwei,

Thanks for your interest for our work,
I will update the instruction and related details for MQ next,

Thank you for your patience!

@QinghongLin
Copy link
Collaborator

Hi Junwei,

I have uploaded the video features for MQ tasks to G drive: train&val / test, so that you can download it directly.
What you need to do is replace the input features with our features.
and I have attached our config of the best VSGN model in here config.txt.

Please try it out and let us know if you have new results.

@JunweiLiang
Copy link
Author

I have downloaded the features but they seem to be a single file. Are they a single pickle binary with dictionary keys? How to read them and map them to the videos (for example, slowfast8x8_r101_k400/ has 9645 *.pt files each corresponds to a video)?

Thanks.

@QinghongLin
Copy link
Collaborator

There is a gz file, after unzipping it (I unzip it on my mac), you will see a document that contains multiple *.pt.
e.g., 0a8f6747-7f79-4176-85ca-f5ec01a15435.pt, this pt file corresponding to the video features of the clip: 0a8f6747-7f79-4176-85ca-f5ec01a15435.

The clip information is provided by the MQ metadata, i.e., clip xxx come from the video yyy with start time t1 and end time t2.

@JunweiLiang
Copy link
Author

I see. The file you provided on Google drive is a .tar.gz file, and I extract it with tar -zxf and got 2034 *.pt file for the train/val part. Will try them.

@JunweiLiang
Copy link
Author

So 0a8f6747-7f79-4176-85ca-f5ec01a15435 is the clip ID instead of video ID? Could you provide the feature files of the whole video as the VSGN baseline? They read the feature of the whole video and then cut the corresponding clip (see here). To follow your instructions I would need this video-level features.

Thanks.

@QinghongLin
Copy link
Collaborator

Yes, it is the clip ID. And sorry, I am currently unable to provide video-level features, a solution is to rewrite the data loader so that supports clip features as input.

@srama2512
Copy link

@QinghongLin - Thanks for providing the clip features. I tried training the VSGN model using the Ego4D episodic-memory codebase instructions. But I'm not able to reproduce the val results from the paper. The numbers are quite a bit lower than the paper results (2nd row vs. 3rd row in the figure below).
image

Here is the training command I used. Note: I modified the data loader to use clip features instead of video features.

 python Train.py \
     --use_xGPN \
     --is_train true \
     --dataset ego4d \
     --feature_path data/egovlp_feats_official \
     --checkpoint_path checkpoints/ \
     --tb_dir tb/ \
     --batch_size 24 \
     --train_lr 0.00005 \
     --use_clip_features true \
     --input_feat_dim 256 \
     --num_epoch 100

@QinghongLin
Copy link
Collaborator

QinghongLin commented Sep 19, 2022

Hi, @srama2512 ,
I released the codebase here MQ.zip, you can check the data loader detail regarding clip-level feature loading.
Besides, I am able to check the config parameters, can you have a try at the following parameters?

{'dataset': 'ego4d', 'is_train': 'true', 'out_prop_map': 'true', 'feature_path': '/mnt/sdb1/Datasets/Ego4d/action_feature_canonical', 'clip_anno': 'Evaluation/ego4d/annot/clip_annotations.json', 'moment_classes': 'Evaluation/ego4d/annot/moment_classes_idx.json', 'checkpoint_path': 'checkpoint', 'output_path': './outputs/hps_search_egovlp_egonce_features/23/', 'prop_path': 'proposals', 'prop_result_file': 'proposals_postNMS.json', 'detect_result_file': 'detections_postNMS.json', 'retrieval_result_file': 'retreival_postNMS.json', 'detad_sensitivity_file': 'detad_sensitivity', 'batch_size': 32, 'train_lr': 5e-05, 'weight_decay': 0.0001, 'num_epoch': 50, 'step_size': 15, 'step_gamma': 0.1, 'focal_alpha': 0.01, 'nms_alpha_detect': 0.46, 'nms_alpha_prop': 0.75, 'nms_thr': 0.4, 'temporal_scale': 928, 'input_feat_dim': 2304, 'bb_hidden_dim': 256, 'decoder_num_classes': 111, 'num_levels': 5, 'num_head_layers': 4, 'nfeat_mode': 'feat_ctr', 'num_neigh': 12, 'edge_weight': 'false', 'agg_type': 'max', 'gcn_insert': 'par', 'iou_thr': [0.5, 0.5, 0.7], 'anchor_scale': [1, 10], 'base_stride': 1, 'stitch_gap': 30, 'short_ratio': 0.4, 'clip_win_size': 0.38, 'use_xGPN': False, 'use_VSS': False, 'num_props': 200, 'tIoU_thr': [0.1, 0.2, 0.3, 0.4, 0.5], 'eval_stage': 'all', 'infer_datasplit': 'val'}

@srama2512
Copy link

srama2512 commented Nov 6, 2022

@QinghongLin - Thanks for sharing your code and the hyperparameters. I was able to obtain a similar performance. It turns out that there was a bug in the test_mq.py feature-extraction code that I used. I modified test_mq.py to increase the batch size here to 128.

EgoVLP/run/test_mq.py

Lines 77 to 87 in dc4a60f

batch = 4
times = data['video'].shape[0] // batch
for j in range(times):
start = j*batch
if (j+1) * batch > data['video'].shape[0]:
end = data['video'].shape[0]
else:
end = (j+1)*batch
outs[start:end,] = \
model.compute_video(data['video'][start:end,])

The calculation of times = data['video'].shape[0] // batch does not work when video shape is not a multiple of the batch. It gets much worse when we increase the batch, leaving a residual set of all-zero features in the end. After changing that part of the code to the snippet below, it works as expected.

if data['video'].shape[0]% batch == 0:
    times = data['video'].shape[0] // batch
else:
    times = data['video'].shape[0] // batch + 1

Happy to send a PR if you'd like this bug-fix to be a part of the EgoVLP repo. This affects most of the test_*.py and causes a significant issue if anyone increases batch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants