Assist Non-native Viewers: Multimodal Cross-Lingual Summarization for How2 Videos

The original conference version was accepted by EMNLP 2022, and the extended journal version has been accepted by TPAMI.

Data Preparing

The reorganized How2-MCLS text data can be downloaded from here [Baidu Netdisk, Passcode: a9df], as well as video features [Baidu Netdisk, Passcode: eqqj] (derived from the original How2 dataset). The original How2 dataset for multimodal summarization is provided by https://github.com/srvk/how2-dataset.

Preprocessing

Some demo data is placed in "data/demo_data" folder, and you can replace the demo data with the full How2-MCLS dataset, following the format of "data/demo_data" folder. Then run the following command to preprocess the data. This code takes the Pt2En scenario as an example for demonstration.

python preprocess.py #Please modify the data storage path configuration.

Training and Prediction

After data preprocessing, you can run the following script commands to execute the training and prediction procedures of the proposed models.

VDF

bash run_scripts/VDF.sh

VDF-TS-E

bash run_scripts/VDF-TS-E.sh

VDF-TS-V

bash run_scripts/VDF-TS-V.sh

VDF-TS-E2, using language-adaptive warping distillation (LAWD) to replace adaptive pooling distillation.

bash run_scripts/VDF-TS-E2.sh

VDF-TS-V2, using LAWD to replace adaptive pooling distillation.

bash run_scripts/VDF-TS-V2.sh

Evaluation

nmtpytorch library is used to evaluate models, which includes BLEU (1, 2, 3, 4), ROUGE-L, METEOR, and CIDEr evaluation metrics.

As an alternative, nlg-eval evaluation library can obtain the same evaluation scores as nmtpytorch.

In addition, ROUGE evaluation library is used to calculate the ROUGE (1, 2, L) score.

Acknowledgement

We are very grateful that the code is based on MFN, nmtpytorch, fairseq, machine-translation, pytorch-softdtw-cuda, and Transformers.

Citation

@inproceedings{liu2022assist,
  title={Assist non-native viewers: Multimodal cross-lingual summarization for how2 videos},
  author={Liu, Nayu and Wei, Kaiwen and Sun, Xian and Yu, Hongfeng and Yao, Fanglong and Jin, Li and Zhi, Guo and Xu, Guangluan},
  booktitle={Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing},
  pages={6959--6969},
  year={2022}
}
@article{liu2024multimodal,
  title={Multimodal Cross-lingual Summarization for Videos: A Revisit in Knowledge Distillation Induced Triple-stage Training Method},
  author={Liu, Nayu and Wei, Kaiwen and Yang, Yong and Tao, Jianhua and Sun, Xian and Yao, Fanglong and Yu, Hongfeng and Jin, Li and Lv, Zhao and Fan, Cunhang},
  journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
  year={2024},
  note = {Early Access},
  publisher={IEEE}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Assist Non-native Viewers: Multimodal Cross-Lingual Summarization for How2 Videos

Data Preparing

Preprocessing

Training and Prediction

Evaluation

Acknowledgement

Citation

Files

README.md

Latest commit

History

README.md

File metadata and controls

Assist Non-native Viewers: Multimodal Cross-Lingual Summarization for How2 Videos

Data Preparing

Preprocessing

Training and Prediction

Evaluation

Acknowledgement

Citation