Releases: PaddlePaddle/PaddleVideo
Releases · PaddlePaddle/PaddleVideo
PaddleVideo v2.1.0
Release Note
PaddleVideo v2.1.0有如下升级点:
框架
- 重构framework架构,单卡和多卡下forward接口统一。
- 重构Inference架构,支持不同模型预测。
- 添加混合精度训练和分布式训练接口。
模型
- PP-TSM
(1) 通过添加tricks,Uniform评估策略下精度由73.5提升至74.54。
(2) 添加dense训练策略,蒸馏精度达到76.16,同等ResNet50 backbone下精度超过slowfast。 - Slowfast
(1) 添加multigrid训练加速策略,在kinetics-400数据集上训练358个epoch仅需6.7天。
(2) 评估精度由74.35提升至75.84。 - BMN
(1) 添加Inference支持。
数据集
- 提供Kinetics-400数据集下载链接,包括百度网盘下载和脚本下载方式。
应用
- FootballAction:
(1) 基础特征模型由TSN替换为ppTSM,准确率由84%提升到94%。
(2) 准确率提升,precision和recall均有大幅提升,F1-score从0.57提升到0.82。
Release Note
Framework
- Refactoring code of model.framework to unify the forward interface of single card and multi card training.
- Refactoring code of utils.inference to support different model predictions.
- Add interface of Automatic Mixed Precision Training and Distributed training.
Model
- PP-TSM
(1) Improve accuracy from 73.5 to 74.54 using uniform sampling method.
(2) Improve accuracy to 76.16 using dense sampling method. - Slowfast
(1) Add multigrid training strategy. It only takes 6.7 days to train 358 epochs on the kinetics-400 dataset using v100.
(2) Improve accuracy from 74.35 to 75.84. - BMN
(1) Support inference.
Dataset
- Provide the download link of kinetics-400 dataset, including Baidu network disk and script download.
Application
- FootballAction
(1) Replace TSN with PP-TSM, and the accuracy is improved from 84% to 94%.
(2) improve F1 score from 0.57 to 0.82.
PaddleVideo v2.0.0
Release Note
PaddleVideo 基于2.0动态图实现,使用模块化设计,将各部分功能拆分到不同组件中进行解耦。可以轻松的组合、配置和自定义组件来快速实现视频算法模型。
基础能力
- 支持更多的数据集和模型结构,包括: Kinectics400、UCF-101、YoutTube8M、ActivityNet等数据集。
- 发布多个视频分类和视频动作定位方向模型,包括: TSN、TSM、SlowFast、AttentionLSTM、BMN模型。
- 打通完整部署全流程。
亮点建设
- 发布2D SOTA模型ppTSM: 在Kinectics-400数据集上Top1精度为73.5% ,较标准版TSM提升3.5%,且模型参数量持平,模型训练和预测速度更快。
- 发布多种训练加速方案:SlowFast训练速度相较于原始实现提速100%,TSN+DALI训练速度相较于原始实现提速3.6倍 。
特色应用
- 发布大规模视频分类模型VideoTag: 使用千万量级数据集训练的视频标签预训练模型,支持3000个源于产业实践的实用标签。
- 发布足球动作检测算法FootballAction: 高效定位出视频中各种足球动作发生的起止时间以及该动作类别。
Release Note
Support dynamic graph programming paradigm, adapted to Paddle2.0. Including:
- Various dataset. PaddleVideo supports various datasets including Kinectics400, ucf101, YoutTube8M datasets.
- Various architectures. PaddleVideo supports more architectures, including video recognition models, such as TSN, TSM, SlowFast, AttentionLSTM and action localization model, like BMN.
- Deployable. PaddleVideo is powered by the Paddle Inference.
- Higher performance. PP-TSM, which is based on the standard TSM, already archive the best performance in the 2D recognition network, has the same size of parameters but improve the Top1 Acc to 73.5%.
- Faster training strategy. PaddleVideo supports faster training strategy, it accelerates by 100% compared with the standard Slowfast version. TSN+DALI speed up training 3.6x.
- VideoTag. 3k Large-Scale video classification model.
- FootballAction. Football action detection model.