v0.14.0
panyx0718
released this
03 Jul 08:29
·
105 commits
to release/0.14.0
since this release
Release Log
Major Features
- Enhanced the inference library. Better memory buffer. Added several demos.
- Inference library added support for Anakin engine, TensorRT engine.
- ParallelExecutor supports multi-threaded CPU training. (In addition to multi-GPU training)
- Added mean IOU operator, argsort operator, etc. Improved L2norm operator. Added crop API.
- Released pre-trained ResNet50, Se-Resnext50, AlexNet, etc, Enahanced Transformer, etc.
- New data augmentation operators.
- Major documentation and API comment improvements.
- Enhance the continuous evaluation system.
Performance Improvements
- More overlap of distributed training network operation with computation. ~10% improvements
- CPU performance improvements with more MKLDNN support.
Major Bug Fixes
- Fix memory leak issues.
- Fix concat operator.
- Fix ParallelExecutor input data memcpy issue.
- Fix ParallelExecutor deadlock issue.
- Fix distributed training client timeout.
- Fix distributed training pserver side learning rate decay.
- Thread-safe Scope implementation.
- Fix some issue using memory optimizer and parallelexecutor together.
Known Issues
- IfElse has some bugs.
- BatchNorm is not stable if batch_size=1