Skip to content
Tao Luo edited this page Dec 9, 2019 · 1 revision

Lei Wang

wangkuiyi

helinwang

tonyyang-svail

weixing

abhinavarora

CPPLint Progress

Directory Develop 15-Mar
fluid/framework 29 227
fluid/framework/details 0 5
fluid/inference 0 15
fluid/inference/tensorrt 14 N/A
fluid/memory 0 2
fluid/operators 0 303
fluid/operators/reader 0 8
fluid/operators/concurrency 0 N/A
fluid/operators/math 328 369
fluid/operators/detail 0 29
fluid/operators/nccl 2 2
fluid/platform 0 155
fluid/pybind 0 41
fluid/recordio 0 18
fluid/string 0 7

Chenxi

kexinzhao

Qingsheng Li

Xin Pan

luotao

wuyi

Baiyifan

tangwei

fengjiayi

Yu Yang

  • Fix a critical bug of dynloader
  • Find a critical bug of GPU memory allocator and memcpy
    • We found that we cannot synchonize stream if we invoke cudaMemcpyAsync on a CPU memory, which is allocated by malloc not cudaMallocHost. It is suggest to use cudaMallocHost to malloc CPU memory, when the memory is used for CPU <--> GPU communication.
    • When we change malloc to cudaMallocHost, we found that there are a lot of memory copies are not synchonized. It is a critical bug for Paddle and a key reason making our training process not stable.
    • Currentlly, we add cudaMemcpySync API to avoid the bug when feeding/fetching data. To resolve this bug thoroughly, it will take a week or longer.
  • Add a demo for parallel execturo + reader to train and test a program

gongweibao

wanghaoshuang

Dang Qingqing

zhaochengduo

Liu Yiqun

yangyaming

qiaolongfei

Todo

  • do more benchmark about async training

Yan Xu

dongzhihong

Yibing Liu

Fluid2onnx convertor:

guosheng

Yan Chunwei

daming-lu

cs2be(thuan)

sidgoyal78

jetfuel(Jeff)

Nicky

Clone this wiki locally