sync master #39

qingshui · 2022-05-19T08:40:53Z

PR types

PR changes

Describe

* extract sub-graph * graph-engine merging * fix * fix * fix heter-ps config * test performance * test performance * test performance * test * test * update bfs * change cmake * test * test gpu speed * gpu_graph_engine optimization * add dsm sample method * add graph_neighbor_sample_v2 * Add graph_neighbor_sample_v2 * fix for loop * add cpu sample interface * fix kernel judgement * add ssd layer to graph_engine * fix allocation * fix syntax error * fix syntax error * fix pscore class * fix * change index settings * recover test * recover test * fix spelling * recover * fix * move cudamemcpy after cuda stream sync * fix linking problem * remove comment * add cpu test * test * add cpu test * change comment * combine feature table and graph table * test * test * pybind * test * test * test * test * pybind * pybind * fix cmake * pybind * fix * fix * add pybind * add pybind * optimize pybind * test * fix pybind * fix * pybind change * remove file Co-authored-by: DesmonDay <908660116@qq.com>

* optimize performance of dygraph * optimize performance of dygraph and elementwise_add * optimize the trace op * fix bug * fix bug * fix unittest bug * fix code format

* Fix the race condition in cumsum operator * Optimize cumsum operator

* added test for shuffle_channel_mkldnn_detect_pass * added UT using new framework * CI fix

* fix collections.Sequence in python3.10 * fix format

…ElemwiseGradBroadcast (#42320) * set device id of Place() to get GPUContext needed by LimitGridDim in ElemwiseGradBroadcast * fix code style

…llcases (#42285) * test op_test test=allcases * fix * avoid copy many same file * fix for win * test PYTHONPATH * change path adding way * fix win * use old way * use old way test=allcase * use old way test=allcase

* [KP] fix bug when phi kernel is *_raw * modify the static graph * delete useless comment * delete the phi multiply kernel case * add VLOG(3) message * add VLOG(3) message * fix static graph error in phi * fix bug in tranform model * modify the comment * delete useless code * fix CI bug * fix CI bug

* fix PIL sample mode deprecated warning * compatible with old pil version

* add gradient merge for DistributedFusedLamb * use master acc gradient * fix CI ut * polish * remove math_function_impl.h change * fix test_update_loss_scaling_op.py * try to fix XPU/NPU CI * add gm ut

* Refactor Quantization * Refactor Dequantization * Classy solution * Style I * Style II * Style III * Use VLOG(4) for debug info * Style IV

* Suppport more scenes for fused_fast_ln * fix

* opt attr eaque perf * opt attr select code * fix one hot infermeta * polish get attr impl * fix tests failed * add testcases

* back fl * delete ssl cert * . * make warning * . * unittest paral degree * solve unittest * heter & multi cloud commm ready * . * . * arm_brpc compile * . * . * . * . * . * . * . * . * . * . * . * . * . * . * only output is ok * base is ok * . * . * . * . * . * . * . * . * add switch server bin * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * adapt brpc ssl * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * . * fix heter_server & heter_client * . * . * int->int64_t * . * safe map in multithread * fix heter unitest * . * fix code_style * .

#42273) * [Dy2Stat]Fix losting pre/post hook from outermost layer while jit.save * fix kwargs * fix unittest

* add double yaml * add inline func

* set scipy and numpy version suit for py3.6 * pynacl1.5.0 which is needed by PyGithub built failed in python36, change it to 1.4.0 also works;test=document_fix;test=windows_ci * np.corrcoef support parameter since 1.20 * delete test code

* fix device_free * fix hang

* fix summary trainable_params bug

* [NPU] add take_along_axis and take_along_axis_grad ops * [NPU] add take_along_axis and take_along_axis_grad ops * fix ut because cpu kernel can not be fallbacked

* matmul refactor * remove UT which only check ENFORCE output * code format * improve memory usage

* [Eager] Polish eager code generation * Remove useless code in codegen

…ion mechanism (#41919) * Updated triple_grad_check func * add todo for gradient checker and refine some comments * remove additional code * add test for warnging in backward.py * format python code * support multi input in triple gradient checker * Add matmul triple grad kernel * Updated comments of TODO * Supported some special tests * Change code-format to follow CI std * Updated gradient_checker.py * Fix conflicts * Removed unnecessary printing log * Change code style to follow CI std * merge upstream * add priops.py * add_p * rm useless files * add sub_p mul_p div_p * add sqrt_p and tanh_p * add reshape_p * add broadcast_p * Add python primitive wrappers. * Jvp rules updated. * JVP rules done for all the 17 primops. * quick check and fixes. * add jvp(op, *args) * add broadcast_p fill_constant_p matmul_p reduce_p reshape_p transpose_p * add split_p and concat_p * add gather_p and scatter_add_p * add slice_select_p and slice_assign_p * Add transpose rules. * add multi input check for add_p, sub_p, mul_p, div_p * update concat_p * Linearize and transpose in progress.. * refine gather_p and scatter_add_p * updated. * update transpose. * refine slice_assign_p and slice_select_p * init commit for lower * Merged with primitive ops. * small update * add rules for orig2prim and prim2orig * add 9 test for prim ops * add more test and fix some bug * add more test * register proto * Adding primops test. * add shape valid check for broadcast_p op, and add keepdim attr into reduce_p op proto * support multi input and multi output for split_p and concat_p * Test updated. * update * fix slice bug for slice_select_p and slice_assign_p * updated. * Ops updated. * Refactor and bug fixes. * updated. * finish orig2prim and prim2orig rules * dtype for axis attr should be long int * update dtype for axis attr int64_t * update for iscan CI * Update primx. * Refactor vars in primx. * update for lower transform * add more shape and dtype check * update primx.py * change IndexTensor into int32 dtype * update * Fix linearize and transpose. * Update is_dot * Update is_dot * Update is_dot * add gradient aggregation, fix add_transpose. * pass first linearize+transpose test. * update test * refactor op registration and primx. * update rule for slice_assign * try test lower * update orig2prim and prim2orig * pass simple lower pass * update * Update input types in the unit test. * orig2prim segfault. * 50% for adam.minimize * test updated. * temp fix erros in removing vars. * primx updated. * update for matmul_v2 and reshape2 orig2prim * update for minimize * Refine primrules * Remove some code * supporting unused and unreachable vars. * update for use prim2orig in minimize * fix gather and scatter_add transpose. * Add rules UT * update scatter_add * Refine UT code * fix nonetype check in topo * Update gather_p pywrapper. * remove useless print * Merge tongxin PR and refine code * readd some test * rm useless print * polish code. * fix bug in minimize * add get_input_var_list and get_output_var_list and use it in lower * Fix scatter_add_p prim2orig * Update code and fix orig2prim/prim2orig UT * delete vars after block.desc._remove * Improve ops and vars clean up logics. * fix some bug in linearize and lower * update tanh transpose. * use set instead of list for var2remove * test updated. * polish code. * fix dot2bar delete. * merge tx/ad * add indextensor_dot for gather and scatter_add * add sorted for set * Fix scale_orig2prim params * fix some syntax bug * add golbal_lower_update list * Better handling of unused vars. * update tests. * Fix elementwise_sub orig2prim * support none for transpose rule * Merge and add transform UT * fix a bug in transpose * Fix transpose and UT * a hacky fix for cancat op * Fix exector place * Refine variable name * Add elementwise_mul orig2prim and support p_norm when p=1 * Add sqrt orig2prim rule and UT * merge wz test * rename files, add enable_prim, disable_prim, prim_enabled, delete global_lower_update * fix a bug in test_ad_transform_trans * revert modify in framework.py * add paddle.fluid.incubate.ad_transform to python/setup.py.in * Fix remove vars error * Fix p_norm_orig2prim * merge wz * Modify the code directory * Add utils.py and remove get_input/output_vars functions * Update maolin code * Rename UT and refine test_ad_transform_primops * Fix div_p jvp rule * Add higher derivatives UT * Remove UT to autograd dir * Fix comments * import paddle in primops.py * Add some error message for assert * Refine UT class name and refine some comments in primreg.py * update minimize of paddle/optimizer for supporting new autograd * resolve cicular importing between backward.py and optimizer.py * fill gradients and minimize unittest * Replace `assert isinstance` with `raise TypeError` * Add some assert message for primx.py * Polish variable name * Add some assert message * add some docstring * refine some name * update the format of english documents * Split test_transform.py to two files to avoid ci error * fix the document format of enable_prim/disable_prim/prim2orig/prim_enabled * polish test_gradients_and_minimize * add default value for prim_enabled api doc * Remove some UT to avoid windows ci error * Enlarge test_gradients_and_minimize limit time * Fix ut limit time Co-authored-by: veyron95 <veyron_wu@163.com> Co-authored-by: Jiabin Yang <360788950@qq.com> Co-authored-by: levi131 <limaolin01@baidu.com> Co-authored-by: Tongxin Bai <waffle.bai@gmail.com> Co-authored-by: Xiaoxu Chen <chenxx_id@163.com> Co-authored-by: levi131 <83750468+levi131@users.noreply.github.com>

…ctions (#41772) Add Code Generation for operators, op makers and argument mapping functions (#41772)

…ic directory (#42842) * [Dy2Stat]Modify all jit.save path into tempfile * [Dy2Stat]Modify all jit.save path into tempfile

* slice data in dist_loader & flag to scale grad * bug fix * update unittest * enable static

* update readme test=document_fix * fix api docs;test=document_fix * update logic.py;test=document_fix * update docs;test=document_fix

* support yolov5s static/int8 * fix eltwise_sub and div weight compute * fix delete_fill_constant_pass

* auto parallel support primitive op with data parallel * add primitive change * 5 loss 3D cylinder acc aligned * add unitest

* enable graph-engine to return all id * change vector's dimension * change vector's dimension * enlarge returned ids dimensions * add actual_val * change vlog * fix bug * bug fix * bug fix * fix display test * singleton of gpu_graph_wrapper * change sample result's structure to fit training * recover sample code * fix * secondary sample * add graph partition * fix pybind * optimize buffer allocation * fix node transfer problem * remove log * support 32G+ graph on single gpu * remove logs * fix * fix * fix cpu query * display info * remove log * remove empyt file * distribute labeled data evenly in graph engine Co-authored-by: DesmonDay <908660116@qq.com>

…1093) * refine enforce code * refine enforce code * fix compile failed * fix infrt failed

…input_size. (#42770) test=document_fix

* remove shared_storage * fix bug * fix rnn bug

…ffix (#42856)

* change the output format of C++ backward api * fix merge conflict * fix sparse api code auto-gen * fix eager_gen bug * fix bug of output is null * fix bug of conv2d_grad_impl * fix optional grad * fix bug of eager-gen double_grad * fix bug * fix multiply_double_grad bug * fix bug of higher order derivative * fix bug of FillZeroForEmptyGradInput * remove redundant vector in grad_node * fix bug of test_deformable_conv_v1_op * fix bug of test_deformable_conv_v1_op * some refacotr

* run all demo ci before exit;test=document_fix;test=windows_ci_inference * fix bug;test=document_fix;test=windows_ci_inference * improve log * commetn test code * modify according to zhouwei's comments

* fix adam with multi dim; test=develop

…addle#46342) * add extra attr property set * add type_info for all context * add onednn context to all context * fix context compile error * simplify conv kernel args * pass runtime attr into dev_ctx * fix marco error * clear conv_grad_kernel extra args * merge conv_grad_grad into conv_grad * clear conv2d_grad_grad extra attrs * clear yaml and eager extra attr * fix conv1d error * change to thread local * fix npu compile failed * try to fix windows compile failed * add conv2d onednn phi kernel * fix ci bugs (#36) * fix compile bugs (#38) * fix extra input transform bug (#39) * support dynamic created attr (#40) * reset extra info gen code * rm conv_grad_grad kernel * reimpl pass attr adapting * add int attr support * remove vector inputnames creating * fix map at error * Update paddle/phi/kernels/onednn/conv_grad_kernel.cc Co-authored-by: Sławomir Siwek <slawomir.siwek@intel.com> * remove useless extra attrs * replace mkldnn_engine by onednn_engine Co-authored-by: YuanRisheng <yuanrisheng@baidu.com> Co-authored-by: Sławomir Siwek <slawomir.siwek@intel.com>

* add extra attr property set * add type_info for all context * add onednn context to all context * fix context compile error * simplify conv kernel args * pass runtime attr into dev_ctx * fix marco error * clear conv_grad_kernel extra args * merge conv_grad_grad into conv_grad * clear conv2d_grad_grad extra attrs * remove redundant imports * migrate softmax * clear yaml and eager extra attr * fix conv1d error * change to thread local * fix npu compile failed * try to fix windows compile failed * add conv2d onednn phi kernel * fix ci bugs (#36) * fix compile bugs (#38) * fix extra input transform bug (#39) * support dynamic created attr (#40) * reset extra info gen code * rm conv_grad_grad kernel * reimpl pass attr adapting * add int attr support * remove vector inputnames creating * merge dev * fix map at error * adjust attribute * adapt funcs to PHI Co-authored-by: Chen Weihang <chenweihang@baidu.com> Co-authored-by: YuanRisheng <yuanrisheng@baidu.com>

* add extra attr property set * add type_info for all context * add onednn context to all context * fix context compile error * simplify conv kernel args * pass runtime attr into dev_ctx * fix marco error * clear conv_grad_kernel extra args * merge conv_grad_grad into conv_grad * clear conv2d_grad_grad extra attrs * remove redundant imports * migrate softmax * clear yaml and eager extra attr * fix conv1d error * change to thread local * fix npu compile failed * try to fix windows compile failed * add conv2d onednn phi kernel * fix ci bugs (#36) * fix compile bugs (#38) * fix extra input transform bug (#39) * support dynamic created attr (#40) * reset extra info gen code * rm conv_grad_grad kernel * reimpl pass attr adapting * add int attr support * remove vector inputnames creating * merge dev * fix map at error * adjust attribute * adapt funcs to PHI * init * adjust imports * support postops * format codeblocks * revert changes to softmax Co-authored-by: Chen Weihang <chenweihang@baidu.com> Co-authored-by: YuanRisheng <yuanrisheng@baidu.com>

* add extra attr property set * add type_info for all context * add onednn context to all context * fix context compile error * simplify conv kernel args * pass runtime attr into dev_ctx * fix marco error * clear conv_grad_kernel extra args * merge conv_grad_grad into conv_grad * clear conv2d_grad_grad extra attrs * clear yaml and eager extra attr * fix conv1d error * change to thread local * fix npu compile failed * try to fix windows compile failed * add conv2d onednn phi kernel * fix ci bugs (#36) * fix compile bugs (#38) * fix extra input transform bug (#39) * support dynamic created attr (#40) * reset extra info gen code * rm conv_grad_grad kernel * reimpl pass attr adapting * add int attr support * remove vector inputnames creating * fix map at error * Update paddle/phi/kernels/onednn/conv_grad_kernel.cc Co-authored-by: Sławomir Siwek <slawomir.siwek@intel.com> * remove useless extra attrs * replace mkldnn_engine by onednn_engine * Migrate pool+grad to PHI * Update paddle/fluid/operators/mkldnn/test_mkldnn_op_nhwc.cc Co-authored-by: Sławomir Siwek <slawomir.siwek@intel.com> * Update paddle/phi/kernels/onednn/pool_grad_kernel.cc Co-authored-by: Sławomir Siwek <slawomir.siwek@intel.com> * Update paddle/phi/kernels/onednn/pool_kernel.cc Co-authored-by: Sławomir Siwek <slawomir.siwek@intel.com> Co-authored-by: Chen Weihang <chenweihang@baidu.com> Co-authored-by: YuanRisheng <yuanrisheng@baidu.com> Co-authored-by: Chen Weihang <chenwhpro@163.com> Co-authored-by: Sławomir Siwek <slawomir.siwek@intel.com>

tiancaishaonvjituizi and others added 30 commits April 27, 2022 15:19

fix sparse csr (#42271)

b9bfcf1

Optimize performance of dygraph (v4) (#42196)

37e2f02

* optimize performance of dygraph * optimize performance of dygraph and elementwise_add * optimize the trace op * fix bug * fix bug * fix unittest bug * fix code format

inplace addto (#42313)

748d2ae

fix bug (#42314)

00ed8b5

Fix the race condition in cumsum operator (#42205)

5d72945

* Fix the race condition in cumsum operator * Optimize cumsum operator

fix collections.Iterable in python3.10 (#42295)

3d6fb26

fix gcc warning of [-Wint-in-bool-context] (#42268)

cf78009

implement autotune python API (#42299)

2094a58

Added missing test for shuffle_channel_mkldnn_detect_pass (#42001)

5134f11

* added test for shuffle_channel_mkldnn_detect_pass * added UT using new framework * CI fix

fix collections.Sequence in python3.10 (#42242)

edb61a5

* fix collections.Sequence in python3.10 * fix format

set device id of Place() to get GPUContext needed by LimitGridDim in …

22d3c56

…ElemwiseGradBroadcast (#42320) * set device id of Place() to get GPUContext needed by LimitGridDim in ElemwiseGradBroadcast * fix code style

polish attr get impl (#42337)

b972b0d

[Performance]Add static inline for MakeReturnPyObject (#42334)

2e1fb26

fix fused_multi_transformer compile failed in cuda arch < sm53 (#42315)

f450797

fix PIL sample mode deprecated warning (#42307)

c7a258f

* fix PIL sample mode deprecated warning * compatible with old pil version

[CustomDevice] add amp support (#42035)

acbb5db

Add gradient merge for DistributedFusedLamb optimizer (#40177)

108aeb2

* add gradient merge for DistributedFusedLamb * use master acc gradient * fix CI ut * polish * remove math_function_impl.h change * fix test_update_loss_scaling_op.py * try to fix XPU/NPU CI * add gm ut

fix error report. (#42333)

afa846d

Bfloat16 refactor (#42238)

8ad3870

* Refactor Quantization * Refactor Dequantization * Classy solution * Style I * Style II * Style III * Use VLOG(4) for debug info * Style IV

fix FusedResidualDropoutBias nan in v100 (#42344)

687219f

Suppport more scenes for fused_fast_ln (#42282)

7cb4953

* Suppport more scenes for fused_fast_ln * fix

Optimize attribute selected performence (#42294)

5063546

* opt attr eaque perf * opt attr select code * fix one hot infermeta * polish get attr impl * fix tests failed * add testcases

optimize the pybind in dygraph (#42343)

7f14f78

[Dy2Stat]Fix losting pre/post hook from outermost layer while jit.save (

27cf7af

#42273) * [Dy2Stat]Fix losting pre/post hook from outermost layer while jit.save * fix kwargs * fix unittest

Using small vector for slot and merge edge into grad_slot_meta (#42350)

2bee99d

Add some double/triple grad kernel yaml file (#42361)

24ec6ed

* add double yaml * add inline func

betterpig and others added 24 commits May 18, 2022 11:15

Fix graph hang (#42768)

133d63f

* fix device_free * fix hang

[collective] dynamic shape for send_v2 and recv_v2 (#42765)

1f64c42

Add return in initial function (#42823)

bebaee3

fix summary trainable_params bug (#42798)

e33b9db

* fix summary trainable_params bug

[NPU] add take_along_axis and take_along_axis_grad kernels (#42773)

6f0a28f

* [NPU] add take_along_axis and take_along_axis_grad ops * [NPU] add take_along_axis and take_along_axis_grad ops * fix ut because cpu kernel can not be fallbacked

matmul and matmul_v2 refactor (#42732)

570d032

* matmul refactor * remove UT which only check ENFORCE output * code format * improve memory usage

[Eager] Polish eager code generation (#42822)

b9342a8

* [Eager] Polish eager code generation * Remove useless code in codegen

Add Code Generation for operators, op makers and argument mapping fun…

e339d3c

…ctions (#41772) Add Code Generation for operators, op makers and argument mapping functions (#41772)

fix tensorrt dla int8 problem (#42826)

a51817d

[Dy2Stat]Modify all jit.save path into tempfile under dygraph_to_stat…

16ce33b

…ic directory (#42842) * [Dy2Stat]Modify all jit.save path into tempfile * [Dy2Stat]Modify all jit.save path into tempfile

[AutoParallel] split data in dataloader (#42838)

df47095

* slice data in dist_loader & flag to scale grad * bug fix * update unittest * enable static

Fix API Docs bug (#42816)

9f4d342

* update readme test=document_fix * fix api docs;test=document_fix * update logic.py;test=document_fix * update docs;test=document_fix

[TensorRT] Support yolov5s (#42688)

a777893

* support yolov5s static/int8 * fix eltwise_sub and div weight compute * fix delete_fill_constant_pass

[Auto Parallel] Support Primitive operators with Data Parallel (#42709)

6b8efc4

* auto parallel support primitive op with data parallel * add primitive change * 5 loss 3D cylinder acc aligned * add unitest

[CompileOpt] Refine enforce code and remove boost/variant include (#4…

ca359fe

…1093) * refine enforce code * refine enforce code * fix compile failed * fix infrt failed

Fix typos in the comment doc of SimpleRNN, LSTM, GRU: hidden_size -> …

155fe05

…input_size. (#42770) test=document_fix

[Phi] Remove shared_storage (#42821)

7a171e3

* remove shared_storage * fix bug * fix rnn bug

【GPUPS】add ctr_dymf_accessor for pscore (#42827)

148582f

[NPU] minor changes for version control to support version without su…

892f685

…ffix (#42856)

【CI】run all demo ci before exit in windows (#42700)

6d0e4e4

* run all demo ci before exit;test=document_fix;test=windows_ci_inference * fix bug;test=document_fix;test=windows_ci_inference * improve log * commetn test code * modify according to zhouwei's comments

qingshui merged commit 09dac41 into qingshui:develop May 19, 2022

qingshui pushed a commit that referenced this pull request Jun 29, 2022

fix adam with multi dim (#39)

7263442

* fix adam with multi dim; test=develop

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sync master #39

sync master #39

qingshui commented May 19, 2022

sync master #39

sync master #39

Conversation

qingshui commented May 19, 2022

PR types

PR changes

Describe