update #4

AnnaTrainingG · 2021-05-07T03:04:15Z

PR types

PR changes

Describe

* get_api_md5 should prefer use the real name rather than the alias names * case for ArgSpec style. update the unittests test=document_fix

* fix sublayer error with include_sublayers=False * add ut * refactor include_sublayers related api * fix ut * fix ut of transformer * fix ut of transformer * remove useless code * change sublayer api * polish code * add test for include_self=True

* support dp & mp

…to develop (#32294) * [NPU] support GarbageCollector for npu (#31874) * support GarbageCollector for npu * fix typo * fix gather_grad * disable NPUDefaultStreamGarbageCollector on NPU * [NPU] support npu for memcpy op (#31808) * support npu for memcpy op * add ut * fix ut * fix typo * 【NPU】fix bug of using temp vector (#31963) * fix bug when beta1_pow on cpu (#31995) * [NPU] support npu profiler (#31684) * support npu profiler * add python api * fix bugs * add wrapper for incomplete type * update profile proto * record npu wait * add xpu placeholder * fix adam (#32016) * [NPU] enable async copy and add wait before sync operation (#31956) * enable async copy and add wait before sync operation * remove unneccessary wait * add FillNpuTensorWithConstant * refine * fix fill_constant * make TensorFromVector/TensorToVector sync * [NPU] Support dataloader on npu place. (#31867) * [NPU] Wait on NPUPlace (#32086) * [NPU] fix cast op (#32121) * fix npu kernel of cast op to handle casting to same dtype * add comments * [NPU] support cann 20.3 (#32044) * fix compile problem on cann 20.3 * fix ut * fix test_mul * fix check_finite_and_scale * fix lookup_table_v2_grad * fix cmake * support print op * [NPU] Support npu save load (#31893) * support save load for NPU * add save load npu unittest * support np.array transform in NPU * fix errors * delete dygraph in unittest * add Wait * fix unittest * fix review comment * fix unittest problem * fix little problem * change aclrtSynchronizeDevice to aclrtSynchronizeStream for better performance (#32196) * change aclrtSynchronizeDevice to aclrtSynchronizeStream for better performace * refine code * fix NPUDeviceContext in all c++ unittest (#32198) * fix NPUDeviceContext in all c++ unittest * refine log Co-authored-by: pangyoki <pangyoki@126.com> * [NPU] Remove TensorFromVector and avoid sync copy in npu op kernel for better performance (#31994) * enable async copy and add wait before sync operation * remove unneccessary wait * add FillNpuTensorWithConstant * refine * fix fill_constant * change TensorFromVector to FillNpuTensorWithConstant * fix ignored api * delete extra unittest * fix little error * fix update_loss_scaling_op_npu and check_finite_and_unscale_op_npu * change TensorCopySync to TensorCopy * delete useless Wait and add StreamWait * fix npu_stream error * fix check_finite_and_unscale_op_npu TensorCopy * only save stream wait * fix NPUDeviceContext in all c++ unittest * delete wait Co-authored-by: zhiqiu <chenqiuliang@baidu.com> * delete useless unittest file (#32206) * Fix op test (#32231) * fix conditional block (#32243) * fix adam bug again (#32246) * fix compile * fix ut * fix ut Co-authored-by: liym27 <33742067+liym27@users.noreply.github.com> Co-authored-by: pangyoki <pangyoki@126.com>

add npu check nan and inf (#32340)

* test for mac task,notest,test=mac_py3 * fix the bug that the error message is not displayed

* build task cost * return pool

* sharding: update config DOC * update pipeline config * sharding update doc

* add paddle.nn.unfold * update Parameters of Unfold

Change-Id: Ie35a09772e46f7d90cb68ca82c1d18b9201d1abe * large scale kv store optimize Change-Id: I582cc661afdaa20749ec7493eae1b88c32b967f7 * replace std::unorded_map with roundrobin map Change-Id: I48ee0efef38853876c92d982cdfcac6603c52c88 * remove license * fix cpp lint Change-Id: Ia21fafa65adc09bb9094f7dbc987e31d5af2686e

* remove fluid for auto_checkpoint. * fix bug.

* add retry on gcda_clean.py * add exit code for paddle_coverage.sh * fix format error * fix format error

* flush denormal in the tracer op, test=develop * add cmake dependencies, test=develop * add a macro, test=develop * fix the windows case, test=develop

…n multi XPU cards CI (#32302)

* remove thrust includes, test=develop * fix compilation error, test=develop * fix compilation of truncated_gaussian_random_op, test=develop

* [NPU] register finalize on exit * fix

* graph engine demo * upload unsaved changes * fix dependency error * fix shard_num problem * py client * remove lock and graph-type * add load direct graph * add load direct graph * add load direct graph * batch random_sample * batch_sample_k * fix num_nodes size * batch brpc * batch brpc * add test * add test * add load_nodes; change add_node function * change sample return type to pair * resolve conflict * resolved conflict * resolved conflict * separate server and client * merge pair type * fix * resolved conflict * fixed segment fault; high-level VLOG for load edges and load nodes * random_sample return 0 * rm useless loop * test:load edge * fix ret -1 * test: rm sample * rm sample * random_sample return future * random_sample return int * test fake node * fixed here * memory leak * remove test code * fix return problem * add common_graph_table * random sample node &test & change data-structure from linkedList to vector * add common_graph_table * sample with srand * add node_types * optimize nodes sample * recover test * random sample * destruct weighted sampler * GraphEdgeBlob * WeightedGraphEdgeBlob to GraphEdgeBlob * WeightedGraphEdgeBlob to GraphEdgeBlob * pybind sample nodes api * pull nodes with step * fixed pull_graph_list bug; add test for pull_graph_list by step * add graph table;name * add graph table;name * add pybind * add pybind * add FeatureNode * add FeatureNode * add FeatureNode Serialize * add FeatureNode Serialize * get_feat_node * avoid local rpc * fix get_node_feat * fix get_node_feat * remove log * get_node_feat return py:bytes * merge develop with graph_engine * fix threadpool.h head * fix * fix typo * resolve conflict * fix conflict * recover lost content * fix pybind of FeatureNode * recover cmake * recover tools * resolve conflict * resolve linking problem * code style * change test_server port * fix code problems * remove shard_num config * remove redundent threads * optimize start server * remove logs * fix code problems by reviewers' suggestions * move graph files into a folder * code style change * remove graph operations from base table * optimize get_feat function of graph engine Co-authored-by: Huang Zhengjie <270018958@qq.com> Co-authored-by: Weiyue Su <weiyue.su@gmail.com> Co-authored-by: suweiyue <suweiyue@baidu.com> Co-authored-by: luobin06 <luobin06@baidu.com> Co-authored-by: liweibin02 <liweibin02@baidu.com> Co-authored-by: tangwei12 <tangwei12@baidu.com>

)

add int64 support

* fix test_unpool_op * fix test_inplace_addto_strategy * fix test_conv2d_fusion_op * fix test_imperative_lod_tensor_to_selected_rows, test_imperative_selected_rows_to_lod_tensor * fix test_dot_op * fix test_correlation_op * fix tracer * fix test_memcpy_op

* [Rocm] fix test_expand_as_op * [Rocm] fix test_expand_as_op * [Rocm] fix test_expand_as_op * [Rocm] fix test_expand_as_op * [Rocm] fix test_expand_as_op * [Rocm] fix test_expand_as_op

* [Rocm] fix tests of inplace_abn_op & grid_sampler_op * [Rocm] fix tests of inplace_abn_op & grid_sampler_op

* Add deprecated warning info. * Add unittest for deprecated decorator. * Add warning info for tensor.grad

* Add casting initializers for bf16 training * Changes after review * Correct test and add comment

* OP dot: refactor CPU kernels and get better loop performance. * Minor fix on code format. * Fixed minor errors.

…Windows (#32583) * Remove paddle_custom_op dynamic libraries, change link to FLUID_CORE on windows, and check copy_to * fix CI

* Add elementwise_sub_mkldnn_op without grad * Add test to static_mode_white_list * Refactor code, change license years * Remove invalid grad implementation * Fix element_wise_sub_op test * Fix CI Approval error * Remove unnecessary EltwiseSubMKLDNNGradKernel class * Fix CI Approval 2 * Fix CI Approval 3 * Fix CI Approval Attempt #4 * Fix CI Approve Attempt #5 * Fix CI Approval Attempt #6 * Fix CI Approval Attemt #7 * Change test names containing add to sub * Fix old tests testing add instead of sub * Copy grad implementation from elementwise_add_mkldnn * CI test fix attempt * Revert "CI test fix attempt" This reverts commit c647cacf41e6a87c715385a185de5cbf65fc8900. * Fix CI attempt 2 * Fix elementwise_sub tests, temporary mkldnn broadcast test disable * Add working implementation of elementwise_sub grad * Fix build errors caused by pull * Fix format error * Fix format error 2 * Disable elementwise_sub_mkldnn test on GPU * Apply fix for paddle.fluid import * Revert changes of test_elementwise_sub and Fix mkldnn test * Revert "Apply fix for paddle.fluid import" This reverts commit fc3b122. * fix bug of module 'paddle' has no attribute 'fluid' for python3.6 (PaddlePaddle#35862) * Add changes suggested by reviewers * Change @unittest.skipIf... to @OpTestTool.skip_if_not_cpu_bf16() to satisfy Approval CI * Remove check_dygraph=False to satisify CI Approval Co-authored-by: zhangbo9674 <82555433+zhangbo9674@users.noreply.github.com>

…ten::DenseTensor, test=allcases (PaddlePaddle#38473) * Added shared_ptr<Allocation> member & corresponding interfaces to Storage * Removed original pten::Allocation from Storage and adjusted the interfaces accordingly * Fixed issues with storage offset * Used place to malloc allocation for TensorStorage * [Unify Tensors PR #3]Ported framework::Tensor interfaces to pten::DenseTensor * Fixed issues with place * Added comments * Moved mutable_data with stream argument to DenseTensor * Added set_offset interface * Fixed CI issues,test=allcases * [Unify Tensors PR #4] Port LoDTensor interfaces to DenseTensor * Reverted changes too pten_layout() interface * Removed friend classes

…t=allcases (PaddlePaddle#38632) * Added shared_ptr<Allocation> member & corresponding interfaces to Storage * Removed original pten::Allocation from Storage and adjusted the interfaces accordingly * Fixed issues with storage offset * Used place to malloc allocation for TensorStorage * [Unify Tensors PR #3]Ported framework::Tensor interfaces to pten::DenseTensor * Fixed issues with place * Added comments * Moved mutable_data with stream argument to DenseTensor * Added set_offset interface * Fixed CI issues,test=allcases * [Unify Tensors PR #4] Port LoDTensor interfaces to DenseTensor * Removed friend class EigenTensor/EigenMatrix/EigenVector from Tensor * Modified framework::Tensor to inherit from DenseTensor * Reverted changes too pten_layout() interface * Removed friend classes * Rearranged cfunction calls from tensor.data<void>() to tensor.data() * Fixed CI issues * Fixed lite issues * Fixed data() interface issues,test=allcases * Resolved IsInitialized() issues * Fixed ResetHolder() issues * Fixed MKLDNN & Storage issues * Resolved ShareBufferWith() issues * Fixed LoD issues

…st=allcases (PaddlePaddle#38811) * Added shared_ptr<Allocation> member & corresponding interfaces to Storage * Removed original pten::Allocation from Storage and adjusted the interfaces accordingly * Fixed issues with storage offset * Used place to malloc allocation for TensorStorage * [Unify Tensors PR #3]Ported framework::Tensor interfaces to pten::DenseTensor * Fixed issues with place * Added comments * Moved mutable_data with stream argument to DenseTensor * Added set_offset interface * Fixed CI issues,test=allcases * [Unify Tensors PR #4] Port LoDTensor interfaces to DenseTensor * Removed friend class EigenTensor/EigenMatrix/EigenVector from Tensor * Modified framework::Tensor to inherit from DenseTensor * Reverted changes too pten_layout() interface * Removed friend classes * Rearranged cfunction calls from tensor.data<void>() to tensor.data() * Fixed CI issues * Fixed lite issues * Fixed data() interface issues,test=allcases * Resolved IsInitialized() issues * Fixed ResetHolder() issues * Fixed MKLDNN & Storage issues * Resolved ShareBufferWith() issues * Fixed LoD issues * Removed interfaces & members from lod_tensor,test=allcases

…ddlePaddle#39162) * Added selected_rows and rw_lock to pten * Renamed the unit test target to fix CI * Removed Class SelectedRows in Fluid, changed include/cmake relationship, use pten::SelectedRows in Fluid * Remove rw_lock.h,rw_lock_test.cc in fluid * Use pten::RWLock and pten::AutoRDLock, fix CI * Use pten::SelectedRows * Use pten::SelectedRows * Fix to pass NPU CI * Selected_Rows inherits from TensorBase * Use pten::SelectedRows, to pass NPU CI * To fix NPU CI * To fix NPU CI again * Use paddle/pten/core/enforce and polish code

…Paddle#41051) * [Refactor] refactored eager_gen.py PR #2 * [DoubleGrad PR #1] Decoupled code generation logics for Dygraph ForwardFunctions and GradNodes * Fixed minor issue * Adjusted logics of GenerateNodeCreationCodes and GenerateForwardDefinition * Fixed issues * Supported higher-order grad node generation * [DoubleGrad PR #4] Supported higher-order GradNode generation * Fixed yaml typo

…e#41121) * [Refactor] refactored eager_gen.py PR #2 * [DoubleGrad PR #1] Decoupled code generation logics for Dygraph ForwardFunctions and GradNodes * Fixed minor issue * Adjusted logics of GenerateNodeCreationCodes and GenerateForwardDefinition * Fixed issues * Supported higher-order grad node generation * [DoubleGrad PR #4] Supported higher-order GradNode generation * [DoubleGrad #4] Bug Fixes to Double Grad Node Generation * Fixed yaml typo * Fixed yaml typo * fixed minor issues * Fixed minor issue

…sed to paddle.grad() (PaddlePaddle#41198) * [Refactor] refactored eager_gen.py PR #2 * [DoubleGrad PR #1] Decoupled code generation logics for Dygraph ForwardFunctions and GradNodes * Fixed minor issue * Adjusted logics of GenerateNodeCreationCodes and GenerateForwardDefinition * Fixed issues * Supported higher-order grad node generation * [DoubleGrad PR #4] Supported higher-order GradNode generation * [DoubleGrad #4] Bug Fixes to Double Grad Node Generation * Fixed yaml typo * Fixed yaml typo * fixed minor issues * [DoubleGrad PR #5] Enabled gradient computations for grad_tensors passed to paddle.grad() * Fixed minor issue * Fixed CI-Inference issue * Fixed CI-inference issues

…rd run (PaddlePaddle#41306) * [Refactor] refactored eager_gen.py PR #2 * [DoubleGrad PR #1] Decoupled code generation logics for Dygraph ForwardFunctions and GradNodes * Fixed minor issue * Adjusted logics of GenerateNodeCreationCodes and GenerateForwardDefinition * Fixed issues * Supported higher-order grad node generation * [DoubleGrad PR #4] Supported higher-order GradNode generation * [DoubleGrad #4] Bug Fixes to Double Grad Node Generation * Fixed yaml typo * Fixed yaml typo * fixed minor issues * [DoubleGrad PR #5] Enabled gradient computations for grad_tensors passed to paddle.grad() * Fixed minor issue * Fixed CI-Inference issue * Fixed CI-inference issues * [DoubleGrad PR #7] paddle.grad() to copy backward graph before backward run * Fixed minor issues * Fixed issue with backward graph construction logic * Fixed implementation issues with backward graph reconstruction * Fixed unittest issue * Fixed issues

…ePaddle#41387) * [Refactor] refactored eager_gen.py PR #2 * [DoubleGrad PR #1] Decoupled code generation logics for Dygraph ForwardFunctions and GradNodes * Fixed minor issue * Adjusted logics of GenerateNodeCreationCodes and GenerateForwardDefinition * Fixed issues * Supported higher-order grad node generation * [DoubleGrad PR #4] Supported higher-order GradNode generation * [DoubleGrad #4] Bug Fixes to Double Grad Node Generation * Fixed yaml typo * Fixed yaml typo * fixed minor issues * [DoubleGrad PR #5] Enabled gradient computations for grad_tensors passed to paddle.grad() * Fixed minor issue * Fixed CI-Inference issue * Fixed CI-inference issues * [DoubleGrad PR #7] paddle.grad() to copy backward graph before backward run * Fixed minor issues * Fixed issue with backward graph construction logic * Fixed implementation issues with backward graph reconstruction * Fixed unittest issue * Fixed issues * [DoubleGrad PR #8] Enabled triple grads for sigmoid and matmul * Fixed issues with phi kernel * Added triple grad test case * Fixed minor issue

Update forked paddle repo

* add for alpha_fold2 * add some extra setting * fix some bugs * fix some changes * fix some bugs 2nd * Add another initition of Gmem_tile_qkv and Gmem_tile_o * add some compensation for try..catch * fix mistake in flash_attn_fwd * commit for code style and bug check * fix some bugs for flash_attn_with_bias-mask * add more print for pointer debug * add some bug test cases. * backward function * fix bugs * make some changes for backward * Fix compiling error. * quote all printf debug * quote all printf debug and fix interface error * quote all printf debug and fix interface error, fix typo * remove all printf * split files * remove useless debug code * split fwd and bwd execution function * split fwd and bwd execution function * remove useless codes * remove useless codes * remove useless codes 3rd times * remove useless codes 4th times * Fix compiling error. * Remove const.

wadefelix and others added 30 commits April 19, 2021 11:31

update get_api_md5, using the real api name as the map's key (#32224)

21dc044

* get_api_md5 should prefer use the real name rather than the alias names * case for ArgSpec style. update the unittests test=document_fix

Add BF16 Constant Initializer and support for other initializer (#31935)

76cb83e

Fix sublayer (#31824)

4d69eea

* fix sublayer error with include_sublayers=False * add ut * refactor include_sublayers related api * fix ut * fix ut of transformer * fix ut of transformer * remove useless code * change sublayer api * polish code * add test for include_self=True

[Hybrid Parallel] Support dp & mp in dygraph (#32323)

ffd4086

* support dp & mp

add npu check nan and inf (#32340)

1e3a94b

add npu check nan and inf (#32340)

add log to analyse mkldnn models (#32342)

f0cc188

support numpy.array/asarray(tensor) -> ndarray, test=develop (#32300)

43926c8

fix the bug that the error message is not displayed on mac ci (#32367)

0dd28b8

* test for mac task,notest,test=mac_py3 * fix the bug that the error message is not displayed

[heterps] optimize build task (#32358)

c09d645

* build task cost * return pool

move REGISTER_OP_CUDA_KERNEL into cpp with eigen, test=develop (#32114)

f6f59e5

save/load program (#32336)

e0a52fd

[Sharding]: update config DOC (#32299)

e348901

* sharding: update config DOC * update pipeline config * sharding update doc

add paddle.nn.unfold #32297 (#32298)

186682f

* add paddle.nn.unfold * update Parameters of Unfold

remove fluid for auto_checkpoint. (#32157)

1593ee2

* remove fluid for auto_checkpoint. * fix bug.

Added oneDNN reduce_op GRAD kernel (#32280)

ead8342

add retry on gcda_clean.py (#32318)

229f930

* add retry on gcda_clean.py * add exit code for paddle_coverage.sh * fix format error * fix format error

Modify the exit code of mac CI approval error (#32389)

a2cbbe8

add test=develop (#32380)

4898c38

Added bilinear and nearest interp v2 oneDNN FP32 kernels (#32312)

5d19f8d

flush denormal in the tracer op, test=develop (#32350)

9ff8556

* flush denormal in the tracer op, test=develop * add cmake dependencies, test=develop * add a macro, test=develop * fix the windows case, test=develop

[Kunlun]add collective ops for multi XPU cards training and add Kunlu…

2194ad1

…n multi XPU cards CI (#32302)

remove thrust include files (#32395)

ab6f874

* remove thrust includes, test=develop * fix compilation error, test=develop * fix compilation of truncated_gaussian_random_op, test=develop

[NPU] register npu finalize on exit (#32390)

8e4c193

* [NPU] register finalize on exit * fix

add get_loss_scaling to fleet (#32401)

37bb334

Update the error info for quantizaion (#32273)

3da2c7f

[CustomOP]Support find include/c++/v1 include dirs automatically (#32404

661a1f6

)

[CustomOp]Fix MAC3-CI random failed with XXX_setup.py(#32369)

7bae5e9

lilong12 and others added 16 commits May 5, 2021 09:31

update, test=develop (#32726)

a259076

Change Paddle CI-Cverage Python3.8 (#32515)

8b1b214

Sum kernel for CPU supporting BF16 and SelectedRows (#32631)

9599c3b

add int64 support test=develop (#32736)

f1c68a0

add int64 support

Fix bugs of pipeline on ascend. (#32737)

c5ae21f

fix l1 decay for inplace (#32717)

efdb0a7

[Rocm] fix expand as (#32704)

2fe4580

* [Rocm] fix test_expand_as_op * [Rocm] fix test_expand_as_op * [Rocm] fix test_expand_as_op * [Rocm] fix test_expand_as_op * [Rocm] fix test_expand_as_op * [Rocm] fix test_expand_as_op

change parameter name from softmax_switch to use_softmax, test=develop

28d42a9

update 2.0 public api in distributed (#32695)

70eb435

[Rocm] fix tests of inplace_abn_op & grid_sampler_op (#32703)

7c27541

* [Rocm] fix tests of inplace_abn_op & grid_sampler_op * [Rocm] fix tests of inplace_abn_op & grid_sampler_op

[2.1 API] Enable printing deprecated warning info. (#32712)

51b39a9

* Add deprecated warning info. * Add unittest for deprecated decorator. * Add warning info for tensor.grad

Mechanism that converts startup_program initializers to BF16 (#32720)

ce2bdb0

* Add casting initializers for bf16 training * Changes after review * Correct test and add comment

Refactor dot op's CPU kernel for better performance (#32589)

97a9552

* OP dot: refactor CPU kernels and get better loop performance. * Minor fix on code format. * Fixed minor errors.

bug fix, test=develop (#32752)

9b65d4c

Remove paddle_custom_op dynamic libraries, and link to FLUID_CORE on …

7610c2b

…Windows (#32583) * Remove paddle_custom_op dynamic libraries, change link to FLUID_CORE on windows, and check copy_to * fix CI

AnnaTrainingG merged commit d25ab26 into AnnaTrainingG:develop May 7, 2021

AnnaTrainingG pushed a commit that referenced this pull request Jun 9, 2022

Merge pull request #4 from PaddlePaddle/develop

5be3a45

Update forked paddle repo

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

update #4

update #4

AnnaTrainingG commented May 7, 2021

update #4

update #4

Conversation

AnnaTrainingG commented May 7, 2021

PR types

PR changes

Describe