develop #2

esythan · 2021-08-23T13:20:45Z

PR types

PR changes

Describe

…34922) * add device_context * add gtest for device_event_gpu * Remvoe duplicate DeviceType * push for test * add unittest * fix macros * fix MSVC using usage

* notest;test=gpu-inference * notest;test=gpu-inference * notest;test=gpu-inference * notest;test=gpu-inference * fix error * notest;test=gpu-inference * notest;test=gpu-inference * notest;test=gpu-inference * test=gpu-inference

* fix batch_norm and instance norm when input is []

…ut's shape is [0, 0, 0]. (#34996)

* add slim resnet50 quant model in pr-ci-inference * enable resnet50_quant multi_thread4_trt_int8_bz1 * remove LOG(FATAL)

* add npu sin op * [NPU] Support npu kernel for sin op * modify support npu kernel for sin op * modify support npu kernel for sin op * modify nou sin op * modify npu sin op * add sin op npu

* Add run function log * test=document_fix

* add (N,C,*) input support for GroupNorm * --amend

* [NPU] Support npu op where and where grad * fix use const_cast * delete a test

* add depthwise_conv2d npu * add some tests * Delete test_unique_op_npu.py * delete trans input

* add trainer desc config to distributed strategy * code style modified * data_feed set lod

* use spin lock in auto growth allocator, test=develop * use pthread spin lock, test=develop * use lock guard, test=develop * use malloc spin lock, test=develop * use lock_guard, test=develop

* [NPU] Support npu kernel for pad3d op * fix for comment of zhouwei25 * fix some bugs according to qili93's comments * add support and test for paddings in input * delete VLOG used for debug

* add rmsprop npu * add argsort npu * add argsort npu * modify according to review * modify sharedatawith according to review * modify reshape according to review * rm dygraph=false

…#35004)

* Add cuda device count api * update coda format * fix unittest error * update code format * update comment

* adamw support cuda * adamw support cuda

… out of bounds (#35062)

This reverts commit 6bacfb0.

* Support getitem by Bool index * delete some debug info of bool index * support the case that the shape of bool index is different from indexed tensor

… the grad by dtype (#35070)

Refactor the organization of layer_norm cuda impl so that it can be reused in fused attention op. Extract the layer_norm cuda impl form layer_norm_op.cu to layer_norm_kernel.cu.h. Define fused/attention_layer_norm.h, which can be used in fused attention op in next PR.

* enable infer_ut on windows * remove lib calculation & time * unset http_proxy when download bos file on windows

* - disabled interpolate onednn * - compilation fix * - draft of batch_norm cache disabling * - fixes to UT

Accessor merge

* update fft api path (PaddlePaddle#36219) * update fft api path * add sample code for ihfft2 Co-authored-by: chenfeiyu <chenfeiyu@baidu.com> * fix fft axis (PaddlePaddle#36321) fix: `-1` is used when fft's axis is `0` * use unified external error message for cufft api (PaddlePaddle#36114) * fft: modify sample code result (PaddlePaddle#36325) * dynamic load mkl as a fft backend when it is avaialble and requested (PaddlePaddle#36414) * add rocm support for fft api (PaddlePaddle#36415) * move signal apis * move fft and signal API path (#2) * move signal apis * move fft.py and signal.py to paddle/, fix typos * fix relative imports from fft.py and signal.py * fix typos in signal.py (#3) * move signal apis * move fft.py and signal.py to paddle/, fix typos * fix relative imports from fft.py and signal.py * fix typos * disable Cache when CUFFT_VERSION >= 10200 (#4) * move signal apis * move fft.py and signal.py to paddle/, fix typos * fix relative imports from fft.py and signal.py * fix typos * Add LRUCache for fft plans * add LRUCache for cuff and hipfft (#5) * move signal apis * move fft.py and signal.py to paddle/, fix typos * fix relative imports from fft.py and signal.py * fix typos * WIP: add cache * delete move constructor and operator= for CuFFTHandle and FFTConfig * remove log from CuFFTHandle and FFTConfig * add lrucache for fft rocm backend * disable LRUCache when CUFFT_VERSION >= 10200 * disbale copy and move for hipFFTHandle; format code Co-authored-by: Xiaoxu Chen <chenxx_id@163.com> * remove debug message of cufftHandler * roll_op: support Tensor as input for shifts (PaddlePaddle#36727) * fix fftshift/ifftshift on static mode * update roll_op version * add more test cases for fftshift/ifftshift Co-authored-by: zhiboniu <31800336+zhiboniu@users.noreply.github.com> Co-authored-by: chenfeiyu <chenfeiyu@baidu.com> Co-authored-by: LJQ❤️ <33169170+lijiaqi0612@users.noreply.github.com>

…addlePaddle#38275) * Replaced pten::LoD with paddle::framework::LoD * Overrided CPUVector with CUDAVector * Refactored paddle::framework::Vector

…addlePaddle#39087) * Renamed selected_rows.* -> selected_rows_utils.* * Added selected_rows and rw_lock to pten * Removed useless header * Renamed the unit test target to fix CI * Use pten::framework::DDim * Set selceted_rows_test properties timeout * Polish code to pten style Co-authored-by: Chen Weihang <chenweihang@baidu.com>

…rdFunctions and GradNodes (PaddlePaddle#40937) * [Refactor] refactored eager_gen.py PR #2 * [DoubleGrad PR #1] Decoupled code generation logics for Dygraph ForwardFunctions and GradNodes * Fixed minor issue

…enerateForwardDefinition (PaddlePaddle#41016) * [Refactor] refactored eager_gen.py PR #2 * [DoubleGrad PR #1] Decoupled code generation logics for Dygraph ForwardFunctions and GradNodes * Fixed minor issue * Adjusted logics of GenerateNodeCreationCodes and GenerateForwardDefinition * Fixed issues * Fixed minor issue

…Paddle#41051) * [Refactor] refactored eager_gen.py PR #2 * [DoubleGrad PR #1] Decoupled code generation logics for Dygraph ForwardFunctions and GradNodes * Fixed minor issue * Adjusted logics of GenerateNodeCreationCodes and GenerateForwardDefinition * Fixed issues * Supported higher-order grad node generation * [DoubleGrad PR #4] Supported higher-order GradNode generation * Fixed yaml typo

…e#41121) * [Refactor] refactored eager_gen.py PR #2 * [DoubleGrad PR #1] Decoupled code generation logics for Dygraph ForwardFunctions and GradNodes * Fixed minor issue * Adjusted logics of GenerateNodeCreationCodes and GenerateForwardDefinition * Fixed issues * Supported higher-order grad node generation * [DoubleGrad PR #4] Supported higher-order GradNode generation * [DoubleGrad #4] Bug Fixes to Double Grad Node Generation * Fixed yaml typo * Fixed yaml typo * fixed minor issues * Fixed minor issue

…sed to paddle.grad() (PaddlePaddle#41198) * [Refactor] refactored eager_gen.py PR #2 * [DoubleGrad PR #1] Decoupled code generation logics for Dygraph ForwardFunctions and GradNodes * Fixed minor issue * Adjusted logics of GenerateNodeCreationCodes and GenerateForwardDefinition * Fixed issues * Supported higher-order grad node generation * [DoubleGrad PR #4] Supported higher-order GradNode generation * [DoubleGrad #4] Bug Fixes to Double Grad Node Generation * Fixed yaml typo * Fixed yaml typo * fixed minor issues * [DoubleGrad PR #5] Enabled gradient computations for grad_tensors passed to paddle.grad() * Fixed minor issue * Fixed CI-Inference issue * Fixed CI-inference issues

…rd run (PaddlePaddle#41306) * [Refactor] refactored eager_gen.py PR #2 * [DoubleGrad PR #1] Decoupled code generation logics for Dygraph ForwardFunctions and GradNodes * Fixed minor issue * Adjusted logics of GenerateNodeCreationCodes and GenerateForwardDefinition * Fixed issues * Supported higher-order grad node generation * [DoubleGrad PR #4] Supported higher-order GradNode generation * [DoubleGrad #4] Bug Fixes to Double Grad Node Generation * Fixed yaml typo * Fixed yaml typo * fixed minor issues * [DoubleGrad PR #5] Enabled gradient computations for grad_tensors passed to paddle.grad() * Fixed minor issue * Fixed CI-Inference issue * Fixed CI-inference issues * [DoubleGrad PR #7] paddle.grad() to copy backward graph before backward run * Fixed minor issues * Fixed issue with backward graph construction logic * Fixed implementation issues with backward graph reconstruction * Fixed unittest issue * Fixed issues

…ePaddle#41387) * [Refactor] refactored eager_gen.py PR #2 * [DoubleGrad PR #1] Decoupled code generation logics for Dygraph ForwardFunctions and GradNodes * Fixed minor issue * Adjusted logics of GenerateNodeCreationCodes and GenerateForwardDefinition * Fixed issues * Supported higher-order grad node generation * [DoubleGrad PR #4] Supported higher-order GradNode generation * [DoubleGrad #4] Bug Fixes to Double Grad Node Generation * Fixed yaml typo * Fixed yaml typo * fixed minor issues * [DoubleGrad PR #5] Enabled gradient computations for grad_tensors passed to paddle.grad() * Fixed minor issue * Fixed CI-Inference issue * Fixed CI-inference issues * [DoubleGrad PR #7] paddle.grad() to copy backward graph before backward run * Fixed minor issues * Fixed issue with backward graph construction logic * Fixed implementation issues with backward graph reconstruction * Fixed unittest issue * Fixed issues * [DoubleGrad PR #8] Enabled triple grads for sigmoid and matmul * Fixed issues with phi kernel * Added triple grad test case * Fixed minor issue

Aurelius84 and others added 30 commits August 19, 2021 09:46

Abstract DeviceEvent to manage cross-platform Event implementation (#…

22da190

…34922) * add device_context * add gtest for device_event_gpu * Remvoe duplicate DeviceType * push for test * add unittest * fix macros * fix MSVC using usage

Fix op-benchmark cpu/gpu error (#34997)

c4e05e1

Fix Inference CI CPU/GPU (#34931)

26213a7

* notest;test=gpu-inference * notest;test=gpu-inference * notest;test=gpu-inference * notest;test=gpu-inference * fix error * notest;test=gpu-inference * notest;test=gpu-inference * notest;test=gpu-inference * test=gpu-inference

add the auto scan test for TensorRT convert,test=develop (#34980)

255fc7d

fix batch_norm and instance norm when input is [] (#34107)

ca7f520

* fix batch_norm and instance norm when input is []

Add dimension check for inverse to avoid dividing by 0 error when inp…

a2e0865

…ut's shape is [0, 0, 0]. (#34996)

add resnet50_quant model in PR-CI-INFERENCE (#35012)

97cae5e

* add slim resnet50 quant model in pr-ci-inference * enable resnet50_quant multi_thread4_trt_int8_bz1 * remove LOG(FATAL)

remove unused statements in test_dist_base.py (#35017)

ef024c8

Fix op-benchmark cpu/gpu; test=document_fix (#35027)

ed9a14e

fix reshape when is a number (#35016)

866c1ea

[NPU] Support npu kernel for sin op (#34844)

4641e8f

* add npu sin op * [NPU] Support npu kernel for sin op * modify support npu kernel for sin op * modify support npu kernel for sin op * modify nou sin op * modify npu sin op * add sin op npu

Add op benchmark run function log (#35034)

096b0f2

* Add run function log * test=document_fix

[bug fix] fix spectral_norm bug (#35005)

1aa2bde

add (N,C,*) input support for GroupNorm (#34773)

4637151

* add (N,C,*) input support for GroupNorm * --amend

temporary disable resnet50-quant multi-thread test (#35035)

f927b65

[NPU] Support npu op where and where grad (#34587)

d082955

* [NPU] Support npu op where and where grad * fix use const_cast * delete a test

[NPU] Support npu op depthwise_conv2d (#34853)

4c115a8

* add depthwise_conv2d npu * add some tests * Delete test_unique_op_npu.py * delete trans input

fix set_lod in data_feed (#35000)

4416c79

* add trainer desc config to distributed strategy * code style modified * data_feed set lod

use spin lock in auto growth allocator (#34910)

6bacfb0

* use spin lock in auto growth allocator, test=develop * use pthread spin lock, test=develop * use lock guard, test=develop * use malloc spin lock, test=develop * use lock_guard, test=develop

[NPU] Support npu kernel for pad3d op (#34815)

ef517a5

* [NPU] Support npu kernel for pad3d op * fix for comment of zhouwei25 * fix some bugs according to qili93's comments * add support and test for paddings in input * delete VLOG used for debug

[npu]Add argsort op (#34865)

99ffeff

* add rmsprop npu * add argsort npu * add argsort npu * modify according to review * modify sharedatawith according to review * modify reshape according to review * rm dygraph=false

fix model-benchmark build error (#35041)

f6015d0

[hybrid performance] Grad fuse for gradient merge under pipeline mode (…

4d9b2d6

…#35004)

Add paddle.linalg.matrix_power OP (#34667)

e2241a4

implementation of broadcast add backward by reduce (#34143)

56c5e21

Add cuda.device_count api (#34811)

cf99c0d

* Add cuda device count api * update coda format * fix unittest error * update code format * update comment

add adamw cuda kernel (#35020)

77a8a39

* adamw support cuda * adamw support cuda

set node feature (#34994)

c3efabe

Fix a bug of strided_slice op, about the axes parameter access memory…

aefec22

… out of bounds (#35062)

add fill_constant_batch_size_like npu op (#34563)

7d86737

pangyoki and others added 11 commits August 23, 2021 14:40

add beam_search_decode npu op (#34967)

4ce272e

Revert "use spin lock in auto growth allocator (#34910)" (#35069)

97fef01

This reverts commit 6bacfb0.

Support gettiem by Bool index (#35026)

b6dc16c

* Support getitem by Bool index * delete some debug info of bool index * support the case that the shape of bool index is different from indexed tensor

[hybrid performance] optim the grad fuse for pipeline mode by sorting…

fad4b3b

… the grad by dtype (#35070)

support infer_ut on windows nightly build (#35049)

4f86aae

* enable infer_ut on windows * remove lib calculation & time * unset http_proxy when download bos file on windows

upgrade oneDNN to v2.3.2 (#35040)

a047c13

[oneDNN] disable caching for interpolate and batch Norm (#35030)

673bf71

* - disabled interpolate onednn * - compilation fix * - draft of batch_norm cache disabling * - fixes to UT

remove old data check (#35077)

5b814fd

fix model-benchmark build error (#35081)

a95db6a

trt convert ut add dynamic_shape and int8, etc. (#35061)

17188e8

esythan merged commit 69c797a into esythan:develop Aug 23, 2021

esythan pushed a commit that referenced this pull request Sep 30, 2021

Merge pull request #2 from seemingwang/accessor_merge

91c7536

Accessor merge

esythan pushed a commit that referenced this pull request Nov 30, 2021

Added fluid dependencies to Eager Dygraph #2 (PaddlePaddle#37556)

471fa1e

esythan pushed a commit that referenced this pull request Nov 30, 2021

Added Eager Dygraph AutoCodeGen dependencies #2 (PaddlePaddle#37575)

e7bda1d

esythan pushed a commit that referenced this pull request Mar 28, 2022

[Refactor] refactored eager_gen.py PR #2 (PaddlePaddle#40907)

f027b2a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

develop #2

develop #2

esythan commented Aug 23, 2021

develop #2

develop #2

Conversation

esythan commented Aug 23, 2021

PR types

PR changes

Describe