Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sync master #39

Merged
merged 6,326 commits into from
May 19, 2022
Merged

sync master #39

merged 6,326 commits into from
May 19, 2022

Conversation

qingshui
Copy link
Owner

PR types

PR changes

Describe

tiancaishaonvjituizi and others added 30 commits April 27, 2022 15:19
* extract sub-graph

* graph-engine merging

* fix

* fix

* fix heter-ps config

* test performance

* test performance

* test performance

* test

* test

* update bfs

* change cmake

* test

* test gpu speed

* gpu_graph_engine optimization

* add dsm sample method

* add graph_neighbor_sample_v2

* Add graph_neighbor_sample_v2

* fix for loop

* add cpu sample interface

* fix kernel judgement

* add ssd layer to graph_engine

* fix allocation

* fix syntax error

* fix syntax error

* fix pscore class

* fix

* change index settings

* recover test

* recover test

* fix spelling

* recover

* fix

* move cudamemcpy after cuda stream sync

* fix linking problem

* remove comment

* add cpu test

* test

* add cpu test

* change comment

* combine feature table and graph table

* test

* test

* pybind

* test

* test

* test

* test

* pybind

* pybind

* fix cmake

* pybind

* fix

* fix

* add pybind

* add pybind

* optimize pybind

* test

* fix pybind

* fix

* pybind change

* remove file

Co-authored-by: DesmonDay <908660116@qq.com>
* optimize performance of dygraph

* optimize performance of dygraph and elementwise_add

* optimize the trace op

* fix bug

* fix bug

* fix unittest bug

* fix code format
* Fix the race condition in cumsum operator

* Optimize cumsum operator
* added test for shuffle_channel_mkldnn_detect_pass

* added UT using new framework

* CI fix
* fix collections.Sequence in python3.10

* fix format
…ElemwiseGradBroadcast (#42320)

* set device id of Place() to get GPUContext needed by LimitGridDim in ElemwiseGradBroadcast

* fix code style
…llcases (#42285)

* test op_test test=allcases

* fix

* avoid copy many same file

* fix for win

* test PYTHONPATH

* change path adding way

* fix win

* use old way

* use old way test=allcase

* use old way test=allcase
* [KP] fix bug when phi kernel is *_raw

* modify the static graph

* delete useless comment

* delete the phi multiply kernel case

* add VLOG(3) message

* add VLOG(3) message

* fix static graph error in phi

* fix bug in tranform model

* modify the comment

* delete useless code

* fix CI bug

* fix CI bug
* fix PIL sample mode deprecated warning

* compatible with old pil version
* add gradient merge for DistributedFusedLamb

* use master acc gradient

* fix CI ut

* polish

* remove math_function_impl.h change

* fix test_update_loss_scaling_op.py

* try to fix XPU/NPU CI

* add gm ut
* Refactor Quantization

* Refactor Dequantization

* Classy solution

* Style I

* Style II

* Style III

* Use VLOG(4) for debug info

* Style IV
* Suppport more scenes for fused_fast_ln

* fix
* opt attr eaque perf

* opt attr select code

* fix one hot infermeta

* polish get attr impl

* fix tests failed

* add testcases
* back fl

* delete ssl cert

* .

* make warning

* .

* unittest paral degree

* solve unittest

* heter & multi cloud commm ready

* .

* .

* arm_brpc compile

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* only output is ok

* base is ok

* .

* .

* .

* .

* .

* .

* .

* .

* add switch server bin

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* adapt brpc ssl

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* .

* fix heter_server & heter_client

* .

* .

* int->int64_t

* .

* safe map in multithread

* fix heter unitest

* .

* fix code_style

* .
#42273)

* [Dy2Stat]Fix losting pre/post hook from outermost layer while jit.save

* fix kwargs

* fix unittest
* add double yaml

* add inline func
betterpig and others added 24 commits May 18, 2022 11:15
* set scipy and numpy version suit for py3.6

* pynacl1.5.0 which is needed by PyGithub built failed in python36, change it to 1.4.0 also works;test=document_fix;test=windows_ci

* np.corrcoef support parameter since 1.20

* delete test code
* fix device_free

* fix hang
* fix summary trainable_params bug
* [NPU] add take_along_axis and take_along_axis_grad ops

* [NPU] add take_along_axis and take_along_axis_grad ops

* fix ut because cpu kernel can not be fallbacked
* matmul refactor

* remove UT which only check ENFORCE output

* code format

* improve memory usage
* [Eager] Polish eager code generation

* Remove useless code in codegen
…ion mechanism (#41919)

* Updated triple_grad_check func

* add todo for gradient checker and refine some comments

* remove additional code

* add test for warnging in backward.py

* format python code

* support multi input in triple gradient checker

* Add matmul triple grad kernel

* Updated comments of TODO

* Supported some special tests

* Change code-format to follow CI std

* Updated gradient_checker.py

* Fix conflicts

* Removed unnecessary printing log

* Change code style to follow CI std

* merge upstream

* add priops.py

* add_p

* rm useless files

* add sub_p mul_p div_p

* add sqrt_p and tanh_p

* add reshape_p

* add broadcast_p

* Add python primitive wrappers.

* Jvp rules updated.

* JVP rules done for all the 17 primops.

* quick check and fixes.

* add jvp(op, *args)

* add broadcast_p fill_constant_p matmul_p reduce_p reshape_p transpose_p

* add split_p and concat_p

* add gather_p and scatter_add_p

* add slice_select_p and slice_assign_p

* Add transpose rules.

* add multi input check for add_p, sub_p, mul_p, div_p

* update concat_p

* Linearize and transpose in progress..

* refine gather_p and scatter_add_p

* updated.

* update transpose.

* refine slice_assign_p and slice_select_p

* init commit for lower

* Merged with primitive ops.

* small update

* add rules for orig2prim and prim2orig

* add 9 test for prim ops

* add more test and fix some bug

* add more test

* register proto

* Adding primops test.

* add shape valid check for broadcast_p op, and add keepdim attr into reduce_p op proto

* support multi input and multi output for split_p and concat_p

* Test updated.

* update

* fix slice bug for slice_select_p and slice_assign_p

* updated.

* Ops updated.

* Refactor and bug fixes.

* updated.

* finish orig2prim and prim2orig rules

* dtype for axis attr should be long int

* update dtype for axis attr int64_t

* update for iscan CI

* Update primx.

* Refactor vars in primx.

* update for lower transform

* add more shape and dtype check

* update primx.py

* change IndexTensor into int32 dtype

* update

* Fix linearize and transpose.

* Update is_dot

* Update is_dot

* Update is_dot

* add gradient aggregation, fix add_transpose.

* pass first linearize+transpose test.

* update test

* refactor op registration and primx.

* update rule for slice_assign

* try test lower

* update orig2prim and prim2orig

* pass simple lower pass

* update

* Update input types in the unit test.

* orig2prim segfault.

* 50% for adam.minimize

* test updated.

* temp fix erros in removing vars.

* primx updated.

* update for matmul_v2 and reshape2 orig2prim

* update for minimize

* Refine primrules

* Remove some code

* supporting unused and unreachable vars.

* update for use prim2orig in minimize

* fix gather and scatter_add transpose.

* Add rules UT

* update scatter_add

* Refine UT code

* fix nonetype check in topo

* Update gather_p pywrapper.

* remove useless print

* Merge tongxin PR and refine code

* readd some test

* rm useless print

* polish code.

* fix bug in minimize

* add get_input_var_list and get_output_var_list and use it in lower

* Fix scatter_add_p prim2orig

* Update code and fix orig2prim/prim2orig UT

* delete vars after block.desc._remove

* Improve ops and vars clean up logics.

* fix some bug in linearize and lower

* update tanh transpose.

* use set instead of list for var2remove

* test updated.

* polish code.

* fix dot2bar delete.

* merge tx/ad

* add indextensor_dot for gather and scatter_add

* add sorted for set

* Fix scale_orig2prim params

* fix some syntax bug

* add golbal_lower_update list

* Better handling of unused vars.

* update tests.

* Fix elementwise_sub orig2prim

* support none for transpose rule

* Merge and add transform UT

* fix a bug in transpose

* Fix transpose and UT

* a hacky fix for cancat op

* Fix exector place

* Refine variable name

* Add elementwise_mul orig2prim and support p_norm when p=1

* Add sqrt orig2prim rule and UT

* merge wz test

* rename files, add enable_prim, disable_prim, prim_enabled, delete global_lower_update

* fix a bug in test_ad_transform_trans

* revert modify in framework.py

* add paddle.fluid.incubate.ad_transform to  python/setup.py.in

* Fix remove vars error

* Fix p_norm_orig2prim

* merge wz

* Modify the code directory

* Add utils.py and remove get_input/output_vars functions

* Update maolin code

* Rename UT and refine test_ad_transform_primops

* Fix div_p jvp rule

* Add higher derivatives UT

* Remove UT to autograd dir

* Fix comments

* import paddle in primops.py

* Add some error message for assert

* Refine UT class name and refine some comments in primreg.py

* update minimize of paddle/optimizer for supporting new autograd

* resolve cicular importing between backward.py and optimizer.py

* fill gradients and minimize unittest

* Replace `assert isinstance` with `raise TypeError`

* Add some assert message for primx.py

* Polish variable name

* Add some assert message

* add some docstring

* refine some name

* update the format of english documents

* Split test_transform.py to two files to avoid ci error

* fix the document format of enable_prim/disable_prim/prim2orig/prim_enabled

* polish test_gradients_and_minimize

* add default value for prim_enabled api doc

* Remove some UT to avoid windows ci error

* Enlarge test_gradients_and_minimize limit time

* Fix ut limit time

Co-authored-by: veyron95 <veyron_wu@163.com>
Co-authored-by: Jiabin Yang <360788950@qq.com>
Co-authored-by: levi131 <limaolin01@baidu.com>
Co-authored-by: Tongxin Bai <waffle.bai@gmail.com>
Co-authored-by: Xiaoxu Chen <chenxx_id@163.com>
Co-authored-by: levi131 <83750468+levi131@users.noreply.github.com>
…ctions (#41772)

Add Code Generation for operators,  op makers and argument mapping functions (#41772)
…ic directory (#42842)

* [Dy2Stat]Modify all jit.save path into tempfile

* [Dy2Stat]Modify all jit.save path into tempfile
* slice data in dist_loader & flag to scale grad

* bug fix

* update unittest

* enable static
* update readme test=document_fix

* fix api docs;test=document_fix

* update logic.py;test=document_fix

* update docs;test=document_fix
* support yolov5s static/int8

* fix eltwise_sub and div weight compute

* fix delete_fill_constant_pass
* auto parallel support primitive op with data parallel

* add primitive change

* 5 loss 3D cylinder acc aligned

* add unitest
* enable graph-engine to return all id

* change vector's dimension

* change vector's dimension

* enlarge returned ids dimensions

* add actual_val

* change vlog

* fix bug

* bug fix

* bug fix

* fix display test

* singleton of gpu_graph_wrapper

* change sample result's structure to fit training

* recover sample code

* fix

* secondary sample

* add graph partition

* fix pybind

* optimize buffer allocation

* fix node transfer problem

* remove log

* support 32G+ graph on single gpu

* remove logs

* fix

* fix

* fix cpu query

* display info

* remove log

* remove empyt file

* distribute labeled data evenly in graph engine

Co-authored-by: DesmonDay <908660116@qq.com>
…1093)

* refine enforce code

* refine enforce code

* fix compile failed

* fix infrt failed
* remove shared_storage

* fix bug

* fix rnn bug
* change the output format of C++ backward api

* fix merge conflict

* fix sparse api code auto-gen

* fix eager_gen bug

* fix bug of output is null

* fix bug of conv2d_grad_impl

* fix optional grad

* fix bug of eager-gen double_grad

* fix bug

* fix multiply_double_grad bug

* fix bug of higher order derivative

* fix bug of FillZeroForEmptyGradInput

* remove redundant vector in grad_node

* fix bug of test_deformable_conv_v1_op

* fix bug of test_deformable_conv_v1_op

* some refacotr
* run all demo ci before exit;test=document_fix;test=windows_ci_inference

* fix bug;test=document_fix;test=windows_ci_inference

* improve log

* commetn test code

* modify according to zhouwei's comments
@qingshui qingshui merged commit 09dac41 into qingshui:develop May 19, 2022
qingshui pushed a commit that referenced this pull request Jun 29, 2022
* fix adam with multi dim; test=develop
qingshui pushed a commit that referenced this pull request Feb 14, 2023
…addle#46342)

* add extra attr property set

* add type_info for all context

* add onednn context to all context

* fix context compile error

* simplify conv kernel args

* pass runtime attr into dev_ctx

* fix marco error

* clear conv_grad_kernel extra args

* merge conv_grad_grad into conv_grad

* clear conv2d_grad_grad extra attrs

* clear yaml and eager extra attr

* fix conv1d error

* change to thread local

* fix npu compile failed

* try to fix windows compile failed

* add conv2d onednn phi kernel

* fix ci bugs (#36)

* fix compile bugs (#38)

* fix extra input transform bug (#39)

* support dynamic created attr (#40)

* reset extra info gen code

* rm conv_grad_grad kernel

* reimpl pass attr adapting

* add int attr support

* remove vector inputnames creating

* fix map at error

* Update paddle/phi/kernels/onednn/conv_grad_kernel.cc

Co-authored-by: Sławomir Siwek <slawomir.siwek@intel.com>

* remove useless extra attrs

* replace mkldnn_engine by onednn_engine

Co-authored-by: YuanRisheng <yuanrisheng@baidu.com>
Co-authored-by: Sławomir Siwek <slawomir.siwek@intel.com>
qingshui pushed a commit that referenced this pull request Feb 14, 2023
* add extra attr property set

* add type_info for all context

* add onednn context to all context

* fix context compile error

* simplify conv kernel args

* pass runtime attr into dev_ctx

* fix marco error

* clear conv_grad_kernel extra args

* merge conv_grad_grad into conv_grad

* clear conv2d_grad_grad extra attrs

* remove redundant imports

* migrate softmax

* clear yaml and eager extra attr

* fix conv1d error

* change to thread local

* fix npu compile failed

* try to fix windows compile failed

* add conv2d onednn phi kernel

* fix ci bugs (#36)

* fix compile bugs (#38)

* fix extra input transform bug (#39)

* support dynamic created attr (#40)

* reset extra info gen code

* rm conv_grad_grad kernel

* reimpl pass attr adapting

* add int attr support

* remove vector inputnames creating

* merge dev

* fix map at error

* adjust attribute

* adapt funcs to PHI

Co-authored-by: Chen Weihang <chenweihang@baidu.com>
Co-authored-by: YuanRisheng <yuanrisheng@baidu.com>
qingshui pushed a commit that referenced this pull request Feb 14, 2023
* add extra attr property set

* add type_info for all context

* add onednn context to all context

* fix context compile error

* simplify conv kernel args

* pass runtime attr into dev_ctx

* fix marco error

* clear conv_grad_kernel extra args

* merge conv_grad_grad into conv_grad

* clear conv2d_grad_grad extra attrs

* remove redundant imports

* migrate softmax

* clear yaml and eager extra attr

* fix conv1d error

* change to thread local

* fix npu compile failed

* try to fix windows compile failed

* add conv2d onednn phi kernel

* fix ci bugs (#36)

* fix compile bugs (#38)

* fix extra input transform bug (#39)

* support dynamic created attr (#40)

* reset extra info gen code

* rm conv_grad_grad kernel

* reimpl pass attr adapting

* add int attr support

* remove vector inputnames creating

* merge dev

* fix map at error

* adjust attribute

* adapt funcs to PHI

* init

* adjust imports

* support postops

* format codeblocks

* revert changes to softmax

Co-authored-by: Chen Weihang <chenweihang@baidu.com>
Co-authored-by: YuanRisheng <yuanrisheng@baidu.com>
qingshui pushed a commit that referenced this pull request Feb 14, 2023
* add extra attr property set

* add type_info for all context

* add onednn context to all context

* fix context compile error

* simplify conv kernel args

* pass runtime attr into dev_ctx

* fix marco error

* clear conv_grad_kernel extra args

* merge conv_grad_grad into conv_grad

* clear conv2d_grad_grad extra attrs

* clear yaml and eager extra attr

* fix conv1d error

* change to thread local

* fix npu compile failed

* try to fix windows compile failed

* add conv2d onednn phi kernel

* fix ci bugs (#36)

* fix compile bugs (#38)

* fix extra input transform bug (#39)

* support dynamic created attr (#40)

* reset extra info gen code

* rm conv_grad_grad kernel

* reimpl pass attr adapting

* add int attr support

* remove vector inputnames creating

* fix map at error

* Update paddle/phi/kernels/onednn/conv_grad_kernel.cc

Co-authored-by: Sławomir Siwek <slawomir.siwek@intel.com>

* remove useless extra attrs

* replace mkldnn_engine by onednn_engine

* Migrate pool+grad to PHI

* Update paddle/fluid/operators/mkldnn/test_mkldnn_op_nhwc.cc

Co-authored-by: Sławomir Siwek <slawomir.siwek@intel.com>

* Update paddle/phi/kernels/onednn/pool_grad_kernel.cc

Co-authored-by: Sławomir Siwek <slawomir.siwek@intel.com>

* Update paddle/phi/kernels/onednn/pool_kernel.cc

Co-authored-by: Sławomir Siwek <slawomir.siwek@intel.com>

Co-authored-by: Chen Weihang <chenweihang@baidu.com>
Co-authored-by: YuanRisheng <yuanrisheng@baidu.com>
Co-authored-by: Chen Weihang <chenwhpro@163.com>
Co-authored-by: Sławomir Siwek <slawomir.siwek@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.