Skip to content

Commit

Permalink
Dev nd nccl send recv boxing (#8467)
Browse files Browse the repository at this point in the history
* nd nccl_send_recv_boxing

* rm print

* support num_axes > 2

* Add distributed optional run (#8372)

* Add

* change deps

* add install

* add skip

* autoprof supports bandwidth (#8367)

* autoprof supports bandwidth

Signed-off-by: daquexian <daquexian566@gmail.com>

* print bandwidth

Signed-off-by: daquexian <daquexian566@gmail.com>

* auto format by CI

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>

* remove tmp buffer of cumprod cpu backward kernel (#8369)

* remove tmp buffer of cumprod cpu backward kernel

* refine

* refine

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* Move tensor api to cpython part3 (#8342)

* add tensor_functions

* concat py methods

* add hash, restore tensor.py

* check replacement

* refine code, remove commented tensor.py

* refine code

* move some api

* add cpu and cuda api

* add triu tril norm and etc.

* remove tensor_functions.h

* move more api

* move more api, refine size

* fix typo

* format code, remove useless include

* refine code

* refine code, fix typo

* align .cuda to python

* refine code

* split some api to part3 for review

* remove positional only arguments of argmax and argmin

* remove arguments parse

* modify arguments name in matmul and floor_divide

* rename BINARY_FUNC to DIRECT_PASS_FUNC, modify some functions

* refine code, format code

* add inplace /=, add comments

* remove name in macros

* remove python api

* remove redundant include

* remove cout

* format code

* refactor tensor.size by directly call shape.at, refactor tensor.sub_ by calling nb_sub_

* remove redundant code

* auto format by CI

* fix typo, fix wrong call

* modify idx datatype from int32 to int64 in tensor.size

* add some DIRECT_PASS_FUNC

* add cpu cuda var pow and etc.

* add masked_fill any all

* make REDUCE_FUNC macro, add reduce_* functions

* add 0dim check in ReduceSumWhole, refine yaml

* fix bug

* restore add add_ sub sub_

* add unittest for tensor.half tensor.add tensor.add_

* refine code

* refine code

* fix typo

* fix bug of tensor.std()

* refactor var std and cuda, using c++ functional api

* add beta and threshold in softplus

* auto format by CI

Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* Add nn_functor Check (#7910)

* add bias_add_check

* add bias_add error test

* fix conv2d nhwc bias_add error

* add nhwc conv test

* add bias_add_error test

* Add bias add error check

* Rename

* add batch matmul error check

* add matmul check error msg

* remove annotation

* add fused mlp error msg check

* Add pixel shuffle check test

* add more test until normalization add relu functor

* refine error message

* finish all nnfunctor check msg

* handle type error

* remove useless symbol

* modify back to TypeError

* fix all comment

* Remove redundant code

* Remove pad ndim check

* fix bias add space

* fix check logic cause ci gpu not always gpu:0

Co-authored-by: hjchen2 <chenhoujiangcug@gmail.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* Add FusedMatmulBiasAddReluDropout [OneEmbedding] (#8222)

* previous version for fused_matmul_bias_add_relu_dropout

* add op infer

* fix detail

* finish forward

* support dropout rate list

* add forward test

* fix bug for output buffer

* Configurable alpha params

* try to add bit mask logic

* Add bitmask first version!

* Add row col bitmask logic

* support not align4 reludropout

* simplify relu dropout ld logic

* Add naive relu dropout grad kernel

* add simple relu dropout grad kernel

* Rename

* support relu_dropout bitmask backward

* add vectorized optimization

* fix tmp buffer

* add to amp list

* add lazy backward logic

* Refine kernel

* add indextype dispatch

* simplify functor logic

* fix cublas fused mlp aux_ld shape bug

* Add more relu dropout kernel

* add full unittest

* fix bug in skip final activation

* refine

* Remove dump func

* fix format

* Remove cmake

* remove redundant divide

* add padded version

* fix dropout

* oneflow curand

* refine

* remove redundant kernel

* add unroll logic

* add unroll and ballot sync

* refine format

* Remove fast curand

* Refine python interface

* Add if branch for memset

* fix python logic

* just for debug

* not use matmul bias add grad

* add launch 1 block limit

* fix unittest

* Refine

* fix graph backward bug

* limit to 11060

* change to use int32_t dtype for cublas aux

* Fix jc comment

* fix comment

* fix convert

* fix static_analysis

* fix at

* fix userops td

* fix userops td

* fix const ref

* fix compile error for bfloat16

* limit to 11060

* fix bug

Co-authored-by: Juncheng <liujuncheng1022@gmail.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* fix gather 0-dim tensor bug (#8376)

* fix 0-dim tensor bug

* refine

* support input 0-dim tensor for gather

* refine

* refine

* refine dim_scatter_kernel check

* refine

* refine check

* fix clang_tidy error

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* add api to apply external job pass (#8370)

* Add condition to find-test-cache-distributed (#8387)

* add condition to find-test-cache-distributed

* fix

* warp dim util (#8382)

* warp dim util

* format

* use more maybe_wrap_dim

* refine array functor

* add more

* refine math_functor

* fix_bug_in_broadcast_min_max_grad_and_broadcast_like (#8379)

* fix_bug_in_broadcast_min_max_grad_and_broadcast_like

* refine

* fix static check error

* fix bug about index (#8388)

* fix bug about index

* add test case

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* LogicalSliceAssign support full slice sbp (#8344)

* feat(SliceOp): slice ops support 2d sbp

* fix(SliceOp): fix [B, P] 2d sbp bug

* refine error message

* fix bug in parallel_num == 1

* add comment

* add warning and format

* add NOLINT for boxing check

* feat(LogicalSliceOps): support all nd_sbp

* feat(LogicalSlice): support nd_sbp

* add error message

* fix(AutoTest): fix auto_test bug in module.parameter pass

* auto format by CI

* fix(LogicalSliceAssign): skip test when 1n1d

* fix SliceParams memset error

* remove memset

* add CHECK_JUST

* fix(*): make sure split_axis >= 0 or equal to SPLIT_AXIS_FOR_NON_SPLIT

* remove memset

* fix spilit_info.axis bug

* feat(LogicalSliceOps): support grad

* add logical_slice gradient_funcs

* feat(LogicalSliceAssign): LogicalSliceAssign support full slice sbp

* auto format by CI

* test(LogicalSlice): fix logical_slice dims

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Co-authored-by: Houjiang Chen <chenhoujiangcug@gmail.com>
Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>

* fix_tensor_from_numpy_mem_leak_bug (#8391)

* fix_tensor_from_numpy_mem_leak_bug

* add note

* refine note

* refine

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* Make of_pyext_obj static only to make sure only a python ext so has python symbols (#8393)

* make of_pyext_obj static only

* refine note

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* Adjust tolerance setting in embedding_renorm unit test (#8394)

* support front end compile for job to iree (#8249)

* support frontend dev version

* polish name

* add tosa-to-elf.mlir

* tosa to elf by llvm

* conv2d partial

* an enhanced frontend runner

* support numpy as input

* enable multiple using nn graph with different input(jobname make it  it cd /home/yuhao/frontend/oneflow ; /usr/bin/env /usr/bin/python3 /home/yuhao/.vscode-server/extensions/ms-python.python-2022.6.2/pythonFiles/lib/python/debugpy/launcher 40873 -- /home/yuhao/frontend/oneflow/oneflow/ir/test/Frontend/runner.py )

* enable multiple input

* enable cpu and cuda

* change full_name to _full_name

* support exchange cuda with cpu seamlessly

* remove pip

* lit config

* polish

* trim

* auto format by CI

* modify

* auto format by CI

* last line polish

* use unittest

* auto format by CI

* use allclose

* auto format by CI

* pulish

* optimize convert oneflow to tosa

* conv2d

* conv2d enhanced && conv2d examples add

* add road map

* add add_n2Op and boardcast_addOp conversion

* add matmulOp conversion

* support converting normailzation op to tosa(partically)

* update roadmap

* support i64 tensor to dense elem attr

* support 100% resnet op conversion

* add test mlir

* add test iree resnet python script

* auto format by CI

* done

* enhance iree resnet test script

* auto format by CI

* rebuild code

* auto format by CI

* rebuild test script

* update

* auto format by CI

* pub

* trim test scripts

* move

* move

* input and output add block arg judgement

* emit error in variable conversion

* error handle for ci

* modify err info

* auto format by CI

* merge

* auto format by CI

* output not block

* flow ones

* rm const

* trim maybe

* trim maybe with header file

* const auto

* solve clangd error

Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* Feat/zero mix with mp (#8036)

* add zero limit

* add debug

* add mix zero test

* refactor zero api

* zero test with mp

* add 2d test

* add zero nd

* add nd zero

* add sbp cast

* test passed soft limit consumer

* refine size api

* zero use stage 2

* add limit consumer api

* add new api

* refine zero s select

* fix index out of range

* rm zero limit on device type

* zero test with activation checkpointing

* add indentity when dp sequence len is 1

* move to base with master

* fix

* fix

* fix

* add test

* debug bad case

* refine test for eager and graph boxing

* test case ready

* simplify

* refine test

* fix buff size

* fix conflict

* refine zero nd

* refine

* add full test

* revert change

* refine split check

* fix typo

* rm log

* spit long func

* restore test

* Update optimizer_placement_optimization_pass.cpp

* auto format by CI

* auto format by CI

* fix static check

* add tips for zero api change

* auto format by CI

Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* Revert embedding normal path and fix amp list (#8374)

* revert embedding normal path, fix amp list

* fix amp

* fix memset bug in gather cpu kernel

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* replace fixed_vector with small_vector and make Shape inherit from it (#8365)

* Replace fixed_vector with llvm::SmallVector

Signed-off-by: daquexian <daquexian566@gmail.com>

* Shape inherited from llvm::SmallVector

Signed-off-by: daquexian <daquexian566@gmail.com>

* refine cmake

Signed-off-by: daquexian <daquexian566@gmail.com>

* rename fixed_vector to small_vector

Signed-off-by: daquexian <daquexian566@gmail.com>

* fix reviews

Signed-off-by: daquexian <daquexian566@gmail.com>

* auto format by CI

* update Shape constructor

Signed-off-by: daquexian <daquexian566@gmail.com>

* add 'PUBLIC' keyword to all target_link_libraries

Signed-off-by: daquexian <daquexian566@gmail.com>

* auto format by CI

* update cmake

Signed-off-by: daquexian <daquexian566@gmail.com>

* auto format by CI

* update cmake

Signed-off-by: daquexian <daquexian566@gmail.com>

* update cmake

Signed-off-by: daquexian <daquexian566@gmail.com>

* auto format by CI

* set is_initialized_ default to true

Signed-off-by: daquexian <daquexian566@gmail.com>

* override some methods to set is_initialized_

Signed-off-by: daquexian <daquexian566@gmail.com>

* auto format by CI

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>

* Light plan for debug (#8396)

* Light plan for debug

* fix note

* disable terminfo to fix missing terminfo symbols (#8400)

* disable terminfo to fix missing terminfo symbols

Signed-off-by: daquexian <daquexian566@gmail.com>

* auto format by CI

Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* fix bug of ZeRO MP in complex case (#8404)

* Remove redundant output_lbns in ir (#8409)

* mv case

* remove redundant info

* Dev FusedCrossInteraction[OneEmbedding] (#8335)

* add simple fused cross interaction forward

* add packed fused

* Add cross interaction grad

* simplify code

* fix bug

* support crossnet v2

* support cross interaction v2

* add lazy backward

* Rename and add test

* fix jc comment

* fix comment

* fix bug

* fix userops td elem_cnt for FUSED Group

* fix header file

* fix clang static analysis

* fix unittest

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* add exe graph physical shape check msg (#8002)

* fix index select op in graph

* add exe graph physical shape check msg

* improve the debug information for the python stack trace

1. add a parameter 'max_stack_depth' to specify the max depth for the stack trace
2. refactor other debug related classes.

* remove parens

* update

* resolve PR comments

* update

* update graph debug test file.

* restore self._debug in class Graph and class ModuleBlock

* Do not shorten the stack frame string if it is in debug mode

* delete TODOs

* disable conv3d test (#7969)

Signed-off-by: daquexian <daquexian566@gmail.com>

* skip layernorm random_data_warp test (#7941)

* skip layernorm random_data_warp test

* warp/block/uncached case only test gpu

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* Lock click version (#7967)

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* add global avgpool unittest (#7585)

* fix (#7978)

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* Support negative dim in scatter op (#7934)

* support negative dim in scatter op

* refine scatter test

* refine scatter test again

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* run barrier callback in BarrierPhyInstrOperand::~BarrierPhyInstrOperand (#7702)

* run barrier callback in BarrierPhyInstrOperand::~BarrierPhyInstrOperand

* lock gil in vm Callback thread

* more comments for VirtualMachineEngine::Callback()

* the Env is never destroyed.

* export Env into python

* more unittests

* wait shared_ptr.use_count() == 0

* export unittest.TestCase in framework/unittest.py

* SwitchToShuttingDownPhase

* optional is_normal_exit

* VirtualMachine::CloseVMThreads

* Delete env_api.h

env_api.h is deleted by master

* reshape_only_one_dim_infered

* address pr comments

* fix a ref-cnt bug in TryRunBarrierInstruction.

* rollback flow.env.all_device_placement

* no distributed running test_shutting_down.py

* auto format by CI

* expand lifetime of module oneflow in test_shutting_down.py

* refine del depend on of

* capture oneflow._oneflow_internal.eager when calling sync in __del__

* add try in flaky test

Co-authored-by: Luyang <flowingsun007@163.com>
Co-authored-by: chengtbf <472491134@qq.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: Xiaoyu Xu <xiaoyulink@gmail.com>

* Fix one hot scalar tensor bug (#7975)

* fix reduce_sum scalar check bug

* fix one_hot scalar tensor bug

* fix clang tidy error

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* support ctor np array from of tensor (#7970)

* support ctor np array from of tensor

* add test case constructing np array from tensor

* refine

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* add_manual_seed_all_api (#7957)

* add_manual_seed_all_api

* Update conf.py

* refine

* add test case

* auto format by CI

* Update random_generator.cpp

* auto format by CI

Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* one_embedding add doc string (#7902)

* add doc string

* add example

* add

* fix doc

* refine

* address review

* mb to MB

* add make_table_option

* option to options

* refine

* add forward

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* Support numpy scalar parameters (#7935)

* feat(functional): support numpy scalar parameters

* rename inferface

* feat(*): TensorIndex support numpy scalar

* feat(TensorIndex): support advance indexing

* add unittest and int32 support for branch feat-param_support_np_scalar (#7939)

* add unittest

* refactor unittest

* add todo for int16 advanced indexing

* add int32 supporting for advance indexing

* auto format by CI

Co-authored-by: Wang Yi <53533850+marigoold@users.noreply.github.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>

* fix tensor_scatter_nd_update (#7953)

* fix tensor_scatter_nd_update

* auto backward

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* fix one_embedding adam (#7974)

* fix one_embedding adam

* fix tidy

* fix normal

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* speed test with score (#7990)

Signed-off-by: daquexian <daquexian566@gmail.com>

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* Feat/graph del by ref (#7857)

* remove IsMultiClient() and single client logic

Signed-off-by: daquexian <daquexian566@gmail.com>

* rename eager.multi_client to eager

Signed-off-by: daquexian <daquexian566@gmail.com>

* auto format by CI

* add py ref

* refine new session

* clean code

* make scope api inner use

* use session with ref cnt

* run barrier callback in BarrierPhyInstrOperand::~BarrierPhyInstrOperand

* test pass

* lock gil in vm Callback thread

* more comments for VirtualMachineEngine::Callback()

* merge

* merge rm single client

* rm initenv

* merge and fix master

* refactor env c api

* add debug code

* fix and serving test pass

* test passed

* rm useless

* rm useless code

* format

* rm useless include

* rm sync in py

* the Env is never destroyed.

* export Env into python

* more unittests

* fix and pass tests

* revert virtual_machine.cpp

* revert core/vm

* remove outdated python class oneflow.unittest.TestCase

* graph test passed

* wait shared_ptr.use_count() == 0

* export unittest.TestCase in framework/unittest.py

* SwitchToShuttingDownPhase

* optional is_normal_exit

* VirtualMachine::CloseVMThreads

* Delete env_api.h

env_api.h is deleted by master

* address pr comments

* rm is env init

* Clear empty thread when graph destroy (#7633)

* Revert "Clear empty thread when graph destroy (#7633)" (#7860)

This reverts commit 3e8585e.

* fix a ref-cnt bug in TryRunBarrierInstruction.

* rm env_api

* fix clang-tidy error

* fix clang-tidy in env_imp

* refine env api

* format

* refine graph del and sync at shuttingdown

* fix typo

* add comment

* rm useless

* rm useless

Co-authored-by: daquexian <daquexian566@gmail.com>
Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: lixinqi <lixinqi0703106@163.com>
Co-authored-by: Li Xinqi <lixinqi2010@gmail.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Co-authored-by: Luyang <flowingsun007@163.com>
Co-authored-by: cheng cheng <472491134@qq.com>

* [PersistentTable] Fix num blocks (#7986)

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* Add auto benchmark for flowvision (#7806)

* update yml

* update workflow

* add resnet50

* [PersistentTable] Async write (#7946)

* [PersistentTable] Async write

* fix

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* save log in separate dir by default (#7825)

Signed-off-by: daquexian <daquexian566@gmail.com>

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* fix index select op in graph

* add exe graph physical shape check msg

* improve the debug information for the python stack trace

1. add a parameter 'max_stack_depth' to specify the max depth for the stack trace
2. refactor other debug related classes.

* remove parens

* update

* resolve PR comments

* update

* update graph debug test file.

* restore self._debug in class Graph and class ModuleBlock

* Do not shorten the stack frame string if it is in debug mode

* delete TODOs

* Revert "Merge branch 'master' into fea/graph_check_msg"

This reverts commit 28833b7, reversing
changes made to baadf60.

* Revert "Revert "Merge branch 'master' into fea/graph_check_msg""

This reverts commit 1d5e196.

* update

* resolve conflicts

* resolve conflicts

Co-authored-by: Cijie Xia <cijie.xia@mail.utoronto.ca>
Co-authored-by: daquexian <daquexian566@gmail.com>
Co-authored-by: guo ran <360112263@qq.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Co-authored-by: Shenghang Tsai <jackalcooper@gmail.com>
Co-authored-by: Houjiang Chen <chenhoujiangcug@gmail.com>
Co-authored-by: Peihong Liu <mosout@qq.com>
Co-authored-by: Li Xinqi <lixinqi2010@gmail.com>
Co-authored-by: Luyang <flowingsun007@163.com>
Co-authored-by: chengtbf <472491134@qq.com>
Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>
Co-authored-by: liufengwei0103 <2472937968@qq.com>
Co-authored-by: binbinHan <han_binbin@163.com>
Co-authored-by: Yinggang Wang <wyg19970408@gmail.com>
Co-authored-by: Wang Yi <53533850+marigoold@users.noreply.github.com>
Co-authored-by: Shijie <821898965@qq.com>
Co-authored-by: lixinqi <lixinqi0703106@163.com>
Co-authored-by: Juncheng <liujuncheng1022@gmail.com>

* add batch_matmul sbp (#8385)

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* suppress gcc11 false positive warning (#8401)

Signed-off-by: daquexian <daquexian566@gmail.com>

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* fix variable op conversion to tosa error in ninja c1 (#8412)

* pub

* move test iree resnet python script to oneflow_iree repo

* add bracket

* rename const_val to const_val_ and restore resnet.py test script

Co-authored-by: Shenghang Tsai <jackalcooper@gmail.com>

* nccl send/recv support different placement

* refine

* auto format by CI

* rm out ctrl

* auto format by CI

Co-authored-by: guo-ran <360112263@qq.com>
Co-authored-by: Shenghang Tsai <jackalcooper@gmail.com>
Co-authored-by: daquexian <daquexian566@gmail.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: liufengwei0103 <2472937968@qq.com>
Co-authored-by: Wang Yi <53533850+marigoold@users.noreply.github.com>
Co-authored-by: ZZK <359521840@qq.com>
Co-authored-by: hjchen2 <chenhoujiangcug@gmail.com>
Co-authored-by: Juncheng <liujuncheng1022@gmail.com>
Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>
Co-authored-by: Luyang <flowingsun007@163.com>
Co-authored-by: binbinHan <han_binbin@163.com>
Co-authored-by: Yinggang Wang <wyg19970408@gmail.com>
Co-authored-by: Yao Zihang <1162526220@qq.com>
Co-authored-by: yuhao <72971170+howin98@users.noreply.github.com>
Co-authored-by: Xiaoyu Xu <xiaoyulink@gmail.com>
Co-authored-by: cheng cheng <472491134@qq.com>
Co-authored-by: Cijie Xia <cijie.xia@mail.utoronto.ca>
Co-authored-by: Peihong Liu <mosout@qq.com>
Co-authored-by: Li Xinqi <lixinqi2010@gmail.com>
Co-authored-by: Shijie <821898965@qq.com>
Co-authored-by: lixinqi <lixinqi0703106@163.com>
  • Loading branch information
1 parent e7164a7 commit 2b16f1b
Show file tree
Hide file tree
Showing 149 changed files with 7,580 additions and 1,562 deletions.
28 changes: 25 additions & 3 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -347,7 +347,7 @@ jobs:

find-test-cache-distributed:
name: "Find test cache (distributed)"
if: github.event.pull_request.draft == false && github.base_ref == 'master' && false
if: github.event.pull_request.draft == false && github.base_ref == 'master' && contains(github.event.pull_request.labels.*.name, 'need-test-distributed')
runs-on: ubuntu-latest
needs: [build-oneflow]
env:
Expand Down Expand Up @@ -411,10 +411,10 @@ jobs:
test-distributed:
name: Distributed test suite
needs: [wait_for_gpu_slot, find-test-cache-distributed]
needs: [wait_for_gpu_slot, find-test-cache-distributed, test]
runs-on: ${{ matrix.runs-on }}
timeout-minutes: 120
if: github.event.pull_request.draft == false && github.base_ref == 'master' && false
if: github.event.pull_request.draft == false && github.base_ref == 'master' && contains(github.event.pull_request.labels.*.name, 'need-test-distributed')
concurrency:
group: distributed-test-${{ matrix.entry }}-rank-${{ matrix.rank }}
cancel-in-progress: false
Expand All @@ -439,6 +439,22 @@ jobs:
with:
ref: ${{ github.event.pull_request.head.sha }}
repository: ${{github.event.pull_request.head.repo.full_name}}
- name: Checkout Oneflow-Inc/vision
if: ${{ !fromJson(matrix.cache-hit) && contains(matrix.runs-on, 'self-hosted') }}
uses: actions/checkout@v2
with:
repository: Oneflow-Inc/vision
# please use a commit here
ref: ${{ env.FLOW_VISION_COMMIT}}
path: ${{ env.FLOW_VISION_SRC}}
- name: Checkout Oneflow-Inc/libai
if: ${{ !fromJson(matrix.cache-hit) && contains(matrix.runs-on, 'self-hosted') }}
uses: actions/checkout@v2
with:
repository: Oneflow-Inc/libai
# please use a commit here
ref: ${{ env.LIBAI_COMMIT}}
path: ${{ env.LIBAI_SRC}}
- name: Remove container
timeout-minutes: 45
if: ${{ contains(matrix.runs-on, 'self-hosted') }}
Expand Down Expand Up @@ -537,6 +553,12 @@ jobs:
ls ${ONEFLOW_WHEEL_PATH}
docker exec ${TEST_CONTAINER_NAME} python3 -m pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple
docker exec ${TEST_CONTAINER_NAME} python3 -m pip install --find-links=${ONEFLOW_WHEEL_PATH} oneflow
- name: Install downstream libs
if: ${{ !fromJson(matrix.cache-hit) && contains(matrix.runs-on, 'self-hosted') }}
run: |
docker exec ${TEST_CONTAINER_NAME} python3 -m pip install -e ${{ env.FLOW_VISION_SRC}}
docker exec ${TEST_CONTAINER_NAME} python3 -m pip install pybind11 --user
docker exec ${TEST_CONTAINER_NAME} python3 -m pip install -e ${{ env.LIBAI_SRC}}
- name: Module API test (distributed)
timeout-minutes: 90
if: ${{ !fromJson(matrix.cache-hit) && matrix.test-type == 'module' && matrix.device == 'cuda' && fromJson(matrix.is-distributed) }}
Expand Down
24 changes: 21 additions & 3 deletions cmake/oneflow.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -197,7 +197,7 @@ generate_functional_api_and_pybind11_cpp(FUNCTIONAL_GENERATED_SRCS FUNCTIONAL_GE
FUNCTIONAL_PYBIND11_SRCS ${PROJECT_SOURCE_DIR})
oneflow_add_library(of_functional_obj STATIC ${FUNCTIONAL_GENERATED_SRCS}
${FUNCTIONAL_GENERATED_HRCS})
target_link_libraries(of_functional_obj glog::glog)
target_link_libraries(of_functional_obj LLVMSupportWithHeader glog::glog)
add_dependencies(of_functional_obj prepare_oneflow_third_party)

if(BUILD_PYTHON)
Expand All @@ -214,7 +214,7 @@ if(BUILD_PYTHON)
of_functional_tensor_obj STATIC ${FUNCTIONAL_TENSOR_GENERATED_SRCS}
${FUNCTIONAL_TENSOR_GENERATED_HRCS} ${FUNCTIONAL_OPS_GENERATED_SRCS}
${FUNCTIONAL_OPS_GENERATED_HRCS})
target_link_libraries(of_functional_tensor_obj glog::glog)
target_link_libraries(of_functional_tensor_obj LLVMSupportWithHeader glog::glog)
add_dependencies(of_functional_tensor_obj prepare_oneflow_third_party)
target_include_directories(of_functional_tensor_obj PRIVATE ${Python_INCLUDE_DIRS}
${Python_NumPy_INCLUDE_DIRS})
Expand Down Expand Up @@ -274,6 +274,22 @@ if(WITH_MLIR)
set(ONEFLOW_MLIR_LIBS -Wl,--no-as-needed MLIROneFlowExtension -Wl,--as-needed)
endif()

if("${LLVM_PROVIDER}" STREQUAL "install")
get_property(LLVM_INSTALL_DIR GLOBAL PROPERTY LLVM_INSTALL_DIR)
check_variable_defined(LLVM_INSTALL_DIR)
find_library(LLVMSupportLib LLVMSupport PATHS ${LLVM_INSTALL_DIR}/lib REQUIRED)
add_library(LLVMSupportWithHeader UNKNOWN IMPORTED)
set_property(TARGET LLVMSupportWithHeader PROPERTY IMPORTED_LOCATION ${LLVMSupportLib})
else()
add_library(LLVMSupportWithHeader INTERFACE IMPORTED)
target_link_libraries(LLVMSupportWithHeader INTERFACE LLVMSupport)
endif()
check_variable_defined(LLVM_INCLUDE_DIRS)
set_property(TARGET LLVMSupportWithHeader PROPERTY INTERFACE_INCLUDE_DIRECTORIES
${LLVM_INCLUDE_DIRS})

list(APPEND oneflow_third_party_libs LLVMSupportWithHeader)

include(op_schema)

get_property(EXTERNAL_INCLUDE_DIRS GLOBAL PROPERTY EXTERNAL_INCLUDE_DIRS)
Expand Down Expand Up @@ -317,7 +333,9 @@ endif()
if(BUILD_PYTHON)

# py ext lib
oneflow_add_library(of_pyext_obj SHARED ${of_pyext_obj_cc})
# This library should be static to make sure all python symbols are included in the final ext shared lib,
# so that it is safe to do wheel audits of multiple pythons version in parallel.
oneflow_add_library(of_pyext_obj STATIC ${of_pyext_obj_cc})
target_include_directories(of_pyext_obj PRIVATE ${Python_INCLUDE_DIRS}
${Python_NumPy_INCLUDE_DIRS})
target_link_libraries(of_pyext_obj oneflow pybind11::headers)
Expand Down
2 changes: 1 addition & 1 deletion cmake/op_schema.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -81,5 +81,5 @@ set_source_files_properties(${GENERATED_OP_SCHEMA_H} ${GENERATED_OP_SCHEMA_CPP}
TRUE)

oneflow_add_library(of_op_schema OBJECT ${GENERATED_OP_SCHEMA_H} ${GENERATED_OP_SCHEMA_CPP})
target_link_libraries(of_op_schema glog::glog)
target_link_libraries(of_op_schema LLVMSupportWithHeader glog::glog)
add_dependencies(of_op_schema prepare_oneflow_third_party)
6 changes: 6 additions & 0 deletions cmake/util.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -269,6 +269,12 @@ function(set_compile_options_to_oneflow_target target)
endif()
endfunction()

function(check_variable_defined variable)
if(NOT DEFINED ${variable})
message(FATAL_ERROR "Variable ${variable} is not defined")
endif()
endfunction()

function(checkDirAndAppendSlash)
set(singleValues DIR;OUTPUT)
set(prefix ARG)
Expand Down
3 changes: 1 addition & 2 deletions docs/source/graph.rst
Original file line number Diff line number Diff line change
Expand Up @@ -20,12 +20,11 @@ Base class for running neural networks in Static Graph Mode.

.. autoclass:: oneflow.nn.graph.graph_config.GraphConfig
:members: enable_amp,
enable_zero,
allow_fuse_model_update_ops,
allow_fuse_add_to_output,
allow_fuse_cast_scale,
set_gradient_accumulation_steps,
set_zero_redundancy_optimizer_mode,
set_zero_redundancy_optimizer_min_size_after_split,
enable_cudnn_conv_heuristic_search_algo,
:member-order: bysource

Expand Down
4 changes: 4 additions & 0 deletions oneflow/api/common/variable_tensor_mgr.h
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,10 @@ inline Maybe<void> FillVariableTensorMgr(
auto mgr = Global<VariableTensorMgr>::Get();
return mgr->Fill(variable_op_names, variable_tensors);
}
inline void ClearVariableTensorMgr() {
auto mgr = Global<VariableTensorMgr>::Get();
mgr->Clear();
}

inline std::tuple<std::vector<std::string>, std::vector<std::shared_ptr<one::Tensor>>>
DumpVariableTensorMgr() {
Expand Down
36 changes: 35 additions & 1 deletion oneflow/api/cpp/framework/graph.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -127,6 +127,9 @@ class Graph::GraphImpl final {
std::vector<Tensor> Forward(const std::vector<Tensor>& inputs);
void set_batch_size(int batch_size) { batch_size_ = batch_size; }

of::Maybe<void> RegisterJobPass(
const std::function<std::string(const std::string& job)>& pass_fn);

private:
of::Maybe<void> CollectInputOutputInfos();
of::Maybe<void> Compile(const std::vector<Tensor>& inputs);
Expand All @@ -135,6 +138,7 @@ class Graph::GraphImpl final {
of::Maybe<void> BuildGraph();
of::Maybe<void> LoadCheckpoint();
of::Maybe<void> RegisterTensors(const std::vector<Tensor>& inputs);
of::Maybe<of::Job> ApplyJobPasses(const of::Job& job);

std::shared_ptr<of::NNGraph> graph_ = nullptr;
std::string model_path_;
Expand All @@ -149,6 +153,7 @@ class Graph::GraphImpl final {
of::HashMap<std::string, std::shared_ptr<of::one::Tensor>> variable_op_name_to_tensor_;
std::shared_ptr<of::one::TensorTuple> output_tensor_tuple_;
std::shared_ptr<of::one::TensorTuple> parameter_tensor_tuple_;
std::vector<std::function<std::string(const std::string&)>> registered_job_passes_;
};

Graph::Graph(const std::string& model_path, const Device& device)
Expand All @@ -168,6 +173,10 @@ InputOutputInfos Graph::GetInputInfos() { return graph_->GetInputInfos(); }

InputOutputInfos Graph::GetOutputInfos() { return graph_->GetOutputInfos(); }

void Graph::RegisterJobPass(const std::function<std::string(const std::string& job)>& pass_fn) {
CHECK_JUST(graph_->RegisterJobPass(pass_fn));
}

IValue Graph::Forward(const IValue& inputs) {
std::vector<Tensor> input_tensors;
if (inputs.IsNone()) {
Expand Down Expand Up @@ -234,6 +243,28 @@ of::Maybe<void> Graph::GraphImpl::CollectInputOutputInfos() {
return of::Maybe<void>::Ok();
}

of::Maybe<void> Graph::GraphImpl::RegisterJobPass(
const std::function<std::string(const std::string& job)>& pass_fn) {
if (is_compiled_) {
return of::Error::RuntimeError() << "job pass should be registered before compile and forward";
}
registered_job_passes_.emplace_back(pass_fn);
return of::Maybe<void>::Ok();
}

of::Maybe<of::Job> Graph::GraphImpl::ApplyJobPasses(const of::Job& job) {
auto current_job = std::make_shared<of::Job>(job);
for (const auto& pass_fn : registered_job_passes_) {
std::string new_serialized_job = pass_fn(current_job->SerializeAsString());
of::Job new_job;
if (!new_job.ParseFromString(new_serialized_job)) {
return of::Error::RuntimeError() << "invalid serialized job after pass applied";
}
current_job->Swap(&new_job);
}
return current_job;
}

std::vector<Tensor> Graph::GraphImpl::Forward(const std::vector<Tensor>& inputs) {
if (!is_compiled_) {
static std::mutex mtx;
Expand Down Expand Up @@ -299,9 +330,12 @@ of::Maybe<void> Graph::GraphImpl::BuildGraph() {
}
JUST(LoadCheckpoint());
JUST(of::CurJobBuildAndInferCtx_Complete());
const std::shared_ptr<of::Job> complete_job = JUST(of::GetCurrentJob());
std::shared_ptr<of::Job> complete_job = JUST(of::GetCurrentJob());
int64_t job_id = JUST(of::JobBuildAndInferCtx_GetCurrentJobId());
CHECK(of::Global<OneFlowEnv>::Get() != nullptr);

// apply custom job passes
complete_job = JUST(ApplyJobPasses(*complete_job));
graph_ = std::make_shared<of::NNGraph>(job_.job_conf().job_name(), *complete_job, job_id,
of::Global<OneFlowEnv>::Get()->GetSessionCtx());
{
Expand Down
3 changes: 3 additions & 0 deletions oneflow/api/cpp/framework/graph.h
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ limitations under the License.
#include "tensor.h"
#include <cstddef>
#include <string>
#include <functional>
#include <unordered_map>

namespace oneflow {
Expand Down Expand Up @@ -64,6 +65,8 @@ class Graph {
IValue Forward(const IValue& inputs);
void set_batch_size(int batch_size);

void RegisterJobPass(const std::function<std::string(const std::string& job)>& pass_fn);

static Graph Load(const std::string& model_path, const Device& device = Device("cpu"));

private:
Expand Down
10 changes: 8 additions & 2 deletions oneflow/api/python/framework/nn_graph.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -80,12 +80,18 @@ ONEFLOW_API_PYBIND11_MODULE("nn.graph.", m) {
m.def("RunLazyNNGraph", &RunLazyNNGraph);
m.def("SoftSyncNNGraphBuffers", &SoftSyncNNGraphBuffers);
m.def("AddTensorAsGraphLoss", &AddTensorAsGraphLoss);
m.def("ConvertJobToTosaIR", [](const std::string& serialized_job) -> Maybe<std::string> {
Job job;
CHECK_OR_RETURN(TxtString2PbMessage(serialized_job, &job))
<< "serialized job conversion failed.";
return ConvertJobToTosaIR(&job);
});
m.def("SaveJobToIR",
[](const std::string& serialized_job, const std::string& path) -> Maybe<void> {
Job job;
CHECK_OR_RETURN(TxtString2PbMessage(serialized_job, &job));
CHECK_OR_RETURN(TxtString2PbMessage(serialized_job, &job))
<< "serialized job conversion failed.";
return SaveJobToIR(&job, path);
;
});
m.def("LoadSerializedJobFromIR", [](const std::string& path) -> Maybe<py::bytes> {
Job job;
Expand Down
1 change: 1 addition & 0 deletions oneflow/api/python/framework/tensor.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -337,6 +337,7 @@ static PyObject* PyTensorObject_type(PyObject* self, PyObject* args, PyObject* k
ASSERT(CopyBetweenMirroredTensorAndNumpy<T>(PyTensor_Unpack(self), copied, \
BlobNumpyCopyUtil<T>::From, "mut", \
/*block_host_until_done=*/false)); \
Py_DECREF(copied); \
Py_RETURN_NONE; \
END_HANDLE_ERRORS \
}
Expand Down
Loading

0 comments on commit 2b16f1b

Please sign in to comment.