Skip to content

Commit

Permalink
Merge branch 'master' into feat-general_basic_communication (#8477)
Browse files Browse the repository at this point in the history
* Add distributed optional run (#8372)

* Add

* change deps

* add install

* add skip

* autoprof supports bandwidth (#8367)

* autoprof supports bandwidth

Signed-off-by: daquexian <daquexian566@gmail.com>

* print bandwidth

Signed-off-by: daquexian <daquexian566@gmail.com>

* auto format by CI

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>

* remove tmp buffer of cumprod cpu backward kernel (#8369)

* remove tmp buffer of cumprod cpu backward kernel

* refine

* refine

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* Move tensor api to cpython part3 (#8342)

* add tensor_functions

* concat py methods

* add hash, restore tensor.py

* check replacement

* refine code, remove commented tensor.py

* refine code

* move some api

* add cpu and cuda api

* add triu tril norm and etc.

* remove tensor_functions.h

* move more api

* move more api, refine size

* fix typo

* format code, remove useless include

* refine code

* refine code, fix typo

* align .cuda to python

* refine code

* split some api to part3 for review

* remove positional only arguments of argmax and argmin

* remove arguments parse

* modify arguments name in matmul and floor_divide

* rename BINARY_FUNC to DIRECT_PASS_FUNC, modify some functions

* refine code, format code

* add inplace /=, add comments

* remove name in macros

* remove python api

* remove redundant include

* remove cout

* format code

* refactor tensor.size by directly call shape.at, refactor tensor.sub_ by calling nb_sub_

* remove redundant code

* auto format by CI

* fix typo, fix wrong call

* modify idx datatype from int32 to int64 in tensor.size

* add some DIRECT_PASS_FUNC

* add cpu cuda var pow and etc.

* add masked_fill any all

* make REDUCE_FUNC macro, add reduce_* functions

* add 0dim check in ReduceSumWhole, refine yaml

* fix bug

* restore add add_ sub sub_

* add unittest for tensor.half tensor.add tensor.add_

* refine code

* refine code

* fix typo

* fix bug of tensor.std()

* refactor var std and cuda, using c++ functional api

* add beta and threshold in softplus

* auto format by CI

Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* Add nn_functor Check (#7910)

* add bias_add_check

* add bias_add error test

* fix conv2d nhwc bias_add error

* add nhwc conv test

* add bias_add_error test

* Add bias add error check

* Rename

* add batch matmul error check

* add matmul check error msg

* remove annotation

* add fused mlp error msg check

* Add pixel shuffle check test

* add more test until normalization add relu functor

* refine error message

* finish all nnfunctor check msg

* handle type error

* remove useless symbol

* modify back to TypeError

* fix all comment

* Remove redundant code

* Remove pad ndim check

* fix bias add space

* fix check logic cause ci gpu not always gpu:0

Co-authored-by: hjchen2 <chenhoujiangcug@gmail.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* Add FusedMatmulBiasAddReluDropout [OneEmbedding] (#8222)

* previous version for fused_matmul_bias_add_relu_dropout

* add op infer

* fix detail

* finish forward

* support dropout rate list

* add forward test

* fix bug for output buffer

* Configurable alpha params

* try to add bit mask logic

* Add bitmask first version!

* Add row col bitmask logic

* support not align4 reludropout

* simplify relu dropout ld logic

* Add naive relu dropout grad kernel

* add simple relu dropout grad kernel

* Rename

* support relu_dropout bitmask backward

* add vectorized optimization

* fix tmp buffer

* add to amp list

* add lazy backward logic

* Refine kernel

* add indextype dispatch

* simplify functor logic

* fix cublas fused mlp aux_ld shape bug

* Add more relu dropout kernel

* add full unittest

* fix bug in skip final activation

* refine

* Remove dump func

* fix format

* Remove cmake

* remove redundant divide

* add padded version

* fix dropout

* oneflow curand

* refine

* remove redundant kernel

* add unroll logic

* add unroll and ballot sync

* refine format

* Remove fast curand

* Refine python interface

* Add if branch for memset

* fix python logic

* just for debug

* not use matmul bias add grad

* add launch 1 block limit

* fix unittest

* Refine

* fix graph backward bug

* limit to 11060

* change to use int32_t dtype for cublas aux

* Fix jc comment

* fix comment

* fix convert

* fix static_analysis

* fix at

* fix userops td

* fix userops td

* fix const ref

* fix compile error for bfloat16

* limit to 11060

* fix bug

Co-authored-by: Juncheng <liujuncheng1022@gmail.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* fix gather 0-dim tensor bug (#8376)

* fix 0-dim tensor bug

* refine

* support input 0-dim tensor for gather

* refine

* refine

* refine dim_scatter_kernel check

* refine

* refine check

* fix clang_tidy error

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* add api to apply external job pass (#8370)

* Add condition to find-test-cache-distributed (#8387)

* add condition to find-test-cache-distributed

* fix

* warp dim util (#8382)

* warp dim util

* format

* use more maybe_wrap_dim

* refine array functor

* add more

* refine math_functor

* fix_bug_in_broadcast_min_max_grad_and_broadcast_like (#8379)

* fix_bug_in_broadcast_min_max_grad_and_broadcast_like

* refine

* fix static check error

* fix bug about index (#8388)

* fix bug about index

* add test case

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* LogicalSliceAssign support full slice sbp (#8344)

* feat(SliceOp): slice ops support 2d sbp

* fix(SliceOp): fix [B, P] 2d sbp bug

* refine error message

* fix bug in parallel_num == 1

* add comment

* add warning and format

* add NOLINT for boxing check

* feat(LogicalSliceOps): support all nd_sbp

* feat(LogicalSlice): support nd_sbp

* add error message

* fix(AutoTest): fix auto_test bug in module.parameter pass

* auto format by CI

* fix(LogicalSliceAssign): skip test when 1n1d

* fix SliceParams memset error

* remove memset

* add CHECK_JUST

* fix(*): make sure split_axis >= 0 or equal to SPLIT_AXIS_FOR_NON_SPLIT

* remove memset

* fix spilit_info.axis bug

* feat(LogicalSliceOps): support grad

* add logical_slice gradient_funcs

* feat(LogicalSliceAssign): LogicalSliceAssign support full slice sbp

* auto format by CI

* test(LogicalSlice): fix logical_slice dims

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Co-authored-by: Houjiang Chen <chenhoujiangcug@gmail.com>
Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>

* fix_tensor_from_numpy_mem_leak_bug (#8391)

* fix_tensor_from_numpy_mem_leak_bug

* add note

* refine note

* refine

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* Make of_pyext_obj static only to make sure only a python ext so has python symbols (#8393)

* make of_pyext_obj static only

* refine note

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* Adjust tolerance setting in embedding_renorm unit test (#8394)

* support front end compile for job to iree (#8249)

* support frontend dev version

* polish name

* add tosa-to-elf.mlir

* tosa to elf by llvm

* conv2d partial

* an enhanced frontend runner

* support numpy as input

* enable multiple using nn graph with different input(jobname make it  it cd /home/yuhao/frontend/oneflow ; /usr/bin/env /usr/bin/python3 /home/yuhao/.vscode-server/extensions/ms-python.python-2022.6.2/pythonFiles/lib/python/debugpy/launcher 40873 -- /home/yuhao/frontend/oneflow/oneflow/ir/test/Frontend/runner.py )

* enable multiple input

* enable cpu and cuda

* change full_name to _full_name

* support exchange cuda with cpu seamlessly

* remove pip

* lit config

* polish

* trim

* auto format by CI

* modify

* auto format by CI

* last line polish

* use unittest

* auto format by CI

* use allclose

* auto format by CI

* pulish

* optimize convert oneflow to tosa

* conv2d

* conv2d enhanced && conv2d examples add

* add road map

* add add_n2Op and boardcast_addOp conversion

* add matmulOp conversion

* support converting normailzation op to tosa(partically)

* update roadmap

* support i64 tensor to dense elem attr

* support 100% resnet op conversion

* add test mlir

* add test iree resnet python script

* auto format by CI

* done

* enhance iree resnet test script

* auto format by CI

* rebuild code

* auto format by CI

* rebuild test script

* update

* auto format by CI

* pub

* trim test scripts

* move

* move

* input and output add block arg judgement

* emit error in variable conversion

* error handle for ci

* modify err info

* auto format by CI

* merge

* auto format by CI

* output not block

* flow ones

* rm const

* trim maybe

* trim maybe with header file

* const auto

* solve clangd error

Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* Feat/zero mix with mp (#8036)

* add zero limit

* add debug

* add mix zero test

* refactor zero api

* zero test with mp

* add 2d test

* add zero nd

* add nd zero

* add sbp cast

* test passed soft limit consumer

* refine size api

* zero use stage 2

* add limit consumer api

* add new api

* refine zero s select

* fix index out of range

* rm zero limit on device type

* zero test with activation checkpointing

* add indentity when dp sequence len is 1

* move to base with master

* fix

* fix

* fix

* add test

* debug bad case

* refine test for eager and graph boxing

* test case ready

* simplify

* refine test

* fix buff size

* fix conflict

* refine zero nd

* refine

* add full test

* revert change

* refine split check

* fix typo

* rm log

* spit long func

* restore test

* Update optimizer_placement_optimization_pass.cpp

* auto format by CI

* auto format by CI

* fix static check

* add tips for zero api change

* auto format by CI

Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* Revert embedding normal path and fix amp list (#8374)

* revert embedding normal path, fix amp list

* fix amp

* fix memset bug in gather cpu kernel

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* replace fixed_vector with small_vector and make Shape inherit from it (#8365)

* Replace fixed_vector with llvm::SmallVector

Signed-off-by: daquexian <daquexian566@gmail.com>

* Shape inherited from llvm::SmallVector

Signed-off-by: daquexian <daquexian566@gmail.com>

* refine cmake

Signed-off-by: daquexian <daquexian566@gmail.com>

* rename fixed_vector to small_vector

Signed-off-by: daquexian <daquexian566@gmail.com>

* fix reviews

Signed-off-by: daquexian <daquexian566@gmail.com>

* auto format by CI

* update Shape constructor

Signed-off-by: daquexian <daquexian566@gmail.com>

* add 'PUBLIC' keyword to all target_link_libraries

Signed-off-by: daquexian <daquexian566@gmail.com>

* auto format by CI

* update cmake

Signed-off-by: daquexian <daquexian566@gmail.com>

* auto format by CI

* update cmake

Signed-off-by: daquexian <daquexian566@gmail.com>

* update cmake

Signed-off-by: daquexian <daquexian566@gmail.com>

* auto format by CI

* set is_initialized_ default to true

Signed-off-by: daquexian <daquexian566@gmail.com>

* override some methods to set is_initialized_

Signed-off-by: daquexian <daquexian566@gmail.com>

* auto format by CI

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>

* Light plan for debug (#8396)

* Light plan for debug

* fix note

* disable terminfo to fix missing terminfo symbols (#8400)

* disable terminfo to fix missing terminfo symbols

Signed-off-by: daquexian <daquexian566@gmail.com>

* auto format by CI

Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* fix bug of ZeRO MP in complex case (#8404)

* Remove redundant output_lbns in ir (#8409)

* mv case

* remove redundant info

* Dev FusedCrossInteraction[OneEmbedding] (#8335)

* add simple fused cross interaction forward

* add packed fused

* Add cross interaction grad

* simplify code

* fix bug

* support crossnet v2

* support cross interaction v2

* add lazy backward

* Rename and add test

* fix jc comment

* fix comment

* fix bug

* fix userops td elem_cnt for FUSED Group

* fix header file

* fix clang static analysis

* fix unittest

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* add exe graph physical shape check msg (#8002)

* fix index select op in graph

* add exe graph physical shape check msg

* improve the debug information for the python stack trace

1. add a parameter 'max_stack_depth' to specify the max depth for the stack trace
2. refactor other debug related classes.

* remove parens

* update

* resolve PR comments

* update

* update graph debug test file.

* restore self._debug in class Graph and class ModuleBlock

* Do not shorten the stack frame string if it is in debug mode

* delete TODOs

* disable conv3d test (#7969)

Signed-off-by: daquexian <daquexian566@gmail.com>

* skip layernorm random_data_warp test (#7941)

* skip layernorm random_data_warp test

* warp/block/uncached case only test gpu

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* Lock click version (#7967)

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* add global avgpool unittest (#7585)

* fix (#7978)

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* Support negative dim in scatter op (#7934)

* support negative dim in scatter op

* refine scatter test

* refine scatter test again

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* run barrier callback in BarrierPhyInstrOperand::~BarrierPhyInstrOperand (#7702)

* run barrier callback in BarrierPhyInstrOperand::~BarrierPhyInstrOperand

* lock gil in vm Callback thread

* more comments for VirtualMachineEngine::Callback()

* the Env is never destroyed.

* export Env into python

* more unittests

* wait shared_ptr.use_count() == 0

* export unittest.TestCase in framework/unittest.py

* SwitchToShuttingDownPhase

* optional is_normal_exit

* VirtualMachine::CloseVMThreads

* Delete env_api.h

env_api.h is deleted by master

* reshape_only_one_dim_infered

* address pr comments

* fix a ref-cnt bug in TryRunBarrierInstruction.

* rollback flow.env.all_device_placement

* no distributed running test_shutting_down.py

* auto format by CI

* expand lifetime of module oneflow in test_shutting_down.py

* refine del depend on of

* capture oneflow._oneflow_internal.eager when calling sync in __del__

* add try in flaky test

Co-authored-by: Luyang <flowingsun007@163.com>
Co-authored-by: chengtbf <472491134@qq.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: Xiaoyu Xu <xiaoyulink@gmail.com>

* Fix one hot scalar tensor bug (#7975)

* fix reduce_sum scalar check bug

* fix one_hot scalar tensor bug

* fix clang tidy error

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* support ctor np array from of tensor (#7970)

* support ctor np array from of tensor

* add test case constructing np array from tensor

* refine

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* add_manual_seed_all_api (#7957)

* add_manual_seed_all_api

* Update conf.py

* refine

* add test case

* auto format by CI

* Update random_generator.cpp

* auto format by CI

Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* one_embedding add doc string (#7902)

* add doc string

* add example

* add

* fix doc

* refine

* address review

* mb to MB

* add make_table_option

* option to options

* refine

* add forward

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* Support numpy scalar parameters (#7935)

* feat(functional): support numpy scalar parameters

* rename inferface

* feat(*): TensorIndex support numpy scalar

* feat(TensorIndex): support advance indexing

* add unittest and int32 support for branch feat-param_support_np_scalar (#7939)

* add unittest

* refactor unittest

* add todo for int16 advanced indexing

* add int32 supporting for advance indexing

* auto format by CI

Co-authored-by: Wang Yi <53533850+marigoold@users.noreply.github.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>

* fix tensor_scatter_nd_update (#7953)

* fix tensor_scatter_nd_update

* auto backward

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* fix one_embedding adam (#7974)

* fix one_embedding adam

* fix tidy

* fix normal

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* speed test with score (#7990)

Signed-off-by: daquexian <daquexian566@gmail.com>

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* Feat/graph del by ref (#7857)

* remove IsMultiClient() and single client logic

Signed-off-by: daquexian <daquexian566@gmail.com>

* rename eager.multi_client to eager

Signed-off-by: daquexian <daquexian566@gmail.com>

* auto format by CI

* add py ref

* refine new session

* clean code

* make scope api inner use

* use session with ref cnt

* run barrier callback in BarrierPhyInstrOperand::~BarrierPhyInstrOperand

* test pass

* lock gil in vm Callback thread

* more comments for VirtualMachineEngine::Callback()

* merge

* merge rm single client

* rm initenv

* merge and fix master

* refactor env c api

* add debug code

* fix and serving test pass

* test passed

* rm useless

* rm useless code

* format

* rm useless include

* rm sync in py

* the Env is never destroyed.

* export Env into python

* more unittests

* fix and pass tests

* revert virtual_machine.cpp

* revert core/vm

* remove outdated python class oneflow.unittest.TestCase

* graph test passed

* wait shared_ptr.use_count() == 0

* export unittest.TestCase in framework/unittest.py

* SwitchToShuttingDownPhase

* optional is_normal_exit

* VirtualMachine::CloseVMThreads

* Delete env_api.h

env_api.h is deleted by master

* address pr comments

* rm is env init

* Clear empty thread when graph destroy (#7633)

* Revert "Clear empty thread when graph destroy (#7633)" (#7860)

This reverts commit 3e8585e.

* fix a ref-cnt bug in TryRunBarrierInstruction.

* rm env_api

* fix clang-tidy error

* fix clang-tidy in env_imp

* refine env api

* format

* refine graph del and sync at shuttingdown

* fix typo

* add comment

* rm useless

* rm useless

Co-authored-by: daquexian <daquexian566@gmail.com>
Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: lixinqi <lixinqi0703106@163.com>
Co-authored-by: Li Xinqi <lixinqi2010@gmail.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Co-authored-by: Luyang <flowingsun007@163.com>
Co-authored-by: cheng cheng <472491134@qq.com>

* [PersistentTable] Fix num blocks (#7986)

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* Add auto benchmark for flowvision (#7806)

* update yml

* update workflow

* add resnet50

* [PersistentTable] Async write (#7946)

* [PersistentTable] Async write

* fix

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* save log in separate dir by default (#7825)

Signed-off-by: daquexian <daquexian566@gmail.com>

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* fix index select op in graph

* add exe graph physical shape check msg

* improve the debug information for the python stack trace

1. add a parameter 'max_stack_depth' to specify the max depth for the stack trace
2. refactor other debug related classes.

* remove parens

* update

* resolve PR comments

* update

* update graph debug test file.

* restore self._debug in class Graph and class ModuleBlock

* Do not shorten the stack frame string if it is in debug mode

* delete TODOs

* Revert "Merge branch 'master' into fea/graph_check_msg"

This reverts commit 28833b7, reversing
changes made to baadf60.

* Revert "Revert "Merge branch 'master' into fea/graph_check_msg""

This reverts commit 1d5e196.

* update

* resolve conflicts

* resolve conflicts

Co-authored-by: Cijie Xia <cijie.xia@mail.utoronto.ca>
Co-authored-by: daquexian <daquexian566@gmail.com>
Co-authored-by: guo ran <360112263@qq.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Co-authored-by: Shenghang Tsai <jackalcooper@gmail.com>
Co-authored-by: Houjiang Chen <chenhoujiangcug@gmail.com>
Co-authored-by: Peihong Liu <mosout@qq.com>
Co-authored-by: Li Xinqi <lixinqi2010@gmail.com>
Co-authored-by: Luyang <flowingsun007@163.com>
Co-authored-by: chengtbf <472491134@qq.com>
Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>
Co-authored-by: liufengwei0103 <2472937968@qq.com>
Co-authored-by: binbinHan <han_binbin@163.com>
Co-authored-by: Yinggang Wang <wyg19970408@gmail.com>
Co-authored-by: Wang Yi <53533850+marigoold@users.noreply.github.com>
Co-authored-by: Shijie <821898965@qq.com>
Co-authored-by: lixinqi <lixinqi0703106@163.com>
Co-authored-by: Juncheng <liujuncheng1022@gmail.com>

* add batch_matmul sbp (#8385)

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* suppress gcc11 false positive warning (#8401)

Signed-off-by: daquexian <daquexian566@gmail.com>

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* fix variable op conversion to tosa error in ninja c1 (#8412)

* pub

* move test iree resnet python script to oneflow_iree repo

* add bracket

* rename const_val to const_val_ and restore resnet.py test script

Co-authored-by: Shenghang Tsai <jackalcooper@gmail.com>

* Fix eval error in FusedMLP (#8413)

Fix eval error

* Init NCCL communicator in graph mode unifiedly (#8263)

* centralized comm init

* address review

* revert

* rename

* ref nccl logical send recv

* fix cpu only

Co-authored-by: cheng cheng <472491134@qq.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* fix dim_scatter 0-dim tensor bug (#8418)

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* target based external libraries (#8421)

Signed-off-by: daquexian <daquexian566@gmail.com>

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* Refine hardcoded attr setting/getting in ir (#8420)

* use names in trait static func

* more changes on op name attr

* use wrapped func

* Replace cu115 with cu116 in nightly (#8423)

update workflows

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* fix repeat interleave 0-size tensor bug (#8414)

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* Autotest support print input in ci (#8383)

* support print tensor value in autotest to provide more details in ci

* revert

* refine

* auto format by CI

* control precision to 1e-5 when record

* fix bug

* auto format by CI

* relax tensor_size_mb

* fix bug

* fix bug

* refine

* releax

* refinew

* refine

* fix bug

* relax

* refine

* restruct

* auto format by CI

Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* Modify sbp.split()'s karg: axis to dim (#8411)

* Modify sbp.split()'s axis karg to dim

* Refine

* Refine

* Refine

* Refine

* Feat/graph logical op debug repr (#8131)

* add zero limit

* add debug

* add mix zero test

* refactor zero api

* zero test with mp

* add 2d test

* add zero nd

* add nd zero

* add sbp cast

* test passed soft limit consumer

* refine size api

* add module config

* save nn.Module info in job.proto for better debugging

* add new line

* add ModuleBlock.ops_proto() API

* zero use stage 2

* print operators' info when print ModuleBlock

* handle VariableOpConf

* update

* update

* fix

* move operators repr method to graph util

* add limit consumer api

* add new api

* refine zero s select

* add module block

* fix

* refact for rm op in module conf

* fix

* add sbp debug

* add sbp repr

* add shape

* refine

* add sys op in repr

* add full op debug

* fix index out of range

* rm zero limit on device type

* add no scope op to graph

* zero test with activation checkpointing

* fix order

* add indentity when dp sequence len is 1

* add debug repr

* refine repr of op

* refine and fix

* rm useless log

* move to base with master

* fix

* fix

* fix

* fix proto

* refine test

* fix type

* add test

* debug bad case

* refine test for eager and graph boxing

* test case ready

* simplify

* refine test

* fix buff size

* fix conflict

* refine zero nd

* refine

* add full test

* revert change

* refine split check

* fix typo

* rm log

* spit long func

* refine

* restore test

* refine pass and mem debug

* merge master

* repr dtype

* add placement

* Update optimizer_placement_optimization_pass.cpp

* auto format by CI

* auto format by CI

* fix static check

* add tips for zero api change

* auto format by CI

* fix merge

* auto format by CI

* auto format by CI

* refine get job api

* refine graph util import order

* auto format by CI

* fix static check

* auto format by CI

* fix special case

* refine level print and add full dtype repr

* rm useless

Co-authored-by: Cijie Xia <cijie.xia@mail.utoronto.ca>
Co-authored-by: Cijie Xia <xiacijie1998@163.com>
Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* rm some test case in test_fused_dot_feature_interaction_pooling_sum (#8425)

rm some case in test

* Remove unused linkages (#8426)

remove unused linkages

* refactor stride (#8402)

* Stride inherits DimVector

Signed-off-by: daquexian <daquexian566@gmail.com>

* auto format by CI

* fix argument type of OFStrideToNumpyStride

Signed-off-by: daquexian <daquexian566@gmail.com>

Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>

* Move Tensor.__setitem__  and global related api to Python/C api (#8375)

* add local_to_global, global_to_global, to_global. global_to_global still have bugs

* fix bug of global_to_global

* remove python api

* add setitem

* remove local_to_global sbp pack, format code

* format code

* remove redundant code

* add error msg, refine check of to_global

* fix bug of check

* add error msg

* fix clang static check error

* remove useless api in tensor.py, remove redundant code, remove useless CHECK

* add to_local

* fix wrong exception type in unittest for to_local exception message

* cuda add default error msg (#8427)

default error

Co-authored-by: Shenghang Tsai <jackalcooper@gmail.com>

* Refactor ShapeView (#8422)

* update

Signed-off-by: daquexian <daquexian566@gmail.com>

* update and add docs

Signed-off-by: daquexian <daquexian566@gmail.com>

* turn on view slice (#8302)

* turn_on_view_slice

* inplace scalar math hnandle non-contiguous input

* fix clang check

* add docs

* refactor

* auto format by CI

Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>

* Add flow env init rdma api (#8415)

* add_flow_env_init_rdma_api

* adjust persistent_workers logic for RDMA support

* adjust persistent_workers logic for RDMA support

* add rmda_inited api

* minro fix

* add docs

* Update python/oneflow/utils/data/dataloader.py

Co-authored-by: daquexian <daquexian566@gmail.com>

* fix typo

* refine

* fix RDMAIsInitialized

* minor fix

* refine

* rename InitRdma to InitRDMA

* refine

Co-authored-by: Flowingsun007 <flowingsun007@163.com>
Co-authored-by: daquexian <daquexian566@gmail.com>

* add 1d send recv in nccl logical (#8355)

* add 1d send recv in nccl logical

* Update insert_nccl_logical_op_pass.cpp

* auto format by CI

Co-authored-by: cheng cheng <472491134@qq.com>
Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* Support iree ci (#8419)

* create mlir cpu and modify build gcc 7 shell script

* fix the bug of test_iree_resnet.py cuda test in cpu version error

* fix constant folding tests

* suport oneflow_test_cpu_only

* pub

* build script add flag

* modify test yml

* add python3 into \PATH

* don't use pretrain model

* install flowvision

Co-authored-by: mosout <mosout@qq.com>
Co-authored-by: jackalcooper <jackalcooper@gmail.com>

* Feat straighten task nodes (#8347)

* Add a fast topological traversal

* Add an initial implementation of straighen nodes

* Add the straighen nodes algorithm

* Change algorithm structure

* Remove some debug information

* Finalize the straighten algorithm after
deciding the parameters by experiments

* Notify the usage of straighten algorithm

* Of format

* Update oneflow/core/graph/straighten_nodes.cpp

Of format

Co-authored-by: daquexian <daquexian566@gmail.com>

* Of format

* Stop using visual string before we find a better key

* Remove magic numbers and Of format

* Remove starts

* Of format

* Fix a bug of using GetMaxVal<int32_t>() as an
initial number for comparing

* Refactor add straighten algo interface (#8435)

* feat(*): export straighten nodes algorithm inferface

* export documentation

* Update python/oneflow/nn/graph/graph_config.py

Co-authored-by: Yipeng Li <jamesonli1313@gmail.com>

Co-authored-by: Yipeng Li <jamesonli1313@gmail.com>

* Use TopoForEachNodeFast as default. (#8436)

* Use TopoForEachNodeFast as default.
Rename the original one as TopoForEachNodeDynamic

* Speed up TopoForEachNodeFast when traversing a subgraph

* Rename the switch and code clean up

* Hide the class TopoStruct

* Hide all the other functions

* Grammar

* Of format

Co-authored-by: daquexian <daquexian566@gmail.com>
Co-authored-by: Yinggang Wang <wyg19970408@gmail.com>

* Refactor NLLLoss to support split class dim (#8380)

* refactor

* RuntimeError

* avoid atomic add

* test

* fixes

* update test

* update test

* update test

* fix kernel

* improve backward

* update test

* out_weight to be required

* address static analysis errer

* fix static analysis error

* fix static analysis error

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* Strict ordering in memory reuse algorithm (#8441)

* Support broadcast in fused_softmax kernel (#8321)

* support broadcast

* refine

* Remove shape check

* fix sbp when broadcast

* rollback softmax grad threshold

* increase threshold of test conv bn folding

* tol to 1e-2

* check error msg of fuse softmax ops

* add more dispatch

* remove double datatype test and add broadcast test

Co-authored-by: cheng cheng <472491134@qq.com>

* Merge slice and logical slice (#8416)

* remove Slice, SliceUpdate, SliceGrad op

* rename logical_slice to slice and logical_slice_assign to slice_update

* move gradient_func logical_slice.cpp to slice.cpp

* fix some bug and refine local test

* feat(SliceUpdate): support 0size tensor

* test(Slice): refine consistent slice test

* test(SliceUpdate): refine consistent slice_update test

* not export slice_update's inplace parameter

* auto format by CI

* recovery slice_grad_op

* fix slice_view bug

* add error message and attr judgement

* modified old test

* auto format by CI

* update test README

* update tensor_string code

* fix test bug

* auto format by CI

* fix(hsplit): hsplit functor bug

* fix vsplit doc test bug

* refine

* fix test

* fix pin_memory bug

Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* Graph block.config.set_stage() for recommended Pipeline api. (#8442)

* Graph block.config.set_stage() for recommended Pipeline api.

* revert diff

* refine api doc

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* Update PolynomialLR's doc and paramater (#8430)

* update PolynomialLR doc, current_batch = min(decay_batch, current_batch)

* * update PolynomialLR doc, current_batch = min(decay_batch, current_batch)
* rename the steps to decay_batch in parameters

* update PolynomialLR test case

Co-authored-by: Yinggang Wang <wyg19970408@gmail.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* Add mv op (#8445)

* add mv op with bug that Int is incompatible

* add test

* update test_mv.py

* fix based on comments

* fix based on comments

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* enable oneflow_iree(python package) and corresponding test works in ci (#8431)

* update test.yml

* add pytest for oneflow_iree examples

* add oneflow frontend test

* Dev tensor is pinned api (#8447)

* support tensor.is_pinned

* add test case

* add docs

* auto format by CI

* refine

* auto format by CI

* refine

* auto format by CI

* refine

* refine

* refine

Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>

* Nd sbp tensor str (#8458)

* nd sbp tensor str

* add nd sbp tensor str test

* bigger input size

* refine

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* Patch sbp cost (#8378)

* Add a slight cost for B->S and B->P in 2d sbp

* Add penalty for P in consumer

* Add the slight penalty for eager

* Consider B -> (B, B) for a scalar

* Do not consider parallel description in priority ratio

* Of format

* Fix a bug in the old version group boxing with 2D SBP (#8448)

* Update group boxing to deal with hierarchy [1, 2]

* Use a uniform sbp while grouping consumers

* Steal "ParallelDimReduce"
from "hierarchical_sub_task_graph_builder_impl" to "sbp_infer_util"

* Fix bugs of patch-sbp_cost (#8456)

* Update group boxing to deal with hierarchy [1, 2]

* Use a uniform sbp while grouping consumers

* Steal "ParallelDimReduce"
from "hierarchical_sub_task_graph_builder_impl" to "sbp_infer_util"

* Reduce to uniform B for 1 device.
Use the actual parallel description for each tensor

* Fix a bug of fix-group_boxing-bug

* Group boxing reduce [2, 2]: (S0, S0) to [4]: S0,
then we might infer a 1D SBP from a 2D SBP hint

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Co-authored-by: cheng cheng <472491134@qq.com>

* Decouple stream and instruction (#7607)

* remove deprecated python api

* backup code

* backup code

* fix compiler complaints

* fix typo in refactoring

* kMockDevice

* add unit test test_mock.py

* revert mock kernels

* vert DEVICE_TYPE_SEQ

* mock placement

* address pr comments

* register device kCriticalSectionDevice and kLazyJobLauncher

* kControlDevice

* Stream::vm_stream_

* fix compiler complaints

* backup code

* rename StreamIsTransport to IsCommNetStream

* decouple vm::StreamType and vm::InstructionType

* fix compiler complaints

* remove 'gpu' related code

* address static analyzer complaints

* address static analyzer complaints

* remove unused module in test_mock.py

* the Env is never destroyed.

* export Env into python

* more unittests

* export unittest.TestCase in framework/unittest.py

* SwitchToShuttingDownPhase

* optional is_normal_exit

* VirtualMachine::CloseVMThreads

* Delete env_api.h

env_api.h is deleted by master

* reshape_only_one_dim_infered

* address pr comments

* rollback flow.env.all_device_placement

* no distributed running test_shutting_down.py

* auto format by CI

* expand lifetime of module oneflow in test_shutting_down.py

* refine del depend on of

* fix oneflow.placement.__str__

* revert GlobalSync

* init_producer_stream in oneflow.from_numpy

* debug code for vm

* init disable_vm_threads_ in VirtualMachine::VirtualMachine

* Update oneflow/core/vm/virtual_machine.h

Co-authored-by: daquexian <daquexian566@gmail.com>

* create stream in forked subprocesses.

* refactor StreamRoleSwitch to StreamRoleVisistor

* ThreadLocalGuard

* auto format by CI

* fix compiler complaints

* fix static analyzer complaints

* VirtualMachine::GetVmStream

* fix static analyzer complaints

* reimplement AddAndReadVector by std::deque

* reimplement AddAndReadVector

* merge master

* increase atol for test_consistent_rnn_cell.py

* StreamRole::AsyncLaunchedCommNet is bound to EventRecordedCudaStreamType

* auto format by CI

* remove StreamRoleVisitor<T>::VisitInvalid

* no copy in AddAndReadVector

* fix bug of AddAndReadVector::size_

* disable terminfo to fix missing terminfo symbols

Signed-off-by: daquexian <daquexian566@gmail.com>

* auto format by CI

* fix AddAndReadVector::GetGranularity

* remove bad unittest

* auto format by CI

* rename CallInstructionType to OpCallInstructionType

* static variable  GlobalSingletonPtr is a unique_ptr

* replace ++atomic_cnt with atomic_cnt.fetch_add(1, std::memory_order_relaxed)

* AddAndReadVector::operator[]

* change comments 'lock free' to 'thread safe'

* rename StatefulLocalOpKernel to StatefulOpKernel

* rename VirtualMachine::vm_ to VirtualMachine::engine_

* mark VirtualMachine::NoMoreErasedInstructions private

* mark VirtualMachine::FindOrCreateScheduleLocalDepObject private

* remove unused version of VirtualMachineEngine::Receive

* rename argname for VirtualMachineEngine::Receive

* rename unused PendingInstructionList

* rename AddAndReadVector to SteadyVector

* optimize SteadyVector::operator[] by __builtin_clzll

* refactor SteadyVector::granularity2vector_ to SteadyVector::granularity2data_

* reduce usage of steady_vector::size_

* rename unused anounymous namespace

* greater atol for test_consistent_tensordot.py

* fix BarrierInstructionType::ComputeInFuseMode

* revert container_util.h

* run AccessBlobByCallback in default stream of tensor->device

* reslove static check

* reslove static check

* SteadyVector::MutableOrAdd

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Co-authored-by: chengtbf <472491134@qq.com>
Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: Xiaoyu Xu <xiaoyulink@gmail.com>
Co-authored-by: daquexian <daquexian566@gmail.com>
Co-authored-by: binbinHan <han_binbin@163.com>

* fix_tensor_numpy_to_avoid_gpu_mem_increase (#8449)

* fix_tensor_numpy_to_avoid_gpu_mem_increase

* Update tensor.py

* auto format by CI

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>

* Rename user op tensor shape to shape view (#8433)

* ThreadLocalGuard

* rename user_op::Tensor::shape to user_op::Tensor::shape_view

* auto format by CI

* fix static analyzer complaints

* more verbose code for HobDataType

* larger timeout

* larger timeout

Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Co-authored-by: jackalcooper <jackalcooper@gmail.com>
Co-authored-by: binbinHan <han_binbin@163.com>

* speedup global test (#8468)

* speedup global test

* Test refine slice ops test (#8471)

* refine consistent_slice test from 112s -> 30s in 4 device

* test(SliceUpdate): refine test from 119s -> 28s in 4 device

* delete useless code

* auto format by CI

Co-authored-by: Yinggang Wang <wyg19970408@gmail.com>
Co-authored-by: wyg1997 <wangyinggang@foxmail.com>
Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>

* Set the minimum mtu value for IB communication connection (#8451)

* Set the minimum mtu value for IB communication connection

* refine

* refine

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* Merge branch 'master' into feat-general_basic_communication

Co-authored-by: Shenghang Tsai <jackalcooper@gmail.com>
Co-authored-by: daquexian <daquexian566@gmail.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: liufengwei0103 <2472937968@qq.com>
Co-authored-by: Wang Yi <53533850+marigoold@users.noreply.github.com>
Co-authored-by: ZZK <359521840@qq.com>
Co-authored-by: hjchen2 <chenhoujiangcug@gmail.com>
Co-authored-by: Juncheng <liujuncheng1022@gmail.com>
Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>
Co-authored-by: Luyang <flowingsun007@163.com>
Co-authored-by: binbinHan <han_binbin@163.com>
Co-authored-by: Yinggang Wang <wyg19970408@gmail.com>
Co-authored-by: Yao Zihang <1162526220@qq.com>
Co-authored-by: yuhao <72971170+howin98@users.noreply.github.com>
Co-authored-by: Xiaoyu Xu <xiaoyulink@gmail.com>
Co-authored-by: cheng cheng <472491134@qq.com>
Co-authored-by: Cijie Xia <cijie.xia@mail.utoronto.ca>
Co-authored-by: guo ran <360112263@qq.com>
Co-authored-by: Peihong Liu <mosout@qq.com>
Co-authored-by: Li Xinqi <lixinqi2010@gmail.com>
Co-authored-by: Shijie <821898965@qq.com>
Co-authored-by: lixinqi <lixinqi0703106@163.com>
Co-authored-by: leaves-zwx <kunta0932@gmail.com>
Co-authored-by: Li Xiang <54010254+lixiang007666@users.noreply.github.com>
Co-authored-by: Cijie Xia <xiacijie1998@163.com>
Co-authored-by: Jia <basicv8vc@gmail.com>
Co-authored-by: Shanshan Zhong <62104945+zhongshsh@users.noreply.github.com>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Co-authored-by: wyg1997 <wangyinggang@foxmail.com>
Co-authored-by: Yu OuYang <xuanjiuye@gmail.com>
  • Loading branch information
1 parent 2a1810c commit f46efa1
Show file tree
Hide file tree
Showing 537 changed files with 8,549 additions and 7,677 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/canary.yml
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ jobs:
- name: Checkout Oneflow-Inc/oneflow
if: ${{ github.event.inputs.oneflow-ref == '' }}
uses: actions/checkout@v2
- uses: Oneflow-Inc/get-oneflow@single-matrix-for-efficiency
- uses: Oneflow-Inc/get-oneflow@support-iree-ci
name: Build manylinux
id: build-cuda
with:
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/on_merge.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,6 @@ jobs:
if: github.event.pull_request.merged == true
runs-on: ubuntu-latest
steps:
- uses: Oneflow-Inc/get-oneflow/update-benchmark-history@single-matrix-for-efficiency
- uses: Oneflow-Inc/get-oneflow/update-benchmark-history@support-iree-ci
name: Update benchmark history
timeout-minutes: 10
8 changes: 4 additions & 4 deletions .github/workflows/release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ jobs:
with:
ref: ${{ github.event.pull_request.head.sha }}
repository: ${{github.event.pull_request.head.repo.full_name}}
- uses: Oneflow-Inc/get-oneflow/cache-complete/matrix/build@single-matrix-for-efficiency
- uses: Oneflow-Inc/get-oneflow/cache-complete/matrix/build@support-iree-ci
name: find cache
id: find-cache
timeout-minutes: 5
Expand All @@ -45,7 +45,7 @@ jobs:
release
oneflow-src: ${{ env.ONEFLOW_SRC }}
entries: |
cu115
cu116
cu112
cu102
cpu
Expand Down Expand Up @@ -74,7 +74,7 @@ jobs:
python3 -m pip install -U pip setuptools wheel --user
python3 -m pip install oss2 --user
- uses: actions/checkout@v2
- uses: Oneflow-Inc/get-oneflow@single-matrix-for-efficiency
- uses: Oneflow-Inc/get-oneflow@support-iree-ci
name: Build ${{ matrix.entry }}
if: ${{ matrix.entry !='cpu' }}
with:
Expand All @@ -98,7 +98,7 @@ jobs:
3.8
3.9
3.10
- uses: Oneflow-Inc/get-oneflow@single-matrix-for-efficiency
- uses: Oneflow-Inc/get-oneflow@support-iree-ci
name: Build ${{ matrix.entry }}
if: ${{ matrix.entry =='cpu' }}
with:
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/simple.yml
Original file line number Diff line number Diff line change
Expand Up @@ -245,7 +245,7 @@ jobs:
repository: Oneflow-Inc/conda-env
ref: 30a7f00eb48ee9009d85a848e720823e5054c66b
path: conda-env
- uses: Oneflow-Inc/get-oneflow@single-matrix-for-efficiency
- uses: Oneflow-Inc/get-oneflow@support-iree-ci
name: Build with gcc7
if: ${{ matrix.build-type == 'gcc7'}}
with:
Expand All @@ -254,7 +254,7 @@ jobs:
oneflow-build-env: conda
conda-env-file: conda-env/dev/gcc7/environment-v2.yml
conda-env-name: oneflow-dev-gcc7-v2
- uses: Oneflow-Inc/get-oneflow@single-matrix-for-efficiency
- uses: Oneflow-Inc/get-oneflow@support-iree-ci
name: Build with clang10
if: ${{ matrix.build-type == 'clang10'}}
with:
Expand Down
64 changes: 45 additions & 19 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,8 @@ env:
FLOW_VISION_COMMIT: ca8ebc663b58667cf8cd1b6ef0c861522780b7bb
LIBAI_SRC: libai
LIBAI_COMMIT: 7d31d9781e5f2d559dc0820f599e0bed798488ca
ONEFLOW_IREE_SRC: oneflow_iree
ONEFLOW_IREE_COMMIT: 4322cbad2545877b1664aa8e0f17a17f6b5f687c
TEST_WITH_TORCH_IMG_TAG: registry.cn-beijing.aliyuncs.com/oneflow/test-with-pytorch-1.10.0-cuda11.3-cudnn8-runtime:afaf913e02a4ba02db92260daee22f99121cef62
MLIR_DOCKER_ARGS: "-e ONEFLOW_MLIR_ENABLE_ROUND_TRIP=1 -e ONEFLOW_MLIR_PREFER_NHWC=0 -e ONEFLOW_MLIR_ENABLE_INFERENCE_OPTIMIZATION=1"

Expand All @@ -25,7 +27,7 @@ jobs:
runs-on: ubuntu-latest
if: github.event.pull_request.draft == false && github.base_ref == 'master' && contains(github.event.pull_request.requested_reviewers.*.login, 'oneflow-ci-bot')
steps:
- uses: Oneflow-Inc/get-oneflow/priority-pr@single-matrix-for-efficiency
- uses: Oneflow-Inc/get-oneflow/priority-pr@support-iree-ci
name: Check priority PR closed
id: save-cache
timeout-minutes: 5
Expand Down Expand Up @@ -159,7 +161,7 @@ jobs:
fi
echo "is_secrets_accessible=1" >> $GITHUB_ENV
- name: Wait for GPU slot
uses: Oneflow-Inc/get-oneflow/wait-for-gpu@single-matrix-for-efficiency
uses: Oneflow-Inc/get-oneflow/wait-for-gpu@support-iree-ci
if: env.is_secrets_accessible == '1'
timeout-minutes: 90
continue-on-error: true
Expand All @@ -183,7 +185,7 @@ jobs:
with:
ref: ${{ github.event.pull_request.head.sha }}
repository: ${{github.event.pull_request.head.repo.full_name}}
- uses: Oneflow-Inc/get-oneflow/cache-complete/matrix/build@single-matrix-for-efficiency
- uses: Oneflow-Inc/get-oneflow/cache-complete/matrix/build@support-iree-ci
name: find cache
id: find-cache
timeout-minutes: 5
Expand Down Expand Up @@ -230,7 +232,7 @@ jobs:
with:
ref: ${{ github.event.pull_request.head.sha }}
repository: ${{github.event.pull_request.head.repo.full_name}}
- uses: Oneflow-Inc/get-oneflow/cache-complete@single-matrix-for-efficiency
- uses: Oneflow-Inc/get-oneflow/cache-complete@support-iree-ci
name: Save cache if successful
id: save-cache
timeout-minutes: 5
Expand All @@ -244,13 +246,14 @@ jobs:
run: |
echo "::error file=test.yml,line=204,col=10::steps.save-cache.outputs.cache-hit != matrix.cache-hit"
exit 1
- uses: Oneflow-Inc/get-oneflow@single-matrix-for-efficiency
- uses: Oneflow-Inc/get-oneflow@support-iree-ci
name: Build manylinux ${{ matrix.entry }}
id: build-cpu
if: ${{ matrix.entry =='cpu' && !matrix.cache-hit }}
with:
cmake-init-cache: ${{ env.ONEFLOW_SRC }}/cmake/caches/ci/cpu.cmake
build-script: ${{ env.ONEFLOW_SRC }}/ci/manylinux/build.sh
run-lit: true
oneflow-src: ${{ env.ONEFLOW_SRC }}
oneflow-build-env: manylinux
wheelhouse-dir: ${{ env.WHEELHOUSE_DIR }}
Expand All @@ -265,7 +268,7 @@ jobs:
python-versions: |
3.6
3.7
- uses: Oneflow-Inc/get-oneflow@single-matrix-for-efficiency
- uses: Oneflow-Inc/get-oneflow@support-iree-ci
name: Build manylinux ${{ matrix.entry }}
id: build-cuda
if: ${{ matrix.entry =='cu102' && !matrix.cache-hit }}
Expand All @@ -285,7 +288,7 @@ jobs:
clean-ccache: ${{ contains(github.event.pull_request.labels.*.name, 'need-clean-ccache') }}
python-versions: |
3.7
- uses: Oneflow-Inc/get-oneflow@single-matrix-for-efficiency
- uses: Oneflow-Inc/get-oneflow@support-iree-ci
name: Build ${{ matrix.entry }}
if: ${{ matrix.entry == 'llvm13' && !matrix.cache-hit }}
with:
Expand Down Expand Up @@ -324,7 +327,7 @@ jobs:
})
- name: Upload packed liboneflow
if: ${{ !fromJson(matrix.cache-hit) && matrix.entry != 'llvm13' && matrix.entry != 'cu102_xla' }}
uses: Oneflow-Inc/get-oneflow/digest/upload@single-matrix-for-efficiency
uses: Oneflow-Inc/get-oneflow/digest/upload@support-iree-ci
timeout-minutes: 10
with:
digest: ${{ steps.save-cache.outputs.build-digest }}
Expand All @@ -335,7 +338,7 @@ jobs:
dst-dir: cpack
- name: Upload whl
if: ${{ !fromJson(matrix.cache-hit) && matrix.entry != 'llvm13' && matrix.entry != 'cu102_xla' }}
uses: Oneflow-Inc/get-oneflow/digest/upload@single-matrix-for-efficiency
uses: Oneflow-Inc/get-oneflow/digest/upload@support-iree-ci
timeout-minutes: 10
with:
digest: ${{ steps.save-cache.outputs.build-digest }}
Expand All @@ -360,7 +363,7 @@ jobs:
with:
ref: ${{ github.event.pull_request.head.sha }}
repository: ${{github.event.pull_request.head.repo.full_name}}
- uses: Oneflow-Inc/get-oneflow/cache-complete/matrix/test@single-matrix-for-efficiency
- uses: Oneflow-Inc/get-oneflow/cache-complete/matrix/test@support-iree-ci
name: find cache
id: find-cache
timeout-minutes: 5
Expand Down Expand Up @@ -391,7 +394,7 @@ jobs:
with:
ref: ${{ github.event.pull_request.head.sha }}
repository: ${{github.event.pull_request.head.repo.full_name}}
- uses: Oneflow-Inc/get-oneflow/cache-complete/matrix/test@single-matrix-for-efficiency
- uses: Oneflow-Inc/get-oneflow/cache-complete/matrix/test@support-iree-ci
name: find cache
id: find-cache
timeout-minutes: 5
Expand Down Expand Up @@ -455,12 +458,20 @@ jobs:
# please use a commit here
ref: ${{ env.LIBAI_COMMIT}}
path: ${{ env.LIBAI_SRC}}
- name: Checkout Oneflow-Inc/oneflow_iree
if: ${{ !fromJson(matrix.cache-hit) && contains(matrix.runs-on, 'self-hosted') }}
uses: actions/checkout@v2
with:
repository: Oneflow-Inc/oneflow_iree
# please use a commit here
ref: ${{ env.ONEFLOW_IREE_COMMIT}}
path: ${{ env.ONEFLOW_IREE_SRC}}
- name: Remove container
timeout-minutes: 45
if: ${{ contains(matrix.runs-on, 'self-hosted') }}
run: |
docker rm -f ${{ env.TEST_CONTAINER_NAME }} || true
- uses: Oneflow-Inc/get-oneflow/cache-complete@single-matrix-for-efficiency
- uses: Oneflow-Inc/get-oneflow/cache-complete@support-iree-ci
name: Save cache if successful
id: save-cache
timeout-minutes: 5
Expand All @@ -476,7 +487,7 @@ jobs:
exit 1
- name: Download wheel and packed liboneflow
if: ${{ !fromJson(matrix.cache-hit) && contains(matrix.runs-on, 'self-hosted') }}
uses: Oneflow-Inc/get-oneflow/digest/download@single-matrix-for-efficiency
uses: Oneflow-Inc/get-oneflow/digest/download@support-iree-ci
id: download-digest
timeout-minutes: 10
with:
Expand All @@ -486,7 +497,7 @@ jobs:
ssh-tank-path: ${{ env.SSH_TANK_PATH }}
- name: Get primary node
if: ${{ !fromJson(matrix.cache-hit) && contains(matrix.runs-on, 'self-hosted') }}
uses: Oneflow-Inc/get-oneflow/master-address@single-matrix-for-efficiency
uses: Oneflow-Inc/get-oneflow/master-address@support-iree-ci
id: get-primary-node
with:
rank: ${{ matrix.rank }}
Expand Down Expand Up @@ -559,6 +570,7 @@ jobs:
docker exec ${TEST_CONTAINER_NAME} python3 -m pip install -e ${{ env.FLOW_VISION_SRC}}
docker exec ${TEST_CONTAINER_NAME} python3 -m pip install pybind11 --user
docker exec ${TEST_CONTAINER_NAME} python3 -m pip install -e ${{ env.LIBAI_SRC}}
docker exec ${TEST_CONTAINER_NAME} python3 -m pip install -e ${{ env.ONEFLOW_IREE_SRC}}
- name: Module API test (distributed)
timeout-minutes: 90
if: ${{ !fromJson(matrix.cache-hit) && matrix.test-type == 'module' && matrix.device == 'cuda' && fromJson(matrix.is-distributed) }}
Expand Down Expand Up @@ -648,12 +660,20 @@ jobs:
# please use a commit here
ref: ${{ env.LIBAI_COMMIT}}
path: ${{ env.LIBAI_SRC}}
- name: Checkout Oneflow-Inc/oneflow_iree
if: ${{ !fromJson(matrix.cache-hit) && contains(matrix.runs-on, 'self-hosted') }}
uses: actions/checkout@v2
with:
repository: Oneflow-Inc/oneflow_iree
# please use a commit here
ref: ${{ env.ONEFLOW_IREE_COMMIT}}
path: ${{ env.ONEFLOW_IREE_SRC}}
- name: Remove container
timeout-minutes: 45
if: ${{ contains(matrix.runs-on, 'self-hosted') }}
run: |
docker rm -f ${{ env.TEST_CONTAINER_NAME }} || true
- uses: Oneflow-Inc/get-oneflow/cache-complete@single-matrix-for-efficiency
- uses: Oneflow-Inc/get-oneflow/cache-complete@support-iree-ci
name: Save cache if successful
id: save-cache
timeout-minutes: 5
Expand All @@ -669,7 +689,7 @@ jobs:
exit 1
- name: Download wheel and packed liboneflow
if: ${{ !fromJson(matrix.cache-hit) && contains(matrix.runs-on, 'self-hosted') }}
uses: Oneflow-Inc/get-oneflow/digest/download@single-matrix-for-efficiency
uses: Oneflow-Inc/get-oneflow/digest/download@support-iree-ci
id: download-digest
timeout-minutes: 10
with:
Expand Down Expand Up @@ -781,6 +801,7 @@ jobs:
docker exec ${TEST_CONTAINER_NAME} python3 -m pip install -e ${{ env.FLOW_VISION_SRC}}
docker exec ${TEST_CONTAINER_NAME} python3 -m pip install pybind11 --user
docker exec ${TEST_CONTAINER_NAME} python3 -m pip install -e ${{ env.LIBAI_SRC}}
docker exec ${TEST_CONTAINER_NAME} python3 -m pip install -e ${{ env.ONEFLOW_IREE_SRC}}
- name: Run OneFlow doctor
if: ${{ !fromJson(matrix.cache-hit) && contains(matrix.runs-on, 'self-hosted') }}
run: |
Expand Down Expand Up @@ -865,7 +886,7 @@ jobs:
body: "<details>\n <summary>Speed stats:</summary>\n\n ``` \n${{ steps.speed.outputs.stats }}\n ``` \n\n</details>".replace(/\\n/g, '\n')
})
- name: Module API test
timeout-minutes: 45
timeout-minutes: 60
if: ${{ !fromJson(matrix.cache-hit) && matrix.test-type == 'module' && !fromJson(matrix.is-distributed) }}
run: |
docker exec -e ONEFLOW_TEST_DIR=$PWD/python/oneflow/test/modules ${{ env.TEST_CONTAINER_NAME }} bash ci/test/generic_test_multi_client.sh
Expand All @@ -883,6 +904,11 @@ jobs:
docker exec -e ONEFLOW_TEST_DEVICE_NUM=4 -w $PWD/${{ env.LIBAI_SRC }} ${{ env.TEST_CONTAINER_NAME }} python3 -m oneflow.distributed.launch --nproc_per_node 4 -m unittest -f tests/models/test_gpt.py
docker exec -e ONEFLOW_TEST_DEVICE_NUM=4 -w $PWD/${{ env.LIBAI_SRC }} ${{ env.TEST_CONTAINER_NAME }} python3 -m oneflow.distributed.launch --nproc_per_node 4 -m unittest -f tests/models/test_t5.py
docker exec -e ONEFLOW_TEST_DEVICE_NUM=4 -w $PWD/${{ env.LIBAI_SRC }} ${{ env.TEST_CONTAINER_NAME }} python3 -m oneflow.distributed.launch --nproc_per_node 4 -m unittest -f tests/models/test_vit.py
- name: oneflow_iree test
timeout-minutes: 45
if: ${{ !fromJson(matrix.cache-hit) && matrix.test-type == 'misc' }}
run: |
docker exec -w $PWD/${{ env.ONEFLOW_IREE_SRC }} ${{ env.TEST_CONTAINER_NAME }} python3 -m pytest examples
- name: Expensive tests (models, cases require exclusive access to GPU)
timeout-minutes: 45
if: ${{ !fromJson(matrix.cache-hit) && (matrix.test-type == 'speed-test' || (matrix.test-type == 'misc' && matrix.device == 'cpu')) && !fromJson(matrix.is-distributed) }}
Expand All @@ -908,7 +934,7 @@ jobs:
- name: Benchmark Test
timeout-minutes: 100
if: ${{ !fromJson(matrix.cache-hit) && matrix.test-type == 'benchmark' && matrix.device == 'cuda' }}
uses: Oneflow-Inc/get-oneflow/pytest-benchmark@single-matrix-for-efficiency
uses: Oneflow-Inc/get-oneflow/pytest-benchmark@support-iree-ci
with:
collect-path: ${{ env.FLOW_VISION_SRC }}/benchmark
container-name: ${{ env.TEST_CONTAINER_NAME }}
Expand Down Expand Up @@ -961,7 +987,7 @@ jobs:
ref: ${{ github.event.pull_request.head.sha }}
repository: ${{github.event.pull_request.head.repo.full_name}}
fetch-depth: 0
- uses: Oneflow-Inc/get-oneflow/cache-complete@single-matrix-for-efficiency
- uses: Oneflow-Inc/get-oneflow/cache-complete@support-iree-ci
name: Save cache if successful
id: save-cache
timeout-minutes: 5
Expand Down
5 changes: 5 additions & 0 deletions ci/manylinux/build-gcc7.sh
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,11 @@ cmake -S ${ONEFLOW_CI_SRC_DIR} -C ${ONEFLOW_CI_CMAKE_INIT_CACHE} -DPython3_EXECU
# cmake build
cd ${ONEFLOW_CI_BUILD_DIR}
cmake --build . --parallel ${ONEFLOW_CI_BUILD_PARALLEL}
if [ ! -z "$ONEFLOW_CI_BUILD_RUN_LIT" ]; then
${ONEFLOW_CI_PYTHON_EXE} -m pip install -i https://mirrors.aliyun.com/pypi/simple --user flowvision==0.1.0
export PATH=$PATH:$(dirname $ONEFLOW_CI_PYTHON_EXE)
cmake --build . -t c1
fi

# build pip
cd ${ONEFLOW_CI_SRC_DIR}
Expand Down
5 changes: 5 additions & 0 deletions ci/manylinux/build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,11 @@ cmake -S ${ONEFLOW_CI_SRC_DIR} -C ${ONEFLOW_CI_CMAKE_INIT_CACHE} -DPython3_EXECU
# cmake build
cd ${ONEFLOW_CI_BUILD_DIR}
cmake --build . --parallel ${ONEFLOW_CI_BUILD_PARALLEL}
if [ ! -z "$ONEFLOW_CI_BUILD_RUN_LIT" ]; then
${ONEFLOW_CI_PYTHON_EXE} -m pip install -i https://mirrors.aliyun.com/pypi/simple --user flowvision==0.1.0
export PATH=$PATH:$(dirname $ONEFLOW_CI_PYTHON_EXE)
cmake --build . -t c1
fi

# build pip
cd ${ONEFLOW_CI_SRC_DIR}
Expand Down
24 changes: 24 additions & 0 deletions cmake/caches/cn/fast/mlir-cpu.cmake
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
set(BUILD_SHARED_LIBS YES CACHE BOOL "")
# uncomment only if you know what you are doing
# set(CMAKE_LINK_DEPENDS_NO_SHARED YES CACHE BOOL "")
set(BUILD_CUDA NO CACHE BOOL "")
set(BUILD_GIT_VERSION NO CACHE BOOL "")
set(TREAT_WARNINGS_AS_ERRORS YES CACHE BOOL "")
set(BUILD_HWLOC NO CACHE BOOL "")
set(BUILD_TESTING OFF CACHE BOOL "")
set(WITH_MLIR YES CACHE BOOL "")
set(WITH_MLIR_CUDA_CODEGEN NO CACHE BOOL "")
set(THIRD_PARTY_MIRROR aliyun CACHE STRING "")
set(PIP_INDEX_MIRROR "https://pypi.tuna.tsinghua.edu.cn/simple" CACHE STRING "")
set(CMAKE_BUILD_TYPE RelWithDebInfo CACHE STRING "")
set(CMAKE_GENERATOR Ninja CACHE STRING "")
set(CMAKE_C_COMPILER_LAUNCHER ccache CACHE STRING "")
set(CMAKE_CXX_COMPILER_LAUNCHER ccache CACHE STRING "")
set(CMAKE_INTERPROCEDURAL_OPTIMIZATION OFF CACHE BOOL "")
set(CMAKE_EXE_LINKER_FLAGS_INIT "-fuse-ld=lld" CACHE STRING "")
set(CMAKE_MODULE_LINKER_FLAGS_INIT "-fuse-ld=lld" CACHE STRING "")
set(CMAKE_SHARED_LINKER_FLAGS_INIT "-fuse-ld=lld" CACHE STRING "")
set(CPU_THREADING_RUNTIME SEQ CACHE STRING
"when using lld with TBB enabled, there will be linkage error")
set(BUILD_HWLOC OFF CACHE BOOL "")
set(WITH_ONEDNN OFF CACHE BOOL "")
Loading

0 comments on commit f46efa1

Please sign in to comment.