add api to apply external job pass #8370

hjchen2 · 2022-06-06T08:52:31Z

No description provided.

github-actions · 2022-06-06T11:32:33Z

Speed stats:

github-actions · 2022-06-07T10:23:12Z

CI failed when running job: cpu-misc. PR label automerge has been removed

github-actions · 2022-06-07T10:23:50Z

Speed stats:

daquexian · 2022-06-07T11:20:07Z

oneflow/api/cpp/framework/graph.cpp

+  std::string new_serialized_job = pass_fn(graph_->job().SerializeAsString());
+  of::Job new_job;
+  if (!new_job.ParseFromString(new_serialized_job)) {
+    LOG(FATAL) << "invalid serialized job after xrt pass applied";


这里可以返回 Error

…low-Inc/oneflow into dev_apply_external_job_pass

hjchen2 · 2022-06-08T03:10:26Z

对外接口从ApplyJobPass改成了RegisterJobPass，因为batch size的原因，我们不能直接把BuildGraph提前到GraphImpl的构造函数中，所以就改成了先注册一个job pass，在compile的时候BuildGraph再执行这些pass

github-actions · 2022-06-08T05:12:08Z

View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/8370/

* nd nccl_send_recv_boxing * rm print * support num_axes > 2 * Add distributed optional run (#8372) * Add * change deps * add install * add skip * autoprof supports bandwidth (#8367) * autoprof supports bandwidth Signed-off-by: daquexian <daquexian566@gmail.com> * print bandwidth Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * remove tmp buffer of cumprod cpu backward kernel (#8369) * remove tmp buffer of cumprod cpu backward kernel * refine * refine Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Move tensor api to cpython part3 (#8342) * add tensor_functions * concat py methods * add hash, restore tensor.py * check replacement * refine code, remove commented tensor.py * refine code * move some api * add cpu and cuda api * add triu tril norm and etc. * remove tensor_functions.h * move more api * move more api, refine size * fix typo * format code, remove useless include * refine code * refine code, fix typo * align .cuda to python * refine code * split some api to part3 for review * remove positional only arguments of argmax and argmin * remove arguments parse * modify arguments name in matmul and floor_divide * rename BINARY_FUNC to DIRECT_PASS_FUNC, modify some functions * refine code, format code * add inplace /=, add comments * remove name in macros * remove python api * remove redundant include * remove cout * format code * refactor tensor.size by directly call shape.at, refactor tensor.sub_ by calling nb_sub_ * remove redundant code * auto format by CI * fix typo, fix wrong call * modify idx datatype from int32 to int64 in tensor.size * add some DIRECT_PASS_FUNC * add cpu cuda var pow and etc. * add masked_fill any all * make REDUCE_FUNC macro, add reduce_* functions * add 0dim check in ReduceSumWhole, refine yaml * fix bug * restore add add_ sub sub_ * add unittest for tensor.half tensor.add tensor.add_ * refine code * refine code * fix typo * fix bug of tensor.std() * refactor var std and cuda, using c++ functional api * add beta and threshold in softplus * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Add nn_functor Check (#7910) * add bias_add_check * add bias_add error test * fix conv2d nhwc bias_add error * add nhwc conv test * add bias_add_error test * Add bias add error check * Rename * add batch matmul error check * add matmul check error msg * remove annotation * add fused mlp error msg check * Add pixel shuffle check test * add more test until normalization add relu functor * refine error message * finish all nnfunctor check msg * handle type error * remove useless symbol * modify back to TypeError * fix all comment * Remove redundant code * Remove pad ndim check * fix bias add space * fix check logic cause ci gpu not always gpu:0 Co-authored-by: hjchen2 <chenhoujiangcug@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Add FusedMatmulBiasAddReluDropout [OneEmbedding] (#8222) * previous version for fused_matmul_bias_add_relu_dropout * add op infer * fix detail * finish forward * support dropout rate list * add forward test * fix bug for output buffer * Configurable alpha params * try to add bit mask logic * Add bitmask first version! * Add row col bitmask logic * support not align4 reludropout * simplify relu dropout ld logic * Add naive relu dropout grad kernel * add simple relu dropout grad kernel * Rename * support relu_dropout bitmask backward * add vectorized optimization * fix tmp buffer * add to amp list * add lazy backward logic * Refine kernel * add indextype dispatch * simplify functor logic * fix cublas fused mlp aux_ld shape bug * Add more relu dropout kernel * add full unittest * fix bug in skip final activation * refine * Remove dump func * fix format * Remove cmake * remove redundant divide * add padded version * fix dropout * oneflow curand * refine * remove redundant kernel * add unroll logic * add unroll and ballot sync * refine format * Remove fast curand * Refine python interface * Add if branch for memset * fix python logic * just for debug * not use matmul bias add grad * add launch 1 block limit * fix unittest * Refine * fix graph backward bug * limit to 11060 * change to use int32_t dtype for cublas aux * Fix jc comment * fix comment * fix convert * fix static_analysis * fix at * fix userops td * fix userops td * fix const ref * fix compile error for bfloat16 * limit to 11060 * fix bug Co-authored-by: Juncheng <liujuncheng1022@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * fix gather 0-dim tensor bug (#8376) * fix 0-dim tensor bug * refine * support input 0-dim tensor for gather * refine * refine * refine dim_scatter_kernel check * refine * refine check * fix clang_tidy error Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * add api to apply external job pass (#8370) * Add condition to find-test-cache-distributed (#8387) * add condition to find-test-cache-distributed * fix * warp dim util (#8382) * warp dim util * format * use more maybe_wrap_dim * refine array functor * add more * refine math_functor * fix_bug_in_broadcast_min_max_grad_and_broadcast_like (#8379) * fix_bug_in_broadcast_min_max_grad_and_broadcast_like * refine * fix static check error * fix bug about index (#8388) * fix bug about index * add test case Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * LogicalSliceAssign support full slice sbp (#8344) * feat(SliceOp): slice ops support 2d sbp * fix(SliceOp): fix [B, P] 2d sbp bug * refine error message * fix bug in parallel_num == 1 * add comment * add warning and format * add NOLINT for boxing check * feat(LogicalSliceOps): support all nd_sbp * feat(LogicalSlice): support nd_sbp * add error message * fix(AutoTest): fix auto_test bug in module.parameter pass * auto format by CI * fix(LogicalSliceAssign): skip test when 1n1d * fix SliceParams memset error * remove memset * add CHECK_JUST * fix(*): make sure split_axis >= 0 or equal to SPLIT_AXIS_FOR_NON_SPLIT * remove memset * fix spilit_info.axis bug * feat(LogicalSliceOps): support grad * add logical_slice gradient_funcs * feat(LogicalSliceAssign): LogicalSliceAssign support full slice sbp * auto format by CI * test(LogicalSlice): fix logical_slice dims Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: Houjiang Chen <chenhoujiangcug@gmail.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * fix_tensor_from_numpy_mem_leak_bug (#8391) * fix_tensor_from_numpy_mem_leak_bug * add note * refine note * refine Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Make of_pyext_obj static only to make sure only a python ext so has python symbols (#8393) * make of_pyext_obj static only * refine note Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Adjust tolerance setting in embedding_renorm unit test (#8394) * support front end compile for job to iree (#8249) * support frontend dev version * polish name * add tosa-to-elf.mlir * tosa to elf by llvm * conv2d partial * an enhanced frontend runner * support numpy as input * enable multiple using nn graph with different input(jobname make it it cd /home/yuhao/frontend/oneflow ; /usr/bin/env /usr/bin/python3 /home/yuhao/.vscode-server/extensions/ms-python.python-2022.6.2/pythonFiles/lib/python/debugpy/launcher 40873 -- /home/yuhao/frontend/oneflow/oneflow/ir/test/Frontend/runner.py ) * enable multiple input * enable cpu and cuda * change full_name to _full_name * support exchange cuda with cpu seamlessly * remove pip * lit config * polish * trim * auto format by CI * modify * auto format by CI * last line polish * use unittest * auto format by CI * use allclose * auto format by CI * pulish * optimize convert oneflow to tosa * conv2d * conv2d enhanced && conv2d examples add * add road map * add add_n2Op and boardcast_addOp conversion * add matmulOp conversion * support converting normailzation op to tosa(partically) * update roadmap * support i64 tensor to dense elem attr * support 100% resnet op conversion * add test mlir * add test iree resnet python script * auto format by CI * done * enhance iree resnet test script * auto format by CI * rebuild code * auto format by CI * rebuild test script * update * auto format by CI * pub * trim test scripts * move * move * input and output add block arg judgement * emit error in variable conversion * error handle for ci * modify err info * auto format by CI * merge * auto format by CI * output not block * flow ones * rm const * trim maybe * trim maybe with header file * const auto * solve clangd error Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Feat/zero mix with mp (#8036) * add zero limit * add debug * add mix zero test * refactor zero api * zero test with mp * add 2d test * add zero nd * add nd zero * add sbp cast * test passed soft limit consumer * refine size api * zero use stage 2 * add limit consumer api * add new api * refine zero s select * fix index out of range * rm zero limit on device type * zero test with activation checkpointing * add indentity when dp sequence len is 1 * move to base with master * fix * fix * fix * add test * debug bad case * refine test for eager and graph boxing * test case ready * simplify * refine test * fix buff size * fix conflict * refine zero nd * refine * add full test * revert change * refine split check * fix typo * rm log * spit long func * restore test * Update optimizer_placement_optimization_pass.cpp * auto format by CI * auto format by CI * fix static check * add tips for zero api change * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Revert embedding normal path and fix amp list (#8374) * revert embedding normal path, fix amp list * fix amp * fix memset bug in gather cpu kernel Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * replace fixed_vector with small_vector and make Shape inherit from it (#8365) * Replace fixed_vector with llvm::SmallVector Signed-off-by: daquexian <daquexian566@gmail.com> * Shape inherited from llvm::SmallVector Signed-off-by: daquexian <daquexian566@gmail.com> * refine cmake Signed-off-by: daquexian <daquexian566@gmail.com> * rename fixed_vector to small_vector Signed-off-by: daquexian <daquexian566@gmail.com> * fix reviews Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI * update Shape constructor Signed-off-by: daquexian <daquexian566@gmail.com> * add 'PUBLIC' keyword to all target_link_libraries Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI * update cmake Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI * update cmake Signed-off-by: daquexian <daquexian566@gmail.com> * update cmake Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI * set is_initialized_ default to true Signed-off-by: daquexian <daquexian566@gmail.com> * override some methods to set is_initialized_ Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * Light plan for debug (#8396) * Light plan for debug * fix note * disable terminfo to fix missing terminfo symbols (#8400) * disable terminfo to fix missing terminfo symbols Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * fix bug of ZeRO MP in complex case (#8404) * Remove redundant output_lbns in ir (#8409) * mv case * remove redundant info * Dev FusedCrossInteraction[OneEmbedding] (#8335) * add simple fused cross interaction forward * add packed fused * Add cross interaction grad * simplify code * fix bug * support crossnet v2 * support cross interaction v2 * add lazy backward * Rename and add test * fix jc comment * fix comment * fix bug * fix userops td elem_cnt for FUSED Group * fix header file * fix clang static analysis * fix unittest Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * add exe graph physical shape check msg (#8002) * fix index select op in graph * add exe graph physical shape check msg * improve the debug information for the python stack trace 1. add a parameter 'max_stack_depth' to specify the max depth for the stack trace 2. refactor other debug related classes. * remove parens * update * resolve PR comments * update * update graph debug test file. * restore self._debug in class Graph and class ModuleBlock * Do not shorten the stack frame string if it is in debug mode * delete TODOs * disable conv3d test (#7969) Signed-off-by: daquexian <daquexian566@gmail.com> * skip layernorm random_data_warp test (#7941) * skip layernorm random_data_warp test * warp/block/uncached case only test gpu Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Lock click version (#7967) Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * add global avgpool unittest (#7585) * fix (#7978) Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Support negative dim in scatter op (#7934) * support negative dim in scatter op * refine scatter test * refine scatter test again Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * run barrier callback in BarrierPhyInstrOperand::~BarrierPhyInstrOperand (#7702) * run barrier callback in BarrierPhyInstrOperand::~BarrierPhyInstrOperand * lock gil in vm Callback thread * more comments for VirtualMachineEngine::Callback() * the Env is never destroyed. * export Env into python * more unittests * wait shared_ptr.use_count() == 0 * export unittest.TestCase in framework/unittest.py * SwitchToShuttingDownPhase * optional is_normal_exit * VirtualMachine::CloseVMThreads * Delete env_api.h env_api.h is deleted by master * reshape_only_one_dim_infered * address pr comments * fix a ref-cnt bug in TryRunBarrierInstruction. * rollback flow.env.all_device_placement * no distributed running test_shutting_down.py * auto format by CI * expand lifetime of module oneflow in test_shutting_down.py * refine del depend on of * capture oneflow._oneflow_internal.eager when calling sync in __del__ * add try in flaky test Co-authored-by: Luyang <flowingsun007@163.com> Co-authored-by: chengtbf <472491134@qq.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: Xiaoyu Xu <xiaoyulink@gmail.com> * Fix one hot scalar tensor bug (#7975) * fix reduce_sum scalar check bug * fix one_hot scalar tensor bug * fix clang tidy error Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * support ctor np array from of tensor (#7970) * support ctor np array from of tensor * add test case constructing np array from tensor * refine Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * add_manual_seed_all_api (#7957) * add_manual_seed_all_api * Update conf.py * refine * add test case * auto format by CI * Update random_generator.cpp * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * one_embedding add doc string (#7902) * add doc string * add example * add * fix doc * refine * address review * mb to MB * add make_table_option * option to options * refine * add forward Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Support numpy scalar parameters (#7935) * feat(functional): support numpy scalar parameters * rename inferface * feat(*): TensorIndex support numpy scalar * feat(TensorIndex): support advance indexing * add unittest and int32 support for branch feat-param_support_np_scalar (#7939) * add unittest * refactor unittest * add todo for int16 advanced indexing * add int32 supporting for advance indexing * auto format by CI Co-authored-by: Wang Yi <53533850+marigoold@users.noreply.github.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * fix tensor_scatter_nd_update (#7953) * fix tensor_scatter_nd_update * auto backward Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * fix one_embedding adam (#7974) * fix one_embedding adam * fix tidy * fix normal Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * speed test with score (#7990) Signed-off-by: daquexian <daquexian566@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Feat/graph del by ref (#7857) * remove IsMultiClient() and single client logic Signed-off-by: daquexian <daquexian566@gmail.com> * rename eager.multi_client to eager Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI * add py ref * refine new session * clean code * make scope api inner use * use session with ref cnt * run barrier callback in BarrierPhyInstrOperand::~BarrierPhyInstrOperand * test pass * lock gil in vm Callback thread * more comments for VirtualMachineEngine::Callback() * merge * merge rm single client * rm initenv * merge and fix master * refactor env c api * add debug code * fix and serving test pass * test passed * rm useless * rm useless code * format * rm useless include * rm sync in py * the Env is never destroyed. * export Env into python * more unittests * fix and pass tests * revert virtual_machine.cpp * revert core/vm * remove outdated python class oneflow.unittest.TestCase * graph test passed * wait shared_ptr.use_count() == 0 * export unittest.TestCase in framework/unittest.py * SwitchToShuttingDownPhase * optional is_normal_exit * VirtualMachine::CloseVMThreads * Delete env_api.h env_api.h is deleted by master * address pr comments * rm is env init * Clear empty thread when graph destroy (#7633) * Revert "Clear empty thread when graph destroy (#7633)" (#7860) This reverts commit 3e8585e. * fix a ref-cnt bug in TryRunBarrierInstruction. * rm env_api * fix clang-tidy error * fix clang-tidy in env_imp * refine env api * format * refine graph del and sync at shuttingdown * fix typo * add comment * rm useless * rm useless Co-authored-by: daquexian <daquexian566@gmail.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: lixinqi <lixinqi0703106@163.com> Co-authored-by: Li Xinqi <lixinqi2010@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: Luyang <flowingsun007@163.com> Co-authored-by: cheng cheng <472491134@qq.com> * [PersistentTable] Fix num blocks (#7986) Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Add auto benchmark for flowvision (#7806) * update yml * update workflow * add resnet50 * [PersistentTable] Async write (#7946) * [PersistentTable] Async write * fix Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * save log in separate dir by default (#7825) Signed-off-by: daquexian <daquexian566@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * fix index select op in graph * add exe graph physical shape check msg * improve the debug information for the python stack trace 1. add a parameter 'max_stack_depth' to specify the max depth for the stack trace 2. refactor other debug related classes. * remove parens * update * resolve PR comments * update * update graph debug test file. * restore self._debug in class Graph and class ModuleBlock * Do not shorten the stack frame string if it is in debug mode * delete TODOs * Revert "Merge branch 'master' into fea/graph_check_msg" This reverts commit 28833b7, reversing changes made to baadf60. * Revert "Revert "Merge branch 'master' into fea/graph_check_msg"" This reverts commit 1d5e196. * update * resolve conflicts * resolve conflicts Co-authored-by: Cijie Xia <cijie.xia@mail.utoronto.ca> Co-authored-by: daquexian <daquexian566@gmail.com> Co-authored-by: guo ran <360112263@qq.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: Shenghang Tsai <jackalcooper@gmail.com> Co-authored-by: Houjiang Chen <chenhoujiangcug@gmail.com> Co-authored-by: Peihong Liu <mosout@qq.com> Co-authored-by: Li Xinqi <lixinqi2010@gmail.com> Co-authored-by: Luyang <flowingsun007@163.com> Co-authored-by: chengtbf <472491134@qq.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com> Co-authored-by: liufengwei0103 <2472937968@qq.com> Co-authored-by: binbinHan <han_binbin@163.com> Co-authored-by: Yinggang Wang <wyg19970408@gmail.com> Co-authored-by: Wang Yi <53533850+marigoold@users.noreply.github.com> Co-authored-by: Shijie <821898965@qq.com> Co-authored-by: lixinqi <lixinqi0703106@163.com> Co-authored-by: Juncheng <liujuncheng1022@gmail.com> * add batch_matmul sbp (#8385) Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * suppress gcc11 false positive warning (#8401) Signed-off-by: daquexian <daquexian566@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * fix variable op conversion to tosa error in ninja c1 (#8412) * pub * move test iree resnet python script to oneflow_iree repo * add bracket * rename const_val to const_val_ and restore resnet.py test script Co-authored-by: Shenghang Tsai <jackalcooper@gmail.com> * nccl send/recv support different placement * refine * auto format by CI * rm out ctrl * auto format by CI Co-authored-by: guo-ran <360112263@qq.com> Co-authored-by: Shenghang Tsai <jackalcooper@gmail.com> Co-authored-by: daquexian <daquexian566@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: liufengwei0103 <2472937968@qq.com> Co-authored-by: Wang Yi <53533850+marigoold@users.noreply.github.com> Co-authored-by: ZZK <359521840@qq.com> Co-authored-by: hjchen2 <chenhoujiangcug@gmail.com> Co-authored-by: Juncheng <liujuncheng1022@gmail.com> Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com> Co-authored-by: Luyang <flowingsun007@163.com> Co-authored-by: binbinHan <han_binbin@163.com> Co-authored-by: Yinggang Wang <wyg19970408@gmail.com> Co-authored-by: Yao Zihang <1162526220@qq.com> Co-authored-by: yuhao <72971170+howin98@users.noreply.github.com> Co-authored-by: Xiaoyu Xu <xiaoyulink@gmail.com> Co-authored-by: cheng cheng <472491134@qq.com> Co-authored-by: Cijie Xia <cijie.xia@mail.utoronto.ca> Co-authored-by: Peihong Liu <mosout@qq.com> Co-authored-by: Li Xinqi <lixinqi2010@gmail.com> Co-authored-by: Shijie <821898965@qq.com> Co-authored-by: lixinqi <lixinqi0703106@163.com>

* Add distributed optional run (#8372) * Add * change deps * add install * add skip * autoprof supports bandwidth (#8367) * autoprof supports bandwidth Signed-off-by: daquexian <daquexian566@gmail.com> * print bandwidth Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * remove tmp buffer of cumprod cpu backward kernel (#8369) * remove tmp buffer of cumprod cpu backward kernel * refine * refine Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Move tensor api to cpython part3 (#8342) * add tensor_functions * concat py methods * add hash, restore tensor.py * check replacement * refine code, remove commented tensor.py * refine code * move some api * add cpu and cuda api * add triu tril norm and etc. * remove tensor_functions.h * move more api * move more api, refine size * fix typo * format code, remove useless include * refine code * refine code, fix typo * align .cuda to python * refine code * split some api to part3 for review * remove positional only arguments of argmax and argmin * remove arguments parse * modify arguments name in matmul and floor_divide * rename BINARY_FUNC to DIRECT_PASS_FUNC, modify some functions * refine code, format code * add inplace /=, add comments * remove name in macros * remove python api * remove redundant include * remove cout * format code * refactor tensor.size by directly call shape.at, refactor tensor.sub_ by calling nb_sub_ * remove redundant code * auto format by CI * fix typo, fix wrong call * modify idx datatype from int32 to int64 in tensor.size * add some DIRECT_PASS_FUNC * add cpu cuda var pow and etc. * add masked_fill any all * make REDUCE_FUNC macro, add reduce_* functions * add 0dim check in ReduceSumWhole, refine yaml * fix bug * restore add add_ sub sub_ * add unittest for tensor.half tensor.add tensor.add_ * refine code * refine code * fix typo * fix bug of tensor.std() * refactor var std and cuda, using c++ functional api * add beta and threshold in softplus * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Add nn_functor Check (#7910) * add bias_add_check * add bias_add error test * fix conv2d nhwc bias_add error * add nhwc conv test * add bias_add_error test * Add bias add error check * Rename * add batch matmul error check * add matmul check error msg * remove annotation * add fused mlp error msg check * Add pixel shuffle check test * add more test until normalization add relu functor * refine error message * finish all nnfunctor check msg * handle type error * remove useless symbol * modify back to TypeError * fix all comment * Remove redundant code * Remove pad ndim check * fix bias add space * fix check logic cause ci gpu not always gpu:0 Co-authored-by: hjchen2 <chenhoujiangcug@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Add FusedMatmulBiasAddReluDropout [OneEmbedding] (#8222) * previous version for fused_matmul_bias_add_relu_dropout * add op infer * fix detail * finish forward * support dropout rate list * add forward test * fix bug for output buffer * Configurable alpha params * try to add bit mask logic * Add bitmask first version! * Add row col bitmask logic * support not align4 reludropout * simplify relu dropout ld logic * Add naive relu dropout grad kernel * add simple relu dropout grad kernel * Rename * support relu_dropout bitmask backward * add vectorized optimization * fix tmp buffer * add to amp list * add lazy backward logic * Refine kernel * add indextype dispatch * simplify functor logic * fix cublas fused mlp aux_ld shape bug * Add more relu dropout kernel * add full unittest * fix bug in skip final activation * refine * Remove dump func * fix format * Remove cmake * remove redundant divide * add padded version * fix dropout * oneflow curand * refine * remove redundant kernel * add unroll logic * add unroll and ballot sync * refine format * Remove fast curand * Refine python interface * Add if branch for memset * fix python logic * just for debug * not use matmul bias add grad * add launch 1 block limit * fix unittest * Refine * fix graph backward bug * limit to 11060 * change to use int32_t dtype for cublas aux * Fix jc comment * fix comment * fix convert * fix static_analysis * fix at * fix userops td * fix userops td * fix const ref * fix compile error for bfloat16 * limit to 11060 * fix bug Co-authored-by: Juncheng <liujuncheng1022@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * fix gather 0-dim tensor bug (#8376) * fix 0-dim tensor bug * refine * support input 0-dim tensor for gather * refine * refine * refine dim_scatter_kernel check * refine * refine check * fix clang_tidy error Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * add api to apply external job pass (#8370) * Add condition to find-test-cache-distributed (#8387) * add condition to find-test-cache-distributed * fix * warp dim util (#8382) * warp dim util * format * use more maybe_wrap_dim * refine array functor * add more * refine math_functor * fix_bug_in_broadcast_min_max_grad_and_broadcast_like (#8379) * fix_bug_in_broadcast_min_max_grad_and_broadcast_like * refine * fix static check error * fix bug about index (#8388) * fix bug about index * add test case Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * LogicalSliceAssign support full slice sbp (#8344) * feat(SliceOp): slice ops support 2d sbp * fix(SliceOp): fix [B, P] 2d sbp bug * refine error message * fix bug in parallel_num == 1 * add comment * add warning and format * add NOLINT for boxing check * feat(LogicalSliceOps): support all nd_sbp * feat(LogicalSlice): support nd_sbp * add error message * fix(AutoTest): fix auto_test bug in module.parameter pass * auto format by CI * fix(LogicalSliceAssign): skip test when 1n1d * fix SliceParams memset error * remove memset * add CHECK_JUST * fix(*): make sure split_axis >= 0 or equal to SPLIT_AXIS_FOR_NON_SPLIT * remove memset * fix spilit_info.axis bug * feat(LogicalSliceOps): support grad * add logical_slice gradient_funcs * feat(LogicalSliceAssign): LogicalSliceAssign support full slice sbp * auto format by CI * test(LogicalSlice): fix logical_slice dims Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: Houjiang Chen <chenhoujiangcug@gmail.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * fix_tensor_from_numpy_mem_leak_bug (#8391) * fix_tensor_from_numpy_mem_leak_bug * add note * refine note * refine Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Make of_pyext_obj static only to make sure only a python ext so has python symbols (#8393) * make of_pyext_obj static only * refine note Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Adjust tolerance setting in embedding_renorm unit test (#8394) * support front end compile for job to iree (#8249) * support frontend dev version * polish name * add tosa-to-elf.mlir * tosa to elf by llvm * conv2d partial * an enhanced frontend runner * support numpy as input * enable multiple using nn graph with different input(jobname make it it cd /home/yuhao/frontend/oneflow ; /usr/bin/env /usr/bin/python3 /home/yuhao/.vscode-server/extensions/ms-python.python-2022.6.2/pythonFiles/lib/python/debugpy/launcher 40873 -- /home/yuhao/frontend/oneflow/oneflow/ir/test/Frontend/runner.py ) * enable multiple input * enable cpu and cuda * change full_name to _full_name * support exchange cuda with cpu seamlessly * remove pip * lit config * polish * trim * auto format by CI * modify * auto format by CI * last line polish * use unittest * auto format by CI * use allclose * auto format by CI * pulish * optimize convert oneflow to tosa * conv2d * conv2d enhanced && conv2d examples add * add road map * add add_n2Op and boardcast_addOp conversion * add matmulOp conversion * support converting normailzation op to tosa(partically) * update roadmap * support i64 tensor to dense elem attr * support 100% resnet op conversion * add test mlir * add test iree resnet python script * auto format by CI * done * enhance iree resnet test script * auto format by CI * rebuild code * auto format by CI * rebuild test script * update * auto format by CI * pub * trim test scripts * move * move * input and output add block arg judgement * emit error in variable conversion * error handle for ci * modify err info * auto format by CI * merge * auto format by CI * output not block * flow ones * rm const * trim maybe * trim maybe with header file * const auto * solve clangd error Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Feat/zero mix with mp (#8036) * add zero limit * add debug * add mix zero test * refactor zero api * zero test with mp * add 2d test * add zero nd * add nd zero * add sbp cast * test passed soft limit consumer * refine size api * zero use stage 2 * add limit consumer api * add new api * refine zero s select * fix index out of range * rm zero limit on device type * zero test with activation checkpointing * add indentity when dp sequence len is 1 * move to base with master * fix * fix * fix * add test * debug bad case * refine test for eager and graph boxing * test case ready * simplify * refine test * fix buff size * fix conflict * refine zero nd * refine * add full test * revert change * refine split check * fix typo * rm log * spit long func * restore test * Update optimizer_placement_optimization_pass.cpp * auto format by CI * auto format by CI * fix static check * add tips for zero api change * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Revert embedding normal path and fix amp list (#8374) * revert embedding normal path, fix amp list * fix amp * fix memset bug in gather cpu kernel Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * replace fixed_vector with small_vector and make Shape inherit from it (#8365) * Replace fixed_vector with llvm::SmallVector Signed-off-by: daquexian <daquexian566@gmail.com> * Shape inherited from llvm::SmallVector Signed-off-by: daquexian <daquexian566@gmail.com> * refine cmake Signed-off-by: daquexian <daquexian566@gmail.com> * rename fixed_vector to small_vector Signed-off-by: daquexian <daquexian566@gmail.com> * fix reviews Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI * update Shape constructor Signed-off-by: daquexian <daquexian566@gmail.com> * add 'PUBLIC' keyword to all target_link_libraries Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI * update cmake Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI * update cmake Signed-off-by: daquexian <daquexian566@gmail.com> * update cmake Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI * set is_initialized_ default to true Signed-off-by: daquexian <daquexian566@gmail.com> * override some methods to set is_initialized_ Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * Light plan for debug (#8396) * Light plan for debug * fix note * disable terminfo to fix missing terminfo symbols (#8400) * disable terminfo to fix missing terminfo symbols Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * fix bug of ZeRO MP in complex case (#8404) * Remove redundant output_lbns in ir (#8409) * mv case * remove redundant info * Dev FusedCrossInteraction[OneEmbedding] (#8335) * add simple fused cross interaction forward * add packed fused * Add cross interaction grad * simplify code * fix bug * support crossnet v2 * support cross interaction v2 * add lazy backward * Rename and add test * fix jc comment * fix comment * fix bug * fix userops td elem_cnt for FUSED Group * fix header file * fix clang static analysis * fix unittest Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * add exe graph physical shape check msg (#8002) * fix index select op in graph * add exe graph physical shape check msg * improve the debug information for the python stack trace 1. add a parameter 'max_stack_depth' to specify the max depth for the stack trace 2. refactor other debug related classes. * remove parens * update * resolve PR comments * update * update graph debug test file. * restore self._debug in class Graph and class ModuleBlock * Do not shorten the stack frame string if it is in debug mode * delete TODOs * disable conv3d test (#7969) Signed-off-by: daquexian <daquexian566@gmail.com> * skip layernorm random_data_warp test (#7941) * skip layernorm random_data_warp test * warp/block/uncached case only test gpu Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Lock click version (#7967) Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * add global avgpool unittest (#7585) * fix (#7978) Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Support negative dim in scatter op (#7934) * support negative dim in scatter op * refine scatter test * refine scatter test again Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * run barrier callback in BarrierPhyInstrOperand::~BarrierPhyInstrOperand (#7702) * run barrier callback in BarrierPhyInstrOperand::~BarrierPhyInstrOperand * lock gil in vm Callback thread * more comments for VirtualMachineEngine::Callback() * the Env is never destroyed. * export Env into python * more unittests * wait shared_ptr.use_count() == 0 * export unittest.TestCase in framework/unittest.py * SwitchToShuttingDownPhase * optional is_normal_exit * VirtualMachine::CloseVMThreads * Delete env_api.h env_api.h is deleted by master * reshape_only_one_dim_infered * address pr comments * fix a ref-cnt bug in TryRunBarrierInstruction. * rollback flow.env.all_device_placement * no distributed running test_shutting_down.py * auto format by CI * expand lifetime of module oneflow in test_shutting_down.py * refine del depend on of * capture oneflow._oneflow_internal.eager when calling sync in __del__ * add try in flaky test Co-authored-by: Luyang <flowingsun007@163.com> Co-authored-by: chengtbf <472491134@qq.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: Xiaoyu Xu <xiaoyulink@gmail.com> * Fix one hot scalar tensor bug (#7975) * fix reduce_sum scalar check bug * fix one_hot scalar tensor bug * fix clang tidy error Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * support ctor np array from of tensor (#7970) * support ctor np array from of tensor * add test case constructing np array from tensor * refine Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * add_manual_seed_all_api (#7957) * add_manual_seed_all_api * Update conf.py * refine * add test case * auto format by CI * Update random_generator.cpp * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * one_embedding add doc string (#7902) * add doc string * add example * add * fix doc * refine * address review * mb to MB * add make_table_option * option to options * refine * add forward Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Support numpy scalar parameters (#7935) * feat(functional): support numpy scalar parameters * rename inferface * feat(*): TensorIndex support numpy scalar * feat(TensorIndex): support advance indexing * add unittest and int32 support for branch feat-param_support_np_scalar (#7939) * add unittest * refactor unittest * add todo for int16 advanced indexing * add int32 supporting for advance indexing * auto format by CI Co-authored-by: Wang Yi <53533850+marigoold@users.noreply.github.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * fix tensor_scatter_nd_update (#7953) * fix tensor_scatter_nd_update * auto backward Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * fix one_embedding adam (#7974) * fix one_embedding adam * fix tidy * fix normal Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * speed test with score (#7990) Signed-off-by: daquexian <daquexian566@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Feat/graph del by ref (#7857) * remove IsMultiClient() and single client logic Signed-off-by: daquexian <daquexian566@gmail.com> * rename eager.multi_client to eager Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI * add py ref * refine new session * clean code * make scope api inner use * use session with ref cnt * run barrier callback in BarrierPhyInstrOperand::~BarrierPhyInstrOperand * test pass * lock gil in vm Callback thread * more comments for VirtualMachineEngine::Callback() * merge * merge rm single client * rm initenv * merge and fix master * refactor env c api * add debug code * fix and serving test pass * test passed * rm useless * rm useless code * format * rm useless include * rm sync in py * the Env is never destroyed. * export Env into python * more unittests * fix and pass tests * revert virtual_machine.cpp * revert core/vm * remove outdated python class oneflow.unittest.TestCase * graph test passed * wait shared_ptr.use_count() == 0 * export unittest.TestCase in framework/unittest.py * SwitchToShuttingDownPhase * optional is_normal_exit * VirtualMachine::CloseVMThreads * Delete env_api.h env_api.h is deleted by master * address pr comments * rm is env init * Clear empty thread when graph destroy (#7633) * Revert "Clear empty thread when graph destroy (#7633)" (#7860) This reverts commit 3e8585e. * fix a ref-cnt bug in TryRunBarrierInstruction. * rm env_api * fix clang-tidy error * fix clang-tidy in env_imp * refine env api * format * refine graph del and sync at shuttingdown * fix typo * add comment * rm useless * rm useless Co-authored-by: daquexian <daquexian566@gmail.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: lixinqi <lixinqi0703106@163.com> Co-authored-by: Li Xinqi <lixinqi2010@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: Luyang <flowingsun007@163.com> Co-authored-by: cheng cheng <472491134@qq.com> * [PersistentTable] Fix num blocks (#7986) Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Add auto benchmark for flowvision (#7806) * update yml * update workflow * add resnet50 * [PersistentTable] Async write (#7946) * [PersistentTable] Async write * fix Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * save log in separate dir by default (#7825) Signed-off-by: daquexian <daquexian566@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * fix index select op in graph * add exe graph physical shape check msg * improve the debug information for the python stack trace 1. add a parameter 'max_stack_depth' to specify the max depth for the stack trace 2. refactor other debug related classes. * remove parens * update * resolve PR comments * update * update graph debug test file. * restore self._debug in class Graph and class ModuleBlock * Do not shorten the stack frame string if it is in debug mode * delete TODOs * Revert "Merge branch 'master' into fea/graph_check_msg" This reverts commit 28833b7, reversing changes made to baadf60. * Revert "Revert "Merge branch 'master' into fea/graph_check_msg"" This reverts commit 1d5e196. * update * resolve conflicts * resolve conflicts Co-authored-by: Cijie Xia <cijie.xia@mail.utoronto.ca> Co-authored-by: daquexian <daquexian566@gmail.com> Co-authored-by: guo ran <360112263@qq.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: Shenghang Tsai <jackalcooper@gmail.com> Co-authored-by: Houjiang Chen <chenhoujiangcug@gmail.com> Co-authored-by: Peihong Liu <mosout@qq.com> Co-authored-by: Li Xinqi <lixinqi2010@gmail.com> Co-authored-by: Luyang <flowingsun007@163.com> Co-authored-by: chengtbf <472491134@qq.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com> Co-authored-by: liufengwei0103 <2472937968@qq.com> Co-authored-by: binbinHan <han_binbin@163.com> Co-authored-by: Yinggang Wang <wyg19970408@gmail.com> Co-authored-by: Wang Yi <53533850+marigoold@users.noreply.github.com> Co-authored-by: Shijie <821898965@qq.com> Co-authored-by: lixinqi <lixinqi0703106@163.com> Co-authored-by: Juncheng <liujuncheng1022@gmail.com> * add batch_matmul sbp (#8385) Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * suppress gcc11 false positive warning (#8401) Signed-off-by: daquexian <daquexian566@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * fix variable op conversion to tosa error in ninja c1 (#8412) * pub * move test iree resnet python script to oneflow_iree repo * add bracket * rename const_val to const_val_ and restore resnet.py test script Co-authored-by: Shenghang Tsai <jackalcooper@gmail.com> * Fix eval error in FusedMLP (#8413) Fix eval error * Init NCCL communicator in graph mode unifiedly (#8263) * centralized comm init * address review * revert * rename * ref nccl logical send recv * fix cpu only Co-authored-by: cheng cheng <472491134@qq.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * fix dim_scatter 0-dim tensor bug (#8418) Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * target based external libraries (#8421) Signed-off-by: daquexian <daquexian566@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Refine hardcoded attr setting/getting in ir (#8420) * use names in trait static func * more changes on op name attr * use wrapped func * Replace cu115 with cu116 in nightly (#8423) update workflows Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * fix repeat interleave 0-size tensor bug (#8414) Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Autotest support print input in ci (#8383) * support print tensor value in autotest to provide more details in ci * revert * refine * auto format by CI * control precision to 1e-5 when record * fix bug * auto format by CI * relax tensor_size_mb * fix bug * fix bug * refine * releax * refinew * refine * fix bug * relax * refine * restruct * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Modify sbp.split()'s karg: axis to dim (#8411) * Modify sbp.split()'s axis karg to dim * Refine * Refine * Refine * Refine * Feat/graph logical op debug repr (#8131) * add zero limit * add debug * add mix zero test * refactor zero api * zero test with mp * add 2d test * add zero nd * add nd zero * add sbp cast * test passed soft limit consumer * refine size api * add module config * save nn.Module info in job.proto for better debugging * add new line * add ModuleBlock.ops_proto() API * zero use stage 2 * print operators' info when print ModuleBlock * handle VariableOpConf * update * update * fix * move operators repr method to graph util * add limit consumer api * add new api * refine zero s select * add module block * fix * refact for rm op in module conf * fix * add sbp debug * add sbp repr * add shape * refine * add sys op in repr * add full op debug * fix index out of range * rm zero limit on device type * add no scope op to graph * zero test with activation checkpointing * fix order * add indentity when dp sequence len is 1 * add debug repr * refine repr of op * refine and fix * rm useless log * move to base with master * fix * fix * fix * fix proto * refine test * fix type * add test * debug bad case * refine test for eager and graph boxing * test case ready * simplify * refine test * fix buff size * fix conflict * refine zero nd * refine * add full test * revert change * refine split check * fix typo * rm log * spit long func * refine * restore test * refine pass and mem debug * merge master * repr dtype * add placement * Update optimizer_placement_optimization_pass.cpp * auto format by CI * auto format by CI * fix static check * add tips for zero api change * auto format by CI * fix merge * auto format by CI * auto format by CI * refine get job api * refine graph util import order * auto format by CI * fix static check * auto format by CI * fix special case * refine level print and add full dtype repr * rm useless Co-authored-by: Cijie Xia <cijie.xia@mail.utoronto.ca> Co-authored-by: Cijie Xia <xiacijie1998@163.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * rm some test case in test_fused_dot_feature_interaction_pooling_sum (#8425) rm some case in test * Remove unused linkages (#8426) remove unused linkages * refactor stride (#8402) * Stride inherits DimVector Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI * fix argument type of OFStrideToNumpyStride Signed-off-by: daquexian <daquexian566@gmail.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * Move Tensor.__setitem__ and global related api to Python/C api (#8375) * add local_to_global, global_to_global, to_global. global_to_global still have bugs * fix bug of global_to_global * remove python api * add setitem * remove local_to_global sbp pack, format code * format code * remove redundant code * add error msg, refine check of to_global * fix bug of check * add error msg * fix clang static check error * remove useless api in tensor.py, remove redundant code, remove useless CHECK * add to_local * fix wrong exception type in unittest for to_local exception message * cuda add default error msg (#8427) default error Co-authored-by: Shenghang Tsai <jackalcooper@gmail.com> * Refactor ShapeView (#8422) * update Signed-off-by: daquexian <daquexian566@gmail.com> * update and add docs Signed-off-by: daquexian <daquexian566@gmail.com> * turn on view slice (#8302) * turn_on_view_slice * inplace scalar math hnandle non-contiguous input * fix clang check * add docs * refactor * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * Add flow env init rdma api (#8415) * add_flow_env_init_rdma_api * adjust persistent_workers logic for RDMA support * adjust persistent_workers logic for RDMA support * add rmda_inited api * minro fix * add docs * Update python/oneflow/utils/data/dataloader.py Co-authored-by: daquexian <daquexian566@gmail.com> * fix typo * refine * fix RDMAIsInitialized * minor fix * refine * rename InitRdma to InitRDMA * refine Co-authored-by: Flowingsun007 <flowingsun007@163.com> Co-authored-by: daquexian <daquexian566@gmail.com> * add 1d send recv in nccl logical (#8355) * add 1d send recv in nccl logical * Update insert_nccl_logical_op_pass.cpp * auto format by CI Co-authored-by: cheng cheng <472491134@qq.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Support iree ci (#8419) * create mlir cpu and modify build gcc 7 shell script * fix the bug of test_iree_resnet.py cuda test in cpu version error * fix constant folding tests * suport oneflow_test_cpu_only * pub * build script add flag * modify test yml * add python3 into \PATH * don't use pretrain model * install flowvision Co-authored-by: mosout <mosout@qq.com> Co-authored-by: jackalcooper <jackalcooper@gmail.com> * Feat straighten task nodes (#8347) * Add a fast topological traversal * Add an initial implementation of straighen nodes * Add the straighen nodes algorithm * Change algorithm structure * Remove some debug information * Finalize the straighten algorithm after deciding the parameters by experiments * Notify the usage of straighten algorithm * Of format * Update oneflow/core/graph/straighten_nodes.cpp Of format Co-authored-by: daquexian <daquexian566@gmail.com> * Of format * Stop using visual string before we find a better key * Remove magic numbers and Of format * Remove starts * Of format * Fix a bug of using GetMaxVal<int32_t>() as an initial number for comparing * Refactor add straighten algo interface (#8435) * feat(*): export straighten nodes algorithm inferface * export documentation * Update python/oneflow/nn/graph/graph_config.py Co-authored-by: Yipeng Li <jamesonli1313@gmail.com> Co-authored-by: Yipeng Li <jamesonli1313@gmail.com> * Use TopoForEachNodeFast as default. (#8436) * Use TopoForEachNodeFast as default. Rename the original one as TopoForEachNodeDynamic * Speed up TopoForEachNodeFast when traversing a subgraph * Rename the switch and code clean up * Hide the class TopoStruct * Hide all the other functions * Grammar * Of format Co-authored-by: daquexian <daquexian566@gmail.com> Co-authored-by: Yinggang Wang <wyg19970408@gmail.com> * Refactor NLLLoss to support split class dim (#8380) * refactor * RuntimeError * avoid atomic add * test * fixes * update test * update test * update test * fix kernel * improve backward * update test * out_weight to be required * address static analysis errer * fix static analysis error * fix static analysis error Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Strict ordering in memory reuse algorithm (#8441) * Support broadcast in fused_softmax kernel (#8321) * support broadcast * refine * Remove shape check * fix sbp when broadcast * rollback softmax grad threshold * increase threshold of test conv bn folding * tol to 1e-2 * check error msg of fuse softmax ops * add more dispatch * remove double datatype test and add broadcast test Co-authored-by: cheng cheng <472491134@qq.com> * Merge slice and logical slice (#8416) * remove Slice, SliceUpdate, SliceGrad op * rename logical_slice to slice and logical_slice_assign to slice_update * move gradient_func logical_slice.cpp to slice.cpp * fix some bug and refine local test * feat(SliceUpdate): support 0size tensor * test(Slice): refine consistent slice test * test(SliceUpdate): refine consistent slice_update test * not export slice_update's inplace parameter * auto format by CI * recovery slice_grad_op * fix slice_view bug * add error message and attr judgement * modified old test * auto format by CI * update test README * update tensor_string code * fix test bug * auto format by CI * fix(hsplit): hsplit functor bug * fix vsplit doc test bug * refine * fix test * fix pin_memory bug Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Graph block.config.set_stage() for recommended Pipeline api. (#8442) * Graph block.config.set_stage() for recommended Pipeline api. * revert diff * refine api doc Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Update PolynomialLR's doc and paramater (#8430) * update PolynomialLR doc, current_batch = min(decay_batch, current_batch) * * update PolynomialLR doc, current_batch = min(decay_batch, current_batch) * rename the steps to decay_batch in parameters * update PolynomialLR test case Co-authored-by: Yinggang Wang <wyg19970408@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Add mv op (#8445) * add mv op with bug that Int is incompatible * add test * update test_mv.py * fix based on comments * fix based on comments Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * enable oneflow_iree(python package) and corresponding test works in ci (#8431) * update test.yml * add pytest for oneflow_iree examples * add oneflow frontend test * Dev tensor is pinned api (#8447) * support tensor.is_pinned * add test case * add docs * auto format by CI * refine * auto format by CI * refine * auto format by CI * refine * refine * refine Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * Nd sbp tensor str (#8458) * nd sbp tensor str * add nd sbp tensor str test * bigger input size * refine Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Patch sbp cost (#8378) * Add a slight cost for B->S and B->P in 2d sbp * Add penalty for P in consumer * Add the slight penalty for eager * Consider B -> (B, B) for a scalar * Do not consider parallel description in priority ratio * Of format * Fix a bug in the old version group boxing with 2D SBP (#8448) * Update group boxing to deal with hierarchy [1, 2] * Use a uniform sbp while grouping consumers * Steal "ParallelDimReduce" from "hierarchical_sub_task_graph_builder_impl" to "sbp_infer_util" * Fix bugs of patch-sbp_cost (#8456) * Update group boxing to deal with hierarchy [1, 2] * Use a uniform sbp while grouping consumers * Steal "ParallelDimReduce" from "hierarchical_sub_task_graph_builder_impl" to "sbp_infer_util" * Reduce to uniform B for 1 device. Use the actual parallel description for each tensor * Fix a bug of fix-group_boxing-bug * Group boxing reduce [2, 2]: (S0, S0) to [4]: S0, then we might infer a 1D SBP from a 2D SBP hint Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: cheng cheng <472491134@qq.com> * Decouple stream and instruction (#7607) * remove deprecated python api * backup code * backup code * fix compiler complaints * fix typo in refactoring * kMockDevice * add unit test test_mock.py * revert mock kernels * vert DEVICE_TYPE_SEQ * mock placement * address pr comments * register device kCriticalSectionDevice and kLazyJobLauncher * kControlDevice * Stream::vm_stream_ * fix compiler complaints * backup code * rename StreamIsTransport to IsCommNetStream * decouple vm::StreamType and vm::InstructionType * fix compiler complaints * remove 'gpu' related code * address static analyzer complaints * address static analyzer complaints * remove unused module in test_mock.py * the Env is never destroyed. * export Env into python * more unittests * export unittest.TestCase in framework/unittest.py * SwitchToShuttingDownPhase * optional is_normal_exit * VirtualMachine::CloseVMThreads * Delete env_api.h env_api.h is deleted by master * reshape_only_one_dim_infered * address pr comments * rollback flow.env.all_device_placement * no distributed running test_shutting_down.py * auto format by CI * expand lifetime of module oneflow in test_shutting_down.py * refine del depend on of * fix oneflow.placement.__str__ * revert GlobalSync * init_producer_stream in oneflow.from_numpy * debug code for vm * init disable_vm_threads_ in VirtualMachine::VirtualMachine * Update oneflow/core/vm/virtual_machine.h Co-authored-by: daquexian <daquexian566@gmail.com> * create stream in forked subprocesses. * refactor StreamRoleSwitch to StreamRoleVisistor * ThreadLocalGuard * auto format by CI * fix compiler complaints * fix static analyzer complaints * VirtualMachine::GetVmStream * fix static analyzer complaints * reimplement AddAndReadVector by std::deque * reimplement AddAndReadVector * merge master * increase atol for test_consistent_rnn_cell.py * StreamRole::AsyncLaunchedCommNet is bound to EventRecordedCudaStreamType * auto format by CI * remove StreamRoleVisitor<T>::VisitInvalid * no copy in AddAndReadVector * fix bug of AddAndReadVector::size_ * disable terminfo to fix missing terminfo symbols Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI * fix AddAndReadVector::GetGranularity * remove bad unittest * auto format by CI * rename CallInstructionType to OpCallInstructionType * static variable GlobalSingletonPtr is a unique_ptr * replace ++atomic_cnt with atomic_cnt.fetch_add(1, std::memory_order_relaxed) * AddAndReadVector::operator[] * change comments 'lock free' to 'thread safe' * rename StatefulLocalOpKernel to StatefulOpKernel * rename VirtualMachine::vm_ to VirtualMachine::engine_ * mark VirtualMachine::NoMoreErasedInstructions private * mark VirtualMachine::FindOrCreateScheduleLocalDepObject private * remove unused version of VirtualMachineEngine::Receive * rename argname for VirtualMachineEngine::Receive * rename unused PendingInstructionList * rename AddAndReadVector to SteadyVector * optimize SteadyVector::operator[] by __builtin_clzll * refactor SteadyVector::granularity2vector_ to SteadyVector::granularity2data_ * reduce usage of steady_vector::size_ * rename unused anounymous namespace * greater atol for test_consistent_tensordot.py * fix BarrierInstructionType::ComputeInFuseMode * revert container_util.h * run AccessBlobByCallback in default stream of tensor->device * reslove static check * reslove static check * SteadyVector::MutableOrAdd Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: chengtbf <472491134@qq.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: Xiaoyu Xu <xiaoyulink@gmail.com> Co-authored-by: daquexian <daquexian566@gmail.com> Co-authored-by: binbinHan <han_binbin@163.com> * fix_tensor_numpy_to_avoid_gpu_mem_increase (#8449) * fix_tensor_numpy_to_avoid_gpu_mem_increase * Update tensor.py * auto format by CI Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * Rename user op tensor shape to shape view (#8433) * ThreadLocalGuard * rename user_op::Tensor::shape to user_op::Tensor::shape_view * auto format by CI * fix static analyzer complaints * more verbose code for HobDataType * larger timeout * larger timeout Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: jackalcooper <jackalcooper@gmail.com> Co-authored-by: binbinHan <han_binbin@163.com> * speedup global test (#8468) * speedup global test * Test refine slice ops test (#8471) * refine consistent_slice test from 112s -> 30s in 4 device * test(SliceUpdate): refine test from 119s -> 28s in 4 device * delete useless code * auto format by CI Co-authored-by: Yinggang Wang <wyg19970408@gmail.com> Co-authored-by: wyg1997 <wangyinggang@foxmail.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * Set the minimum mtu value for IB communication connection (#8451) * Set the minimum mtu value for IB communication connection * refine * refine Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Merge branch 'master' into feat-general_basic_communication Co-authored-by: Shenghang Tsai <jackalcooper@gmail.com> Co-authored-by: daquexian <daquexian566@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: liufengwei0103 <2472937968@qq.com> Co-authored-by: Wang Yi <53533850+marigoold@users.noreply.github.com> Co-authored-by: ZZK <359521840@qq.com> Co-authored-by: hjchen2 <chenhoujiangcug@gmail.com> Co-authored-by: Juncheng <liujuncheng1022@gmail.com> Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com> Co-authored-by: Luyang <flowingsun007@163.com> Co-authored-by: binbinHan <han_binbin@163.com> Co-authored-by: Yinggang Wang <wyg19970408@gmail.com> Co-authored-by: Yao Zihang <1162526220@qq.com> Co-authored-by: yuhao <72971170+howin98@users.noreply.github.com> Co-authored-by: Xiaoyu Xu <xiaoyulink@gmail.com> Co-authored-by: cheng cheng <472491134@qq.com> Co-authored-by: Cijie Xia <cijie.xia@mail.utoronto.ca> Co-authored-by: guo ran <360112263@qq.com> Co-authored-by: Peihong Liu <mosout@qq.com> Co-authored-by: Li Xinqi <lixinqi2010@gmail.com> Co-authored-by: Shijie <821898965@qq.com> Co-authored-by: lixinqi <lixinqi0703106@163.com> Co-authored-by: leaves-zwx <kunta0932@gmail.com> Co-authored-by: Li Xiang <54010254+lixiang007666@users.noreply.github.com> Co-authored-by: Cijie Xia <xiacijie1998@163.com> Co-authored-by: Jia <basicv8vc@gmail.com> Co-authored-by: Shanshan Zhong <62104945+zhongshsh@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Co-authored-by: wyg1997 <wangyinggang@foxmail.com> Co-authored-by: Yu OuYang <xuanjiuye@gmail.com>

* Add a slight cost for B->S and B->P in 2d sbp * Add penalty for P in consumer * Fix a slight bug * Add at most 1 middle node for general basic communication * Add the cost for general basic communication * Add the slight penalty for eager * Skip initialization of boxing collector if not needed * Fix a bug * Dev nd nccl send recv boxing (#8467) * nd nccl_send_recv_boxing * rm print * support num_axes > 2 * Add distributed optional run (#8372) * Add * change deps * add install * add skip * autoprof supports bandwidth (#8367) * autoprof supports bandwidth Signed-off-by: daquexian <daquexian566@gmail.com> * print bandwidth Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * remove tmp buffer of cumprod cpu backward kernel (#8369) * remove tmp buffer of cumprod cpu backward kernel * refine * refine Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Move tensor api to cpython part3 (#8342) * add tensor_functions * concat py methods * add hash, restore tensor.py * check replacement * refine code, remove commented tensor.py * refine code * move some api * add cpu and cuda api * add triu tril norm and etc. * remove tensor_functions.h * move more api * move more api, refine size * fix typo * format code, remove useless include * refine code * refine code, fix typo * align .cuda to python * refine code * split some api to part3 for review * remove positional only arguments of argmax and argmin * remove arguments parse * modify arguments name in matmul and floor_divide * rename BINARY_FUNC to DIRECT_PASS_FUNC, modify some functions * refine code, format code * add inplace /=, add comments * remove name in macros * remove python api * remove redundant include * remove cout * format code * refactor tensor.size by directly call shape.at, refactor tensor.sub_ by calling nb_sub_ * remove redundant code * auto format by CI * fix typo, fix wrong call * modify idx datatype from int32 to int64 in tensor.size * add some DIRECT_PASS_FUNC * add cpu cuda var pow and etc. * add masked_fill any all * make REDUCE_FUNC macro, add reduce_* functions * add 0dim check in ReduceSumWhole, refine yaml * fix bug * restore add add_ sub sub_ * add unittest for tensor.half tensor.add tensor.add_ * refine code * refine code * fix typo * fix bug of tensor.std() * refactor var std and cuda, using c++ functional api * add beta and threshold in softplus * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Add nn_functor Check (#7910) * add bias_add_check * add bias_add error test * fix conv2d nhwc bias_add error * add nhwc conv test * add bias_add_error test * Add bias add error check * Rename * add batch matmul error check * add matmul check error msg * remove annotation * add fused mlp error msg check * Add pixel shuffle check test * add more test until normalization add relu functor * refine error message * finish all nnfunctor check msg * handle type error * remove useless symbol * modify back to TypeError * fix all comment * Remove redundant code * Remove pad ndim check * fix bias add space * fix check logic cause ci gpu not always gpu:0 Co-authored-by: hjchen2 <chenhoujiangcug@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Add FusedMatmulBiasAddReluDropout [OneEmbedding] (#8222) * previous version for fused_matmul_bias_add_relu_dropout * add op infer * fix detail * finish forward * support dropout rate list * add forward test * fix bug for output buffer * Configurable alpha params * try to add bit mask logic * Add bitmask first version! * Add row col bitmask logic * support not align4 reludropout * simplify relu dropout ld logic * Add naive relu dropout grad kernel * add simple relu dropout grad kernel * Rename * support relu_dropout bitmask backward * add vectorized optimization * fix tmp buffer * add to amp list * add lazy backward logic * Refine kernel * add indextype dispatch * simplify functor logic * fix cublas fused mlp aux_ld shape bug * Add more relu dropout kernel * add full unittest * fix bug in skip final activation * refine * Remove dump func * fix format * Remove cmake * remove redundant divide * add padded version * fix dropout * oneflow curand * refine * remove redundant kernel * add unroll logic * add unroll and ballot sync * refine format * Remove fast curand * Refine python interface * Add if branch for memset * fix python logic * just for debug * not use matmul bias add grad * add launch 1 block limit * fix unittest * Refine * fix graph backward bug * limit to 11060 * change to use int32_t dtype for cublas aux * Fix jc comment * fix comment * fix convert * fix static_analysis * fix at * fix userops td * fix userops td * fix const ref * fix compile error for bfloat16 * limit to 11060 * fix bug Co-authored-by: Juncheng <liujuncheng1022@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * fix gather 0-dim tensor bug (#8376) * fix 0-dim tensor bug * refine * support input 0-dim tensor for gather * refine * refine * refine dim_scatter_kernel check * refine * refine check * fix clang_tidy error Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * add api to apply external job pass (#8370) * Add condition to find-test-cache-distributed (#8387) * add condition to find-test-cache-distributed * fix * warp dim util (#8382) * warp dim util * format * use more maybe_wrap_dim * refine array functor * add more * refine math_functor * fix_bug_in_broadcast_min_max_grad_and_broadcast_like (#8379) * fix_bug_in_broadcast_min_max_grad_and_broadcast_like * refine * fix static check error * fix bug about index (#8388) * fix bug about index * add test case Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * LogicalSliceAssign support full slice sbp (#8344) * feat(SliceOp): slice ops support 2d sbp * fix(SliceOp): fix [B, P] 2d sbp bug * refine error message * fix bug in parallel_num == 1 * add comment * add warning and format * add NOLINT for boxing check * feat(LogicalSliceOps): support all nd_sbp * feat(LogicalSlice): support nd_sbp * add error message * fix(AutoTest): fix auto_test bug in module.parameter pass * auto format by CI * fix(LogicalSliceAssign): skip test when 1n1d * fix SliceParams memset error * remove memset * add CHECK_JUST * fix(*): make sure split_axis >= 0 or equal to SPLIT_AXIS_FOR_NON_SPLIT * remove memset * fix spilit_info.axis bug * feat(LogicalSliceOps): support grad * add logical_slice gradient_funcs * feat(LogicalSliceAssign): LogicalSliceAssign support full slice sbp * auto format by CI * test(LogicalSlice): fix logical_slice dims Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: Houjiang Chen <chenhoujiangcug@gmail.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * fix_tensor_from_numpy_mem_leak_bug (#8391) * fix_tensor_from_numpy_mem_leak_bug * add note * refine note * refine Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Make of_pyext_obj static only to make sure only a python ext so has python symbols (#8393) * make of_pyext_obj static only * refine note Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Adjust tolerance setting in embedding_renorm unit test (#8394) * support front end compile for job to iree (#8249) * support frontend dev version * polish name * add tosa-to-elf.mlir * tosa to elf by llvm * conv2d partial * an enhanced frontend runner * support numpy as input * enable multiple using nn graph with different input(jobname make it it cd /home/yuhao/frontend/oneflow ; /usr/bin/env /usr/bin/python3 /home/yuhao/.vscode-server/extensions/ms-python.python-2022.6.2/pythonFiles/lib/python/debugpy/launcher 40873 -- /home/yuhao/frontend/oneflow/oneflow/ir/test/Frontend/runner.py ) * enable multiple input * enable cpu and cuda * change full_name to _full_name * support exchange cuda with cpu seamlessly * remove pip * lit config * polish * trim * auto format by CI * modify * auto format by CI * last line polish * use unittest * auto format by CI * use allclose * auto format by CI * pulish * optimize convert oneflow to tosa * conv2d * conv2d enhanced && conv2d examples add * add road map * add add_n2Op and boardcast_addOp conversion * add matmulOp conversion * support converting normailzation op to tosa(partically) * update roadmap * support i64 tensor to dense elem attr * support 100% resnet op conversion * add test mlir * add test iree resnet python script * auto format by CI * done * enhance iree resnet test script * auto format by CI * rebuild code * auto format by CI * rebuild test script * update * auto format by CI * pub * trim test scripts * move * move * input and output add block arg judgement * emit error in variable conversion * error handle for ci * modify err info * auto format by CI * merge * auto format by CI * output not block * flow ones * rm const * trim maybe * trim maybe with header file * const auto * solve clangd error Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Feat/zero mix with mp (#8036) * add zero limit * add debug * add mix zero test * refactor zero api * zero test with mp * add 2d test * add zero nd * add nd zero * add sbp cast * test passed soft limit consumer * refine size api * zero use stage 2 * add limit consumer api * add new api * refine zero s select * fix index out of range * rm zero limit on device type * zero test with activation checkpointing * add indentity when dp sequence len is 1 * move to base with master * fix * fix * fix * add test * debug bad case * refine test for eager and graph boxing * test case ready * simplify * refine test * fix buff size * fix conflict * refine zero nd * refine * add full test * revert change * refine split check * fix typo * rm log * spit long func * restore test * Update optimizer_placement_optimization_pass.cpp * auto format by CI * auto format by CI * fix static check * add tips for zero api change * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Revert embedding normal path and fix amp list (#8374) * revert embedding normal path, fix amp list * fix amp * fix memset bug in gather cpu kernel Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * replace fixed_vector with small_vector and make Shape inherit from it (#8365) * Replace fixed_vector with llvm::SmallVector Signed-off-by: daquexian <daquexian566@gmail.com> * Shape inherited from llvm::SmallVector Signed-off-by: daquexian <daquexian566@gmail.com> * refine cmake Signed-off-by: daquexian <daquexian566@gmail.com> * rename fixed_vector to small_vector Signed-off-by: daquexian <daquexian566@gmail.com> * fix reviews Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI * update Shape constructor Signed-off-by: daquexian <daquexian566@gmail.com> * add 'PUBLIC' keyword to all target_link_libraries Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI * update cmake Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI * update cmake Signed-off-by: daquexian <daquexian566@gmail.com> * update cmake Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI * set is_initialized_ default to true Signed-off-by: daquexian <daquexian566@gmail.com> * override some methods to set is_initialized_ Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * Light plan for debug (#8396) * Light plan for debug * fix note * disable terminfo to fix missing terminfo symbols (#8400) * disable terminfo to fix missing terminfo symbols Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * fix bug of ZeRO MP in complex case (#8404) * Remove redundant output_lbns in ir (#8409) * mv case * remove redundant info * Dev FusedCrossInteraction[OneEmbedding] (#8335) * add simple fused cross interaction forward * add packed fused * Add cross interaction grad * simplify code * fix bug * support crossnet v2 * support cross interaction v2 * add lazy backward * Rename and add test * fix jc comment * fix comment * fix bug * fix userops td elem_cnt for FUSED Group * fix header file * fix clang static analysis * fix unittest Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * add exe graph physical shape check msg (#8002) * fix index select op in graph * add exe graph physical shape check msg * improve the debug information for the python stack trace 1. add a parameter 'max_stack_depth' to specify the max depth for the stack trace 2. refactor other debug related classes. * remove parens * update * resolve PR comments * update * update graph debug test file. * restore self._debug in class Graph and class ModuleBlock * Do not shorten the stack frame string if it is in debug mode * delete TODOs * disable conv3d test (#7969) Signed-off-by: daquexian <daquexian566@gmail.com> * skip layernorm random_data_warp test (#7941) * skip layernorm random_data_warp test * warp/block/uncached case only test gpu Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Lock click version (#7967) Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * add global avgpool unittest (#7585) * fix (#7978) Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Support negative dim in scatter op (#7934) * support negative dim in scatter op * refine scatter test * refine scatter test again Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * run barrier callback in BarrierPhyInstrOperand::~BarrierPhyInstrOperand (#7702) * run barrier callback in BarrierPhyInstrOperand::~BarrierPhyInstrOperand * lock gil in vm Callback thread * more comments for VirtualMachineEngine::Callback() * the Env is never destroyed. * export Env into python * more unittests * wait shared_ptr.use_count() == 0 * export unittest.TestCase in framework/unittest.py * SwitchToShuttingDownPhase * optional is_normal_exit * VirtualMachine::CloseVMThreads * Delete env_api.h env_api.h is deleted by master * reshape_only_one_dim_infered * address pr comments * fix a ref-cnt bug in TryRunBarrierInstruction. * rollback flow.env.all_device_placement * no distributed running test_shutting_down.py * auto format by CI * expand lifetime of module oneflow in test_shutting_down.py * refine del depend on of * capture oneflow._oneflow_internal.eager when calling sync in __del__ * add try in flaky test Co-authored-by: Luyang <flowingsun007@163.com> Co-authored-by: chengtbf <472491134@qq.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: Xiaoyu Xu <xiaoyulink@gmail.com> * Fix one hot scalar tensor bug (#7975) * fix reduce_sum scalar check bug * fix one_hot scalar tensor bug * fix clang tidy error Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * support ctor np array from of tensor (#7970) * support ctor np array from of tensor * add test case constructing np array from tensor * refine Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * add_manual_seed_all_api (#7957) * add_manual_seed_all_api * Update conf.py * refine * add test case * auto format by CI * Update random_generator.cpp * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * one_embedding add doc string (#7902) * add doc string * add example * add * fix doc * refine * address review * mb to MB * add make_table_option * option to options * refine * add forward Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Support numpy scalar parameters (#7935) * feat(functional): support numpy scalar parameters * rename inferface * feat(*): TensorIndex support numpy scalar * feat(TensorIndex): support advance indexing * add unittest and int32 support for branch feat-param_support_np_scalar (#7939) * add unittest * refactor unittest * add todo for int16 advanced indexing * add int32 supporting for advance indexing * auto format by CI Co-authored-by: Wang Yi <53533850+marigoold@users.noreply.github.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * fix tensor_scatter_nd_update (#7953) * fix tensor_scatter_nd_update * auto backward Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * fix one_embedding adam (#7974) * fix one_embedding adam * fix tidy * fix normal Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * speed test with score (#7990) Signed-off-by: daquexian <daquexian566@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Feat/graph del by ref (#7857) * remove IsMultiClient() and single client logic Signed-off-by: daquexian <daquexian566@gmail.com> * rename eager.multi_client to eager Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI * add py ref * refine new session * clean code * make scope api inner use * use session with ref cnt * run barrier callback in BarrierPhyInstrOperand::~BarrierPhyInstrOperand * test pass * lock gil in vm Callback thread * more comments for VirtualMachineEngine::Callback() * merge * merge rm single client * rm initenv * merge and fix master * refactor env c api * add debug code * fix and serving test pass * test passed * rm useless * rm useless code * format * rm useless include * rm sync in py * the Env is never destroyed. * export Env into python * more unittests * fix and pass tests * revert virtual_machine.cpp * revert core/vm * remove outdated python class oneflow.unittest.TestCase * graph test passed * wait shared_ptr.use_count() == 0 * export unittest.TestCase in framework/unittest.py * SwitchToShuttingDownPhase * optional is_normal_exit * VirtualMachine::CloseVMThreads * Delete env_api.h env_api.h is deleted by master * address pr comments * rm is env init * Clear empty thread when graph destroy (#7633) * Revert "Clear empty thread when graph destroy (#7633)" (#7860) This reverts commit 3e8585e5fa20b97229d6b0be46a7ff814dc8cd83. * fix a ref-cnt bug in TryRunBarrierInstruction. * rm env_api * fix clang-tidy error * fix clang-tidy in env_imp * refine env api * format * refine graph del and sync at shuttingdown * fix typo * add comment * rm useless * rm useless Co-authored-by: daquexian <daquexian566@gmail.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: lixinqi <lixinqi0703106@163.com> Co-authored-by: Li Xinqi <lixinqi2010@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: Luyang <flowingsun007@163.com> Co-authored-by: cheng cheng <472491134@qq.com> * [PersistentTable] Fix num blocks (#7986) Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Add auto benchmark for flowvision (#7806) * update yml * update workflow * add resnet50 * [PersistentTable] Async write (#7946) * [PersistentTable] Async write * fix Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * save log in separate dir by default (#7825) Signed-off-by: daquexian <daquexian566@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * fix index select op in graph * add exe graph physical shape check msg * improve the debug information for the python stack trace 1. add a parameter 'max_stack_depth' to specify the max depth for the stack trace 2. refactor other debug related classes. * remove parens * update * resolve PR comments * update * update graph debug test file. * restore self._debug in class Graph and class ModuleBlock * Do not shorten the stack frame string if it is in debug mode * delete TODOs * Revert "Merge branch 'master' into fea/graph_check_msg" This reverts commit 28833b73a8041463e5e3d130784be386ee248bd8, reversing changes made to baadf6045f2fce69c090e442a755229c1c949773. * Revert "Revert "Merge branch 'master' into fea/graph_check_msg"" This reverts commit 1d5e196d8530ffd2b9bf781abcf168b94ff9ca41. * update * resolve conflicts * resolve conflicts Co-authored-by: Cijie Xia <cijie.xia@mail.utoronto.ca> Co-authored-by: daquexian <daquexian566@gmail.com> Co-authored-by: guo ran <360112263@qq.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: Shenghang Tsai <jackalcooper@gmail.com> Co-authored-by: Houjiang Chen <chenhoujiangcug@gmail.com> Co-authored-by: Peihong Liu <mosout@qq.com> Co-authored-by: Li Xinqi <lixinqi2010@gmail.com> Co-authored-by: Luyang <flowingsun007@163.com> Co-authored-by: chengtbf <472491134@qq.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com> Co-authored-by: liufengwei0103 <2472937968@qq.com> Co-authored-by: binbinHan <han_binbin@163.com> Co-authored-by: Yinggang Wang <wyg19970408@gmail.com> Co-authored-by: Wang Yi <53533850+marigoold@users.noreply.github.com> Co-authored-by: Shijie <821898965@qq.com> Co-authored-by: lixinqi <lixinqi0703106@163.com> Co-authored-by: Juncheng <liujuncheng1022@gmail.com> * add batch_matmul sbp (#8385) Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * suppress gcc11 false positive warning (#8401) Signed-off-by: daquexian <daquexian566@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * fix variable op conversion to tosa error in ninja c1 (#8412) * pub * move test iree resnet python script to oneflow_iree repo * add bracket * rename const_val to const_val_ and restore resnet.py test script Co-authored-by: Shenghang Tsai <jackalcooper@gmail.com> * nccl send/recv support different placement * refine * auto format by CI * rm out ctrl * auto format by CI Co-authored-by: guo-ran <360112263@qq.com> Co-authored-by: Shenghang Tsai <jackalcooper@gmail.com> Co-authored-by: daquexian <daquexian566@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: liufengwei0103 <2472937968@qq.com> Co-authored-by: Wang Yi <53533850+marigoold@users.noreply.github.com> Co-authored-by: ZZK <359521840@qq.com> Co-authored-by: hjchen2 <chenhoujiangcug@gmail.com> Co-authored-by: Juncheng <liujuncheng1022@gmail.com> Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com> Co-authored-by: Luyang <flowingsun007@163.com> Co-authored-by: binbinHan <han_binbin@163.com> Co-authored-by: Yinggang Wang <wyg19970408@gmail.com> Co-authored-by: Yao Zihang <1162526220@qq.com> Co-authored-by: yuhao <72971170+howin98@users.noreply.github.com> Co-authored-by: Xiaoyu Xu <xiaoyulink@gmail.com> Co-authored-by: cheng cheng <472491134@qq.com> Co-authored-by: Cijie Xia <cijie.xia@mail.utoronto.ca> Co-authored-by: Peihong Liu <mosout@qq.com> Co-authored-by: Li Xinqi <lixinqi2010@gmail.com> Co-authored-by: Shijie <821898965@qq.com> Co-authored-by: lixinqi <lixinqi0703106@163.com> * Support different hierarchy * Merge branch 'master' into feat-general_basic_communication (#8477) * Add distributed optional run (#8372) * Add * change deps * add install * add skip * autoprof supports bandwidth (#8367) * autoprof supports bandwidth Signed-off-by: daquexian <daquexian566@gmail.com> * print bandwidth Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * remove tmp buffer of cumprod cpu backward kernel (#8369) * remove tmp buffer of cumprod cpu backward kernel * refine * refine Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Move tensor api to cpython part3 (#8342) * add tensor_functions * concat py methods * add hash, restore tensor.py * check replacement * refine code, remove commented tensor.py * refine code * move some api * add cpu and cuda api * add triu tril norm and etc. * remove tensor_functions.h * move more api * move more api, refine size * fix typo * format code, remove useless include * refine code * refine code, fix typo * align .cuda to python * refine code * split some api to part3 for review * remove positional only arguments of argmax and argmin * remove arguments parse * modify arguments name in matmul and floor_divide * rename BINARY_FUNC to DIRECT_PASS_FUNC, modify some functions * refine code, format code * add inplace /=, add comments * remove name in macros * remove python api * remove redundant include * remove cout * format code * refactor tensor.size by directly call shape.at, refactor tensor.sub_ by calling nb_sub_ * remove redundant code * auto format by CI * fix typo, fix wrong call * modify idx datatype from int32 to int64 in tensor.size * add some DIRECT_PASS_FUNC * add cpu cuda var pow and etc. * add masked_fill any all * make REDUCE_FUNC macro, add reduce_* functions * add 0dim check in ReduceSumWhole, refine yaml * fix bug * restore add add_ sub sub_ * add unittest for tensor.half tensor.add tensor.add_ * refine code * refine code * fix typo * fix bug of tensor.std() * refactor var std and cuda, using c++ functional api * add beta and threshold in softplus * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Add nn_functor Check (#7910) * add bias_add_check * add bias_add error test * fix conv2d nhwc bias_add error * add nhwc conv test * add bias_add_error test * Add bias add error check * Rename * add batch matmul error check * add matmul check error msg * remove annotation * add fused mlp error msg check * Add pixel shuffle check test * add more test until normalization add relu functor * refine error message * finish all nnfunctor check msg * handle type error * remove useless symbol * modify back to TypeError * fix all comment * Remove redundant code * Remove pad ndim check * fix bias add space * fix check logic cause ci gpu not always gpu:0 Co-authored-by: hjchen2 <chenhoujiangcug@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Add FusedMatmulBiasAddReluDropout [OneEmbedding] (#8222) * previous version for fused_matmul_bias_add_relu_dropout * add op infer * fix detail * finish forward * support dropout rate list * add forward test * fix bug for output buffer * Configurable alpha params * try to add bit mask logic * Add bitmask first version! * Add row col bitmask logic * support not align4 reludropout * simplify relu dropout ld logic * Add naive relu dropout grad kernel * add simple relu dropout grad kernel * Rename * support relu_dropout bitmask backward * add vectorized optimization * fix tmp buffer * add to amp list * add lazy backward logic * Refine kernel * add indextype dispatch * simplify functor logic * fix cublas fused mlp aux_ld shape bug * Add more relu dropout kernel * add full unittest * fix bug in skip final activation * refine * Remove dump func * fix format * Remove cmake * remove redundant divide * add padded version * fix dropout * oneflow curand * refine * remove redundant kernel * add unroll logic * add unroll and ballot sync * refine format * Remove fast curand * Refine python interface * Add if branch for memset * fix python logic * just for debug * not use matmul bias add grad * add launch 1 block limit * fix unittest * Refine * fix graph backward bug * limit to 11060 * change to use int32_t dtype for cublas aux * Fix jc comment * fix comment * fix convert * fix static_analysis * fix at * fix userops td * fix userops td * fix const ref * fix compile error for bfloat16 * limit to 11060 * fix bug Co-authored-by: Juncheng <liujuncheng1022@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * fix gather 0-dim tensor bug (#8376) * fix 0-dim tensor bug * refine * support input 0-dim tensor for gather * refine * refine * refine dim_scatter_kernel check * refine * refine check * fix clang_tidy error Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * add api to apply external job pass (#8370) * Add condition to find-test-cache-distributed (#8387) * add condition to find-test-cache-distributed * fix * warp dim util (#8382) * warp dim util * format * use more maybe_wrap_dim * refine array functor * add more * refine math_functor * fix_bug_in_broadcast_min_max_grad_and_broadcast_like (#8379) * fix_bug_in_broadcast_min_max_grad_and_broadcast_like * refine * fix static check error * fix bug about index (#8388) * fix bug about index * add test case Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * LogicalSliceAssign support full slice sbp (#8344) * feat(SliceOp): slice ops support 2d sbp * fix(SliceOp): fix [B, P] 2d sbp bug * refine error message * fix bug in parallel_num == 1 * add comment * add warning and format * add NOLINT for boxing check * feat(LogicalSliceOps): support all nd_sbp * feat(LogicalSlice): support nd_sbp * add error message * fix(AutoTest): fix auto_test bug in module.parameter pass * auto format by CI * fix(LogicalSliceAssign): skip test when 1n1d * fix SliceParams memset error * remove memset * add CHECK_JUST * fix(*): make sure split_axis >= 0 or equal to SPLIT_AXIS_FOR_NON_SPLIT * remove memset * fix spilit_info.axis bug * feat(LogicalSliceOps): support grad * add logical_slice gradient_funcs * feat(LogicalSliceAssign): LogicalSliceAssign support full slice sbp * auto format by CI * test(LogicalSlice): fix logical_slice dims Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: Houjiang Chen <chenhoujiangcug@gmail.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * fix_tensor_from_numpy_mem_leak_bug (#8391) * fix_tensor_from_numpy_mem_leak_bug * add note * refine note * refine Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Make of_pyext_obj static only to make sure only a python ext so has python symbols (#8393) * make of_pyext_obj static only * refine note Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Adjust tolerance setting in embedding_renorm unit test (#8394) * support front end compile for job to iree (#8249) * support frontend dev version * polish name * add tosa-to-elf.mlir * tosa to elf by llvm * conv2d partial * an enhanced frontend runner * support numpy as input * enable multiple using nn graph with different input(jobname make it it cd /home/yuhao/frontend/oneflow ; /usr/bin/env /usr/bin/python3 /home/yuhao/.vscode-server/extensions/ms-python.python-2022.6.2/pythonFiles/lib/python/debugpy/launcher 40873 -- /home/yuhao/frontend/oneflow/oneflow/ir/test/Frontend/runner.py ) * enable multiple input * enable cpu and cuda * change full_name to _full_name * support exchange cuda with cpu seamlessly * remove pip * lit config * polish * trim * auto format by CI * modify * auto format by CI * last line polish * use unittest * auto format by CI * use allclose * auto format by CI * pulish * optimize convert oneflow to tosa * conv2d * conv2d enhanced && conv2d examples add * add road map * add add_n2Op and boardcast_addOp conversion * add matmulOp conversion * support converting normailzation op to tosa(partically) * update roadmap * support i64 tensor to dense elem attr * support 100% resnet op conversion * add test mlir * add test iree resnet python script * auto format by CI * done * enhance iree resnet test script * auto format by CI * rebuild code * auto format by CI * rebuild test script * update * auto format by CI * pub * trim test scripts * move * move * input and output add block arg judgement * emit error in variable conversion * error handle for ci * modify err info * auto format by CI * merge * auto format by CI * output not block * flow ones * rm const * trim maybe * trim maybe with header file * const auto * solve clangd error Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Feat/zero mix with mp (#8036) * add zero limit * add debug * add mix zero test * refactor zero api * zero test with mp * add 2d test * add zero nd * add nd zero * add sbp cast * test passed soft limit consumer * refine size api * zero use stage 2 * add limit consumer api * add new api * refine zero s select * fix index out of range * rm zero limit on device type * zero test with activation checkpointing * add indentity when dp sequence len is 1 * move to base with master * fix * fix * fix * add test * debug bad case * refine test for eager and graph boxing * test case ready * simplify * refine test * fix buff size * fix conflict * refine zero nd * refine * add full test * revert change * refine split check * fix typo * rm log * spit long func * restore test * Update optimizer_placement_optimization_pass.cpp * auto format by CI * auto format by CI * fix static check * add tips for zero api change * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Revert embedding normal path and fix amp list (#8374) * revert embedding normal path, fix amp list * fix amp * fix memset bug in gather cpu kernel Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * replace fixed_vector with small_vector and make Shape inherit from it (#8365) * Replace fixed_vector with llvm::SmallVector Signed-off-by: daquexian <daquexian566@gmail.com> * Shape inherited from llvm::SmallVector Signed-off-by: daquexian <daquexian566@gmail.com> * refine cmake Signed-off-by: daquexian <daquexian566@gmail.com> * rename fixed_vector to small_vector Signed-off-by: daquexian <daquexian566@gmail.com> * fix reviews Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI * update Shape constructor Signed-off-by: daquexian <daquexian566@gmail.com> * add 'PUBLIC' keyword to all target_link_libraries Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI * update cmake Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI * update cmake Signed-off-by: daquexian <daquexian566@gmail.com> * update cmake Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI * set is_initialized_ default to true Signed-off-by: daquexian <daquexian566@gmail.com> * override some methods to set is_initialized_ Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * Light plan for debug (#8396) * Light plan for debug * fix note * disable terminfo to fix missing terminfo symbols (#8400) * disable terminfo to fix missing terminfo symbols Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * fix bug of ZeRO MP in complex case (#8404) * Remove redundant output_lbns in ir (#8409) * mv case * remove redundant info * Dev FusedCrossInteraction[OneEmbedding] (#8335) * add simple fused cross interaction forward * add packed fused * Add cross interaction grad * simplify code * fix bug * support crossnet v2 * support cross interaction v2 * add lazy backward * Rename and add test * fix jc comment * fix comment * fix bug * fix userops td elem_cnt for FUSED Group * fix header file * fix clang static analysis * fix unittest Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * add exe graph physical shape check msg (#8002) * fix index select op in graph * add exe graph physical shape check msg * improve the debug information for the python stack trace 1. add a parameter 'max_stack_depth' to specify the max depth for the stack trace 2. refactor other debug related classes. * remove parens * update * resolve PR comments * update * update graph debug test file. * restore self._debug in class Graph and class ModuleBlock * Do not shorten the stack frame string if it is in debug mode * delete TODOs * disable conv3d test (#7969) Signed-off-by: daquexian <daquexian566@gmail.com> * skip layernorm random_data_warp test (#7941) * skip layernorm random_data_warp test * warp/block/uncached case only test gpu Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Lock click version (#7967) Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * add global avgpool unittest (#7585) * fix (#7978) Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Support negative dim in scatter op (#7934) * support negative dim in scatter op * refine scatter test * refine scatter test again Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * run barrier callback in BarrierPhyInstrOperand::~BarrierPhyInstrOperand (#7702) * run barrier callback in BarrierPhyInstrOperand::~BarrierPhyInstrOperand * lock gil in vm Callback thread * more comments for VirtualMachineEngine::Callback() * the Env is never destroyed. * export Env into python * more unittests * wait shared_ptr.use_count() == 0 * export unittest.TestCase in framework/unittest.py * SwitchToShuttingDownPhase * optional is_normal_exit * VirtualMachine::CloseVMThreads * Delete env_api.h env_api.h is deleted by master * reshape_only_one_dim_infered * address pr comments * fix a ref-cnt bug in TryRunBarrierInstruction. * rollback flow.env.all_device_placement * no distributed running test_shutting_down.py * auto format by CI * expand lifetime of module oneflow in test_shutting_down.py * refine del depend on of * capture oneflow._oneflow_internal.eager when calling sync in __del__ * add try in flaky test Co-authored-by: Luyang <flowingsun007@163.com> Co-authored-by: chengtbf <472491134@qq.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: Xiaoyu Xu <xiaoyulink@gmail.com> * Fix one hot scalar tensor bug (#7975) * fix reduce_sum scalar check bug * fix one_hot scalar tensor bug * fix clang tidy error Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * support ctor np array from of tensor (#7970) * support ctor np array from of tensor * add test case constructing np array from tensor * refine Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * add_manual_seed_all_api (#7957) * add_manual_seed_all_api * Update conf.py * refine * add test case * auto format by CI * Update random_generator.cpp * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * one_embedding add doc string (#7902) * add doc string * add example * add * fix doc * refine * address review * mb to MB * add make_table_option * option to options * refine * add forward Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Support numpy scalar parameters (#7935) * feat(functional): support numpy scalar parameters * rename inferface * feat(*): TensorIndex support numpy scalar * feat(TensorIndex): support advance indexing * add unittest and int32 support for branch feat-param_support_np_scalar (#7939) * add unittest * refactor unittest * add todo for int16 advanced indexing * add int32 supporting for advance indexing * auto format by CI Co-authored-by: Wang Yi <53533850+marigoold@users.noreply.github.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * fix tensor_scatter_nd_update (#7953) * fix tensor_scatter_nd_update * auto backward Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * fix one_embedding adam (#7974) * fix one_embedding adam * fix tidy * fix normal Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * speed test with score (#7990) Signed-off-by: daquexian <daquexian566@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Feat/graph del by ref (#7857) * remove IsMultiClient() and single client logic Signed-off-by: daquexian <daquexian566@gmail.com> * rename eager.multi_client to eager Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI * add py ref * refine new session * clean code * make scope api inner use * use session with ref cnt * run barrier callback in BarrierPhyInstrOperand::~BarrierPhyInstrOperand * test pass * lock gil in vm Callback thread * more comments for VirtualMachineEngine::Callback() * merge * merge rm single client * rm initenv * merge and fix master * refactor env c api * add debug code * fix and serving test pass * test passed * rm useless * rm useless code * format * rm useless include * rm sync in py * the Env is never destroyed. * export Env into python * more unittests * fix and pass tests * revert virtual_machine.cpp * revert core/vm * remove outdated python class oneflow.unittest.TestCase * graph test passed * wait shared_ptr.use_count() == 0 * export unittest.TestCase in framework/unittest.py * SwitchToShuttingDownPhase * optional is_normal_exit * VirtualMachine::CloseVMThreads * Delete env_api.h env_api.h is deleted by master * address pr comments * rm is env init * Clear empty thread when graph destroy (#7633) * Revert "Clear empty thread when graph destroy (#7633)" (#7860) This reverts commit 3e8585e5fa20b97229d6b0be46a7ff814dc8cd83. * fix a ref-cnt bug in TryRunBarrierInstruction. * rm env_api * fix clang-tidy error * fix clang-tidy in env_imp * refine env api * format * refine graph del and sync at shuttingdown * fix typo * add comment * rm useless * rm useless Co-authored-by: daquexian <daquexian566@gmail.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: lixinqi <lixinqi0703106@163.com> Co-authored-by: Li Xinqi <lixinqi2010@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: Luyang <flowingsun007@163.com> Co-authored-by: cheng cheng <472491134@qq.com> * [PersistentTable] Fix num blocks (#7986) Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Add auto benchmark for flowvision (#7806) * update yml * update workflow * add resnet50 * [PersistentTable] Async write (#7946) * [PersistentTable] Async write * fix Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * save log in separate dir by default (#7825) Signed-off-by: daquexian <daquexian566@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * fix index select op in graph * add exe graph physical shape check msg * improve the debug information for the python stack trace 1. add a parameter 'max_stack_depth' to specify the max depth for the stack trace 2. refactor other debug related classes. * remove parens * update * resolve PR comments * update * update graph debug test file. * restore self._debug in class Graph and class ModuleBlock * Do not shorten the stack frame string if it is in debug mode * delete TODOs * Revert "Merge branch 'master' into fea/graph_check_msg" This reverts commit 28833b73a8041463e5e3d130784be386ee248bd8, reversing changes made to baadf6045f2fce69c090e442a755229c1c949773. * Revert "Revert "Merge branch 'master' into fea/graph_check_msg"" This reverts commit 1d5e196d8530ffd2b9bf781abcf168b94ff9ca41. * update * resolve conflicts * resolve conflicts Co-authored-by: Cijie Xia <cijie.xia@mail.utoronto.ca> Co-authored-by: daquexian <daquexian566@gmail.com> Co-authored-by: guo ran <360112263@qq.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: Shenghang Tsai <jackalcooper@gmail.com> Co-authored-by: Houjiang Chen <chenhoujiangcug@gmail.com> Co-authored-by: Peihong Liu <mosout@qq.com> Co-authored-by: Li Xinqi <lixinqi2010@gmail.com> Co-authored-by: Luyang <flowingsun007@163.com> Co-authored-by: chengtbf <472491134@qq.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com> Co-authored-by: liufengwei0103 <2472937968@qq.com> Co-authored-by: binbinHan <han_binbin@163.com> Co-authored-by: Yinggang Wang <wyg19970408@gmail.com> Co-authored-by: Wang Yi <53533850+marigoold@users.noreply.github.com> Co-authored-by: Shijie <821898965@qq.com> Co-authored-by: lixinqi <lixinqi0703106@163.com> Co-authored-by: Juncheng <liujuncheng1022@gmail.com> * add batch_matmul sbp (#8385) Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * suppress gcc11 false positive warning (#8401) Signed-off-by: daquexian <daquexian566@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * fix variable op conversion to tosa error in ninja c1 (#8412) * pub * move test iree resnet python script to oneflow_iree repo * add bracket * rename const_val to const_val_ and restore resnet.py test script Co-authored-by: Shenghang Tsai <jackalcooper@gmail.com> * Fix eval error in FusedMLP (#8413) Fix eval error * Init NCCL communicator in graph mode unifiedly (#8263) * centralized comm init * address review * revert * rename * ref nccl logical send recv * fix cpu only Co-authored-by: cheng cheng <472491134@qq.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * fix dim_scatter 0-dim tensor bug (#8418) Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * target based external libraries (#8421) Signed-off-by: daquexian <daquexian566@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Refine hardcoded attr setting/getting in ir (#8420) * use names in trait static func * more changes on op name attr * use wrapped func * Replace cu115 with cu116 in nightly (#8423) update workflows Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * fix repeat interleave 0-size tensor bug (#8414) Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Autotest support print input in ci (#8383) * support print tensor value in autotest to provide more details in ci * revert * refine * auto format by CI * control precision to 1e-5 when record * fix bug * auto format by CI * relax tensor_size_mb * fix bug * fix bug * refine * releax * refinew * refine * fix bug * relax * refine * restruct * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Modify sbp.split()'s karg: axis to dim (#8411) * Modify sbp.split()'s axis karg to dim * Refine * Refine * Refine * Refine * Feat/graph logical op debug repr (#8131) * add zero limit * add debug * add mix zero test * refactor zero api * zero test with mp * add 2d test * add zero nd * add nd zero * add sbp cast * test passed soft limit consumer * refine size api * add module config * save nn.Module info in job.proto for better debugging * add new line * add ModuleBlock.ops_proto() API * zero use stage 2 * print operators' info when print ModuleBlock * handle VariableOpConf * update * update * fix * move operators repr method to graph util * add limit consumer api * add new api * refine zero s select * add module block * fix * refact for rm op in module conf * fix * add sbp debug * add sbp repr * add shape * refine * add sys op in repr * add full op debug * fix index out of range * rm zero limit on device type * add no scope op to graph * zero test with activation checkpointing * fix order * add indentity when dp sequence len is 1 * add debug repr * refine repr of op * refine and fix * rm useless log * move to base with master * fix * fix * fix * fix proto * refine test * fix type * add test * debug bad case * refine test for eager and graph boxing * test case ready * simplify * refine test * fix buff size * fix conflict * refine zero nd * refine * add full test * revert change * refine split check * fix typo * rm log * spit long func * refine * restore test * refine pass and mem debug * merge master * repr dtype * add placement * Update optimizer_placement_optimization_pass.cpp * auto format by CI * auto format by CI * fix static check * add tips for zero api change * auto format by CI * fix merge * auto format by CI * auto format by CI * refine get job api * refine graph util import order * auto format by CI * fix static check * auto format by CI * fix special case * refine level print and add full dtype repr * rm useless Co-authored-by: Cijie Xia <cijie.xia@mail.utoronto.ca> Co-authored-by: Cijie Xia <xiacijie1998@163.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * rm some test case in test_fused_dot_feature_interaction_pooling_sum (#8425) rm some case in test * Remove unused linkages (#8426) remove unused linkages * refactor stride (#8402) * Stride inherits DimVector Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI * fix argument type of OFStrideToNumpyStride Signed-off-by: daquexian <daquexian566@gmail.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * Move Tensor.__setitem__ and global related api to Python/C api (#8375) * add local_to_global, global_to_global, to_global. global_to_global still have bugs * fix bug of global_to_global * remove python api * add setitem * remove local_to_global sbp pack, format code * format code * remove redundant code * add error msg, refine check of to_global * fix bug of check * add error msg * fix clang static check error * remove useless api in tensor.py, remove redundant code, remove useless CHECK * add to_local * fix wrong exception type in unittest for to_local exception message * cuda add default error msg (#8427) default error Co-authored-by: Shenghang Tsai <jackalcooper@gmail.com> * Refactor ShapeView (#8422) * update Signed-off-by: daquexian <daquexian566@gmail.com> * update and add docs Signed-off-by: daquexian <daquexian566@gmail.com> * turn on view slice (#8302) * turn_on_view_slice * inplace scalar math hnandle non-contiguous input * fix clang check * add docs * refactor * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * Add flow env init rdma api (#8415) * add_flow_env_init_rdma_api * adjust persistent_workers logic for RDMA support * adjust persistent_workers logic for RDMA support * add rmda_inited api * minro fix * add docs * Update python/oneflow/utils/data/dataloader.py Co-authored-by: daquexian <daquexian566@gmail.com> * fix typo * refine * fix RDMAIsInitialized * minor fix * refine * rename InitRdma to InitRDMA * refine Co-authored-by: Flowingsun007 <flowingsun007@163.com> Co-authored-by: daquexian <daquexian566@gmail.com> * add 1d send recv in nccl logical (#8355) * add 1d send recv in nccl logical * Update insert_nccl_logical_op_pass.cpp * auto format by CI Co-authored-by: cheng cheng <472491134@qq.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Support iree ci (#8419) * create mlir cpu and modify build gcc 7 shell script * fix the bug of test_iree_resnet.py cuda test in cpu version error * fix constant folding tests * suport oneflow_test_cpu_only * pub * build script add flag * modify test yml * add python3 into \PATH * don't use pretrain model * install flowvision Co-authored-by: mosout <mosout@qq.com> Co-authored-by: jackalcooper <jackalcooper@gmail.com> * Feat straighten task nodes (#8347) * Add a fast topological traversal * Add an initial implementation of straighen nodes * Add the straighen nodes algorithm * Change algorithm structure * Remove some debug information * Finalize the straighten algorithm after deciding the parameters by experiments * Notify the usage of straighten algorithm * Of format * Update oneflow/core/graph/straighten_nodes.cpp Of format Co-authored-by: daquexian <daquexian566@gmail.com> * Of format * Stop using visual string before we find a better key * Remove magic numbers and Of format * Remove starts * Of format * Fix a bug of using GetMaxVal<int32_t>() as an initial number for comparing * Refactor add straighten algo interface (#8435) * feat(*): export straighten nodes algorithm inferface * export documentation * Update python/oneflow/nn/graph/graph_config.py Co-authored-by: Yipeng Li <jamesonli1313@gmail.com> Co-authored-by: Yipeng Li <jamesonli1313@gmail.com> * Use TopoForEachNodeFast as default. (#8436) * Use TopoForEachNodeFast as default. Rename the original one as TopoForEachNodeDynamic * Speed up TopoForEachNodeFast when traversing a subgraph * Rename the switch and code clean up * Hide the class TopoStruct * Hide all the other functions * Grammar * Of format Co-authored-by: daquexian <daquexian566@gmail.com> Co-authored-by: Yinggang Wang <wyg19970408@gmail.com> * Refactor NLLLoss to support split class dim (#8380) * refactor * RuntimeError * avoid atomic add * test * fixes * update test * update test * update test * fix kernel * improve backward * update test * out_weight to be required * address static analysis errer * fix static analysis error * fix static analysis error Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Strict ordering in memory reuse algorithm (#8441) * Support broadcast in fused_softmax kernel (#8321) * support broadcast * refine * Remove shape check * fix sbp when broadcast * rollback softmax grad threshold * increase threshold of test conv bn folding * tol to 1e-2 * check error msg of fuse softmax ops * add more dispatch * remove double datatype test and add broadcast test Co-authored-by: cheng cheng <472491134@qq.com> * Merge slice and logical slice (#8416) * remove Slice, SliceUpdate, SliceGrad op * rename logical_slice to slice and logical_slice_assign to slice_update * move gradient_func logical_slice.cpp to slice.cpp * fix some bug and refine local test * feat(SliceUpdate): support 0size tensor * test(Slice): refine consistent slice test * test(SliceUpdate): refine consistent slice_update test * not export slice_update's inplace parameter * auto format by CI * recovery slice_grad_op * fix slice_view bug * add error message and attr judgement * modified old test * auto format by CI * update test README * update tensor_string code * fix test bug * auto format by CI * fix(hsplit): hsplit functor bug * fix vsplit doc test bug * refine * fix test * fix pin_memory bug Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Graph block.config.set_stage() for recommended Pipeline api. (#8442) * Graph block.config.set_stage() for recommended Pipeline api. * revert diff * refine api doc Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Update PolynomialLR's doc and paramater (#8430) * update PolynomialLR doc, current_batch = min(decay_batch, current_batch) * * update PolynomialLR doc, current_batch = min(decay_batch, current_batch) * rename the steps to decay_batch in parameters * update PolynomialLR test case Co-authored-by: Yinggang Wang <wyg19970408@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Add mv op (#8445) * add mv op with bug that Int is incompatible * add test * update test_mv.py * fix based on comments * fix based on comments Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * enable oneflow_iree(python package) and corresponding test works in ci (#8431) * update test.yml * add pytest for oneflow_iree examples * add oneflow frontend test * Dev tensor is pinned api (#8447) * support tensor.is_pinned * add test case * add docs * auto format by CI * refine * auto format by CI * refine * auto format by CI * refine * refine * refine Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * Nd sbp tensor str (#8458) * nd sbp tensor str * add nd sbp tensor str test * bigger input size * refine Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Patch sbp cost (#8378) * Add a slight cost for B->S and B->P in 2d sbp * Add penalty for P in consumer * Add the slight penalty for eager * Consider B -> (B, B) for a scalar * Do not consider parallel description in priority ratio * Of format * Fix a bug in the old version group boxing with 2D SBP (#8448) * Update group boxing to deal with hierarchy [1, 2] * Use a uniform sbp while grouping consumers * Steal "ParallelDimReduce" from "hierarchical_sub_task_graph_builder_impl" to "sbp_infer_util" * Fix bugs of patch-sbp_cost (#8456) * Update group boxing to deal with hierarchy [1, 2] * Use a uniform sbp while grouping consumers * Steal "ParallelDimReduce" from "hierarchical_sub_task_graph_builder_impl" to "sbp_infer_util" * Reduce to uniform B for 1 device. Use the actual parallel description for each tensor * Fix a bug of fix-group_boxing-bug * Group boxing reduce [2, 2]: (S0, S0) to [4]: S0, then we might infer a 1D SBP from a 2D SBP hint Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: cheng cheng <472491134@qq.com> * Decouple stream and instruction (#7607) * remove deprecated python api * backup code * backup code * fix compiler complaints * fix typo in refactoring * kMockDevice * add unit test test_mock.py * revert mock kernels * vert DEVICE_TYPE_SEQ * mock placement * address pr comments * register device kCriticalSectionDevice and kLazyJobLauncher * kControlDevice * Stream::vm_stream_ * fix compiler complaints * backup code * rename StreamIsTransport to IsCommNetStream * decouple vm::StreamType and vm::InstructionType * fix compiler complaints * remove 'gpu' related code * address static analyzer complaints * address static analyzer complaints * remove unused module in test_mock.py * the Env is never destroyed. * export Env into python * more unittests * export unittest.TestCase in framework/unittest.py * SwitchToShuttingDownPhase * optional is_normal_exit * VirtualMachine::CloseVMThreads * Delete env_api.h env_api.h is deleted by master * reshape_only_one_dim_infered * address pr comments * rollback flow.env.all_device_placement * no distributed running test_shutting_down.py * auto format by CI * expand lifetime of module oneflow in test_shutting_down.py * refine del depend on of * fix oneflow.placement.__str__ * revert GlobalSync * init_producer_stream in oneflow.from_numpy * debug code for vm * init disable_vm_threads_ in VirtualMachine::VirtualMachine * Update oneflow/core/vm/virtual_machine.h Co-authored-by: daquexian <daquexian566@gmail.com> * create stream in forked subprocesses. * refactor StreamRoleSwitch to StreamRoleVisistor * ThreadLocalGuard * auto format by CI * fix compiler complaints * fix static analyzer complaints * VirtualMachine::GetVmStream * fix static analyzer complaints * reimplement AddAndReadVector by std::deque * reimplement AddAndReadVector * merge master * increase atol for test_consistent_rnn_cell.py * StreamRole::AsyncLaunchedCommNet is bound to EventRecordedCudaStreamType * auto format by CI * remove StreamRoleVisitor<T>::VisitInvalid * no copy in AddAndReadVector * fix bug of AddAndReadVector::size_ * disable terminfo to fix missing terminfo symbols Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI * fix AddAndReadVector::GetGranularity * remove bad unittest * auto format by CI * rename CallInstructionType to OpCallInstructionType * sta…

* Multi Tensor apply Optimizer (#8373) * Add optim_cast and modify sgd * Remove * try to add fuseUpdatecast pass logic * use pass * still have bug in inplace * ban inplace and fix sgd update * fix regst num * add env var * remove cuda graph wrong use * add support for graph * initialize * add functional impl * add simple job rewrite * delete redundant sgd update kernel * support half * add kernel * use single loop kernel * refine * when in eval mode, we turn off multi tensor update * refine format * use juncheng kernel * Refine * group multi tensor op by some attr * add parallel conf to key * refine * Add unroll logic * fix bug * restruct * use pointer list * add adam kernel * support multi tensor adam update * Remove cpu * support skip if and scale by tensor * support sgd adam unittest * add more check * Remove config * Restruct tensorparams * support fused cast in multi tensor update * support cast in multi tensor * fix bug in model update cast pass * fix multi tensor sgd update with cast Pass check logic * refine * support multi tensor adam update with cast * refine format * Remove redundant template args * merge modify for fused cast * only allow fused cast in train mode * only support data parallel in multi tensor update * rewrite fuse update cast pass logic * remove redundant if * fix format * add new line * rename * Remove print * rename and add LOG * Add more type and test * still have bug in multi tensor adam * Fix multi tensor adam update bug * add multi tensor adam update with cast test * simplify code * fix format * Add model diff datatype in optimizer key * remove random seed * fix comment * fix comment * fix to use model copy * use for loop * Fix comment * use hashcombine * fix clang analysis error * add with cuda macro * fix env var in unittest * remove redundant unittest Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Fix doc and ops template auto gen (#8546) * fix doc and add op calculator * fix bug * fix gen_ops * fix diag 0size tensr shape infer bug (#8557) * fix diag 0size tensr shape infer bug * refine * refine * auto format by CI * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Format tensor on cpu (#8548) * Format tensor on cpu * use tensor.detach * Remove useless WITH_CUDAs (#8562) * unique identity (#8509) * unique identity * fix * add identit name * rm debug log * mv identity form class to graph * auto format by CI * fix unique iden with having multiple stage * auto format by CI * Update block.py Co-authored-by: cheng cheng <472491134@qq.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Add GenericStreamContext (#8560) * Modify some file and add test (#8556) * Modify some file and add test * modify the content * modify the format and test function name * modify the format and aligned with pytorch * delete print * modity the function name * auto format by CI Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * Move some op into amp gray list (#8545) enlarge gray list Co-authored-by: cheng cheng <472491134@qq.com> * Refine inplace expand runtime_error (#8561) * Refine inplace expand runtime_error * Opt * Refine * Add Note * OneEmbedding use malloc async (#8543) * in out ptrs * ops and test * test pass * prefetch tmp buffer * embedding shuffle tmp buffer * gradient shuffle * tmp buffer size * mem pool * cuda 11.2 * add id_shuffle to setNumunique in update tests * default not use dynamic alloc * fix of_tidy * add fused op * address review * init tmp_buffer * mv memset * fix * one_embedding fused_lookup_init_cast and fused_update_put (#8564) * add fused op * mv memset * fix * address review * rm fullcache n_missing check Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * fix cpu aligned_alloc size (#8569) Signed-off-by: daquexian <daquexian566@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Add flow norm (#8535) * add flow norm * rm import * rm doctest.testmod * fix pad_packed_sequence method input requires_grad==True (#8574) * fix pad_packed_sequence method input requires_grad==True * fix append error when batch_first=True Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * fix embedding manager tmp buffer (#8585) * fix embedding manager * format * fix reduce_ops 0size bug (#8551) * fix reduce_ops 0size bug * fix commnet * auto format by CI * fix bug Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Align Momentum Optimizer (#8549) * fix moemntum update * align momentum * fix bug and finish eager unittest * Support Graph optimizer * fix momentum bug * refine beta Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Fill GetSbp bug and consistent test bug (#8576) fix(FillOp): fill GetSbp bug and consistent test bug Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Dev Fully fused MLP Grad[OneEmbedding] (#8462) * support fully fused mlp grad in eager * support lazy backward * fix output size * add fallback to tmp_buf logic when ones buffer is not enough * build sbp * overlap allreduce * fix overlap order * fix format * CUDA Graphs delayed capture * Add ifcomm create for graph * insert weight event roughly * fix dbias allreduce error * simplify code * Add 11060 limit * Remove print * Rename * fix fill bug and remove comm to cache * Rename variable and add debug code for cache * Use kernel state and fix bug * remove print * fix allreduce dbias bug * fix header file * fix comment * remove redundant headerfile * fix userops build error * refine * init nccl comm before execute kernel * fix comment Co-authored-by: liujuncheng <liujuncheng1022@gmail.com> * rename mirrored to local (#8503) * rename mirrored to local * rename files * rename files * auto format by CI * revert change of package_mirror.py * rename LocalObject to Dependence * rename fn LocalObject to Dependence * merge master * handle clang check * fix * refine * rename local_object to dependence Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * Implement BroadcastElementwiseUnary primitive (#8384) * Add code skeleton for broadcast unary primitive * first try * finish impl * finish impl * format * fix build error * address review * refine * address review comments * use broadcast unary primitive in fill_tensor_ kernel * handle pack tail statically * fix * address review * address review * Fix SimplifyBroadcastDims * fix * revert fill_kernel Co-authored-by: Juncheng <liujuncheng1022@gmail.com> * skip cpu autotest for graph global (#8593) * TODO * skip cpu autotest for graph global * Refine Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Add function_library.h Exception (#8241) * add RuntimeError for checking * add RuntimeError to CHECK_EQ * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * Refactor shrink (#8573) * caching allocator * auto format by CI * Update ep_device_context.h * EpDeviceCtx with CachingAllocator * rm RawAllocator typename * auto format by CI * specific allo in EpDeviceCtx * auto format by CI * rm outdated alloc * simplify thread safe guard * auto format by CI * avoid return mutex * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Speed up SliceKernel (#8589) * perf(SliceKernel): descrease number of cuda kernel and speed up * perf(SliceKernel): use old kernel when small tensor is all fullslice * use std::copy to copy contiguous memory * fix cpu kernel bug * Update readme and vsn for 0.8.0 (#8600) * update version * remove py3.6 * modify some file and improve error message (#8592) * modify some file and improve error message * modify scalar_by_tensor_op.cpp * Update scalar_by_tensor_op.cpp * Update slice_op.cpp * Update test_slice_op.py * Update test_slice_op.py * auto format by CI * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * rename consistent to global (#8505) * rename consistent to global * rename consistent to global * rename files * rename files * refine * auto format by CI * refine * fix clang check * fix * fix * fix * rm to_consistent docs * auto format by CI * refine * fix * fix * revert changes * auto format by CI * revert changes * revert changes * rename * rename Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * add module releated container docs (#8580) * add module releated container docs * auto format by CI * fix comment * refine * refine Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * fix rnn util extra memory usage when requires_grad=False (#8603) * fix rnn util extra memory usage when requires_grad=False * add comments * refine comments Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * use bracket format slice in tensor str (#8489) Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Perf TensorInfo constructor (#8606) * perf(Autograd): perf TensorInfo constructor * rename consistent to global Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * print operators' python location when print nn_graph (#8558) 1. add a flag in nn.Graph.debug() named print_op_loc for printing operator location. 2. add a flag in nn.Graph.debug() named only_print_user_code_loc for only print users' code location * Add randint like (#8598) * add randnint_like op * add docs for random * refine * auto format by CI * add randint_like global test * refine doc * refine randint_like docs * fix bug Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Add full_like api (#8595) * add full_like_op api * refine * add test * refine * refine docs * refine * add consistent_full test * add full_like op * fix docs commnet * change scalar sbp return value from list to tuple * auto format by CI * merge conflict * revert Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * fix cumsum GenBackwardOpConfFn (#8604) * fix cumsum GenBackwardOpConfFn * add test case Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * revert change (#8613) * fix test graph optimization conf CI bug (#8617) * restore resource config after random tests * refine * refine * Release pod tensor (#8552) * ThreadLocalGuard * split ReleaseTensor into ReleasePodTensor and ReleaseNonPodTensor. * rename Co-authored-by: luyang <flowingsun007@163.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Add param group for optimizer (#8611) * add add_param_group interface for Optimize * add test for add_param_group * revert * fix comment * refine * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * fix broadcast_elementwise_binary cpu (#8625) fix broadcast_elementwise_binary_cpu Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * align exception msg to torch (#8627) * align exception msg to torch * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * skip unstable global test in ci, reduce failture rate (#8635) * fuse embedding interaction (#8586) * fuse embedding interaction * fix of_tidy * refine * fix * address review Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * fix flip gen backward opconf (#8605) * fix flip gen backward opconf * use new opconf api Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Add ONEFLOW_ONE_EMBEDDING_PERSISTENT_TABLE_SNAPSHOT_LOAD_MMAP_LOCKED (#8597) * Add ONEFLOW_ONE_EMBEDDING_PERSISTENT_TABLE_SNAPSHOT_LOAD_MMAP_LOCKED * refine * use MAP_POPULATE Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Profiling main thread (#8601) * ThreadLocalGuard * refactor EagerBlobObjectList * op_args_reserved_size * remove useless comments Co-authored-by: binbinHan <han_binbin@163.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Fully Memory Log V2 with more details (#8565) * Fully Memory Log V2 with more details * refine log and long op name * fix clang tidy * fix test Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: Xiaoyu Xu <xiaoyulink@gmail.com> * Stream policy (#8590) * ThreadLocalGuard * refactor signature of StreamType::InitDeviceCtx * refactor hint * add StreamPolicy * remove DeviceCtx args * refine OpCallInstructionUtil::Prepare & Compute * merge EpDeviceCtx and LazyJobDeviceCtx into StreamPolicy * minor fix * minor fix * del useless code * fix error * fix merge error * fix segment fault bug * fix complie error * del methods belong to Subclass * reslove comment Co-authored-by: binbinHan <han_binbin@163.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Add fully support for broadcast matmul (#6937) * fix arange bug * fully support broadcast matmul * add more check * remove check * add fully sbp * fix full sbp * Fix broadcast matmul grad * remove old broadcast matmul grad * add broadcast grad back and when B numaxes is 2, we use broadcast_gradB instead of matmul+reduce * add lazy backward * Add restrict when transpose_a is false we can use bmatmul_grad_b * revert * fix broadcast matmul backward * fix single client dispatch matmul logic * revert old bcast matmul grad b kernel * fix eager functional matmul backward * add more test case * remove redundant code * add more special case * when b num axes is 2, we only save tensor a * fix annotation * fix conflict and format * remove single client matmul code * Fix eval error * fix conflict * fix unittest * Add init value * support matrix vector matmul * add vector matrix product * Use matmul primitive to rewrite matrix vector product forward and backward * Add fullllllllly support for vector matrix product * Fix sbp * fix bug * add unittest * Add consistent test for broadcast matmul * Remove redundant code * fix userops annotation * fix * refine * Fix clang static analysis * fix clang analysis * set check graph as false * fix * fix for unittest * fix broadcast sbp bug * try to fix unittest * Fix consistent test * fix multiplier to 4 for unittest Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Revert "skip cpu autotest for graph global" (#8608) * Revert "skip cpu autotest for graph global (#8593)" This reverts commit b076be782fd8f21e50ee4915f2d1562f3a9ab4c0. * cherry pick from master Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * OneEmbedding add tmp_buffer allocator (#8588) * fix embedding manager * format * refine embedding_manager tmp_buffer allocator * fix * format * refine * refine * auto format by CI Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * refine error msg for some user ops (#8579) * refine error msg for some user ops * refine error msg for some user ops * optimize * optimize the writing * optimize the writing * optimize the writing * auto format by CI * optimize writing Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Add tril fill value (#8655) add tril fill value Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * fix_non_pod_data_allocate_bug (#8657) Co-authored-by: Li Xinqi <lixinqi2010@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Fix norm (#8629) * fix norm * add doc * add bool & * update math_functor.cpp * add note * fix_decorate_mem_leak_bug_in_eager_boxing (#8661) Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * add higher order derivative for leaky_relu and negative op (#8643) * add higher derivative for leakyrelu and negative * fix a typo * remove functor * add initialize alpha * fix incorrect dim size in global test * fix incorrect dim size in global test * optimize testcase Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * update oneflow intro to show the difference (#8669) * update oneflow intro * refine * refine * refine * refine * refine * refine * refine * refine * refine * refine oneflow intro * Stacked error (#8671) * ThreadLocalGuard * StackedError * StackedError Co-authored-by: Shenghang Tsai <jackalcooper@gmail.com> * Refactor tensor initializer (#8626) * fix(*): fix xavier_initializer * refactor(Initializer): refactor initializer * fix function name * auto format by CI * refine * fix interface in tensor.py * fix(trunc_normal_): fix init bug and add test * auto format by CI * fix bug * add oneflow.nn.init.normal_ test Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * Fix nn doc (#8650) * fix hsplit doc * add doc for module * fix dtype * fix formula * add ref * fix row length * Fix reduce max min bool dtype bug (#8651) * fix reduce_max_min_bool_dtype * fix bug * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Remove redundant exception wrapper (#8631) * remove redundant ExceptionWrapper * refine KeyErrorMessage * refine * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * Refactor MemoryCase to eliminate determine statements of device_type (#7727) * ref memory_case_util * ref BlobObject::CheckMemCase * ref mem_case using * address review * address review * namespace memcase -> memory * fix conflict * address review * address static analysis * rm check * cpu device_id is always 0 * fix conflict * timeout-minutes: 50 * revert change * increase thrd limit in container * skip 2x2 TestEinsumConsistent * skip failed case of distributed test * auto format by CI * fix_non_pod_data_allocate_bug Co-authored-by: Li Xinqi <lixinqi2010@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: tsai <jackalcooper@gmail.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: clackhan <han_binbin@163.com> * fix some data races in c++ api and SteadyVector (#8654) * fix some data races in c++ api and SteadyVector Signed-off-by: daquexian <daquexian566@gmail.com> * skip self copy in MutShapeView::ToShape Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * Fix sin/cos higher order derivative (#8648) * fix(GradGrad): fix sin/cos higher order derivative * fix(GradGrad): fix calculate error * refine autograd global test * auto format by CI * refine sin/cos grad_grad calculate * fix static analysis * merge conflict Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: Ping Zhu <58718936+REYGU@users.noreply.github.com> Co-authored-by: Zhu, Ping <pingzhuu@outlook.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * refine_eager_boxing_to_adapt_ep (#8568) * refine_eager_boxing_to_adapt_ep * fix typo * refine * refine symmetric-acyclic-nd-sbp-to-nd-sbp * refine * fix error * fix static check * add NOLINT Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Fix repeat bug (#8645) * make result contiguous * add test case * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * Instruction policy (#8583) * ThreadLocalGuard * vm::InstructionPolicy * fix compile error (#8623) * fix compile error * change MirroredObject to Dependence * Modify DependenceVector * rm include stream type * fix stream type * auto format by CI Co-authored-by: Yu OuYang <xuanjiuye@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * handle non-contiguous input (#8665) * handle non-contiguous input * refine * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * rename define CONSISTENT to GLOBAL (#8652) Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Refine naive interpret (#8672) * ThreadLocalGuard * refactor EagerBlobObjectList * op_args_reserved_size * remove useless comments * rename one::EagerBlobObjectList* to vm::EagerBlobObject* * refactor signature of InstructionsBuiler::Call * PhysicalRun * refactor InstructionsBuilder::Call * remove unused StatefulOpKernel::need_check_mem_case * remove EagerLocalTensorImpl::is_shape_synced_ * refactor SoftSync * move SmallVector from common/container_util.h to framework/instructions_builder.cpp * explicit scalar initialization Co-authored-by: clackhan <han_binbin@163.com> * Rebuild Docs V0.8.0 (#8392) * rebuild for 5 module * fix bug * fix for doctree and content in nn and * fix * fix * fix * add some * fix for oneflow.rst * update oneflow oneflow.nn * update tensor * update tensor module * update * test * update * update * fix for undone desc * docs: oneflow.utils.data (#8485) * feat(utils.data): add oneflow.utils.data * docs(dataloader): change the docstring of DataLoader * docs(tensor): add methods to oneflow.Tensor document * docs(optim): change docstring of optimizer and add a note to the doucument * nn.graph * fix for graph * fix bug * review nn and linalg document (#8515) * docs(nn): add contents to oneflow.nn document * docs(linalg): refactor oneflow.linalg document * change attributes.rst and review nn.functional.rst (#8514) * change attributes.rst and review nn.functional.rst * reconstruction oneflow.cuda * fix cuda and rebuild comm demo (#8582) * update image * add distributed * oneembedding & refine graph * update for sdisributed one_embedding * fix rnn.py (#8616) * 重构 oneflow.nn.init 文档 (#8622) docs(nn.init): refactore nn.init document * docs(nn.init): remove the comments * docs(utils.data): remove the comments * update and fix bug * docs(review): refine the documents (#8646) * docs(review): refine oneflow, nn, Tensor, nn.init, linalg, utils.data, optim modules * docs(optim): modify the code examples * docs(tensor): edit note * 重构 oneflow.autograd 文档 (#8594) * docs(autograd): refactor oneflow.autograd * docs(autograd): edit "Default gradient layouts". * docs(autograd): reedit "Default gradient layouts" * docs(autograd): add comment * docs(autograd): add reference * update * docs(tensor): change autoclass to autosummary * update * update * add oneflow.linalg.diagonal (#8653) * docs(linalg): add oneflow.linalg.diagonal * update enviorment variable * Update docs/source/distributed.rst Co-authored-by: Houjiang Chen <chenhoujiangcug@gmail.com> * Update docs/source/distributed.rst Co-authored-by: Houjiang Chen <chenhoujiangcug@gmail.com> * update enviorment variable * update for ev & distributed * update distribued * update ev * update distribute desc * Update docs/source/distributed.rst Co-authored-by: Houjiang Chen <chenhoujiangcug@gmail.com> * update * 修改 docstring 描述 (#8656) * docs: move pytorch refernce to end * docs: add some docstring * docs(refs): add refs * Update docs/source/distributed.rst * updte for distributed details and environment_variable * docs(docstring): Modify all reference links to version 1.10 (#8663) * fix bug * fix bug * fix all warning Co-authored-by: Guoliang Cheng <1876953310@qq.com> Co-authored-by: liu xuan <85344642+laoliu97@users.noreply.github.com> Co-authored-by: Guoliang Cheng <lmyybh_lazy@163.com> Co-authored-by: laoliu97 <841637247@qq.com> Co-authored-by: Yao Chi <later@usopp.net> Co-authored-by: Houjiang Chen <chenhoujiangcug@gmail.com> * Fix zeros like and ones_like api (#8632) * fix zeros_like and ones_like bug * refine * revert * refine * fix tensor_slice_view infer physic_shape bug * add test * refine * auto format by CI * fix bug * refine * auto format by CI * fix import error * fix bug Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Fix sbp print bug (#8689) * Add a normal priority with no transfer but different sbp * Fix the bug for printing no boxing edge * Do not use P for weights * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * eager_local_interpreter_with_infer_cache (#8619) * ThreadLocalGuard * refactor EagerBlobObjectList * op_args_reserved_size * remove useless comments * rename one::EagerBlobObjectList* to vm::EagerBlobObject* * refactor signature of InstructionsBuiler::Call * PhysicalRun * refactor InstructionsBuilder::Call * remove unused StatefulOpKernel::need_check_mem_case * remove EagerLocalTensorImpl::is_shape_synced_ * eager_local_interpreter_with_infer_cache * remove useless code * reslove comments * refactor TensorMeta::TensorMeta(const TensorMeta) * use small vector * add kMaxNumDims * fix error include * fix split Symbol LocalTensorMeta error * refactor SoftSync * move SmallVector from common/container_util.h to framework/instructions_builder.cpp * mone ONEFLOW_EAGER_ENABLE_LOCAL_INFER_CACHE to eager.h * add blank line * reslove comments * minor fix * refine * explicit scalar initialization * fix static check error * auto format by CI * of_format * reslove comment * refine * refine * refine Co-authored-by: lixinqi <lixinqi0703106@163.com> Co-authored-by: Li Xinqi <lixinqi2010@gmail.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * fix gelu nn.Module bug and support tanh mode. (#8693) * add gelu2 api * refine test * refine docs * refine * restuct * delete useless headfile * format * rm doc of tensor.gelu (#8696) Co-authored-by: Shanshan Zhong <62104945+zhongshsh@users.noreply.github.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Fix bug in CrossFeatureInteraction LazyBackward (#8677) fix bug Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * fix floating-point scalar tensor in arange (#8673) Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Add nn functional fold (#8667) * add fold * update fold.py * add test * fix doc * fix comment Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * modify some file and improve the error message (#8566) * modify some file and improve the error message * modify the content * modify the content * auto format by CI * Update roi_align_op.cpp * Update roi_align_op.cpp * Update reshape_user_op_util.cpp * auto format by CI * Update roi_align_op.cpp Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * [OneEmbedding] add id_shuffle_copy_out (#8683) add id_shuffle_copy_out Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * fix add_param_group step key not match error (#8698) * fix add_param_group step key not match error * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * add env ONEFLOW_EP_CUDA_DEVICE_FLAGS and ONEFLOW_EP_CUDA_STREAM_FLAGS (#8703) Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * fix for docsv0.8 (#8710) * fix repeat op 0-size releated bug (both in FW and AD) (#8707) * fix repeat op 0-size releated bug (both in FW and AD) * refine * refine static check * refine * fix commnet * fix comment * refine * fix test * auto format by CI * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Support Dropout Scale in FusedMLPGrad[OneEmbedding] (#8633) * support alpha list * Remove redundant modify * remove redundant alpha set * refine Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Fix bug of Tensor.type (#8697) * fix bug of tensor.type(flow.Tensor) * fix bug of tensor.type(flow.Tensor) about device * Fix tensor type doc (#8699) fix doc of tensor.type * add test for tensor.type(flow.Tensor) * move PyTensorMetaCls_CheckExact to header file Co-authored-by: Shanshan Zhong <62104945+zhongshsh@users.noreply.github.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * ONEFLOW_GRAPH_PLACE_TRAINING_STATE_ON_ALL_RANKS (#8706) * ONEFLOW_GRAPH_PLACE_TRAINING_STATE_ON_ALL_RANKS * auto format by CI Co-authored-by: liujuncheng <liujuncheng1022@gmail.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * define_mut_output_shape_and_mut_output_stride_in_infer_ctx (#8709) * define_mut_output_shape_and_mut_output_stride_in_infer_ctx * fix merge master error * fix typo Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Add qat conv modules (#8368) * add qat conv modules * add quantization related modules to doc * refine qatconv modules doc * add qat conv module tests * refine * refine Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * add unsqueeze_multiple_op (#8714) * add unsqueeze_multiple_op * modify the format * Update functional_api.yaml Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * modify broadcast_like_op.cpp and add test (#8720) * modify broadcast_like_op.cpp and add test * modify broadcast_like_op.cpp * Update broadcast_like_op.cpp Co-authored-by: Yinggang Wang <wyg19970408@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * JIT LR (#8500) * add example code * Update cosine_annealing_lr.py * enable self params transformer * enable pass ast to c++ api * enable jit backend for lr * enable jit global register and invoke * convert Global to Singleton for new merge * enable pybind11 walk on python ast * enable test all existent get_lr of oneflow in python * enable py_ast_wrapper pass ast from python to mlir * switch all ast to ast-wrapper in mlir scope * define python ast partially * partial python ast definition * trim asdl of python ast * mlir gen * add symbol table * from ast to jit done * switch llvm::errs() to mlir::emitError and convert switch to typeSwitch * trim duplicate namespace use * fix LIT header * add some docs * enable compare with or_else, if with return seamless in branch and mutable variable * trim code and refine struct * register pybind11 ast node for shared_ptr * enable cpp class in python * go through python to mlir to llvm to jit to run * add addf subf op * work well on stepLR linearLR exponentialLR coseineDecayLR cosineAnnealingLR constantLR * enable maxf minf conversion to llvm ir * rename LR_JIT to LRJITRegister * remove LR_JIT_Engine and swith Invoke to std::function ret by lookup * refine struct * enable bisect_right and python resigter api have dump option arg * add bisect_left and bisect_transformer specially, delete former test python script * remove c++17 standard * restore double hash to iterator * publish * publish * publish * use llvm classof and typeswitch rightly * trim * commit * commit * commit * commit * commit * commit * auto format by CI * Update ir.cpp * Update OneFlowLRJITRegistry.h * auto format by CI * Update AstMlirGen.h * Update lr_jit.cpp * auto format by CI * Naming conventions * auto format by CI * auto format by CI * deploy _ behind Co-authored-by: leaves-zwx <kunta0932@gmail.com> Co-authored-by: yuhao <1171760467@qq.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: yuhao <72971170+howin98@users.noreply.github.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Add logspace (#8599) * add logspace * add global test * restore rand * fix doc * rename consistent to global * adjust import order * add todo * Add hann_window (#8615) * add hann_window * rm useless include * add check * adjust import order * add ONEFLOW_VM_PENDING_HANDLE_WINDOW_SIZE (#8730) * add ONEFLOW_VM_PENDING_HANDLE_WINDOW_SIZE * add environment to vm.h Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Fix as strided bool type and view bug (#8713) * fix as_stride bug * refine * refine * refine * delete useless head file * refine Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Add functional binary cross entropy (#8708) * add gelu2 api * refine test * refine docs * refine * restuct * delete useless headfile * format * rm doc of tensor.gelu * add functional binary cross entropy Co-authored-by: BBuf <1182563586@qq.com> Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * support map_location in flow.load (#8666) * support map_location in flow.load Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI * fix tests Signed-off-by: daquexian <daquexian566@gmail.com> * fix bug when map_location is None Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * Add addcdiv (#8581) * add addcdiv * fix tensor_functions * fix inplace * add test number * rename consistent to global * Inner most dim case for cumsum cumprod op (#8403) * cumsum use cub scansum in some case * prod use cub scan * refine name * refine * optimize cum op * format * fix * get device properties by cuda stream class * revert useless code * refine * outer dim use parallel sweep algo * refine * fix a fraction of threads hit __syncthreads * revert * refine kernel define * refine * refine * refine * refine * move comment * fix * fix * refine Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Define mut output dtype and mut output is dynamic in infer ctx (#8716) * define_mut_output_shape_and_mut_output_stride_in_infer_ctx * fix merge master error * fix typo * define_mut_output_dtype_and_mut_output_is_dynamic_in_infer_ctx * replce const DataType& with DataType * replace const DataType& with DataType ret * split TensorDesc4ArgNameAndIndex and MutTensorDesc4ArgNameAndIndex * refine * minor fix * refine * fix static check error * Update op_expr.cpp * Update op_expr.cpp * Update stateful_opkernel.cpp * refine * fix static check error * refine * refine Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Dev refactor fuse instruction policy (#8624) * ThreadLocalGuard * vm::InstructionPolicy * refactor fuse instruction policy * fix compile error (#8623) * fix compile error * change MirroredObject to Dependence * Modify DependenceVector * add instruction policy util * add instruction policy util * remove include * add include * rm fuse instruction type * Modifying variable properties * add stream_sequential_dependence_ to instruction_policy Co-authored-by: lixinqi <lixinqi0703106@163.com> Co-authored-by: Li Xinqi <lixinqi2010@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * fix bug of batchnorm num_batches_tracked global error when loading state_dict (#8723) add condition for assign num_batches_tracked Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * add launch master port limit (#8563) * add launch master port limit * Update python/oneflow/distributed/launch.py Co-authored-by: daquexian <daquexian566@gmail.com> Co-authored-by: daquexian <daquexian566@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Fix docs import distance (#8691) * fix import distance * add functional apis * add smooth_l1_loss docs * refine activation.py * add deleted api * review * 添加oneflow, nn 等模块文档中遗漏的接口 (#8704) * docs: add api * docs(nn): refactor nn * review Co-authored-by: Guoliang Cheng <lmyybh_lazy@163.com> Co-authored-by: ChenQiaoling <48576019+Chenqll@users.noreply.github.com> * refactor control stream type (#8647) * refactor control stream type * auto format by CI * Add method implementation * refine * refien Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: Li Xinqi <lixinqi2010@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Define mut output tensor desc (#8717) * define_mut_output_shape_and_mut_output_stride_in_infer_ctx * fix merge master error * fix typo * define_mut_output_dtype_and_mut_output_is_dynamic_in_infer_ctx * define_mut_output_dtype_and_mut_output_tensor_desc * replce const DataType& with DataType * replace const DataType& with DataType ret * split TensorDesc4ArgNameAndIndex and MutTensorDesc4ArgNameAndIndex * refine * minor fix * fix merge error * fix warning error * refine * fix static check error * Update op_expr.cpp * Update op_expr.cpp * Update stateful_opkernel.cpp * refine * fix static check error * refine * refine Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Symbolic local tensor meta (#8662) * ThreadLocalGuard * refactor EagerBlobObjectList * op_args_reserved_size * remove useless comments * rename one::EagerBlobObjectList* to vm::EagerBlobObject* * refactor signature of InstructionsBuiler::Call * PhysicalRun * refactor InstructionsBuilder::Call * remove unused StatefulOpKernel::need_check_mem_case * remove EagerLocalTensorImpl::is_shape_synced_ * eager_local_interpreter_with_infer_cache * remove useless code * reslove comments * refactor TensorMeta::TensorMeta(const TensorMeta) * use small vector * Symbolic LocalTensorMeta * check shape in critical_sectio * add kMaxNumDims * fix error include * fix split Symbol LocalTensorMeta error * fix split cache and symbolic local tensor meta error * refactor SoftSync * move SmallVector from common/container_util.h to framework/instructions_builder.cpp * mone ONEFLOW_EAGER_ENABLE_LOCAL_INFER_CACHE to eager.h * add blank line * reslove comments * minor fix * refine * explicit scalar initialization * fix static check error * auto format by CI * of_format * reslove comment * refine * refine * refine * fix error * define MutOutputShape and MutOutputStride in InferContext * define_mut_output_shape_and_mut_output_stride_in_infer_ctx * fix merge master error * fix typo * fix static check error * define_mut_output_dtype_and_mut_output_is_dynamic_in_infer_ctx * define_mut_output_dtype_and_mut_output_tensor_desc * replce const DataType& with DataType * split const and mut func in LocalTensorMeta * replace const DataType& with DataType ret * split TensorDesc4ArgNameAndIndex and MutTensorDesc4ArgNameAndIndex * refine * minor fix * fix merge error * fix warning error * refine * fix static check error * Update op_expr.cpp * Update op_expr.cpp * split MutTensorMeta and MutLocalTensorMeta * Update stateful_opkernel.cpp * refine * fix static check error * refine * refine * reslove comment * refine * fix typo Co-authored-by: Houjiang Chen <chenhoujiangcug@gmail.com> * fxi typo * use OpArgsVector Co-authored-by: lixinqi <lixinqi0703106@163.com> Co-authored-by: Li Xinqi <lixinqi2010@gmail.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: Houjiang Chen <chenhoujiangcug@gmail.com> * Feat general basic communication (#8437) * Add a slight cost for B->S and B->P in 2d sbp * Add penalty for P in consumer * Fix a slight bug * Add at most 1 middle node for general basic communication * Add the cost for general basic communication * Add the slight penalty for eager * Skip initialization of boxing collector if not needed * Fix a bug * Dev nd nccl send recv boxing (#8467) * nd nccl_send_recv_boxing * rm print * support num_axes > 2 * Add distributed optional run (#8372) * Add * change deps * add install * add skip * autoprof supports bandwidth (#8367) * autoprof supports bandwidth Signed-off-by: daquexian <daquexian566@gmail.com> * print bandwidth Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * remove tmp buffer of cumprod cpu backward kernel (#8369) * remove tmp buffer of cumprod cpu backward kernel * refine * refine Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Move tensor api to cpython part3 (#8342) * add tensor_functions * concat py methods * add hash, restore tensor.py * check replacement * refine code, remove commented tensor.py * refine code * move some api * add cpu and cuda api * add triu tril norm and etc. * remove tensor_functions.h * move more api * move more api, refine size * fix typo * format code, remove useless include * refine code * refine code, fix typo * align .cuda to python * refine code * split some api to part3 for review * remove positional only arguments of argmax and argmin * remove arguments parse * modify arguments name in matmul and floor_divide * rename BINARY_FUNC to DIRECT_PASS_FUNC, modify some functions * refine code, format code * add inplace /=, add comments * remove name in macros * remove python api * remove redundant include * remove cout * format code * refactor tensor.size by directly call shape.at, refactor tensor.sub_ by calling nb_sub_ * remove redundant code * auto format by CI * fix typo, fix wrong call * modify idx datatype from int32 to int64 in tensor.size * add some DIRECT_PASS_FUNC * add cpu cuda var pow and etc. * add masked_fill any all * make REDUCE_FUNC macro, add reduce_* functions * add 0dim check in ReduceSumWhole, refine yaml * fix bug * restore add add_ sub sub_ * add unittest for tensor.half tensor.add tensor.add_ * refine code * refine code * fix typo * fix bug of tensor.std() * refactor var std and cuda, using c++ functional api * add beta and threshold in softplus * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Add nn_functor Check (#7910) * add bias_add_check * add bias_add error test * fix conv2d nhwc bias_add error * add nhwc conv test * add bias_add_error test * Add bias add error check * Rename * add batch matmul error check * add matmul check error msg * remove annotation * add fused mlp error msg check * Add pixel shuffle check test * add more test until normalization add relu functor * refine error message * finish all nnfunctor check msg * handle type error * remove useless symbol * modify back to TypeError * fix all comment * Remove redundant code * Remove pad ndim check * fix bias add space * fix check logic cause ci gpu not always gpu:0 Co-authored-by: hjchen2 <chenhoujiangcug@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Add FusedMatmulBiasAddReluDropout [OneEmbedding] (#8222) * previous version for fused_matmul_bias_add_relu_dropout * add op infer * fix detail * finish forward * support dropout rate list * add forward test * fix bug for output buffer * Configurable alpha params * try to add bit mask logic * Add bitmask first version! * Add row col bitmask logic * support not align4 reludropout * simplify relu dropout ld logic * Add naive relu dropout grad kernel * add simple relu dropout grad kernel * Rename * support relu_dropout bitmask backward * add vectorized optimization * fix tmp buffer * add to amp list * add lazy backward logic * Refine kernel * add indextype dispatch * simplify functor logic * fix cublas fused mlp aux_ld shape bug * Add more relu dropout kernel * add full unittest * fix bug in skip final activation * refine * Remove dump func * fix format * Remove cmake * remove redundant divide * add padded version * fix dropout * oneflow curand * refine * remove redundant kernel * add unroll logic * add unroll and ballot sync * refine format * Remove fast curand * Refine python interface * Add if branch for memset * fix python logic * just for debug * not use matmul bias add grad * add launch 1 block limit * fix unittest * Refine * fix graph backward bug * limit to 11060 * change to use int32_t dtype for cublas aux * Fix jc comment * fix comment * fix convert * fix static_analysis * fix at * fix userops td * fix userops td * fix const ref * fix compile error for bfloat16 * limit to 11060 * fix bug Co-authored-by: Juncheng <liujuncheng1022@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * fix gather 0-dim tensor bug (#8376) * fix 0-dim tensor bug * refine * support input 0-dim tensor for gather * refine * refine * refine dim_scatter_kernel check * refine * refine check * fix clang_tidy error Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * add api to apply external job pass (#8370) * Add condition to find-test-cache-distributed (#8387) * add condition to find-test-cache-distributed * fix * warp dim util (#8382) * warp dim util * format * use more maybe_wrap_dim * refine array functor * add more * refine math_functor * fix_bug_in_broadcast_min_max_grad_and_broadcast_like (#8379) * fix_bug_in_broadcast_min_max_grad_and_broadcast_like * refine * fix static check error * fix bug about index (#8388) * fix bug about index * add test case Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * LogicalSliceAssign support full slice sbp (#8344) * feat(SliceOp): slice ops support 2d sbp * fix(SliceOp): fix [B, P] 2d sbp bug * refine error message * fix bug in parallel_num == 1 * add comment * add warning and format * add NOLINT for boxing check * feat(LogicalSliceOps): support all nd_sbp * feat(LogicalSlice): support nd_sbp * add error message * fix(AutoTest): fix auto_test bug in module.parameter pass * auto format by CI * fix(LogicalSliceAssign): skip test when 1n1d * fix SliceParams memset error * remove memset * add CHECK_JUST * fix(*): make sure split_axis >= 0 or equal to SPLIT_AXIS_FOR_NON_SPLIT * remove memset * fix spilit_info.axis bug * feat(LogicalSliceOps): support grad * add logical_slice gradient_funcs * feat(LogicalSliceAssign): LogicalSliceAssign support full slice sbp * auto format by CI * test(LogicalSlice): fix logical_slice dims Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: Houjiang Chen <chenhoujiangcug@gmail.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * fix_tensor_from_numpy_mem_leak_bug (#8391) * fix_tensor_from_numpy_mem_leak_bug * add note * refine note * refine Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Make of_pyext_obj static only to make sure only a python ext so has python symbols (#8393) * make of_pyext_obj static only * refine note Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Adjust tolerance setting in embedding_renorm unit test (#8394) * support front end compile for job to iree (#8249) * support frontend dev version * polish name * add tosa-to-elf.mlir * tosa to elf by llvm * conv2d partial * an enhanced frontend runner * support numpy as input * enable multiple using nn graph with different input(jobname make it it cd /home/yuhao/frontend/oneflow ; /usr/bin/env /usr/bin/python3 /home/yuhao/.vscode-server/extensions/ms-python.python-2022.6.2/pythonFiles/lib/python/debugpy/launcher 40873 -- /home/yuhao/frontend/oneflow/oneflow/ir/test/Frontend/runner.py ) * enable multiple input * enable cpu and cuda * change full_name to _full_name * support exchange cuda with cpu seamlessly * remove pip * lit config * polish * trim * auto format by CI * modify * auto format by CI * last line polish * use unittest * auto format by CI * use allclose * auto format by CI * pulish * optimize convert oneflow to tosa * conv2d * conv2d enhanced && conv2d examples add * add road map * add add_n2Op and boardcast_addOp conversion * add matmulOp conversion * support converting normailzation op to tosa(partically) * update roadmap * support i64 tensor to dense elem attr * support 100% resnet op conversion * add test mlir * add test iree resnet python script * auto format by CI * done * enhance iree resnet test script * auto format by CI * rebuild code * auto format by CI * rebuild test script * update * auto format by CI * pub * trim test scripts * move * move * input and output add block arg judgement * emit error in variable conversion * error handle for ci * modify err info * auto format by CI * merge * auto format by CI * output not block * flow ones * rm const * trim maybe * trim maybe with header file * const auto * solve clangd error Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Feat/zero mix with mp (#8036) * add zero limit * add debug * add mix zero test * refactor zero api * zero test with mp * add 2d test * add zero nd * add nd zero * add sbp cast * test passed soft limit consumer * refine size api * zero use stage 2 * add limit consumer api * add new api * refine zero s select * fix index out of range * rm zero limit on device type * zero test with activation checkpointing * add indentity when dp sequence len is 1 * move to base with master * fix * fix * fix * add test * debug bad case * refine test for eager and graph boxing * test case ready * simplify * refine test * fix buff size * fix conflict * refine zero nd * refine * add full test * revert change * refine split check * fix typo * rm log * spit long func * restore test * Update optimizer_placement_optimization_pass.cpp * auto format by CI * auto format by CI * fix static check * add tips for zero api change * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Revert embedding normal path and fix amp list (#8374) * revert embedding normal path, fix amp list * fix amp * fix memset bug in gather cpu kernel Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * replace fixed_vector with small_vector and make Shape inherit from it (#8365) * Replace fixed_vector with llvm::SmallVector Signed-off-by: daquexian <daquexian566@gmail.com> * Shape inherited from llvm::SmallVector Signed-off-by: daquexian <daquexian566@gmail.com> * refine cmake Signed-off-by: daquexian <daquexian566@gmail.com> * rename fixed_vector to small_vector Signed-off-by: daquexian <daquexian566@gmail.com> * fix reviews Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI * update Shape constructor Signed-off-by: daquexian <daquexian566@gmail.com> * add 'PUBLIC' keyword to all target_link_libraries Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI * update cmake Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI * update cmake Signed-off-by: daquexian <daquexian566@gmail.com> * update cmake Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI * set is_initialized_ default to true Signed-off-by: daquexian <daquexian566@gmail.com> * override some methods to set is_initialized_ Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * Light plan for debug (#8396) * Light plan for debug * fix note * disable terminfo to fix missing terminfo symbols (#8400) * disable terminfo to fix missing terminfo symbols Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * fix bug of ZeRO MP in complex case (#8404) * Remove redundant output_lbns in ir (#8409) * mv case * remove redundant info * Dev FusedCrossInteraction[OneEmbedding] (#8335) * add simple fused cross interaction forward * add packed fused * Add cross interaction grad * simplify code * fix bug * support crossnet v2 * support cross interaction v2 * add lazy backward * Rename and add test * fix jc comment * fix comment * fix bug * fix userops td elem_cnt for FUSED Group * fix header file * fix clang static analysis * fix unittest Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * add exe graph physical shape check msg (#8002) * fix index select op in graph * add exe graph physical shape check msg * improve the debug information for the python stack trace 1. add a parameter 'max_stack_depth' to specify the max depth for the stack trace 2. refactor other debug related classes. * remove parens * update * resolve PR comments * update * update graph debug test file. * restore self._debug in class Graph and class ModuleBlock * Do not shorten the stack frame string if it is in debug mode * delete TODOs * disable conv3d test (#7969) Signed-off-by: daquexian <daquexian566@gmail.com> * skip layernorm random_data_warp test (#7941) * skip layernorm random_data_warp test * warp/block/uncached case only test gpu Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Lock click version (#7967) Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * add global avgpool unittest (#7585) * fix (#7978) Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Support negative dim in scatter op (#7934) * support negative dim in scatter op * refine scatter test * refine scatter test again Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * run barrier callback in BarrierPhyInstrOperand::~BarrierPhyInstrOperand (#7702) * run barrier callback in BarrierPhyInstrOperand::~BarrierPhyInstrOperand * lock gil in vm Callback thread * more comments for VirtualMachineEngine::Callback() * the Env is never destroyed. * export Env into python * more unittests * wait shared_ptr.use_count() == 0 * export unittest.TestCase in framework/unittest.py * SwitchToShuttingDownPhase * optional is_normal_exit * VirtualMachine::CloseVMThreads * Delete env_api.h env_api.h is deleted by master * reshape_only_one_dim_infered * address pr comments * fix a ref-cnt bug in TryRunBarrierInstruction. * rollback flow.env.all_device_placement * no distributed running test_shutting_down.py * auto format by CI * expand lifetime of module oneflow in test_shutting_down.py * refine del depend on of * capture oneflow._oneflow_internal.eager when calling sync in __del__ * add try in flaky test Co-authored-by: Luyang <flowingsun007@163.com> Co-authored-by: chengtbf <472491134@qq.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: Xiaoyu Xu <xiaoyulink@gmail.com> * Fix one hot scalar tensor bug (#7975) * fix reduce_sum scalar check bug * fix one_hot scalar tensor bug * fix clang tidy error Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * support ctor np array from of tensor (#7970) * support ctor np array from of tensor * add test case constructing np array from tensor * refine Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * add_manual_seed_all_api (#7957) * add_manual_seed_all_api * Update conf.py * refine * add test case * auto format by CI * Update random_generator.cpp * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * one_embedding add doc string (#7902) * add doc string * add example * add * fix doc * refine * address review * mb to MB * add make_table_option * option to options * refine * add forward Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Support numpy scalar parameters (#7935) * feat(functional): support numpy scalar parameters * rename inferface * feat(*): TensorIndex support numpy scalar * feat(TensorIndex): support advance indexing * add unittest and int32 support for branch feat-param_support_np_scalar (#7939) * add unittest * refactor unittest * add todo for int16 advanced indexing * add int32 supporting for advance indexing * auto format by CI Co-authored-by: Wang Yi <53533850+marigoold@users.noreply.github.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * fix tensor_scatter_nd_update (#7953) * fix tensor_scatter_nd_update * auto backward Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * fix one_embedding adam (#7974) * fix one_embedding adam * fix tidy * fix normal Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * speed test with score (#7990) Signed-off-by: daquexian <daquexian566@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Feat/graph del by ref (#7857) * remove IsMultiClient() and single client logic Signed-off-by: daquexian <daquexian566@gmail.com> * rename eager.multi_client to eager Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI * add py ref * refine new session * clean code * make scope api inner use * use session with ref cnt * run barrier callback in BarrierPhyInstrOperand::~BarrierPhyInstrOperand * test pass * lock gil in vm Callback thread * more comments for VirtualMachineEngine::Callback() * merge * merge rm single client * rm initenv * merge and fix master * refactor env c api * add debug code * fix and serving test pass * test passed * rm useless * rm useless code * format * rm useless include * rm sync in py * the Env is never destroyed. * export Env into python * more unittests * fix and pass tests * revert virtual_machine.cpp * revert core/vm * remove outdated python class oneflow.unittest.TestCase * graph test passed * wait shared_ptr.use_count() == 0 * export unittest.TestCase in framework/unittest.py * SwitchToShuttingDownPhase * optional is_normal_exit * VirtualMachine::CloseVMThreads * Delete env_api.h env_api.h is deleted by master * address pr comments * rm is env init * Clear empty thread when graph destroy (#7633) * Revert "Clear empty thread when graph destroy (#7633)" (#7860) This reverts commit 3e8585e5fa20b97229d6b0be46a7ff814dc8cd83. * fix a ref-cnt bug in TryRunBarrierInstruction. * rm env_api * fix clang-tidy error * fix clang-tidy in env_imp * refine env api * format * refine graph del and sync at shuttingdown * fix typo * add comment * rm useless * rm useless Co-authored-by: daquexian <daquexian566@gmail.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: lixinqi <lixinqi0703106@163.com> Co-authored-by: Li Xinqi <lixinqi2010@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: Luyang <flowingsun007@163.com> Co-authored-by: cheng cheng <472491134@qq.com> * [PersistentTable] Fix num blocks (#7986) Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Add auto benchmark for flowvision (#7806) * update yml * update workflow * add resnet50 * [PersistentTable] Async write (#7946) * [PersistentTable] Async write * fix Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * save log in separate dir by default (#7825) Signed-off-by: daquexian <daquexian566@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * fix index select op in graph * add exe graph physical shape check msg * improve the debug inform…

* edit tanh to a closure op (#5) Co-authored-by: yoonlee888 <qiuyunlei@zhejianglab.com> * Dev sin loop grad (#7) * edit tanh to a closure op * add grad-looped sin_cos_negative * add test case Co-authored-by: yoonlee888 <qiuyunlei@zhejianglab.com> Co-authored-by: Zhenhua <1209435+hengzi@users.noreply.github.com> * add log_grad_grad (#12) * Add exp_grad_grad (#11) * Revert "Dev sin loop grad (#7)" (#13) This reverts commit c256a5a326d7e04c2ad4af802318661d18f72441. * fix bugs (#16) * fix ScalarSub param * Add test case * code format * fix * add higher order derivative Interface draft (#6) * add higher order derivative Interface draft * solve bugs of no Tensor.is_sparse attrs * rm some Interface comments * fix & format Co-authored-by: Zhenhua <1209435+hengzi@users.noreply.github.com> Co-authored-by: Huang Zhenhua <huangzhenhua@zhejianglab.com> * add Higher derivative vjp (#9) * add Higher derivative vjp * add autotest code * add autograd.functional.vhp and motified functional * Merge Testcase * Rm chinese chars Co-authored-by: Zhenhua <1209435+hengzi@users.noreply.github.com> Co-authored-by: Huang Zhenhua <huangzhenhua@zhejianglab.com> * merge Master into zj/develop (#21) * Multi Tensor apply Optimizer (#8373) * Add optim_cast and modify sgd * Remove * try to add fuseUpdatecast pass logic * use pass * still have bug in inplace * ban inplace and fix sgd update * fix regst num * add env var * remove cuda graph wrong use * add support for graph * initialize * add functional impl * add simple job rewrite * delete redundant sgd update kernel * support half * add kernel * use single loop kernel * refine * when in eval mode, we turn off multi tensor update * refine format * use juncheng kernel * Refine * group multi tensor op by some attr * add parallel conf to key * refine * Add unroll logic * fix bug * restruct * use pointer list * add adam kernel * support multi tensor adam update * Remove cpu * support skip if and scale by tensor * support sgd adam unittest * add more check * Remove config * Restruct tensorparams * support fused cast in multi tensor update * support cast in multi tensor * fix bug in model update cast pass * fix multi tensor sgd update with cast Pass check logic * refine * support multi tensor adam update with cast * refine format * Remove redundant template args * merge modify for fused cast * only allow fused cast in train mode * only support data parallel in multi tensor update * rewrite fuse update cast pass logic * remove redundant if * fix format * add new line * rename * Remove print * rename and add LOG * Add more type and test * still have bug in multi tensor adam * Fix multi tensor adam update bug * add multi tensor adam update with cast test * simplify code * fix format * Add model diff datatype in optimizer key * remove random seed * fix comment * fix comment * fix to use model copy * use for loop * Fix comment * use hashcombine * fix clang analysis error * add with cuda macro * fix env var in unittest * remove redundant unittest Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Fix doc and ops template auto gen (#8546) * fix doc and add op calculator * fix bug * fix gen_ops * fix diag 0size tensr shape infer bug (#8557) * fix diag 0size tensr shape infer bug * refine * refine * auto format by CI * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Format tensor on cpu (#8548) * Format tensor on cpu * use tensor.detach * Remove useless WITH_CUDAs (#8562) * unique identity (#8509) * unique identity * fix * add identit name * rm debug log * mv identity form class to graph * auto format by CI * fix unique iden with having multiple stage * auto format by CI * Update block.py Co-authored-by: cheng cheng <472491134@qq.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Add GenericStreamContext (#8560) * Modify some file and add test (#8556) * Modify some file and add test * modify the content * modify the format and test function name * modify the format and aligned with pytorch * delete print * modity the function name * auto format by CI Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * Move some op into amp gray list (#8545) enlarge gray list Co-authored-by: cheng cheng <472491134@qq.com> * Refine inplace expand runtime_error (#8561) * Refine inplace expand runtime_error * Opt * Refine * Add Note * OneEmbedding use malloc async (#8543) * in out ptrs * ops and test * test pass * prefetch tmp buffer * embedding shuffle tmp buffer * gradient shuffle * tmp buffer size * mem pool * cuda 11.2 * add id_shuffle to setNumunique in update tests * default not use dynamic alloc * fix of_tidy * add fused op * address review * init tmp_buffer * mv memset * fix * one_embedding fused_lookup_init_cast and fused_update_put (#8564) * add fused op * mv memset * fix * address review * rm fullcache n_missing check Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * fix cpu aligned_alloc size (#8569) Signed-off-by: daquexian <daquexian566@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Add flow norm (#8535) * add flow norm * rm import * rm doctest.testmod * fix pad_packed_sequence method input requires_grad==True (#8574) * fix pad_packed_sequence method input requires_grad==True * fix append error when batch_first=True Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * fix embedding manager tmp buffer (#8585) * fix embedding manager * format * fix reduce_ops 0size bug (#8551) * fix reduce_ops 0size bug * fix commnet * auto format by CI * fix bug Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Align Momentum Optimizer (#8549) * fix moemntum update * align momentum * fix bug and finish eager unittest * Support Graph optimizer * fix momentum bug * refine beta Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Fill GetSbp bug and consistent test bug (#8576) fix(FillOp): fill GetSbp bug and consistent test bug Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Dev Fully fused MLP Grad[OneEmbedding] (#8462) * support fully fused mlp grad in eager * support lazy backward * fix output size * add fallback to tmp_buf logic when ones buffer is not enough * build sbp * overlap allreduce * fix overlap order * fix format * CUDA Graphs delayed capture * Add ifcomm create for graph * insert weight event roughly * fix dbias allreduce error * simplify code * Add 11060 limit * Remove print * Rename * fix fill bug and remove comm to cache * Rename variable and add debug code for cache * Use kernel state and fix bug * remove print * fix allreduce dbias bug * fix header file * fix comment * remove redundant headerfile * fix userops build error * refine * init nccl comm before execute kernel * fix comment Co-authored-by: liujuncheng <liujuncheng1022@gmail.com> * rename mirrored to local (#8503) * rename mirrored to local * rename files * rename files * auto format by CI * revert change of package_mirror.py * rename LocalObject to Dependence * rename fn LocalObject to Dependence * merge master * handle clang check * fix * refine * rename local_object to dependence Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * Implement BroadcastElementwiseUnary primitive (#8384) * Add code skeleton for broadcast unary primitive * first try * finish impl * finish impl * format * fix build error * address review * refine * address review comments * use broadcast unary primitive in fill_tensor_ kernel * handle pack tail statically * fix * address review * address review * Fix SimplifyBroadcastDims * fix * revert fill_kernel Co-authored-by: Juncheng <liujuncheng1022@gmail.com> * skip cpu autotest for graph global (#8593) * TODO * skip cpu autotest for graph global * Refine Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Add function_library.h Exception (#8241) * add RuntimeError for checking * add RuntimeError to CHECK_EQ * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * Refactor shrink (#8573) * caching allocator * auto format by CI * Update ep_device_context.h * EpDeviceCtx with CachingAllocator * rm RawAllocator typename * auto format by CI * specific allo in EpDeviceCtx * auto format by CI * rm outdated alloc * simplify thread safe guard * auto format by CI * avoid return mutex * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Speed up SliceKernel (#8589) * perf(SliceKernel): descrease number of cuda kernel and speed up * perf(SliceKernel): use old kernel when small tensor is all fullslice * use std::copy to copy contiguous memory * fix cpu kernel bug * Update readme and vsn for 0.8.0 (#8600) * update version * remove py3.6 * modify some file and improve error message (#8592) * modify some file and improve error message * modify scalar_by_tensor_op.cpp * Update scalar_by_tensor_op.cpp * Update slice_op.cpp * Update test_slice_op.py * Update test_slice_op.py * auto format by CI * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * rename consistent to global (#8505) * rename consistent to global * rename consistent to global * rename files * rename files * refine * auto format by CI * refine * fix clang check * fix * fix * fix * rm to_consistent docs * auto format by CI * refine * fix * fix * revert changes * auto format by CI * revert changes * revert changes * rename * rename Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * add module releated container docs (#8580) * add module releated container docs * auto format by CI * fix comment * refine * refine Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * fix rnn util extra memory usage when requires_grad=False (#8603) * fix rnn util extra memory usage when requires_grad=False * add comments * refine comments Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * use bracket format slice in tensor str (#8489) Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Perf TensorInfo constructor (#8606) * perf(Autograd): perf TensorInfo constructor * rename consistent to global Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * print operators' python location when print nn_graph (#8558) 1. add a flag in nn.Graph.debug() named print_op_loc for printing operator location. 2. add a flag in nn.Graph.debug() named only_print_user_code_loc for only print users' code location * Add randint like (#8598) * add randnint_like op * add docs for random * refine * auto format by CI * add randint_like global test * refine doc * refine randint_like docs * fix bug Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Add full_like api (#8595) * add full_like_op api * refine * add test * refine * refine docs * refine * add consistent_full test * add full_like op * fix docs commnet * change scalar sbp return value from list to tuple * auto format by CI * merge conflict * revert Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * fix cumsum GenBackwardOpConfFn (#8604) * fix cumsum GenBackwardOpConfFn * add test case Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * revert change (#8613) * fix test graph optimization conf CI bug (#8617) * restore resource config after random tests * refine * refine * Release pod tensor (#8552) * ThreadLocalGuard * split ReleaseTensor into ReleasePodTensor and ReleaseNonPodTensor. * rename Co-authored-by: luyang <flowingsun007@163.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Add param group for optimizer (#8611) * add add_param_group interface for Optimize * add test for add_param_group * revert * fix comment * refine * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * fix broadcast_elementwise_binary cpu (#8625) fix broadcast_elementwise_binary_cpu Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * align exception msg to torch (#8627) * align exception msg to torch * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * skip unstable global test in ci, reduce failture rate (#8635) * fuse embedding interaction (#8586) * fuse embedding interaction * fix of_tidy * refine * fix * address review Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * fix flip gen backward opconf (#8605) * fix flip gen backward opconf * use new opconf api Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Add ONEFLOW_ONE_EMBEDDING_PERSISTENT_TABLE_SNAPSHOT_LOAD_MMAP_LOCKED (#8597) * Add ONEFLOW_ONE_EMBEDDING_PERSISTENT_TABLE_SNAPSHOT_LOAD_MMAP_LOCKED * refine * use MAP_POPULATE Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Profiling main thread (#8601) * ThreadLocalGuard * refactor EagerBlobObjectList * op_args_reserved_size * remove useless comments Co-authored-by: binbinHan <han_binbin@163.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Fully Memory Log V2 with more details (#8565) * Fully Memory Log V2 with more details * refine log and long op name * fix clang tidy * fix test Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: Xiaoyu Xu <xiaoyulink@gmail.com> * Stream policy (#8590) * ThreadLocalGuard * refactor signature of StreamType::InitDeviceCtx * refactor hint * add StreamPolicy * remove DeviceCtx args * refine OpCallInstructionUtil::Prepare & Compute * merge EpDeviceCtx and LazyJobDeviceCtx into StreamPolicy * minor fix * minor fix * del useless code * fix error * fix merge error * fix segment fault bug * fix complie error * del methods belong to Subclass * reslove comment Co-authored-by: binbinHan <han_binbin@163.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Add fully support for broadcast matmul (#6937) * fix arange bug * fully support broadcast matmul * add more check * remove check * add fully sbp * fix full sbp * Fix broadcast matmul grad * remove old broadcast matmul grad * add broadcast grad back and when B numaxes is 2, we use broadcast_gradB instead of matmul+reduce * add lazy backward * Add restrict when transpose_a is false we can use bmatmul_grad_b * revert * fix broadcast matmul backward * fix single client dispatch matmul logic * revert old bcast matmul grad b kernel * fix eager functional matmul backward * add more test case * remove redundant code * add more special case * when b num axes is 2, we only save tensor a * fix annotation * fix conflict and format * remove single client matmul code * Fix eval error * fix conflict * fix unittest * Add init value * support matrix vector matmul * add vector matrix product * Use matmul primitive to rewrite matrix vector product forward and backward * Add fullllllllly support for vector matrix product * Fix sbp * fix bug * add unittest * Add consistent test for broadcast matmul * Remove redundant code * fix userops annotation * fix * refine * Fix clang static analysis * fix clang analysis * set check graph as false * fix * fix for unittest * fix broadcast sbp bug * try to fix unittest * Fix consistent test * fix multiplier to 4 for unittest Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Revert "skip cpu autotest for graph global" (#8608) * Revert "skip cpu autotest for graph global (#8593)" This reverts commit b076be782fd8f21e50ee4915f2d1562f3a9ab4c0. * cherry pick from master Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * OneEmbedding add tmp_buffer allocator (#8588) * fix embedding manager * format * refine embedding_manager tmp_buffer allocator * fix * format * refine * refine * auto format by CI Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * refine error msg for some user ops (#8579) * refine error msg for some user ops * refine error msg for some user ops * optimize * optimize the writing * optimize the writing * optimize the writing * auto format by CI * optimize writing Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Add tril fill value (#8655) add tril fill value Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * fix_non_pod_data_allocate_bug (#8657) Co-authored-by: Li Xinqi <lixinqi2010@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Fix norm (#8629) * fix norm * add doc * add bool & * update math_functor.cpp * add note * fix_decorate_mem_leak_bug_in_eager_boxing (#8661) Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * add higher order derivative for leaky_relu and negative op (#8643) * add higher derivative for leakyrelu and negative * fix a typo * remove functor * add initialize alpha * fix incorrect dim size in global test * fix incorrect dim size in global test * optimize testcase Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * update oneflow intro to show the difference (#8669) * update oneflow intro * refine * refine * refine * refine * refine * refine * refine * refine * refine * refine oneflow intro * Stacked error (#8671) * ThreadLocalGuard * StackedError * StackedError Co-authored-by: Shenghang Tsai <jackalcooper@gmail.com> * Refactor tensor initializer (#8626) * fix(*): fix xavier_initializer * refactor(Initializer): refactor initializer * fix function name * auto format by CI * refine * fix interface in tensor.py * fix(trunc_normal_): fix init bug and add test * auto format by CI * fix bug * add oneflow.nn.init.normal_ test Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * Fix nn doc (#8650) * fix hsplit doc * add doc for module * fix dtype * fix formula * add ref * fix row length * Fix reduce max min bool dtype bug (#8651) * fix reduce_max_min_bool_dtype * fix bug * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Remove redundant exception wrapper (#8631) * remove redundant ExceptionWrapper * refine KeyErrorMessage * refine * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * Refactor MemoryCase to eliminate determine statements of device_type (#7727) * ref memory_case_util * ref BlobObject::CheckMemCase * ref mem_case using * address review * address review * namespace memcase -> memory * fix conflict * address review * address static analysis * rm check * cpu device_id is always 0 * fix conflict * timeout-minutes: 50 * revert change * increase thrd limit in container * skip 2x2 TestEinsumConsistent * skip failed case of distributed test * auto format by CI * fix_non_pod_data_allocate_bug Co-authored-by: Li Xinqi <lixinqi2010@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: tsai <jackalcooper@gmail.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: clackhan <han_binbin@163.com> * fix some data races in c++ api and SteadyVector (#8654) * fix some data races in c++ api and SteadyVector Signed-off-by: daquexian <daquexian566@gmail.com> * skip self copy in MutShapeView::ToShape Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * Fix sin/cos higher order derivative (#8648) * fix(GradGrad): fix sin/cos higher order derivative * fix(GradGrad): fix calculate error * refine autograd global test * auto format by CI * refine sin/cos grad_grad calculate * fix static analysis * merge conflict Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: Ping Zhu <58718936+REYGU@users.noreply.github.com> Co-authored-by: Zhu, Ping <pingzhuu@outlook.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * refine_eager_boxing_to_adapt_ep (#8568) * refine_eager_boxing_to_adapt_ep * fix typo * refine * refine symmetric-acyclic-nd-sbp-to-nd-sbp * refine * fix error * fix static check * add NOLINT Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Fix repeat bug (#8645) * make result contiguous * add test case * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * Instruction policy (#8583) * ThreadLocalGuard * vm::InstructionPolicy * fix compile error (#8623) * fix compile error * change MirroredObject to Dependence * Modify DependenceVector * rm include stream type * fix stream type * auto format by CI Co-authored-by: Yu OuYang <xuanjiuye@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * handle non-contiguous input (#8665) * handle non-contiguous input * refine * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * rename define CONSISTENT to GLOBAL (#8652) Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Refine naive interpret (#8672) * ThreadLocalGuard * refactor EagerBlobObjectList * op_args_reserved_size * remove useless comments * rename one::EagerBlobObjectList* to vm::EagerBlobObject* * refactor signature of InstructionsBuiler::Call * PhysicalRun * refactor InstructionsBuilder::Call * remove unused StatefulOpKernel::need_check_mem_case * remove EagerLocalTensorImpl::is_shape_synced_ * refactor SoftSync * move SmallVector from common/container_util.h to framework/instructions_builder.cpp * explicit scalar initialization Co-authored-by: clackhan <han_binbin@163.com> * Rebuild Docs V0.8.0 (#8392) * rebuild for 5 module * fix bug * fix for doctree and content in nn and * fix * fix * fix * add some * fix for oneflow.rst * update oneflow oneflow.nn * update tensor * update tensor module * update * test * update * update * fix for undone desc * docs: oneflow.utils.data (#8485) * feat(utils.data): add oneflow.utils.data * docs(dataloader): change the docstring of DataLoader * docs(tensor): add methods to oneflow.Tensor document * docs(optim): change docstring of optimizer and add a note to the doucument * nn.graph * fix for graph * fix bug * review nn and linalg document (#8515) * docs(nn): add contents to oneflow.nn document * docs(linalg): refactor oneflow.linalg document * change attributes.rst and review nn.functional.rst (#8514) * change attributes.rst and review nn.functional.rst * reconstruction oneflow.cuda * fix cuda and rebuild comm demo (#8582) * update image * add distributed * oneembedding & refine graph * update for sdisributed one_embedding * fix rnn.py (#8616) * 重构 oneflow.nn.init 文档 (#8622) docs(nn.init): refactore nn.init document * docs(nn.init): remove the comments * docs(utils.data): remove the comments * update and fix bug * docs(review): refine the documents (#8646) * docs(review): refine oneflow, nn, Tensor, nn.init, linalg, utils.data, optim modules * docs(optim): modify the code examples * docs(tensor): edit note * 重构 oneflow.autograd 文档 (#8594) * docs(autograd): refactor oneflow.autograd * docs(autograd): edit "Default gradient layouts". * docs(autograd): reedit "Default gradient layouts" * docs(autograd): add comment * docs(autograd): add reference * update * docs(tensor): change autoclass to autosummary * update * update * add oneflow.linalg.diagonal (#8653) * docs(linalg): add oneflow.linalg.diagonal * update enviorment variable * Update docs/source/distributed.rst Co-authored-by: Houjiang Chen <chenhoujiangcug@gmail.com> * Update docs/source/distributed.rst Co-authored-by: Houjiang Chen <chenhoujiangcug@gmail.com> * update enviorment variable * update for ev & distributed * update distribued * update ev * update distribute desc * Update docs/source/distributed.rst Co-authored-by: Houjiang Chen <chenhoujiangcug@gmail.com> * update * 修改 docstring 描述 (#8656) * docs: move pytorch refernce to end * docs: add some docstring * docs(refs): add refs * Update docs/source/distributed.rst * updte for distributed details and environment_variable * docs(docstring): Modify all reference links to version 1.10 (#8663) * fix bug * fix bug * fix all warning Co-authored-by: Guoliang Cheng <1876953310@qq.com> Co-authored-by: liu xuan <85344642+laoliu97@users.noreply.github.com> Co-authored-by: Guoliang Cheng <lmyybh_lazy@163.com> Co-authored-by: laoliu97 <841637247@qq.com> Co-authored-by: Yao Chi <later@usopp.net> Co-authored-by: Houjiang Chen <chenhoujiangcug@gmail.com> * Fix zeros like and ones_like api (#8632) * fix zeros_like and ones_like bug * refine * revert * refine * fix tensor_slice_view infer physic_shape bug * add test * refine * auto format by CI * fix bug * refine * auto format by CI * fix import error * fix bug Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Fix sbp print bug (#8689) * Add a normal priority with no transfer but different sbp * Fix the bug for printing no boxing edge * Do not use P for weights * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * eager_local_interpreter_with_infer_cache (#8619) * ThreadLocalGuard * refactor EagerBlobObjectList * op_args_reserved_size * remove useless comments * rename one::EagerBlobObjectList* to vm::EagerBlobObject* * refactor signature of InstructionsBuiler::Call * PhysicalRun * refactor InstructionsBuilder::Call * remove unused StatefulOpKernel::need_check_mem_case * remove EagerLocalTensorImpl::is_shape_synced_ * eager_local_interpreter_with_infer_cache * remove useless code * reslove comments * refactor TensorMeta::TensorMeta(const TensorMeta) * use small vector * add kMaxNumDims * fix error include * fix split Symbol LocalTensorMeta error * refactor SoftSync * move SmallVector from common/container_util.h to framework/instructions_builder.cpp * mone ONEFLOW_EAGER_ENABLE_LOCAL_INFER_CACHE to eager.h * add blank line * reslove comments * minor fix * refine * explicit scalar initialization * fix static check error * auto format by CI * of_format * reslove comment * refine * refine * refine Co-authored-by: lixinqi <lixinqi0703106@163.com> Co-authored-by: Li Xinqi <lixinqi2010@gmail.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * fix gelu nn.Module bug and support tanh mode. (#8693) * add gelu2 api * refine test * refine docs * refine * restuct * delete useless headfile * format * rm doc of tensor.gelu (#8696) Co-authored-by: Shanshan Zhong <62104945+zhongshsh@users.noreply.github.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Fix bug in CrossFeatureInteraction LazyBackward (#8677) fix bug Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * fix floating-point scalar tensor in arange (#8673) Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Add nn functional fold (#8667) * add fold * update fold.py * add test * fix doc * fix comment Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * modify some file and improve the error message (#8566) * modify some file and improve the error message * modify the content * modify the content * auto format by CI * Update roi_align_op.cpp * Update roi_align_op.cpp * Update reshape_user_op_util.cpp * auto format by CI * Update roi_align_op.cpp Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * [OneEmbedding] add id_shuffle_copy_out (#8683) add id_shuffle_copy_out Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * fix add_param_group step key not match error (#8698) * fix add_param_group step key not match error * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * add env ONEFLOW_EP_CUDA_DEVICE_FLAGS and ONEFLOW_EP_CUDA_STREAM_FLAGS (#8703) Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * fix for docsv0.8 (#8710) * fix repeat op 0-size releated bug (both in FW and AD) (#8707) * fix repeat op 0-size releated bug (both in FW and AD) * refine * refine static check * refine * fix commnet * fix comment * refine * fix test * auto format by CI * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Support Dropout Scale in FusedMLPGrad[OneEmbedding] (#8633) * support alpha list * Remove redundant modify * remove redundant alpha set * refine Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Fix bug of Tensor.type (#8697) * fix bug of tensor.type(flow.Tensor) * fix bug of tensor.type(flow.Tensor) about device * Fix tensor type doc (#8699) fix doc of tensor.type * add test for tensor.type(flow.Tensor) * move PyTensorMetaCls_CheckExact to header file Co-authored-by: Shanshan Zhong <62104945+zhongshsh@users.noreply.github.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * ONEFLOW_GRAPH_PLACE_TRAINING_STATE_ON_ALL_RANKS (#8706) * ONEFLOW_GRAPH_PLACE_TRAINING_STATE_ON_ALL_RANKS * auto format by CI Co-authored-by: liujuncheng <liujuncheng1022@gmail.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * define_mut_output_shape_and_mut_output_stride_in_infer_ctx (#8709) * define_mut_output_shape_and_mut_output_stride_in_infer_ctx * fix merge master error * fix typo Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Add qat conv modules (#8368) * add qat conv modules * add quantization related modules to doc * refine qatconv modules doc * add qat conv module tests * refine * refine Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * add unsqueeze_multiple_op (#8714) * add unsqueeze_multiple_op * modify the format * Update functional_api.yaml Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * modify broadcast_like_op.cpp and add test (#8720) * modify broadcast_like_op.cpp and add test * modify broadcast_like_op.cpp * Update broadcast_like_op.cpp Co-authored-by: Yinggang Wang <wyg19970408@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * JIT LR (#8500) * add example code * Update cosine_annealing_lr.py * enable self params transformer * enable pass ast to c++ api * enable jit backend for lr * enable jit global register and invoke * convert Global to Singleton for new merge * enable pybind11 walk on python ast * enable test all existent get_lr of oneflow in python * enable py_ast_wrapper pass ast from python to mlir * switch all ast to ast-wrapper in mlir scope * define python ast partially * partial python ast definition * trim asdl of python ast * mlir gen * add symbol table * from ast to jit done * switch llvm::errs() to mlir::emitError and convert switch to typeSwitch * trim duplicate namespace use * fix LIT header * add some docs * enable compare with or_else, if with return seamless in branch and mutable variable * trim code and refine struct * register pybind11 ast node for shared_ptr * enable cpp class in python * go through python to mlir to llvm to jit to run * add addf subf op * work well on stepLR linearLR exponentialLR coseineDecayLR cosineAnnealingLR constantLR * enable maxf minf conversion to llvm ir * rename LR_JIT to LRJITRegister * remove LR_JIT_Engine and swith Invoke to std::function ret by lookup * refine struct * enable bisect_right and python resigter api have dump option arg * add bisect_left and bisect_transformer specially, delete former test python script * remove c++17 standard * restore double hash to iterator * publish * publish * publish * use llvm classof and typeswitch rightly * trim * commit * commit * commit * commit * commit * commit * auto format by CI * Update ir.cpp * Update OneFlowLRJITRegistry.h * auto format by CI * Update AstMlirGen.h * Update lr_jit.cpp * auto format by CI * Naming conventions * auto format by CI * auto format by CI * deploy _ behind Co-authored-by: leaves-zwx <kunta0932@gmail.com> Co-authored-by: yuhao <1171760467@qq.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: yuhao <72971170+howin98@users.noreply.github.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Add logspace (#8599) * add logspace * add global test * restore rand * fix doc * rename consistent to global * adjust import order * add todo * Add hann_window (#8615) * add hann_window * rm useless include * add check * adjust import order * add ONEFLOW_VM_PENDING_HANDLE_WINDOW_SIZE (#8730) * add ONEFLOW_VM_PENDING_HANDLE_WINDOW_SIZE * add environment to vm.h Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Fix as strided bool type and view bug (#8713) * fix as_stride bug * refine * refine * refine * delete useless head file * refine Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Add functional binary cross entropy (#8708) * add gelu2 api * refine test * refine docs * refine * restuct * delete useless headfile * format * rm doc of tensor.gelu * add functional binary cross entropy Co-authored-by: BBuf <1182563586@qq.com> Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * support map_location in flow.load (#8666) * support map_location in flow.load Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI * fix tests Signed-off-by: daquexian <daquexian566@gmail.com> * fix bug when map_location is None Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * Add addcdiv (#8581) * add addcdiv * fix tensor_functions * fix inplace * add test number * rename consistent to global * Inner most dim case for cumsum cumprod op (#8403) * cumsum use cub scansum in some case * prod use cub scan * refine name * refine * optimize cum op * format * fix * get device properties by cuda stream class * revert useless code * refine * outer dim use parallel sweep algo * refine * fix a fraction of threads hit __syncthreads * revert * refine kernel define * refine * refine * refine * refine * move comment * fix * fix * refine Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Define mut output dtype and mut output is dynamic in infer ctx (#8716) * define_mut_output_shape_and_mut_output_stride_in_infer_ctx * fix merge master error * fix typo * define_mut_output_dtype_and_mut_output_is_dynamic_in_infer_ctx * replce const DataType& with DataType * replace const DataType& with DataType ret * split TensorDesc4ArgNameAndIndex and MutTensorDesc4ArgNameAndIndex * refine * minor fix * refine * fix static check error * Update op_expr.cpp * Update op_expr.cpp * Update stateful_opkernel.cpp * refine * fix static check error * refine * refine Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Dev refactor fuse instruction policy (#8624) * ThreadLocalGuard * vm::InstructionPolicy * refactor fuse instruction policy * fix compile error (#8623) * fix compile error * change MirroredObject to Dependence * Modify DependenceVector * add instruction policy util * add instruction policy util * remove include * add include * rm fuse instruction type * Modifying variable properties * add stream_sequential_dependence_ to instruction_policy Co-authored-by: lixinqi <lixinqi0703106@163.com> Co-authored-by: Li Xinqi <lixinqi2010@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * fix bug of batchnorm num_batches_tracked global error when loading state_dict (#8723) add condition for assign num_batches_tracked Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * add launch master port limit (#8563) * add launch master port limit * Update python/oneflow/distributed/launch.py Co-authored-by: daquexian <daquexian566@gmail.com> Co-authored-by: daquexian <daquexian566@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Fix docs import distance (#8691) * fix import distance * add functional apis * add smooth_l1_loss docs * refine activation.py * add deleted api * review * 添加oneflow, nn 等模块文档中遗漏的接口 (#8704) * docs: add api * docs(nn): refactor nn * review Co-authored-by: Guoliang Cheng <lmyybh_lazy@163.com> Co-authored-by: ChenQiaoling <48576019+Chenqll@users.noreply.github.com> * refactor control stream type (#8647) * refactor control stream type * auto format by CI * Add method implementation * refine * refien Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: Li Xinqi <lixinqi2010@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Define mut output tensor desc (#8717) * define_mut_output_shape_and_mut_output_stride_in_infer_ctx * fix merge master error * fix typo * define_mut_output_dtype_and_mut_output_is_dynamic_in_infer_ctx * define_mut_output_dtype_and_mut_output_tensor_desc * replce const DataType& with DataType * replace const DataType& with DataType ret * split TensorDesc4ArgNameAndIndex and MutTensorDesc4ArgNameAndIndex * refine * minor fix * fix merge error * fix warning error * refine * fix static check error * Update op_expr.cpp * Update op_expr.cpp * Update stateful_opkernel.cpp * refine * fix static check error * refine * refine Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Symbolic local tensor meta (#8662) * ThreadLocalGuard * refactor EagerBlobObjectList * op_args_reserved_size * remove useless comments * rename one::EagerBlobObjectList* to vm::EagerBlobObject* * refactor signature of InstructionsBuiler::Call * PhysicalRun * refactor InstructionsBuilder::Call * remove unused StatefulOpKernel::need_check_mem_case * remove EagerLocalTensorImpl::is_shape_synced_ * eager_local_interpreter_with_infer_cache * remove useless code * reslove comments * refactor TensorMeta::TensorMeta(const TensorMeta) * use small vector * Symbolic LocalTensorMeta * check shape in critical_sectio * add kMaxNumDims * fix error include * fix split Symbol LocalTensorMeta error * fix split cache and symbolic local tensor meta error * refactor SoftSync * move SmallVector from common/container_util.h to framework/instructions_builder.cpp * mone ONEFLOW_EAGER_ENABLE_LOCAL_INFER_CACHE to eager.h * add blank line * reslove comments * minor fix * refine * explicit scalar initialization * fix static check error * auto format by CI * of_format * reslove comment * refine * refine * refine * fix error * define MutOutputShape and MutOutputStride in InferContext * define_mut_output_shape_and_mut_output_stride_in_infer_ctx * fix merge master error * fix typo * fix static check error * define_mut_output_dtype_and_mut_output_is_dynamic_in_infer_ctx * define_mut_output_dtype_and_mut_output_tensor_desc * replce const DataType& with DataType * split const and mut func in LocalTensorMeta * replace const DataType& with DataType ret * split TensorDesc4ArgNameAndIndex and MutTensorDesc4ArgNameAndIndex * refine * minor fix * fix merge error * fix warning error * refine * fix static check error * Update op_expr.cpp * Update op_expr.cpp * split MutTensorMeta and MutLocalTensorMeta * Update stateful_opkernel.cpp * refine * fix static check error * refine * refine * reslove comment * refine * fix typo Co-authored-by: Houjiang Chen <chenhoujiangcug@gmail.com> * fxi typo * use OpArgsVector Co-authored-by: lixinqi <lixinqi0703106@163.com> Co-authored-by: Li Xinqi <lixinqi2010@gmail.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: Houjiang Chen <chenhoujiangcug@gmail.com> * Feat general basic communication (#8437) * Add a slight cost for B->S and B->P in 2d sbp * Add penalty for P in consumer * Fix a slight bug * Add at most 1 middle node for general basic communication * Add the cost for general basic communication * Add the slight penalty for eager * Skip initialization of boxing collector if not needed * Fix a bug * Dev nd nccl send recv boxing (#8467) * nd nccl_send_recv_boxing * rm print * support num_axes > 2 * Add distributed optional run (#8372) * Add * change deps * add install * add skip * autoprof supports bandwidth (#8367) * autoprof supports bandwidth Signed-off-by: daquexian <daquexian566@gmail.com> * print bandwidth Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * remove tmp buffer of cumprod cpu backward kernel (#8369) * remove tmp buffer of cumprod cpu backward kernel * refine * refine Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Move tensor api to cpython part3 (#8342) * add tensor_functions * concat py methods * add hash, restore tensor.py * check replacement * refine code, remove commented tensor.py * refine code * move some api * add cpu and cuda api * add triu tril norm and etc. * remove tensor_functions.h * move more api * move more api, refine size * fix typo * format code, remove useless include * refine code * refine code, fix typo * align .cuda to python * refine code * split some api to part3 for review * remove positional only arguments of argmax and argmin * remove arguments parse * modify arguments name in matmul and floor_divide * rename BINARY_FUNC to DIRECT_PASS_FUNC, modify some functions * refine code, format code * add inplace /=, add comments * remove name in macros * remove python api * remove redundant include * remove cout * format code * refactor tensor.size by directly call shape.at, refactor tensor.sub_ by calling nb_sub_ * remove redundant code * auto format by CI * fix typo, fix wrong call * modify idx datatype from int32 to int64 in tensor.size * add some DIRECT_PASS_FUNC * add cpu cuda var pow and etc. * add masked_fill any all * make REDUCE_FUNC macro, add reduce_* functions * add 0dim check in ReduceSumWhole, refine yaml * fix bug * restore add add_ sub sub_ * add unittest for tensor.half tensor.add tensor.add_ * refine code * refine code * fix typo * fix bug of tensor.std() * refactor var std and cuda, using c++ functional api * add beta and threshold in softplus * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Add nn_functor Check (#7910) * add bias_add_check * add bias_add error test * fix conv2d nhwc bias_add error * add nhwc conv test * add bias_add_error test * Add bias add error check * Rename * add batch matmul error check * add matmul check error msg * remove annotation * add fused mlp error msg check * Add pixel shuffle check test * add more test until normalization add relu functor * refine error message * finish all nnfunctor check msg * handle type error * remove useless symbol * modify back to TypeError * fix all comment * Remove redundant code * Remove pad ndim check * fix bias add space * fix check logic cause ci gpu not always gpu:0 Co-authored-by: hjchen2 <chenhoujiangcug@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Add FusedMatmulBiasAddReluDropout [OneEmbedding] (#8222) * previous version for fused_matmul_bias_add_relu_dropout * add op infer * fix detail * finish forward * support dropout rate list * add forward test * fix bug for output buffer * Configurable alpha params * try to add bit mask logic * Add bitmask first version! * Add row col bitmask logic * support not align4 reludropout * simplify relu dropout ld logic * Add naive relu dropout grad kernel * add simple relu dropout grad kernel * Rename * support relu_dropout bitmask backward * add vectorized optimization * fix tmp buffer * add to amp list * add lazy backward logic * Refine kernel * add indextype dispatch * simplify functor logic * fix cublas fused mlp aux_ld shape bug * Add more relu dropout kernel * add full unittest * fix bug in skip final activation * refine * Remove dump func * fix format * Remove cmake * remove redundant divide * add padded version * fix dropout * oneflow curand * refine * remove redundant kernel * add unroll logic * add unroll and ballot sync * refine format * Remove fast curand * Refine python interface * Add if branch for memset * fix python logic * just for debug * not use matmul bias add grad * add launch 1 block limit * fix unittest * Refine * fix graph backward bug * limit to 11060 * change to use int32_t dtype for cublas aux * Fix jc comment * fix comment * fix convert * fix static_analysis * fix at * fix userops td * fix userops td * fix const ref * fix compile error for bfloat16 * limit to 11060 * fix bug Co-authored-by: Juncheng <liujuncheng1022@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * fix gather 0-dim tensor bug (#8376) * fix 0-dim tensor bug * refine * support input 0-dim tensor for gather * refine * refine * refine dim_scatter_kernel check * refine * refine check * fix clang_tidy error Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * add api to apply external job pass (#8370) * Add condition to find-test-cache-distributed (#8387) * add condition to find-test-cache-distributed * fix * warp dim util (#8382) * warp dim util * format * use more maybe_wrap_dim * refine array functor * add more * refine math_functor * fix_bug_in_broadcast_min_max_grad_and_broadcast_like (#8379) * fix_bug_in_broadcast_min_max_grad_and_broadcast_like * refine * fix static check error * fix bug about index (#8388) * fix bug about index * add test case Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * LogicalSliceAssign support full slice sbp (#8344) * feat(SliceOp): slice ops support 2d sbp * fix(SliceOp): fix [B, P] 2d sbp bug * refine error message * fix bug in parallel_num == 1 * add comment * add warning and format * add NOLINT for boxing check * feat(LogicalSliceOps): support all nd_sbp * feat(LogicalSlice): support nd_sbp * add error message * fix(AutoTest): fix auto_test bug in module.parameter pass * auto format by CI * fix(LogicalSliceAssign): skip test when 1n1d * fix SliceParams memset error * remove memset * add CHECK_JUST * fix(*): make sure split_axis >= 0 or equal to SPLIT_AXIS_FOR_NON_SPLIT * remove memset * fix spilit_info.axis bug * feat(LogicalSliceOps): support grad * add logical_slice gradient_funcs * feat(LogicalSliceAssign): LogicalSliceAssign support full slice sbp * auto format by CI * test(LogicalSlice): fix logical_slice dims Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: Houjiang Chen <chenhoujiangcug@gmail.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * fix_tensor_from_numpy_mem_leak_bug (#8391) * fix_tensor_from_numpy_mem_leak_bug * add note * refine note * refine Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Make of_pyext_obj static only to make sure only a python ext so has python symbols (#8393) * make of_pyext_obj static only * refine note Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Adjust tolerance setting in embedding_renorm unit test (#8394) * support front end compile for job to iree (#8249) * support frontend dev version * polish name * add tosa-to-elf.mlir * tosa to elf by llvm * conv2d partial * an enhanced frontend runner * support numpy as input * enable multiple using nn graph with different input(jobname make it it cd /home/yuhao/frontend/oneflow ; /usr/bin/env /usr/bin/python3 /home/yuhao/.vscode-server/extensions/ms-python.python-2022.6.2/pythonFiles/lib/python/debugpy/launcher 40873 -- /home/yuhao/frontend/oneflow/oneflow/ir/test/Frontend/runner.py ) * enable multiple input * enable cpu and cuda * change full_name to _full_name * support exchange cuda with cpu seamlessly * remove pip * lit config * polish * trim * auto format by CI * modify * auto format by CI * last line polish * use unittest * auto format by CI * use allclose * auto format by CI * pulish * optimize convert oneflow to tosa * conv2d * conv2d enhanced && conv2d examples add * add road map * add add_n2Op and boardcast_addOp conversion * add matmulOp conversion * support converting normailzation op to tosa(partically) * update roadmap * support i64 tensor to dense elem attr * support 100% resnet op conversion * add test mlir * add test iree resnet python script * auto format by CI * done * enhance iree resnet test script * auto format by CI * rebuild code * auto format by CI * rebuild test script * update * auto format by CI * pub * trim test scripts * move * move * input and output add block arg judgement * emit error in variable conversion * error handle for ci * modify err info * auto format by CI * merge * auto format by CI * output not block * flow ones * rm const * trim maybe * trim maybe with header file * const auto * solve clangd error Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Feat/zero mix with mp (#8036) * add zero limit * add debug * add mix zero test * refactor zero api * zero test with mp * add 2d test * add zero nd * add nd zero * add sbp cast * test passed soft limit consumer * refine size api * zero use stage 2 * add limit consumer api * add new api * refine zero s select * fix index out of range * rm zero limit on device type * zero test with activation checkpointing * add indentity when dp sequence len is 1 * move to base with master * fix * fix * fix * add test * debug bad case * refine test for eager and graph boxing * test case ready * simplify * refine test * fix buff size * fix conflict * refine zero nd * refine * add full test * revert change * refine split check * fix typo * rm log * spit long func * restore test * Update optimizer_placement_optimization_pass.cpp * auto format by CI * auto format by CI * fix static check * add tips for zero api change * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Revert embedding normal path and fix amp list (#8374) * revert embedding normal path, fix amp list * fix amp * fix memset bug in gather cpu kernel Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * replace fixed_vector with small_vector and make Shape inherit from it (#8365) * Replace fixed_vector with llvm::SmallVector Signed-off-by: daquexian <daquexian566@gmail.com> * Shape inherited from llvm::SmallVector Signed-off-by: daquexian <daquexian566@gmail.com> * refine cmake Signed-off-by: daquexian <daquexian566@gmail.com> * rename fixed_vector to small_vector Signed-off-by: daquexian <daquexian566@gmail.com> * fix reviews Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI * update Shape constructor Signed-off-by: daquexian <daquexian566@gmail.com> * add 'PUBLIC' keyword to all target_link_libraries Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI * update cmake Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI * update cmake Signed-off-by: daquexian <daquexian566@gmail.com> * update cmake Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI * set is_initialized_ default to true Signed-off-by: daquexian <daquexian566@gmail.com> * override some methods to set is_initialized_ Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * Light plan for debug (#8396) * Light plan for debug * fix note * disable terminfo to fix missing terminfo symbols (#8400) * disable terminfo to fix missing terminfo symbols Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * fix bug of ZeRO MP in complex case (#8404) * Remove redundant output_lbns in ir (#8409) * mv case * remove redundant info * Dev FusedCrossInteraction[OneEmbedding] (#8335) * add simple fused cross interaction forward * add packed fused * Add cross interaction grad * simplify code * fix bug * support crossnet v2 * support cross interaction v2 * add lazy backward * Rename and add test * fix jc comment * fix comment * fix bug * fix userops td elem_cnt for FUSED Group * fix header file * fix clang static analysis * fix unittest Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * add exe graph physical shape check msg (#8002) * fix index select op in graph * add exe graph physical shape check msg * improve the debug information for the python stack trace 1. add a parameter 'max_stack_depth' to specify the max depth for the stack trace 2. refactor other debug related classes. * remove parens * update * resolve PR comments * update * update graph debug test file. * restore self._debug in class Graph and class ModuleBlock * Do not shorten the stack frame string if it is in debug mode * delete TODOs * disable conv3d test (#7969) Signed-off-by: daquexian <daquexian566@gmail.com> * skip layernorm random_data_warp test (#7941) * skip layernorm random_data_warp test * warp/block/uncached case only test gpu Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Lock click version (#7967) Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * add global avgpool unittest (#7585) * fix (#7978) Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Support negative dim in scatter op (#7934) * support negative dim in scatter op * refine scatter test * refine scatter test again Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * run barrier callback in BarrierPhyInstrOperand::~BarrierPhyInstrOperand (#7702) * run barrier callback in BarrierPhyInstrOperand::~BarrierPhyInstrOperand * lock gil in vm Callback thread * more comments for VirtualMachineEngine::Callback() * the Env is never destroyed. * export Env into python * more unittests * wait shared_ptr.use_count() == 0 * export unittest.TestCase in framework/unittest.py * SwitchToShuttingDownPhase * optional is_normal_exit * VirtualMachine::CloseVMThreads * Delete env_api.h env_api.h is deleted by master * reshape_only_one_dim_infered * address pr comments * fix a ref-cnt bug in TryRunBarrierInstruction. * rollback flow.env.all_device_placement * no distributed running test_shutting_down.py * auto format by CI * expand lifetime of module oneflow in test_shutting_down.py * refine del depend on of * capture oneflow._oneflow_internal.eager when calling sync in __del__ * add try in flaky test Co-authored-by: Luyang <flowingsun007@163.com> Co-authored-by: chengtbf <472491134@qq.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: Xiaoyu Xu <xiaoyulink@gmail.com> * Fix one hot scalar tensor bug (#7975) * fix reduce_sum scalar check bug * fix one_hot scalar tensor bug * fix clang tidy error Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * support ctor np array from of tensor (#7970) * support ctor np array from of tensor * add test case constructing np array from tensor * refine Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * add_manual_seed_all_api (#7957) * add_manual_seed_all_api * Update conf.py * refine * add test case * auto format by CI * Update random_generator.cpp * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * one_embedding add doc string (#7902) * add doc string * add example * add * fix doc * refine * address review * mb to MB * add make_table_option * option to options * refine * add forward Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Support numpy scalar parameters (#7935) * feat(functional): support numpy scalar parameters * rename inferface * feat(*): TensorIndex support numpy scalar * feat(TensorIndex): support advance indexing * add unittest and int32 support for branch feat-param_support_np_scalar (#7939) * add unittest * refactor unittest * add todo for int16 advanced indexing * add int32 supporting for advance indexing * auto format by CI Co-authored-by: Wang Yi <53533850+marigoold@users.noreply.github.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * fix tensor_scatter_nd_update (#7953) * fix tensor_scatter_nd_update * auto backward Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * fix one_embedding adam (#7974) * fix one_embedding adam * fix tidy * fix normal Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * speed test with score (#7990) Signed-off-by: daquexian <daquexian566@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Feat/graph del by ref (#7857) * remove IsMultiClient() and single client logic Signed-off-by: daquexian <daquexian566@gmail.com> * rename eager.multi_client to eager Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI * add py ref * refine new session * clean code * make scope api inner use * use session with ref cnt * run barrier callback in BarrierPhyInstrOperand::~BarrierPhyInstrOperand * test pass * lock gil in vm Callback thread * more comments for VirtualMachineEngine::Callback() * merge * merge rm single client * rm initenv * merge and fix master * refactor env c api * add debug code * fix and serving test pass * test passed * rm useless * rm useless code * format * rm useless include * rm sync in py * the Env is never destroyed. * export Env into python * more unittests * fix and pass tests * revert virtual_machine.cpp * revert core/vm * remove outdated python class oneflow.unittest.TestCase * graph test passed * wait shared_ptr.use_count() == 0 * export unittest.TestCase in framework/unittest.py * SwitchToShuttingDownPhase * optional is_normal_exit * VirtualMachine::CloseVMThreads * Delete env_api.h env_api.h is deleted by master * address pr comments * rm is env init * Clear empty thread when graph destroy (#7633) * Revert "Clear empty thread when graph destroy (#7633)" (#7860) This reverts commit 3e8585e5fa20b97229d6b0be46a7ff814dc8cd83. * fix a ref-cnt bug in TryRunBarrierInstruction. * rm env_api * …

add api to apply external job pass

edd2e39

hjchen2 requested review from zzk0, mosout and daquexian and removed request for zzk0 June 6, 2022 08:52

Merge branch 'master' into dev_apply_external_job_pass

7b5bc78

hjchen2 enabled auto-merge (squash) June 6, 2022 08:52

hjchen2 requested a review from oneflow-ci-bot June 6, 2022 08:52

hjchen2 added automerge api serving enhancement labels Jun 6, 2022

mosout approved these changes Jun 7, 2022

View reviewed changes

Merge branch 'master' into dev_apply_external_job_pass

ef242f3

github-actions bot removed the automerge label Jun 7, 2022

daquexian approved these changes Jun 7, 2022

View reviewed changes

hjchen2 added 3 commits June 8, 2022 11:04

build graph in complie and fix based on comments

3644e69

Merge branch 'dev_apply_external_job_pass' of https://github.com/Onef…

70e6747

…low-Inc/oneflow into dev_apply_external_job_pass

Merge branch 'master' into dev_apply_external_job_pass

397f7db

hjchen2 added the automerge label Jun 8, 2022

hjchen2 requested review from oneflow-ci-bot and removed request for oneflow-ci-bot June 8, 2022 07:43

mergify bot added 2 commits June 8, 2022 11:07

Merge branch 'master' into dev_apply_external_job_pass

0e00773

Merge branch 'master' into dev_apply_external_job_pass

9c8a53d

oneflow-ci-bot requested review from oneflow-ci-bot and removed request for oneflow-ci-bot June 8, 2022 16:24

hjchen2 merged commit 33634a2 into master Jun 8, 2022

hjchen2 deleted the dev_apply_external_job_pass branch June 8, 2022 18:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add api to apply external job pass #8370

add api to apply external job pass #8370

hjchen2 commented Jun 6, 2022

github-actions bot commented Jun 6, 2022

github-actions bot commented Jun 7, 2022

github-actions bot commented Jun 7, 2022

daquexian Jun 7, 2022

hjchen2 Jun 8, 2022

hjchen2 commented Jun 8, 2022

github-actions bot commented Jun 8, 2022

add api to apply external job pass #8370

add api to apply external job pass #8370

Conversation

hjchen2 commented Jun 6, 2022

github-actions bot commented Jun 6, 2022

github-actions bot commented Jun 7, 2022

github-actions bot commented Jun 7, 2022

daquexian Jun 7, 2022

Choose a reason for hiding this comment

hjchen2 Jun 8, 2022

Choose a reason for hiding this comment

hjchen2 commented Jun 8, 2022

github-actions bot commented Jun 8, 2022