Fix python apis and xla implementation (#7183)

* Support save/load for lr_scheduler (#6948) * feat(LrScheduler): support save/load for lr_scheduler * refine document * auto format by CI * Refine test * auto format by CI Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * Fix eye_op attr (#6973) * fix * add graph test * Update python/oneflow/test/graph/test_graph_eye.py Co-authored-by: daquexian <daquexian566@gmail.com> * refine * Update python/oneflow/test/graph/test_graph_eye.py Co-authored-by: daquexian <daquexian566@gmail.com> * auto format by CI Co-authored-by: daquexian <daquexian566@gmail.com> Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * softmax double use uncached impl to accelerate compile (#6992) Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> * Add [[nodiscard]] for cpp api (#6997) * add [[nodiscard]] * refine * reformat Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> * Support Arange delta to decide dtype (#6998) * support delta dtype to decide output dtype * add more unittest Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> * Add clang as CUDA FE compiler in CI (#6954) * update action use * refine * refine * refine * refine * refine * refine * refine * refine * refine * refine * refine * refine * fix * add 80 and 86 * refine * refine * add CUDA_NVCC_THREADS_NUMBER * refine * address review * set CUDA_NVCC_THREADS_NUMBER 8 * fix * fix clang in init cmake * add script * refine * refine * refine * refine * refine * refien * refine * add flags to skip zlib * refine * refine * refine * refine * refine * refine * refine * refine * refine * refine * refine * refine * refine * refine * refine * refine * Migrate chunk python layer to functor (#6983) * Migrate chunk Python layer logic to functor * fix runtime * Fix splits bug and CI * Modify push to emplace Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> * Reduce memory usage when compiling oneflow dialect ops (#7000) * CudaAllocator device reset before OOM (#6976) * CudaAllocator device reset before OOM * Add NOTE Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> * Refactor vm stream desc (#6989) * remove StreamDesc::num_machines * Prepare one thread for one stream_type Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> * Add Diagonal Op (#6016) * format complete * python to cpp * py2cpp error * rm * auto format by CI * revise * auto format by CI * license * docstring * docstring * tensor * tensor attribute * auto format by CI * docstring * revise * test * revise * revise * rename * half * docs * doc,test * test times * revise * format Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> * add all to all op (#6283) * add all to all op * add barrier * format * add import * fix test * delete barrier * delete barrier * Revert "delete barrier" This reverts commit aa397ea. * Revert "delete barrier" This reverts commit 7ddf79a. * check tensor meta between ranks * add more assert * all_reduce operate in place * all_reduce operate in place * fix bug * assert tensor.is_local * fix bug in scatter * add more assert * delete meta check * add pytorch comparison test * add pytorch comparison test * refine * add ONEFLOW_TEST_CPU_ONLY * fix bug from torch gloo Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> * Dev ivalue for cpp api (#6890) * add api tensor * refine * add nn.relu * refine * clean shape & refine relu test * support void* for from_blob * add multithreading relu test * refine test * refine * refine * add comment for __internal_tensor() * convert to copy_util * reformat * refine * add ivalue * refine directory structure * refine cpp api test * refine test * add ivalue * refine ivalue * refine ivalue * refine * refine * refine * refine Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> * default use cpu generator (#7001) * optimize reshape/slice/transpose functor (#6956) * optimize reshape/slice/transpose functor * update code according to reviewer's suggestion * judge negative dimension number besides -1 * judge negative shape value in view::Reshape * remove is_full_slice logic in SliceFunctor * update code according to yinggang's advice * move ordered permute judge to TransposeKernel * remove print sentence * abstract IsOrderedPermute func * support negative permute value in TransposeFunctor * delete tranpose_kernel optimization * Revert "delete tranpose_kernel optimization" This reverts commit e026434. * not return original tensor when reshape do nothing * simplify code * correct spell error Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> * fix IsContinuosSubspace error (#6968) * fix IsContinuosSubspace error * recover original IsContinuosSubspace code * add test case * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> * add cpu group deconve impl (#6980) * add cpu group deconv impl * remove useless lines * remove useless lines * add deconv2d import * add groups test * remove check_allclose=False * add tf_prelu * add cpu group deconv impl * remove useless lines * remove useless lines * add deconv2d * add groups test * remove check_allclose=False * add tf_prelu * auto format by CI * add deconv2d impl * add deconv2d impl * remove useless lines * add deconv2d in functional api * auto format by CI * auto format by CI * Add variable initial * Add variable initial * auto format by CI * add conv2d impl * add conv2d impl * auto format by CI * remove useless lines Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * Migrate the python layer logic of broadcastlike to functor (#7007) * Migrate the python layer logic of broadcastlike to functor * add var name Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> * Temporarily skip comm test cases (#7015) * Temporarily skip comm test cases * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> * Fix nd_sbp attribute type and set nd_sbp in random functors (#7017) * fix * fix compile Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> * Save Job to IR and load Job from IR (#6885) * save to ir * test * fix bugs * impl load and test * rm useless code * fix conflict * fix issues * JobOp * fix issues * fix test_fuse_tril_scale * fix test jit-outline-func * fix test_mlir_opt.py * save * fix ods gen for max and avg pool * rename oneflow to oneflow_foundation * fix files checks * refine * refine * refine * refine * refine * refine * refine * refine * refine * refine * refine * refine * refine * refine * refine * refine * auto format by CI * check in changes * refine * Update oneflow/ir/test/OneFlow/test_mlir_opt.py * Update oneflow/ir/include/OneFlow/OneFlowOps.td * refine includes * printer & parser & verifier * code tidy * tidy include * address review * rm duplicated GetDataTypeType * TensorSource trait Co-authored-by: jackalcooper <jackalcooper@gmail.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * Fix Simple CI linkage (#6986) * fix-simple-ci-linkage * refine * refine * fix * refine * refine * refine * refine * refien * refine * revert * refine * auto format by CI * refine * revert * refine Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * fix sbp when weight is optional (#6984) Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> * Feat from numpy (#7013) * feat(Tensor): support share memory with ndarray * test(FromNumpy): add test * enhancement test and add document * Fix merge error * fix bug in numpy c api * Fix(doctest): fix doctest error Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> * Add custom ShapeAttr in ODS (#7023) * add ShapeAttr * refine * fix doc * refine * fix (#7028) * Add linspace op (#7006) * add linspace op * refine doc * refine * fix comments * fix comment * auto format by CI * fix ci doc error Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> * fix fasterrcnn infer (#7014) * fix fasterrcnn infer * roi_align 0shape * refine * refine Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> * separate kernel state and cache (#6655) * support eager state except lazy dynamic Signed-off-by: daquexian <daquexian566@gmail.com> * modularize kernel contexts Signed-off-by: daquexian <daquexian566@gmail.com> * fix warning Signed-off-by: daquexian <daquexian566@gmail.com> * reformat Signed-off-by: daquexian <daquexian566@gmail.com> * remove duplicated license Signed-off-by: daquexian <daquexian566@gmail.com> * fix static check error Signed-off-by: daquexian <daquexian566@gmail.com> * make test gpu only Signed-off-by: daquexian <daquexian566@gmail.com> * temp Signed-off-by: daquexian <daquexian566@gmail.com> * revert opkernel context changes, align with master Signed-off-by: daquexian <daquexian566@gmail.com> * reformat Signed-off-by: daquexian <daquexian566@gmail.com> * refine cachecontext Signed-off-by: daquexian <daquexian566@gmail.com> * add separate cache context inferface, remove out-dated files Signed-off-by: daquexian <daquexian566@gmail.com> * add init and cache context aliases Signed-off-by: daquexian <daquexian566@gmail.com> * update eager kernel Signed-off-by: daquexian <daquexian566@gmail.com> * fix wrong AttrMayChanged value Signed-off-by: daquexian <daquexian566@gmail.com> * rename and add comment Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI * fix combined_margin_loss_kernel.cpp Signed-off-by: daquexian <daquexian566@gmail.com> * rename op_kernel_state_wrapper.h to op_kernel_wrapper.h Signed-off-by: daquexian <daquexian566@gmail.com> * rename more classes, fix old cache in stateful op kernel Signed-off-by: daquexian <daquexian566@gmail.com> * rename more classes Signed-off-by: daquexian <daquexian566@gmail.com> * may changed -> not changed Signed-off-by: daquexian <daquexian566@gmail.com> * optimize away genrepeatedbn Signed-off-by: daquexian <daquexian566@gmail.com> * reformat Signed-off-by: daquexian <daquexian566@gmail.com> * refine Signed-off-by: daquexian <daquexian566@gmail.com> * update stateful local opkernel, use Cache** if possible Signed-off-by: daquexian <daquexian566@gmail.com> * remove TensorDesc4ArgNameAndIndex base method Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI * fix clang-tidy error Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI * fix conv kernel bug Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI * fix group conv bug and fix warning Signed-off-by: daquexian <daquexian566@gmail.com> * fix avgpool error Signed-off-by: daquexian <daquexian566@gmail.com> * fix maxpool error Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI * respect flag in deconv cpu kernel, rename cache to cache_ptr Signed-off-by: daquexian <daquexian566@gmail.com> * fix compile error Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI * fix deconv cache bug Signed-off-by: daquexian <daquexian566@gmail.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: Li Xinqi <lixinqi2010@gmail.com> Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> * Add fully support for all datatype (#7025) * add fully support for all datatype * Use max array size * add clang-format off to maintain the matrix * fix format * remove redundant numpy dtype Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> * Migrate split python layer to functor (#7030) * Migrate split python layer to functor * modify dim Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> * Add add_sparse_optimizer for Graph (#6988) * add_sparse_optimizer * format * fix bug * refine new interface by discuss * auto format by CI * address review * correct syntax * correct error message * rm debug print * auto format by CI * fix cpu-only test Co-authored-by: XIE Xuan <xiexuanx2@gmail.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> * Refine RUN_CUDA_KERNEL (#7003) * Refine RUN_CUDA_KERNEL * Added LaunchConfig Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> * Support llvm in tree build (#6995) * refine * refine * refine * refine * add61 * refien * refine * refine * refine * refine * refien * refine * refine * refine * refine * refine * refine * refine * rm * revert * refine * refine * refine * refine * return_self_in_to_consistent_if_necessary (#7004) * return_self_in_to_consistent_if_necessary * fix error and add test case * skip cpu test * fix error Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> * Decouple ep and global (#7027) * Decouple ep and global * NOLINT * fix * fix import Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> * arange doc fix (#7035) Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> * add_consistency_check_in_consistent_tensor_set_data (#7002) * add_consistency_check_in_consistent_tensor_set_data * auto format by CI * minor fix * add just wrap Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> * [cmake] add liboneflow_cpp target (#7005) * add cmake changes for liboneflow_cpp.so Signed-off-by: daquexian <daquexian566@gmail.com> * add separate target for cpp api test Signed-off-by: daquexian <daquexian566@gmail.com> * add cpp api test in ci Signed-off-by: daquexian <daquexian566@gmail.com> * reverse the order of cudnn and cuda library Signed-off-by: daquexian <daquexian566@gmail.com> * update logic of BUILD_MONOLITHIC_LIBONEFLOW Signed-off-by: daquexian <daquexian566@gmail.com> * rename BUILD_MONOLITHIC_LIBONEFLOW to BUILD_MONOLITHIC_LIBONEFLOW_CPP_SO Signed-off-by: daquexian <daquexian566@gmail.com> * share lib directory in test container Signed-off-by: daquexian <daquexian566@gmail.com> * add github actions debug Signed-off-by: daquexian <daquexian566@gmail.com> * Revert "add github actions debug" This reverts commit 7d9aef6. * add upterm debug after exe test Signed-off-by: daquexian <daquexian566@gmail.com> * sleep after fail Signed-off-by: daquexian <daquexian566@gmail.com> * set LD_LIBRARY_PATH in yml for cpp api test exe Signed-off-by: daquexian <daquexian566@gmail.com> * sleep Signed-off-by: daquexian <daquexian566@gmail.com> * upload liboneflow_cpp.so Signed-off-by: daquexian <daquexian566@gmail.com> * modify cmake to trigger compilation Signed-off-by: daquexian <daquexian566@gmail.com> * remove sleep Signed-off-by: daquexian <daquexian566@gmail.com> * build cpp api in cpu mode Signed-off-by: daquexian <daquexian566@gmail.com> Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> * Fix CUDA 52 and add it to CI (#7031) * refine * refine * refine * refine * revert * fix * refine * refine * refine Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> * Add check of placement constructor (#6991) * add_check_of_placement_constructor * move CheckDeviceIdsIsValid to runtime * handle comment * fix error * fix error Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> * Fix(FromNumpy): fix bug in stride (#7042) Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> * add non virtual destructor back (#6999) Signed-off-by: daquexian <daquexian566@gmail.com> Co-authored-by: Houjiang Chen <chenhoujiangcug@gmail.com> * move python code to cpp: eye (#7036) * 80% Sbp signature left to finish * refine functional_api.yaml * 90% docstr left to update * refine * add sbp check * refine docs * auto format by CI * refine * refine docstr * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> * Fix l2norm block_size (#7044) Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> * fix undefined symbol: cudaGetDeviceCount (#7052) * fix_worker_orphan_process (#7048) * fix_worker_orphan_process * use SIGTERM instead * broadcast elemwise binary (#6871) * add * broadcast elementwise binary * fix * refine * fix * refine * refine * for compile * refine * refine * refine * refine * refine * revert kernels * revert kernel * refine * refine * refine * refine * nvcc thread to 4 Co-authored-by: ZZK <42901638+MARD1NO@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> * Source op per critical section (#6472) * backup code * EventRecord * auto format by CI * backup code * remove deprecated binary test cases * refactor valatile to atomic * add StreamType::InitInstructionStatusIf/StreamType::DeleteInstructionStatusIf * merge from branch profiling_nn_graph * address comments * EventRecordProvider * more comments for XXXStatusQuerier::SetLaunched * more comments for SharedEventRecord::Init * wait source op per critical section * rename a task_node.cpp * minor fix * backup code * fix compiler complaints * 1) remove AddCtrlEdgeBetweenSrcDstTickAndInputOutputInSameRank; 2) create CriticalSectionInstance buffers * fix compiler complaints * more profiler code * refactor vm preschedule * TryMoveFromWaitingToReady * revert flying_instruction_cnt * revert to single position to call DispatchInstruction * revert several code * reset instruction watermark * remove is_xxx_hook_empty * build with profiler * merge master * insert device ticks before and after critical sections * refactor register_num of cs_wait/cs_callback from 2 to 128 * fix static analysis complaints * fix complier complaints about JobBuilder::ParallelConf4OpName * Update oneflow/core/operator/critical_section_wait_tick_op.cpp Co-authored-by: daquexian <daquexian566@gmail.com> * address pr comments * add job example for InstructionsBuilder::LaunchLazyJob * address pr comments Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Co-authored-by: ouyangyu <xuanjiuye@gmail.com> Co-authored-by: daquexian <daquexian566@gmail.com> * More details of error of getting op matched sbp signature (#7077) * more details of error msg * minor change * address review comment * avoid namesake iterator * Module apply only once (#7055) * add once apply of param * apply once on buffer * test reuse var on module to * test resue var * rm useless test * finish test * refine test Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> * distributed test bugfix (#7057) * change spawn_shell to spawn_shell_and_check, sleep in script Signed-off-by: daquexian <daquexian566@gmail.com> * fix distributed test master addr Signed-off-by: daquexian <daquexian566@gmail.com> * remove sleep Signed-off-by: daquexian <daquexian566@gmail.com> * spawn_shell -> spawn_shell_ignoring_failure Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI * fix bug Signed-off-by: daquexian <daquexian566@gmail.com> * auto format by CI * fix the reversed logic Signed-off-by: daquexian <daquexian566@gmail.com> * improve error msg Signed-off-by: daquexian <daquexian566@gmail.com> * resolve name conflict of MASTER_ADDR Signed-off-by: daquexian <daquexian566@gmail.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> * fix promote_type matrix (#7066) Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> * fix chunk op dim=-1 bug (#7073) * fix chunk op dim=-1 bug * Update oneflow/core/functional/impl/array_functor.cpp Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com> * Update oneflow/core/functional/impl/array_functor.cpp Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com> Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> * Fix resource desc dump cudnn conf bug (#7038) * fix Resource::DumpCudnnConf * fix typo and error msg Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> * fix concat bug (#7075) * fix * support concat single input * Clean TensorNameScope after graph build (#7076) * Clear tensor name scope after graph build * Add test case of 2 graph caught same free eager tensor * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> * fix_abnormal_printing (#7099) * Fix bias add dropout fuse (#7081) * fix bias_add dropout fuse when p=0.0 * remove redundant op Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> * Support 1d to 2d eager boxing (#7083) * fix Resource::DumpCudnnConf * support_1d_to_2d_eager_boxing * rename stack to unflatten * add test case * of format * refine test case * Revert "fix Resource::DumpCudnnConf" This reverts commit f07278d. * support nd to 1d * add 2d to 1d test case Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> * Implement all User Ops with Op Schema (#7032) * add oneflow-tblgen: generate op schema (OpInterpCtx) from ods * cmake: add inja * tblgen: add oneflow_datatype * tblgen: use option cat * tblgen: fix error * tblgen: put impl in .cpp * tblgen: fix null attrs * tblgen: fix null ops * refine * refine * reifne * Refine op schema template and compilation * add base OpInterpCtx to finish compilation * fix * refine * fix * add custom infer code * generate op registrants automatically * refine * fix * update user op ods and fix shape attr * refine * refine * add custom code in op base * refine comments * add same_output_regst_num and infer * support declare hasxx * update op schema emitter * refine * emit output regist num * refine * refine * migrate acc op * migrate onerec_reader, ones_like, send, pack and padding ops * add has_sbp_signature_infer_fn * refine * migrate pad, parallel_cast, partial_fc and pooling ops * rm redundant has_device_infer_fn * migrate prelu, quantization, randperm, reduce and repeat ops * migrate reshape, reshape_like, roi_align, same_pad, selu and scalar related ops * back port * backport * migrate ops * refine * refine * refine * refine * add new op * fix llvm not found * fix mlir headers * fix mlir headers * fix llvm not found * irefine * mark override * fix merge * fix * fix * set op schema as obj lib to speed up * rewrite ops * add addn * add grdi * refien * add more def (#7051) * affine grid * refien * refine * refine * refine * fix * refien * refine * refine * refine * refine * refine * refien * refine * refine * refein * refine * refine * refine * refine * refien * refine * refine * refine * refien * refien * refien * refine * refine * refien * refine * refine * refine * refein * refine * refine * refine * refine * refine * refien * refine * refine * refine * refine * refine * refine * refine * refine * refine * refine * refine * refine * refine * refine * refine * refine * refine * refein * refine * refine * refine * move more ops * fix math_binary_broadcast/elementwise_ops * fix hardtanh * add norm * rename file and add CpuOnly no_grad * fix ir & fix norm op * fix oneflow-tblgen * fix math_unary_elementwise_op * fix norm * fix bn * fix op schema * refine * fix * refine physical_tensor_desc_infer_fn * refine * add ScalarLogicalNotEqualOp & RecvOp * refine * auto format by CI * fix fmt * add cuda only trait * delete unused inja * del inja_copy_headers_to_destination * delete unused inja * del inja_copy_headers_to_destination * add cuda only to tblgen * fix json inja url and md5 not used * fix json inja url and md5 not used * refine * revert * add with cuda * refine * delete GenUserOpODS * remove cuda only * revert cuda only after meeting * fix Co-authored-by: PragmaTwice <i@twice.moe> Co-authored-by: hjchen2 <chenhoujiangcug@gmail.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * Feat/debug pass (#7054) * add pass debug * debug pass * refine comment of fuse add pass * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> * Fix error message (#6930) * fix error message * fix dot doc * fix dot elem cnt * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> * fix simple ci: add of_op_schema target to tidy check (#7105) Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> * Rename AnyType in .td (#7109) * AnyType => Tensor * refine * refine Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> * Feat graph reuse var (#7080) * add once apply of param * apply once on buffer * test reuse var on module to * test resue var * rm useless test * finish test * refine test * Clear tensor name scope after graph build * Add test case of 2 graph caught same free eager tensor * auto format by CI * refactor var build draft * add full func; add check * done * add test of call parameter ousite its moudule * fix break test Co-authored-by: chengtbf <472491134@qq.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> * Fix l2_normalize & add nn.functional.normalize (#6940) * fix l2_normalize * add normalize * add test for normalize * refine * clean l2_normalize and refine normalize * simplify normalize test * Fix l2norm block_size * refine Co-authored-by: Juncheng <liujuncheng1022@gmail.com> Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> * Align api in swin transformer (#7058) * add linspace op * fix align error in swintransformer * add @ magic method * fix conflict * support tensor list * fix meshgrid bug * revert Co-authored-by: hjchen2 <chenhoujiangcug@gmail.com> * set CMAKE_LINK_DEPENDS_NO_SHARED to ON (#7063) Signed-off-by: daquexian <daquexian566@gmail.com> Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> * Add other api graph autotest (#7091) * Clear tensor name scope after graph build * Add test case of 2 graph caught same free eager tensor * auto format by CI * add other api graph autotest * add more samples * fix comments * refine * refine * refine * refine * refine * fix error * fix test error * fix bug * fix flip bug * fix bug * fix bug * fix ci bug * fix ci error * fix bug * fix ci error Co-authored-by: chengtbf <472491134@qq.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Co-authored-by: Li Xiang <54010254+lixiang007666@users.noreply.github.com> * [serving] dev graph run (#7008) * add cmake changes for liboneflow_cpp.so Signed-off-by: daquexian <daquexian566@gmail.com> * add separate target for cpp api test Signed-off-by: daquexian <daquexian566@gmail.com> * add cpp api test in ci Signed-off-by: daquexian <daquexian566@gmail.com> * graph run * reverse the order of cudnn and cuda library Signed-off-by: daquexian <daquexian566@gmail.com> * update logic of BUILD_MONOLITHIC_LIBONEFLOW Signed-off-by: daquexian <daquexian566@gmail.com> * rename BUILD_MONOLITHIC_LIBONEFLOW to BUILD_MONOLITHIC_LIBONEFLOW_CPP_SO Signed-off-by: daquexian <daquexian566@gmail.com> * refine * [draft] implement graph parameter load and save (#7010) * implement parameter save (python) and load (c++) Signed-off-by: daquexian <daquexian566@gmail.com> * revert accident changes Signed-off-by: daquexian <daquexian566@gmail.com> * fix circular reference Signed-off-by: daquexian <daquexian566@gmail.com> * pimpl * batching * share lib directory in test container Signed-off-by: daquexian <daquexian566@gmail.com> * fix typo; * add github actions debug Signed-off-by: daquexian <daquexian566@gmail.com> * Revert "add github actions debug" This reverts commit 7d9aef6. * add upterm debug after exe test Signed-off-by: daquexian <daquexian566@gmail.com> * sleep after fail Signed-off-by: daquexian <daquexian566@gmail.com> * set LD_LIBRARY_PATH in yml for cpp api test exe Signed-off-by: daquexian <daquexian566@gmail.com> * refine * add test file && input order * sleep Signed-off-by: daquexian <daquexian566@gmail.com> * upload liboneflow_cpp.so Signed-off-by: daquexian <daquexian566@gmail.com> * modify cmake to trigger compilation Signed-off-by: daquexian <daquexian566@gmail.com> * load job from ir && clean && add mlir model * [remove useless python code]save to .pb * add target of_common_obj to remove duplicate REGISTER_PASS && run of_format * remove openvino * remove openvino test * refine * IValue * Update oneflow/api/cpp/framework/graph.h Co-authored-by: daquexian <daquexian566@gmail.com> * refine * refine * refine * refine * refine * refine * rename in oneflow.cmake * refine oneflow.cmake * make of_api_common object library * move device util function in api to core * remove device check in New and ThreadLocalGetOrNew * refine * fix device test * refine graph test * refine GetExeDir() * refine GetExeDir() again * fix * refine * fix Co-authored-by: daquexian <daquexian566@gmail.com> Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Co-authored-by: mosout <mosout@qq.com> * disable autograd in lazy mode (#7070) * disable autograd in lazy mode * refine * Fix/rand source op in graph (#7092) * add test * fix rand consistent * add test * Fix powf (#7106) * quick fix power * add int scalar test case Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> * Dispatch stateful ops using functional api (#7046) * Dispatch functional stateful ops * fix * fix cmake * fix * disable attr check since it may not given when creating op expr. * fix * fix * fix * fix * fix * fix * fix * fix * refine Co-authored-by: VertexC <bob2420083992@gmail.com> * Fix HWLoc memory affinity (#7115) Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> * add_env_api_docs (#7100) * add_env_api_docs * minor fix * fix grammatical errors Co-authored-by: Yao Chi <later@usopp.net> Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> * tmp skip s0 print because of slice (#7065) * tmp skip s0 print because of slice * tmp skip s0 print in test case * fix Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> * indexing first version (#7012) * indexing first version * complete * test * out loop * test skip * revise * revise * shape * docs * formatted * confict1 * confict2 * confict2 * confict * revise * auto format by CI Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * fix maybe: add Maybe(T&&) to allow constructing from rvalue T (#7125) Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> * autotest_add_graph_log (#7126) * Meta info consistency check (#7085) * meta_info_consistency_check * refine check function * Update consistent_cast.cpp * move check to opinterpreter * refine * add note * refactor MetaInfoConsistencyCheck * of_format * refine * NonRecursiveMetaInfoConsistencyCheck * fix func name * add IsMetaInfoConsistencyCheckDisable() * mino fix * refine * minor fix * format * minor fix Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> * cmake: use interface target instead of include_directories in pybind11 (#7128) Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> * Import cmake dependence json and inja using FetchContent (#7124) * import cmake dependence json and inja using FetchContent * install-llvm: fix url hash * fix inja config * add cache var * fix ninja build * fix ninja build Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> * Add environment variable to set GRPC_ARG_MAX_MESSAGE_LENGTH (#7130) * env ONEFLOW_GRPC_MAX_MESSAGE_BYTE_SIZE * set default to -1 * Fea/nhwc (#6811) * legacy maxpool2d module * add legacy avgpool2d * add graph cudnn conv alg config * add conv2d nhwc * lazy create cuda_stream in CudaCopyD2HDeviceCtx CudaStreamHandleDeviceCtx * refine * conv bn pool nhwc for resnet perf * one hot with float * use BiasAddRowGpu * rm l2 with 0 * reformat * add nhwc env var * legacy pool merged into new * refine * fix style * fix and refine * address review * fix and refine * fix doc test Co-authored-by: luyang <flowingsun007@163.com> Co-authored-by: guo-ran <360112263@qq.com> Co-authored-by: lixinqi <lixinqi0703106@163.com> Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> * reduce memory usage caused by slice grad (#7144) * cmake: fix THIRD_PARTY build (#7146) Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> * fix fold op (#7156) Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> * Support inplace for lazy consistent (#7112) * Support inplace for lazy consistent * fix single client sbp hint * refine Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> * fix prelu bug (#7118) * support dtype and device in prelu * optimize PreluFunctor * fix prelu 1-dim error * update * update * auto format by CI Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * use ibn2nd_sbp to get nd_sbp (#7155) Co-authored-by: Houjiang Chen <chenhoujiangcug@gmail.com> * fix copy bug (#7159) * fix copy bug * add to test case * refine * fix test case Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> * Fix laynorm backward bug (#7164) * fix layernorm backward index bug * add layernorm test case * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> * [Fix] graph support 0-Size tensor (#6957) * Add nn.functional.glu graph test * add filter to motify functional autotest * motify code * add test example * add test else * add test judging condition for test_masked_fill.py,test_constant.py,test_tile.py、test_repeat.py,test_expand.py * add test ok example * Clear tensor name scope after graph build * Add test case of 2 graph caught same free eager tensor * auto format by CI * Dev cc clean tensor name scope (#7082) * Clear tensor name scope after graph build * Add test case of 2 graph caught same free eager tensor * auto format by CI Co-authored-by: chengtbf <472491134@qq.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * submit test success example * test success example * submit test code * fix a bug about relu module with 0 shape data * fixed a bug about relu module with 0 shape data * fix a bug about relu module with 0 shape data * fix a bug about relu module with 0 shape data * 0shape and 0d autotest * fix a bug about relu module with 0 shape data * 0shape changed to 0_size * modify test_var.py * modify test_eye.py * modify test_reshape.py * modify test_.py * modify ReshapeFunctor * modify some file * Fixed graph autotest bug with reshape op test * Fixed graph autotest bug with reshape op test * fixed test_sub.py * modify test_sub.py * modify tensor_methods.cpp * modify array_functor.cpp * graph support 0-Size tensor * rename 0shape to 0 size * modified check_graph=True * fix and refine Co-authored-by: Zhenhua <huangzhenhua@zhejianglab.com> Co-authored-by: tangnana925 <85614052+tangnana925@users.noreply.github.com> Co-authored-by: tangnana <tnn_personal@163.com> Co-authored-by: Zhenhua <1209435+hengzi@users.noreply.github.com> Co-authored-by: chengtbf <472491134@qq.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: Xiaoyu Xu <xiaoyulink@gmail.com> Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> * Cumsum op implementation (#7050) * add cumsum op's forward definition * add cumsum forward test case * cumsum ver3 * remove calculating time * add cumsum forward gpu implementation * fix gpu forward error * change var name * remove annotation * add cumsum cpu forward multi-thread support * add multi-thread annotation * add cumsum grad definition * update * add cumsum cpu backward * add cumsum cpu backward functor * add cumsum autograd * update * remove user interface * use random method to test cumsum forward * add cumsum gpu backward * add cumsum gpu test * fix gpu backward bug * add a 3d cuda kernel try * Revert "add cumsum gpu test" This reverts commit 05c31556ba28ecb827b25e54c2f5fa38984e8096. * Revert "Revert "add cumsum gpu test"" This reverts commit 918ee1569863b008c1d419c3528257416cffd840. * change nele to ele_cnt * add test_cumsum.py in oneflow/test/modules * change original test_cumsum to autotest version * optimize cumsum for special up_space and down_space * add two special cu func * add cumsum doc * update doc * update doc * update code according to bbuf's review * ditto * change pin/pout to in_ptr/out_ptr * remove multi-thread func * update doc * use tensor processor * update by review * update by review * update * update * auto format by CI * auto format by CI * update doc * update Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * Logical slice in tenosr str (#7116) * using logical slice in tensor str * add tensor str util file * refine * refine * refine * refine * add logical slice docs * fix bug * fix comment * auto format by CI * fix doc test bug * delete TODO Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> * Add install for oneflow py (#7107) * Add install for oneflow py * refine * refine * refine * refine * refine * refine * refine * refine * refien * refine * refine * refine * refine * refine * refine * refine * refine * refine * refine Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> * fix bug: output key not exists when SavaJobToIR (#7139) * fix bug: output key not exists when SavaJobToIR * [test] makedirs when path not exists * remove useless comment Co-authored-by: Peihong Liu <mosout@qq.com> Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> * Add linalg 2d norm op for clip_grad (#7160) * add linalg_2d_norm op for clip_grad * code format * revert sqrt * fix comment * refine * fix comment * fix ci error * fix ci error * fix docs bug * fix ci error * fix ci error Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> * refine nn.graph autotest (#7111) * add linspace op * refine graph autotest * revert * add graph error trace * fix bug * fix autotest bug * auto format by CI * fix set_printoptions error * auto format by CI * CI test bug * auto format by CI * For CI * auto format by CI * For CI test * fix ci error * revert for ci * fix bug * fix ci error * fix bug * fix bug Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: Li Xiang <54010254+lixiang007666@users.noreply.github.com> Co-authored-by: lixiang <88304454@qq.com> * add oneflow/pytorch cudnn.deterministic (#7172) * add cudnn.deterministic * fix bug * auto format by CI * fix bug * fix generate fake program input bug * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * fix linalg vector norm scalar tensor print bug (#7178) * fix linalg vector norm scalar tensor print bug * auto format by CI Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * format * refine * format Co-authored-by: Yinggang Wang <wyg19970408@gmail.com> Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: liufengwei0103 <2472937968@qq.com> Co-authored-by: daquexian <daquexian566@gmail.com> Co-authored-by: guo ran <360112263@qq.com> Co-authored-by: Peihong Liu <mosout@qq.com> Co-authored-by: ZZK <42901638+MARD1NO@users.noreply.github.com> Co-authored-by: Shenghang Tsai <jackalcooper@gmail.com> Co-authored-by: Li Xiang <54010254+lixiang007666@users.noreply.github.com> Co-authored-by: cheng cheng <472491134@qq.com> Co-authored-by: Li Xinqi <lixinqi2010@gmail.com> Co-authored-by: lichunyou <33850693+lcylcy@users.noreply.github.com> Co-authored-by: Luyang <flowingsun007@163.com> Co-authored-by: wyushun <wyushun@foxmail.com> Co-authored-by: zhu wang <33675639+olojuwin@users.noreply.github.com> Co-authored-by: leaves-zwx <kunta0932@gmail.com> Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com> Co-authored-by: Shijie <821898965@qq.com> Co-authored-by: XIE Xuan <xiexuanx2@gmail.com> Co-authored-by: Juncheng <liujuncheng1022@gmail.com> Co-authored-by: binbinHan <han_binbin@163.com> Co-authored-by: CHI LIU <42956025+thinksoso@users.noreply.github.com> Co-authored-by: Yao Chi <later@usopp.net> Co-authored-by: ouyangyu <xuanjiuye@gmail.com> Co-authored-by: Xiaoyu Xu <xiaoyulink@gmail.com> Co-authored-by: PragmaTwice <i@twice.moe> Co-authored-by: luqiang guo <702572275@qq.com> Co-authored-by: ZeKai Zhou <30856589+zzk0@users.noreply.github.com> Co-authored-by: VertexC <bob2420083992@gmail.com> Co-authored-by: lixinqi <lixinqi0703106@163.com> Co-authored-by: fengdaozhuo <52237830+grybd@users.noreply.github.com> Co-authored-by: Zhenhua <huangzhenhua@zhejianglab.com> Co-authored-by: tangnana925 <85614052+tangnana925@users.noreply.github.com> Co-authored-by: tangnana <tnn_personal@163.com> Co-authored-by: Zhenhua <1209435+hengzi@users.noreply.github.com> Co-authored-by: lixiang <88304454@qq.com>
Oneflow-Inc · Jan 4, 2022 · 322a36b · 322a36b
1 parent 9e59561
commit 322a36b
Show file tree

Hide file tree

Showing 815 changed files with 38,707 additions and 18,043 deletions.
diff --git a/.github/workflows/canary.yml b/.github/workflows/canary.yml
@@ -4,7 +4,7 @@ on:
   push:
     branches:
       - master
-      - add-canary-release
+      - add-support-clang-12
   workflow_dispatch:
     inputs:
       oneflow-ref:
@@ -43,7 +43,7 @@ jobs:
       - name: Checkout Oneflow-Inc/oneflow
         if: ${{ github.event.inputs.oneflow-ref == '' }}
         uses: actions/checkout@v2
-      - uses: Oneflow-Inc/get-oneflow@canary-release
+      - uses: Oneflow-Inc/get-oneflow@support-clang-12
         name: Build manylinux
         id: build-cuda
         with:

diff --git a/.github/workflows/simple.yml b/.github/workflows/simple.yml
@@ -50,7 +50,7 @@ jobs:
           cmake .. -C ../cmake/caches/international/cpu.cmake \
             -DCMAKE_BUILD_TYPE=Release \
             -DBUILD_TESTING=ON
-          cmake --build . -j$(nproc) --target oneflow_deps of_cfgobj of_protoobj of_functional_obj of_functional_tensor_obj
+          cmake --build . -j$(nproc) --target oneflow_deps of_cfgobj of_protoobj of_functional_obj of_functional_tensor_obj of_op_schema
       - name: Run clang-tidy for all translation units
         # use clang as compiler for correct compiler flags
         run: |
@@ -247,7 +247,7 @@ jobs:
           repository: Oneflow-Inc/conda-env
           ref: 30a7f00eb48ee9009d85a848e720823e5054c66b
           path: conda-env
-      - uses: Oneflow-Inc/get-oneflow@canary-release
+      - uses: Oneflow-Inc/get-oneflow@support-clang-12
         name: Build with gcc7
         if: ${{ matrix.build-type == 'gcc7'}}
         with:
@@ -256,7 +256,7 @@ jobs:
           oneflow-build-env: conda
           conda-env-file: conda-env/dev/gcc7/environment-v2.yml
           conda-env-name: oneflow-dev-gcc7-v2
-      - uses: Oneflow-Inc/get-oneflow@canary-release
+      - uses: Oneflow-Inc/get-oneflow@support-clang-12
         name: Build with clang10
         if: ${{ matrix.build-type == 'clang10'}}
         with:

diff --git a/.github/workflows/test.yml b/.github/workflows/test.yml
@@ -182,7 +182,7 @@ jobs:
         with:
           ref: ${{ github.event.pull_request.head.sha }}
           repository: ${{github.event.pull_request.head.repo.full_name}}
-      - uses: Oneflow-Inc/get-oneflow/cache-complete/matrix/build@canary-release
+      - uses: Oneflow-Inc/get-oneflow/cache-complete/matrix/build@support-clang-12
         name: find cache
         id: find-cache
         timeout-minutes: 5
@@ -228,7 +228,7 @@ jobs:
         with:
           ref: ${{ github.event.pull_request.head.sha }}
           repository: ${{github.event.pull_request.head.repo.full_name}}
-      - uses: Oneflow-Inc/get-oneflow/cache-complete@canary-release
+      - uses: Oneflow-Inc/get-oneflow/cache-complete@support-clang-12
         name: Save cache if successful
         id: save-cache
         timeout-minutes: 5
@@ -242,7 +242,7 @@ jobs:
         run: |
           echo "::error file=test.yml,line=204,col=10::steps.save-cache.outputs.cache-hit != matrix.cache-hit"
           exit 1
-      - uses: Oneflow-Inc/get-oneflow@canary-release
+      - uses: Oneflow-Inc/get-oneflow@support-clang-12
         name: Build manylinux cpu only
         id: build-cpu
         if: ${{ matrix.entry =='cpu' && !matrix.cache-hit }}
@@ -263,7 +263,7 @@ jobs:
           python-versions: |
             3.6
             3.7
-      - uses: Oneflow-Inc/get-oneflow@canary-release
+      - uses: Oneflow-Inc/get-oneflow@support-clang-12
         name: Build manylinux cu102
         id: build-cuda
         if: ${{ matrix.entry =='cu102' && !matrix.cache-hit }}
@@ -284,7 +284,7 @@ jobs:
           python-versions: |
             3.6
             3.7
-      - uses: Oneflow-Inc/get-oneflow@canary-release
+      - uses: Oneflow-Inc/get-oneflow@support-clang-12
         name: Build manylinux cu101_xla
         id: build-xla
         if: ${{ matrix.entry =='cu101_xla' && !matrix.cache-hit && needs.changed_files.outputs.should_run_single_client_tests == '1' }}
@@ -306,7 +306,7 @@ jobs:
             3.6
       - name: Upload bin
         if: ${{ !fromJson(matrix.cache-hit) && contains(matrix.runs-on, 'self-hosted') && (steps.build-cpu.outcome == 'success' || steps.build-cuda.outcome == 'success' || steps.build-xla.outcome == 'success') }}
-        uses: Oneflow-Inc/get-oneflow/digest/upload@canary-release
+        uses: Oneflow-Inc/get-oneflow/digest/upload@support-clang-12
         timeout-minutes: 10
         with:
           digest: ${{ steps.save-cache.outputs.build-digest }}
@@ -315,9 +315,20 @@ jobs:
           ssh-tank-path: ${{ env.SSH_TANK_PATH }}
           src-dir: ${{ env.MANYLINUX_CACHE_DIR }}/build/bin
           dst-dir: bin
+      - name: Upload liboneflow_cpp library
+        if: ${{ !fromJson(matrix.cache-hit) && contains(matrix.runs-on, 'self-hosted') && (steps.build-cpu.outcome == 'success' || steps.build-cuda.outcome == 'success') }}
+        uses: Oneflow-Inc/get-oneflow/digest/upload@support-clang-12
+        timeout-minutes: 10
+        with:
+          digest: ${{ steps.save-cache.outputs.build-digest }}
+          entry: ${{ matrix.entry }}
+          ssh-tank-host: ${{ env.SSH_TANK_HOST }}
+          ssh-tank-path: ${{ env.SSH_TANK_PATH }}
+          src-dir: ${{ env.MANYLINUX_CACHE_DIR }}/build/liboneflow_cpp/lib
+          dst-dir: liboneflow_cpp/lib
       - name: Upload whl
         if: ${{ !fromJson(matrix.cache-hit) && contains(matrix.runs-on, 'self-hosted') && (steps.build-cpu.outcome == 'success' || steps.build-cuda.outcome == 'success' || steps.build-xla.outcome == 'success') }}
-        uses: Oneflow-Inc/get-oneflow/digest/upload@canary-release
+        uses: Oneflow-Inc/get-oneflow/digest/upload@support-clang-12
         timeout-minutes: 10
         with:
           digest: ${{ steps.save-cache.outputs.build-digest }}
@@ -331,14 +342,19 @@ jobs:
     name: Build with clang
     if: github.event.pull_request.draft == false && github.base_ref == 'master' && contains(github.event.pull_request.requested_reviewers.*.login, 'oneflow-ci-bot')
     runs-on: [self-hosted, linux, build]
+    env:
+      ONEFLOW_SRC: .
+      MANYLINUX_CACHE_DIR: ~/manylinux-cache-dir/clang13
+      CUDA_VERSION: "10.1"
+      WHEELHOUSE_DIR: ./wheelhouse
     steps:
       - name: Fix permissions
         run: |
           set -x
           docker run --rm -v $PWD:$PWD -w $PWD busybox rm -rf *
       - name: Checkout Oneflow-Inc/oneflow
         uses: actions/checkout@v2
-      - uses: Oneflow-Inc/get-oneflow/cache-complete@canary-release
+      - uses: Oneflow-Inc/get-oneflow/cache-complete@support-clang-12
         name: Save cache if successful
         id: save-cache
         timeout-minutes: 5
@@ -347,25 +363,26 @@ jobs:
           entry: build-with-clang
           digest-type: build
           mark-as-completed: ${{ github.event.pull_request.head.repo.full_name == github.repository }}
-      - name: Checkout Oneflow-Inc/conda-env
-        if: ${{ !fromJSON(steps.save-cache.outputs.cache-hit) }}
-        uses: actions/checkout@v2
-        with:
-          repository: Oneflow-Inc/conda-env
-          ref: 30a7f00eb48ee9009d85a848e720823e5054c66b
-          path: conda-env
-      - uses: Oneflow-Inc/get-oneflow@canary-release
+      - name: Build with Clang
+        uses: Oneflow-Inc/get-oneflow@support-clang-12
         if: ${{ !fromJSON(steps.save-cache.outputs.cache-hit) }}
-        name: Build with clang10
         with:
-          cmake-init-cache: cmake/caches/ci/gh-hosted/cpu-clang.cmake
-          oneflow-src: .
-          oneflow-build-env: conda
-          conda-env-file: conda-env/dev/clang10/environment-v2.yml
-          conda-env-name: oneflow-dev-clang10-v2
-          conda-installer-url: https://oneflow-static.oss-cn-beijing.aliyuncs.com/downloads/conda-installers/Miniconda3-py39_4.10.3-Linux-x86_64.sh
-          conda-prefix: ~/miniconda3-prefixes/py39_4.10.3
+          cmake-init-cache: ${{ env.ONEFLOW_SRC }}/cmake/caches/ci/llvm/cuda-75-clang.cmake
+          build-script: ${{ env.ONEFLOW_SRC }}/ci/clang/build-llvm.sh
+          oneflow-src: ${{ env.ONEFLOW_SRC }}
+          oneflow-build-env: llvm
+          wheelhouse-dir: ${{ env.WHEELHOUSE_DIR }}
+          clear-wheelhouse-dir: true
           self-hosted: true
+          cuda-version: ${{ env.CUDA_VERSION }}
+          manylinux-cache-dir: ${{ env.MANYLINUX_CACHE_DIR }}
+          docker-run-use-system-http-proxy: false
+          docker-run-use-lld: false
+          retry-failed-build: true
+          clean-ccache: ${{ contains(github.event.pull_request.labels.*.name, 'need-clean-ccache') }}
+          wheel-audit: false
+          python-versions: |
+            3.8
 
   find-test-cache:
     name: "Find test cache"
@@ -382,7 +399,7 @@ jobs:
         with:
           ref: ${{ github.event.pull_request.head.sha }}
           repository: ${{github.event.pull_request.head.repo.full_name}}
-      - uses: Oneflow-Inc/get-oneflow/cache-complete/matrix/test@canary-release
+      - uses: Oneflow-Inc/get-oneflow/cache-complete/matrix/test@support-clang-12
         name: find cache
         id: find-cache
         timeout-minutes: 5
@@ -424,7 +441,7 @@ jobs:
         if: ${{ contains(matrix.runs-on, 'self-hosted') }}
         run: |
           docker rm -f ${{ env.TEST_CONTAINER_NAME }} || true
-      - uses: Oneflow-Inc/get-oneflow/cache-complete@canary-release
+      - uses: Oneflow-Inc/get-oneflow/cache-complete@support-clang-12
         name: Save cache if successful
         id: save-cache
         timeout-minutes: 5
@@ -438,9 +455,9 @@ jobs:
         run: |
           echo "::error file=test.yml,line=204,col=10::steps.save-cache.outputs.cache-hit != matrix.cache-hit"
           exit 1
-      - name: Download wheel and binary
+      - name: Download wheel, binary and liboneflow_cpp lib
         if: ${{ !fromJson(matrix.cache-hit) && contains(matrix.runs-on, 'self-hosted') && (!fromJson(matrix.is-xla) || (fromJson(matrix.is-xla) && needs.changed_files.outputs.should_run_single_client_tests == '1')) }}
-        uses: Oneflow-Inc/get-oneflow/digest/download@canary-release
+        uses: Oneflow-Inc/get-oneflow/digest/download@support-clang-12
         id: download-digest
         timeout-minutes: 10
         with:
@@ -492,13 +509,15 @@ jobs:
         working-directory: ${{ env.ONEFLOW_SRC }}
         env:
           ONEFLOW_BIN_PATH: ${{ steps.download-digest.outputs.entry-dir }}/bin
+          ONEFLOW_CPP_API_LIB_PATH: ${{ steps.download-digest.outputs.entry-dir }}/liboneflow_cpp/lib
         run: |
           docker run -d --rm --privileged --shm-size=8g \
             --cap-add=SYS_PTRACE --security-opt seccomp=unconfined \
             --runtime=nvidia \
             -v /dataset:/dataset:ro -v /model_zoo:/model_zoo:ro \
             -v ${ONEFLOW_WHEEL_PATH}:${ONEFLOW_WHEEL_PATH}:ro \
             -v ${ONEFLOW_BIN_PATH}:${ONEFLOW_BIN_PATH}:ro \
+            -v ${ONEFLOW_CPP_API_LIB_PATH}:${ONEFLOW_CPP_API_LIB_PATH}:ro \
             -v $HOME/test-container-cache/dot-local:/root/.local \
             -v $HOME/test-container-cache/dot-cache:/root/.cache \
             -e ONEFLOW_WHEEL_PATH=${ONEFLOW_WHEEL_PATH} \
@@ -527,11 +546,13 @@ jobs:
         run: |
           docker exec ${{ env.TEST_CONTAINER_NAME }} python3 -m oneflow --doctor
       - name: Exe test
-        if: ${{ !fromJson(matrix.cache-hit) && matrix.test-type == 'misc' }}
+        if: ${{ !fromJson(matrix.cache-hit) && matrix.test-type == 'misc' && matrix.device == 'cpu' }}
         timeout-minutes: 10
         run: |
           chmod +x ${{ steps.download-digest.outputs.entry-dir }}/bin/oneflow_testexe
           docker exec ${{ env.TEST_CONTAINER_NAME }} ${{ steps.download-digest.outputs.entry-dir }}/bin/oneflow_testexe
+          chmod +x ${{ steps.download-digest.outputs.entry-dir }}/bin/oneflow_cpp_api_testexe
+          docker exec -e LD_LIBRARY_PATH=${{ steps.download-digest.outputs.entry-dir }}/liboneflow_cpp/lib ${{ env.TEST_CONTAINER_NAME }} ${{ steps.download-digest.outputs.entry-dir }}/bin/oneflow_cpp_api_testexe
       - name: Build documentation
         timeout-minutes: 10
         if: ${{ !fromJson(matrix.cache-hit) && matrix.test-type == 'misc' && matrix.device == 'cpu' }}
@@ -744,7 +765,7 @@ jobs:
           ref: ${{ github.event.pull_request.head.sha }}
           repository: ${{github.event.pull_request.head.repo.full_name}}
           fetch-depth: 0
-      - uses: Oneflow-Inc/get-oneflow/cache-complete@canary-release
+      - uses: Oneflow-Inc/get-oneflow/cache-complete@support-clang-12
         name: Save cache if successful
         id: save-cache
         timeout-minutes: 5
@@ -785,7 +806,7 @@ jobs:
             -DBUILD_TESTING=ON \
             -DCMAKE_C_COMPILER_LAUNCHER=ccache \
             -DCMAKE_CXX_COMPILER_LAUNCHER=ccache
-          cmake --build . -j$(nproc) --target oneflow_deps of_cfgobj of_protoobj of_functional_obj of_functional_tensor_obj
+          cmake --build . -j$(nproc) --target oneflow_deps of_cfgobj of_protoobj of_functional_obj of_functional_tensor_obj of_op_schema
       - name: Fetch upstream
         if: ${{ !fromJSON(steps.save-cache.outputs.cache-hit) && github.event.pull_request.head.repo.full_name != github.event.pull_request.base.repo.full_name }}
         run: |

diff --git a/CMakeLists.txt b/CMakeLists.txt
@@ -1,6 +1,7 @@
 # Minimum CMake required
 cmake_minimum_required(VERSION 3.18.0)
 
+set(CMAKE_INSTALL_MESSAGE LAZY CACHE STRING "")
 if (NOT CMAKE_BUILD_TYPE)
   message(STATUS "No build type selected, default to Release")
   set(CMAKE_BUILD_TYPE "Release" CACHE STRING "Build type (default Release)" FORCE)
@@ -23,7 +24,8 @@ endif()
 option(USE_CLANG_FORMAT "" OFF)
 option(USE_CLANG_TIDY "" OFF)
 option(BUILD_PYTHON "" ON)
-option(BUILD_MONOLITHIC_LIBONEFLOW "" ON)
+option(BUILD_CPP_API "Option to build OneFlow C++ API (beta)" OFF)
+option(BUILD_MONOLITHIC_LIBONEFLOW_CPP_SO "Option to build a monolithic liboneflow_cpp.so (only meaningful when BUILD_CPP_API is ON)" ON)
 option(BUILD_RDMA "" OFF)
 option(BUILD_CUDA "" ON)
 option(WITH_ONEDNN "" OFF)
@@ -33,7 +35,12 @@ option(WITH_TENSORRT "Option to build with TensorRT" OFF)
 option(WITH_OPENVINO "Option to build with OpenVINO" OFF)
 option(WITH_MLIR "" OFF)
 option(WITH_MLIR_CUDA_CODEGEN "" OFF)
+set(LLVM_PROVIDER "in-tree" CACHE STRING "in-tree, install")
+if (NOT WITH_MLIR)
+  set(LLVM_PROVIDER "install" CACHE STRING "in-tree will build LLVM's ALL, not what we want when not building MLIR" FORCE)
+endif(NOT WITH_MLIR)
 option(WITH_COCOAPI "Option to build with COCO API" ON)
+option(WITH_ZLIB "" ON)
 option(BUILD_GIT_VERSION "" ON)
 option(BUILD_PROFILER "" OFF)
 option(OF_SOFTMAX_USE_FAST_MATH "" ON)
@@ -201,28 +208,22 @@ endif()
 
 if(BUILD_PYTHON)
   set(ONEFLOW_INCLUDE_DIR "${ONEFLOW_PYTHON_DIR}/oneflow/include")
-else() # build_python
-  set(ONEFLOW_INCLUDE_DIR "${PROJECT_BINARY_DIR}/liboneflow/include/oneflow")
-  set(ONEFLOW_LIBRARY_DIR "${PROJECT_BINARY_DIR}/liboneflow/lib")
-  set(ONEFLOW_SHARE_DIR "${PROJECT_BINARY_DIR}/liboneflow/share")
-  make_directory(${ONEFLOW_INCLUDE_DIR})
-  make_directory(${ONEFLOW_LIBRARY_DIR})
-  make_directory(${ONEFLOW_SHARE_DIR})
+endif(BUILD_PYTHON)
+
+if(BUILD_CPP_API)
+  set(LIBONEFLOW_LIBRARY_DIR "${PROJECT_BINARY_DIR}/liboneflow_cpp/lib")
+  set(LIBONEFLOW_SHARE_DIR "${PROJECT_BINARY_DIR}/liboneflow_cpp/share")
+  make_directory(${LIBONEFLOW_LIBRARY_DIR})
+  make_directory(${LIBONEFLOW_SHARE_DIR})
 
   if(BUILD_SHARED_LIBS)
-    if(BUILD_MONOLITHIC_LIBONEFLOW)
-      set(BUILD_SHARED_LIBS OFF)
+    if(BUILD_MONOLITHIC_LIBONEFLOW_CPP_SO)
+      message(FATAL_ERROR "BUILD_MONOLITHIC_LIBONEFLOW_CPP_SO is incompatible with BUILD_SHARED_LIBS. Please set either of them to OFF.")
     else()
-      set(LIBRARY_OUTPUT_PATH ${ONEFLOW_LIBRARY_DIR})
-    endif(BUILD_MONOLITHIC_LIBONEFLOW)
-    set(BUILD_SHARED_LIBONEFLOW ON)
-  else()
-    if(BUILD_MONOLITHIC_LIBONEFLOW)
-      message(WARNING "BUILD_MONOLITHIC_LIBONEFLOW=ON is meaningless when BUILD_SHARED_LIBS=OFF")
-    endif()
-    set(BUILD_SHARED_LIBONEFLOW OFF)
+      set(LIBRARY_OUTPUT_PATH ${LIBONEFLOW_LIBRARY_DIR})
+    endif(BUILD_MONOLITHIC_LIBONEFLOW_CPP_SO)
   endif(BUILD_SHARED_LIBS)
-endif(BUILD_PYTHON)
+endif(BUILD_CPP_API)
 
 include(third_party)
 
@@ -261,7 +262,7 @@ if (BUILD_CUDA)
 
   if ("${CMAKE_CUDA_COMPILER_ID}" STREQUAL "NVIDIA")
     if(CMAKE_CUDA_COMPILER_VERSION VERSION_GREATER_EQUAL "11.2")
-      set(CUDA_NVCC_THREADS_NUMBER "1" CACHE STRING "")
+      set(CUDA_NVCC_THREADS_NUMBER "4" CACHE STRING "")
       list(APPEND CUDA_NVCC_FLAGS -t ${CUDA_NVCC_THREADS_NUMBER})
     endif()
     message(STATUS "CUDA_NVCC_FLAGS: " ${CUDA_NVCC_FLAGS})
@@ -276,3 +277,4 @@ add_custom_target(oneflow_deps ALL DEPENDS prepare_oneflow_third_party)
 if (ONEFLOW)
   include(oneflow)
 endif()
+add_subdirectory(ci)
diff --git a/ci/CMakeLists.txt b/ci/CMakeLists.txt
@@ -0,0 +1 @@
+add_subdirectory(test)
diff --git a/ci/clang/build-llvm.sh b/ci/clang/build-llvm.sh
@@ -0,0 +1,28 @@
+set -ex
+export PATH=/usr/lib/llvm-12/bin:/usr/lib/llvm-13/bin:/usr/lib64/ccache:/root/.local/bin:$PATH
+
+# clean python dir
+cd ${ONEFLOW_CI_SRC_DIR}
+${ONEFLOW_CI_PYTHON_EXE} -m pip install -i https://mirrors.aliyun.com/pypi/simple --user -r ci/fixed-dev-requirements.txt
+cd python
+git clean -nXd -e \!dist -e \!dist/**
+git clean -fXd -e \!dist -e \!dist/**
+
+# cmake config
+mkdir -p ${ONEFLOW_CI_BUILD_DIR}
+cd ${ONEFLOW_CI_BUILD_DIR}
+find ${ONEFLOW_CI_BUILD_DIR} -name CMakeCache.txt
+find ${ONEFLOW_CI_BUILD_DIR} -name CMakeCache.txt -delete
+if [ ! -f "$ONEFLOW_CI_CMAKE_INIT_CACHE" ]; then
+    echo "$ONEFLOW_CI_CMAKE_INIT_CACHE does not exist."
+    exit 1
+fi
+cmake -S ${ONEFLOW_CI_SRC_DIR} -C ${ONEFLOW_CI_CMAKE_INIT_CACHE} -DPython3_EXECUTABLE=${ONEFLOW_CI_PYTHON_EXE}
+# cmake build
+cd ${ONEFLOW_CI_BUILD_DIR}
+cmake --build . -j $(nproc)
+
+# build pip
+cd ${ONEFLOW_CI_SRC_DIR}
+cd python
+${ONEFLOW_CI_PYTHON_EXE} setup.py bdist_wheel
diff --git a/ci/test/1node_op_test.sh b/ci/test/1node_op_test.sh
@@ -37,5 +37,3 @@ then
 else
     echo "deadlock unsolved, skipping multi-card eager"
 fi
-
-ONEFLOW_TEST_MULTI_PROCESS=1 python3 test/ops/test_multi_process.py --failfast --verbose
diff --git a/ci/test/2node_op_test_multi_client.sh b/ci/test/2node_op_test_multi_client.sh
@@ -17,7 +17,7 @@ cd ${test_tmp_dir}/$(basename $test_dir)
 
 for device_num in 1 2 4
 do
-    ONEFLOW_TEST_NODE_NUM=2 ONEFLOW_TEST_DEVICE_NUM=$device_num python3 -m oneflow.distributed.launch --nproc_per_node $device_num --nnodes=2 --node_rank=$NODE_RANK --master_addr 192.168.1.12 -m unittest discover ${PWD} --failfast --verbose
+    ONEFLOW_TEST_NODE_NUM=2 ONEFLOW_TEST_DEVICE_NUM=$device_num python3 -m oneflow.distributed.launch --nproc_per_node $device_num --nnodes=2 --node_rank=$NODE_RANK --master_addr $_MASTER_ADDR -m unittest discover ${PWD} --failfast --verbose
     # use a invalid ibverbs lib to test if falling back to epoll works
-    ONEFLOW_TEST_NODE_NUM=2 ONEFLOW_TEST_DEVICE_NUM=$device_num ONEFLOW_LIBIBVERBS_PATH=invalid_lib python3 -m oneflow.distributed.launch --nproc_per_node $device_num --nnodes=2 --node_rank=$NODE_RANK --master_addr 192.168.1.12 -m unittest discover ${PWD} --failfast --verbose
+    ONEFLOW_TEST_NODE_NUM=2 ONEFLOW_TEST_DEVICE_NUM=$device_num ONEFLOW_LIBIBVERBS_PATH=invalid_lib python3 -m oneflow.distributed.launch --nproc_per_node $device_num --nnodes=2 --node_rank=$NODE_RANK --master_addr $_MASTER_ADDR -m unittest discover ${PWD} --failfast --verbose
 done
-Original file line number
+Diff line change
@@ Expand Up / @@ -37,5 +37,3 @@ then @@
     else
         echo "deadlock unsolved, skipping multi-card eager"
     fi
-    ONEFLOW_TEST_MULTI_PROCESS=1 python3 test/ops/test_multi_process.py --failfast --verbose