Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Module apply only once #7055

Merged
merged 11 commits into from
Dec 22, 2021
Merged

Module apply only once #7055

merged 11 commits into from
Dec 22, 2021

Conversation

strint
Copy link
Contributor

@strint strint commented Dec 20, 2021

支持那些复用Parameter的Module的to和to_consistent.

@strint strint marked this pull request as ready for review December 20, 2021 10:35
# A dict to store tensors that has already been applied.
# There is no need to apply multiple times on a same tensor.
if applied_dict is None:
applied_dict = dict()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

支持跨module的apply only once;

影响所有apply;

applied_dict[param] = param_applied
else:
# The parameter's data has already been set when it can use assign copy.
pass
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已经被改,无需再改

self._parameters[key] = new_param
applied_dict[param] = new_param
else:
self._parameters[key] = applied_dict[param]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

复用

if param.grad is not None:
assert param.grad.is_leaf
need_apply = False
if param not in applied_dict:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

没有被apply过的才做apply

super().__init__()
self.linear1 = flow.nn.Linear(3, 4)
self.linear2 = flow.nn.Linear(3, 4)
self.linear2.weight = self.linear1.weight
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

测试to

super().__init__()
self.linear1 = flow.nn.Linear(3, 4)
self.linear2 = flow.nn.Linear(3, 4)
self.linear2.weight = self.linear1.weight
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

测试to_consistent

@oneflow-ci-bot oneflow-ci-bot requested review from oneflow-ci-bot and removed request for oneflow-ci-bot December 22, 2021 13:16
@github-actions
Copy link
Contributor

Speed stats:
GPU Name: GeForce GTX 1080 

OneFlow resnet50 time: 137.2ms (= 13724.3ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 140.4ms (= 14036.0ms / 100, input_shape=[16, 3, 224, 224])
✔️ Relative speed: 1.02 (= 140.4ms / 137.2ms)

OneFlow resnet50 time: 80.3ms (= 8025.1ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 85.8ms (= 8580.7ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.07 (= 85.8ms / 80.3ms)

OneFlow resnet50 time: 54.5ms (= 10899.9ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 55.9ms (= 11174.2ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.03 (= 55.9ms / 54.5ms)

OneFlow resnet50 time: 41.3ms (= 8253.6ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 48.1ms (= 9629.8ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.17 (= 48.1ms / 41.3ms)

OneFlow resnet50 time: 37.4ms (= 7483.7ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 42.0ms (= 8390.6ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.12 (= 42.0ms / 37.4ms)

OneFlow resnet50 time: 155.5ms (= 15549.9ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 159.1ms (= 15907.3ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.02 (= 159.1ms / 155.5ms)

OneFlow resnet50 time: 94.6ms (= 9462.7ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 106.1ms (= 10605.8ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.12 (= 106.1ms / 94.6ms)

OneFlow resnet50 time: 68.8ms (= 13753.9ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 74.7ms (= 14934.2ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.09 (= 74.7ms / 68.8ms)

OneFlow resnet50 time: 64.3ms (= 12859.7ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 65.0ms (= 12998.5ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.01 (= 65.0ms / 64.3ms)

OneFlow resnet50 time: 71.5ms (= 14309.8ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 63.4ms (= 12680.8ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 0.89 (= 63.4ms / 71.5ms)

@oneflow-ci-bot oneflow-ci-bot removed their request for review December 22, 2021 14:23
@oneflow-ci-bot oneflow-ci-bot merged commit 8c9cdd3 into master Dec 22, 2021
@oneflow-ci-bot oneflow-ci-bot deleted the feat/to_consistent_once branch December 22, 2021 14:29
hjchen2 added a commit that referenced this pull request Jan 4, 2022
* Support save/load for lr_scheduler (#6948)

* feat(LrScheduler): support save/load for lr_scheduler

* refine document

* auto format by CI

* Refine test

* auto format by CI

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>

* Fix eye_op attr (#6973)

* fix

* add graph test

* Update python/oneflow/test/graph/test_graph_eye.py

Co-authored-by: daquexian <daquexian566@gmail.com>

* refine

* Update python/oneflow/test/graph/test_graph_eye.py

Co-authored-by: daquexian <daquexian566@gmail.com>

* auto format by CI

Co-authored-by: daquexian <daquexian566@gmail.com>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>

* softmax double use uncached impl to accelerate compile (#6992)

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Add [[nodiscard]] for cpp api (#6997)

* add [[nodiscard]]

* refine

* reformat

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Support Arange delta to decide dtype (#6998)

* support delta dtype to decide output dtype

* add more unittest

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Add clang as CUDA FE compiler in CI (#6954)

* update action use

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* fix

* add 80 and 86

* refine

* refine

* add CUDA_NVCC_THREADS_NUMBER

* refine

* address review

* set CUDA_NVCC_THREADS_NUMBER 8

* fix

* fix clang in init cmake

* add script

* refine

* refine

* refine

* refine

* refine

* refien

* refine

* add flags to skip zlib

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* Migrate chunk python layer to functor (#6983)

* Migrate chunk Python layer logic to functor

* fix runtime

* Fix splits bug and CI

* Modify push to emplace

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Reduce memory usage when compiling oneflow dialect ops (#7000)

* CudaAllocator device reset before OOM (#6976)

* CudaAllocator device reset before OOM

* Add NOTE

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Refactor vm stream desc (#6989)

* remove StreamDesc::num_machines

* Prepare one thread for one stream_type

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Add Diagonal Op (#6016)

* format complete

* python to cpp

* py2cpp error

* rm

* auto format by CI

* revise

* auto format by CI

* license

* docstring

* docstring

* tensor

* tensor attribute

* auto format by CI

* docstring

* revise

* test

* revise

* revise

* rename

* half

* docs

* doc,test

* test times

* revise

* format

Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* add all to all op (#6283)

* add all to all op

* add barrier

* format

* add import

* fix test

* delete barrier

* delete barrier

* Revert "delete barrier"

This reverts commit aa397ea.

* Revert "delete barrier"

This reverts commit 7ddf79a.

* check tensor meta between ranks

* add more assert

* all_reduce operate in place

* all_reduce operate in place

* fix bug

* assert tensor.is_local

* fix bug in scatter

* add more assert

* delete meta check

* add pytorch comparison test

* add pytorch comparison test

* refine

* add ONEFLOW_TEST_CPU_ONLY

* fix bug from torch gloo

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Dev ivalue for cpp api (#6890)

* add api tensor

* refine

* add nn.relu

* refine

* clean shape & refine relu test

* support void* for from_blob

* add multithreading relu test

* refine test

* refine

* refine

* add comment for __internal_tensor()

* convert to copy_util

* reformat

* refine

* add ivalue

* refine directory structure

* refine cpp api test

* refine test

* add ivalue

* refine ivalue

* refine ivalue

* refine

* refine

* refine

* refine

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* default use cpu generator (#7001)

* optimize reshape/slice/transpose functor (#6956)

* optimize reshape/slice/transpose functor

* update code according to reviewer's suggestion

* judge negative dimension number besides -1

* judge negative shape value in view::Reshape

* remove is_full_slice logic in SliceFunctor

* update code according to yinggang's advice

* move ordered permute judge to TransposeKernel

* remove print sentence

* abstract IsOrderedPermute func

* support negative permute value in TransposeFunctor

* delete tranpose_kernel optimization

* Revert "delete tranpose_kernel optimization"

This reverts commit e026434.

* not return original tensor when reshape do nothing

* simplify code

* correct spell error

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* fix IsContinuosSubspace error (#6968)

* fix IsContinuosSubspace error

* recover original IsContinuosSubspace code

* add test case

* auto format by CI

Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* add cpu group deconve impl (#6980)

* add cpu group deconv impl

* remove useless lines

* remove useless lines

* add deconv2d import

* add groups test

* remove check_allclose=False

* add tf_prelu

* add cpu group deconv impl

* remove useless lines

* remove useless lines

* add deconv2d

* add groups test

* remove check_allclose=False

* add tf_prelu

* auto format by CI

* add deconv2d impl

* add deconv2d impl

* remove useless lines

* add deconv2d in functional api

* auto format by CI

* auto format by CI

* Add variable initial

* Add variable initial

* auto format by CI

* add conv2d impl

* add conv2d impl

* auto format by CI

* remove useless lines

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>

* Migrate the python layer logic of broadcastlike to functor (#7007)

* Migrate the python layer logic of broadcastlike to functor

* add var name

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Temporarily skip comm test cases (#7015)

* Temporarily skip comm test cases

* auto format by CI

Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Fix nd_sbp attribute type and set nd_sbp in random functors (#7017)

* fix

* fix compile

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Save Job to IR and load Job from IR (#6885)

* save to ir

* test

* fix bugs

* impl load and test

* rm useless code

* fix conflict

* fix issues

* JobOp

* fix issues

* fix test_fuse_tril_scale

* fix test jit-outline-func

* fix test_mlir_opt.py

* save

* fix ods gen for max and avg pool

* rename oneflow to oneflow_foundation

* fix files checks

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* auto format by CI

* check in changes

* refine

* Update oneflow/ir/test/OneFlow/test_mlir_opt.py

* Update oneflow/ir/include/OneFlow/OneFlowOps.td

* refine includes

* printer & parser & verifier

* code tidy

* tidy include

* address review

* rm duplicated GetDataTypeType

* TensorSource trait

Co-authored-by: jackalcooper <jackalcooper@gmail.com>
Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>

* Fix Simple CI linkage (#6986)

* fix-simple-ci-linkage

* refine

* refine

* fix

* refine

* refine

* refine

* refine

* refien

* refine

* revert

* refine

* auto format by CI

* refine

* revert

* refine

Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>

* fix sbp when weight is optional (#6984)

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Feat from numpy (#7013)

* feat(Tensor): support share memory with ndarray

* test(FromNumpy): add test

* enhancement test and add document

* Fix merge error

* fix bug in numpy c api

* Fix(doctest): fix doctest error

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Add custom ShapeAttr in ODS (#7023)

* add ShapeAttr

* refine

* fix doc

* refine

* fix (#7028)

* Add linspace op (#7006)

* add linspace op

* refine doc

* refine

* fix comments

* fix comment

* auto format by CI

* fix ci doc error

Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* fix fasterrcnn infer (#7014)

* fix fasterrcnn infer

* roi_align 0shape

* refine

* refine

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* separate kernel state and cache (#6655)

* support eager state except lazy dynamic

Signed-off-by: daquexian <daquexian566@gmail.com>

* modularize kernel contexts

Signed-off-by: daquexian <daquexian566@gmail.com>

* fix warning

Signed-off-by: daquexian <daquexian566@gmail.com>

* reformat

Signed-off-by: daquexian <daquexian566@gmail.com>

* remove duplicated license

Signed-off-by: daquexian <daquexian566@gmail.com>

* fix static check error

Signed-off-by: daquexian <daquexian566@gmail.com>

* make test gpu only

Signed-off-by: daquexian <daquexian566@gmail.com>

* temp

Signed-off-by: daquexian <daquexian566@gmail.com>

* revert opkernel context changes, align with master

Signed-off-by: daquexian <daquexian566@gmail.com>

* reformat

Signed-off-by: daquexian <daquexian566@gmail.com>

* refine cachecontext

Signed-off-by: daquexian <daquexian566@gmail.com>

* add separate cache context inferface, remove out-dated files

Signed-off-by: daquexian <daquexian566@gmail.com>

* add init and cache context aliases

Signed-off-by: daquexian <daquexian566@gmail.com>

* update eager kernel

Signed-off-by: daquexian <daquexian566@gmail.com>

* fix wrong AttrMayChanged value

Signed-off-by: daquexian <daquexian566@gmail.com>

* rename and add comment

Signed-off-by: daquexian <daquexian566@gmail.com>

* auto format by CI

* fix combined_margin_loss_kernel.cpp

Signed-off-by: daquexian <daquexian566@gmail.com>

* rename op_kernel_state_wrapper.h to op_kernel_wrapper.h

Signed-off-by: daquexian <daquexian566@gmail.com>

* rename more classes, fix old cache in stateful op kernel

Signed-off-by: daquexian <daquexian566@gmail.com>

* rename more classes

Signed-off-by: daquexian <daquexian566@gmail.com>

* may changed -> not changed

Signed-off-by: daquexian <daquexian566@gmail.com>

* optimize away genrepeatedbn

Signed-off-by: daquexian <daquexian566@gmail.com>

* reformat

Signed-off-by: daquexian <daquexian566@gmail.com>

* refine

Signed-off-by: daquexian <daquexian566@gmail.com>

* update stateful local opkernel, use Cache** if possible

Signed-off-by: daquexian <daquexian566@gmail.com>

* remove TensorDesc4ArgNameAndIndex base method

Signed-off-by: daquexian <daquexian566@gmail.com>

* auto format by CI

* fix clang-tidy error

Signed-off-by: daquexian <daquexian566@gmail.com>

* auto format by CI

* fix conv kernel bug

Signed-off-by: daquexian <daquexian566@gmail.com>

* auto format by CI

* fix group conv bug and fix warning

Signed-off-by: daquexian <daquexian566@gmail.com>

* fix avgpool error

Signed-off-by: daquexian <daquexian566@gmail.com>

* fix maxpool error

Signed-off-by: daquexian <daquexian566@gmail.com>

* auto format by CI

* respect flag in deconv cpu kernel, rename cache to cache_ptr

Signed-off-by: daquexian <daquexian566@gmail.com>

* fix compile error

Signed-off-by: daquexian <daquexian566@gmail.com>

* auto format by CI

* fix deconv cache bug

Signed-off-by: daquexian <daquexian566@gmail.com>

Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: Li Xinqi <lixinqi2010@gmail.com>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Add fully support for all datatype (#7025)

* add fully support for all datatype

* Use max array size

* add clang-format off to maintain the matrix

* fix format

* remove redundant numpy dtype

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Migrate split python layer to functor (#7030)

* Migrate split python layer to functor

* modify dim

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Add add_sparse_optimizer  for Graph (#6988)

* add_sparse_optimizer

* format

* fix bug

* refine new interface by discuss

* auto format by CI

* address review

* correct syntax

* correct error message

* rm debug print

* auto format by CI

* fix cpu-only test

Co-authored-by: XIE Xuan <xiexuanx2@gmail.com>
Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Refine RUN_CUDA_KERNEL (#7003)

* Refine RUN_CUDA_KERNEL

* Added LaunchConfig

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Support llvm in tree build (#6995)

* refine

* refine

* refine

* refine

* add61

* refien

* refine

* refine

* refine

* refine

* refien

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* rm

* revert

* refine

* refine

* refine

* refine

* return_self_in_to_consistent_if_necessary (#7004)

* return_self_in_to_consistent_if_necessary

* fix error and add test case

* skip cpu test

* fix error

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Decouple ep and global (#7027)

* Decouple ep and global

* NOLINT

* fix

* fix import

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* arange doc fix (#7035)

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* add_consistency_check_in_consistent_tensor_set_data (#7002)

* add_consistency_check_in_consistent_tensor_set_data

* auto format by CI

* minor fix

* add just wrap

Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* [cmake] add liboneflow_cpp target (#7005)

* add cmake changes for liboneflow_cpp.so

Signed-off-by: daquexian <daquexian566@gmail.com>

* add separate target for cpp api test

Signed-off-by: daquexian <daquexian566@gmail.com>

* add cpp api test in ci

Signed-off-by: daquexian <daquexian566@gmail.com>

* reverse the order of cudnn and cuda library

Signed-off-by: daquexian <daquexian566@gmail.com>

* update logic of BUILD_MONOLITHIC_LIBONEFLOW

Signed-off-by: daquexian <daquexian566@gmail.com>

* rename BUILD_MONOLITHIC_LIBONEFLOW to BUILD_MONOLITHIC_LIBONEFLOW_CPP_SO

Signed-off-by: daquexian <daquexian566@gmail.com>

* share lib directory in test container

Signed-off-by: daquexian <daquexian566@gmail.com>

* add github actions debug

Signed-off-by: daquexian <daquexian566@gmail.com>

* Revert "add github actions debug"

This reverts commit 7d9aef6.

* add upterm debug after exe test

Signed-off-by: daquexian <daquexian566@gmail.com>

* sleep after fail

Signed-off-by: daquexian <daquexian566@gmail.com>

* set LD_LIBRARY_PATH in yml for cpp api test exe

Signed-off-by: daquexian <daquexian566@gmail.com>

* sleep

Signed-off-by: daquexian <daquexian566@gmail.com>

* upload liboneflow_cpp.so

Signed-off-by: daquexian <daquexian566@gmail.com>

* modify cmake to trigger compilation

Signed-off-by: daquexian <daquexian566@gmail.com>

* remove sleep

Signed-off-by: daquexian <daquexian566@gmail.com>

* build cpp api in cpu mode

Signed-off-by: daquexian <daquexian566@gmail.com>

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Fix CUDA 52 and add it to CI (#7031)

* refine

* refine

* refine

* refine

* revert

* fix

* refine

* refine

* refine

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Add check of placement constructor (#6991)

* add_check_of_placement_constructor

* move CheckDeviceIdsIsValid to runtime

* handle comment

* fix error

* fix error

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Fix(FromNumpy): fix bug in stride (#7042)

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* add non virtual destructor back (#6999)

Signed-off-by: daquexian <daquexian566@gmail.com>

Co-authored-by: Houjiang Chen <chenhoujiangcug@gmail.com>

* move python code to cpp: eye (#7036)

* 80% Sbp signature left to finish

* refine functional_api.yaml

* 90% docstr left to update

* refine

* add sbp check

* refine docs

* auto format by CI

* refine

* refine docstr

* auto format by CI

Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Fix l2norm block_size (#7044)

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* fix undefined symbol: cudaGetDeviceCount (#7052)

* fix_worker_orphan_process (#7048)

* fix_worker_orphan_process

* use SIGTERM instead

* broadcast elemwise binary (#6871)

* add

* broadcast elementwise binary

* fix

* refine

* fix

* refine

* refine

* for compile

* refine

* refine

* refine

* refine

* refine

* revert kernels

* revert kernel

* refine

* refine

* refine

* refine

* nvcc thread to 4

Co-authored-by: ZZK <42901638+MARD1NO@users.noreply.github.com>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Source op per critical section (#6472)

* backup code

* EventRecord

* auto format by CI

* backup code

* remove deprecated binary test cases

* refactor valatile to atomic

* add StreamType::InitInstructionStatusIf/StreamType::DeleteInstructionStatusIf

* merge from branch profiling_nn_graph

* address comments

* EventRecordProvider

* more comments for XXXStatusQuerier::SetLaunched

* more comments for SharedEventRecord::Init

* wait source op per critical section

* rename a task_node.cpp

* minor fix

* backup code

* fix compiler complaints

* 1) remove AddCtrlEdgeBetweenSrcDstTickAndInputOutputInSameRank; 2) create CriticalSectionInstance buffers

* fix compiler complaints

* more profiler code

* refactor vm preschedule

* TryMoveFromWaitingToReady

* revert flying_instruction_cnt

* revert to single position to call DispatchInstruction

* revert several code

* reset instruction watermark

* remove is_xxx_hook_empty

* build with profiler

* merge master

* insert device ticks before and after critical sections

* refactor register_num of cs_wait/cs_callback from 2 to 128

* fix static analysis complaints

* fix complier complaints about JobBuilder::ParallelConf4OpName

* Update oneflow/core/operator/critical_section_wait_tick_op.cpp

Co-authored-by: daquexian <daquexian566@gmail.com>

* address pr comments

* add job example for InstructionsBuilder::LaunchLazyJob

* address pr comments

Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Co-authored-by: ouyangyu <xuanjiuye@gmail.com>
Co-authored-by: daquexian <daquexian566@gmail.com>

* More details of error of getting op matched sbp signature (#7077)

* more details of error msg

* minor change

* address review comment

* avoid namesake iterator

* Module apply only once (#7055)

* add once apply of param

* apply once on buffer

* test reuse var on module to

* test resue var

* rm useless test

* finish test

* refine test

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* distributed test bugfix (#7057)

* change spawn_shell to spawn_shell_and_check, sleep in script

Signed-off-by: daquexian <daquexian566@gmail.com>

* fix distributed test master addr

Signed-off-by: daquexian <daquexian566@gmail.com>

* remove sleep

Signed-off-by: daquexian <daquexian566@gmail.com>

* spawn_shell -> spawn_shell_ignoring_failure

Signed-off-by: daquexian <daquexian566@gmail.com>

* auto format by CI

* fix bug

Signed-off-by: daquexian <daquexian566@gmail.com>

* auto format by CI

* fix the reversed logic

Signed-off-by: daquexian <daquexian566@gmail.com>

* improve error msg

Signed-off-by: daquexian <daquexian566@gmail.com>

* resolve name conflict of MASTER_ADDR

Signed-off-by: daquexian <daquexian566@gmail.com>

Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* fix promote_type matrix (#7066)

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* fix chunk op dim=-1 bug (#7073)

* fix chunk op dim=-1 bug

* Update oneflow/core/functional/impl/array_functor.cpp

Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>

* Update oneflow/core/functional/impl/array_functor.cpp

Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>

Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Fix resource desc dump cudnn conf bug (#7038)

* fix Resource::DumpCudnnConf

* fix typo and error msg

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* fix concat bug (#7075)

* fix

* support concat single input

* Clean TensorNameScope after graph build (#7076)

* Clear tensor name scope after graph build

* Add test case of 2 graph caught same free eager tensor

* auto format by CI

Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* fix_abnormal_printing (#7099)

* Fix bias add dropout fuse (#7081)

* fix bias_add dropout fuse when p=0.0

* remove redundant op

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Support 1d to 2d eager boxing (#7083)

* fix Resource::DumpCudnnConf

* support_1d_to_2d_eager_boxing

* rename stack to unflatten

* add test case

* of format

* refine test case

* Revert "fix Resource::DumpCudnnConf"

This reverts commit f07278d.

* support nd to 1d

* add 2d to 1d test case

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Implement all User Ops with Op Schema (#7032)

* add oneflow-tblgen: generate op schema (OpInterpCtx) from ods

* cmake: add inja

* tblgen: add oneflow_datatype

* tblgen: use option cat

* tblgen: fix error

* tblgen: put impl in .cpp

* tblgen: fix null attrs

* tblgen: fix null ops

* refine

* refine

* reifne

* Refine op schema template and compilation

* add base OpInterpCtx to finish compilation

* fix

* refine

* fix

* add custom infer code

* generate op registrants automatically

* refine

* fix

* update user op ods and fix shape attr

* refine

* refine

* add custom code in op base

* refine comments

* add same_output_regst_num and infer

* support declare hasxx

* update op schema emitter

* refine

* emit output regist num

* refine

* refine

* migrate acc op

* migrate onerec_reader, ones_like, send, pack and padding ops

* add has_sbp_signature_infer_fn

* refine

* migrate pad, parallel_cast, partial_fc and pooling ops

* rm redundant has_device_infer_fn

* migrate prelu, quantization, randperm, reduce and repeat ops

* migrate reshape, reshape_like, roi_align, same_pad, selu and scalar related ops

* back port

* backport

* migrate ops

* refine

* refine

* refine

* refine

* add new op

* fix llvm not found

* fix mlir headers

* fix mlir headers

* fix llvm not found

* irefine

* mark override

* fix merge

* fix

* fix

* set op schema as obj lib to speed up

* rewrite ops

* add addn

* add grdi

* refien

* add more def (#7051)

* affine grid

* refien

* refine

* refine

* refine

* fix

* refien

* refine

* refine

* refine

* refine

* refine

* refien

* refine

* refine

* refein

* refine

* refine

* refine

* refine

* refien

* refine

* refine

* refine

* refien

* refien

* refien

* refine

* refine

* refien

* refine

* refine

* refine

* refein

* refine

* refine

* refine

* refine

* refine

* refien

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refein

* refine

* refine

* refine

* move more ops

* fix math_binary_broadcast/elementwise_ops

* fix hardtanh

* add norm

* rename file and add CpuOnly no_grad

* fix ir & fix norm op

* fix oneflow-tblgen

* fix math_unary_elementwise_op

* fix norm

* fix bn

* fix op schema

* refine

* fix

* refine physical_tensor_desc_infer_fn

* refine

* add ScalarLogicalNotEqualOp & RecvOp

* refine

* auto format by CI

* fix fmt

* add cuda only trait

* delete unused inja

* del inja_copy_headers_to_destination

* delete unused inja

* del inja_copy_headers_to_destination

* add cuda only to tblgen

* fix json inja url and md5 not used

* fix json inja url and md5 not used

* refine

* revert

* add with cuda

* refine

* delete GenUserOpODS

* remove cuda only

* revert cuda only after meeting

* fix

Co-authored-by: PragmaTwice <i@twice.moe>
Co-authored-by: hjchen2 <chenhoujiangcug@gmail.com>
Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>

* Feat/debug pass (#7054)

* add pass debug

* debug pass

* refine comment of fuse add pass

* auto format by CI

Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Fix error message (#6930)

* fix error message

* fix dot doc

* fix dot elem cnt

* auto format by CI

Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* fix simple ci: add of_op_schema target to tidy check (#7105)

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Rename AnyType in .td (#7109)

* AnyType => Tensor

* refine

* refine

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Feat graph reuse var (#7080)

* add once apply of param

* apply once on buffer

* test reuse var on module to

* test resue var

* rm useless test

* finish test

* refine test

* Clear tensor name scope after graph build

* Add test case of 2 graph caught same free eager tensor

* auto format by CI

* refactor var build draft

* add full func; add check

* done

* add test of call parameter ousite its moudule

* fix break test

Co-authored-by: chengtbf <472491134@qq.com>
Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Fix l2_normalize & add nn.functional.normalize (#6940)

* fix l2_normalize

* add normalize

* add test for normalize

* refine

* clean l2_normalize and refine normalize

* simplify normalize test

* Fix l2norm block_size

* refine

Co-authored-by: Juncheng <liujuncheng1022@gmail.com>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Align api in swin transformer (#7058)

* add linspace op

* fix align error in swintransformer

* add @ magic method

* fix conflict

* support tensor list

* fix meshgrid bug

* revert

Co-authored-by: hjchen2 <chenhoujiangcug@gmail.com>

* set CMAKE_LINK_DEPENDS_NO_SHARED to ON (#7063)

Signed-off-by: daquexian <daquexian566@gmail.com>

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Add other api graph autotest (#7091)

* Clear tensor name scope after graph build

* Add test case of 2 graph caught same free eager tensor

* auto format by CI

* add other api graph autotest

* add more samples

* fix comments

* refine

* refine

* refine

* refine

* refine

* fix error

* fix test error

* fix bug

* fix flip bug

* fix bug

* fix bug

* fix ci bug

* fix ci error

* fix bug

* fix ci error

Co-authored-by: chengtbf <472491134@qq.com>
Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Co-authored-by: Li Xiang <54010254+lixiang007666@users.noreply.github.com>

* [serving] dev graph run (#7008)

* add cmake changes for liboneflow_cpp.so

Signed-off-by: daquexian <daquexian566@gmail.com>

* add separate target for cpp api test

Signed-off-by: daquexian <daquexian566@gmail.com>

* add cpp api test in ci

Signed-off-by: daquexian <daquexian566@gmail.com>

* graph run

* reverse the order of cudnn and cuda library

Signed-off-by: daquexian <daquexian566@gmail.com>

* update logic of BUILD_MONOLITHIC_LIBONEFLOW

Signed-off-by: daquexian <daquexian566@gmail.com>

* rename BUILD_MONOLITHIC_LIBONEFLOW to BUILD_MONOLITHIC_LIBONEFLOW_CPP_SO

Signed-off-by: daquexian <daquexian566@gmail.com>

* refine

* [draft] implement graph parameter load and save (#7010)

* implement parameter save (python) and load (c++)

Signed-off-by: daquexian <daquexian566@gmail.com>

* revert accident changes

Signed-off-by: daquexian <daquexian566@gmail.com>

* fix circular reference

Signed-off-by: daquexian <daquexian566@gmail.com>

* pimpl

* batching

* share lib directory in test container

Signed-off-by: daquexian <daquexian566@gmail.com>

* fix typo;

* add github actions debug

Signed-off-by: daquexian <daquexian566@gmail.com>

* Revert "add github actions debug"

This reverts commit 7d9aef6.

* add upterm debug after exe test

Signed-off-by: daquexian <daquexian566@gmail.com>

* sleep after fail

Signed-off-by: daquexian <daquexian566@gmail.com>

* set LD_LIBRARY_PATH in yml for cpp api test exe

Signed-off-by: daquexian <daquexian566@gmail.com>

* refine

* add test file && input order

* sleep

Signed-off-by: daquexian <daquexian566@gmail.com>

* upload liboneflow_cpp.so

Signed-off-by: daquexian <daquexian566@gmail.com>

* modify cmake to trigger compilation

Signed-off-by: daquexian <daquexian566@gmail.com>

* load job from ir && clean && add mlir model

* [remove useless python code]save to .pb

* add target of_common_obj to remove duplicate REGISTER_PASS  && run of_format

* remove openvino

* remove openvino test

* refine

* IValue

* Update oneflow/api/cpp/framework/graph.h

Co-authored-by: daquexian <daquexian566@gmail.com>

* refine

* refine

* refine

* refine

* refine

* refine

* rename in oneflow.cmake

* refine oneflow.cmake

* make of_api_common object library

* move device util function in api to core

* remove device check in New and ThreadLocalGetOrNew

* refine

* fix device test

* refine graph test

* refine GetExeDir()

* refine GetExeDir() again

* fix

* refine

* fix

Co-authored-by: daquexian <daquexian566@gmail.com>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Co-authored-by: mosout <mosout@qq.com>

* disable autograd in lazy mode (#7070)

* disable autograd in lazy mode

* refine

* Fix/rand source op in graph (#7092)

* add test

* fix rand consistent

* add test

* Fix powf (#7106)

* quick fix power

* add int scalar test case

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Dispatch stateful ops using functional api (#7046)

* Dispatch functional stateful ops

* fix

* fix cmake

* fix

* disable attr check since it may not given when creating op expr.

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* refine

Co-authored-by: VertexC <bob2420083992@gmail.com>

* Fix HWLoc memory affinity (#7115)

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* add_env_api_docs (#7100)

* add_env_api_docs

* minor fix

* fix grammatical errors

Co-authored-by: Yao Chi <later@usopp.net>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* tmp skip s0 print because of slice (#7065)

* tmp skip s0 print because of slice

* tmp skip s0 print in test case

* fix

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* indexing first version (#7012)

* indexing first version

* complete

* test

* out loop

* test skip

* revise

* revise

* shape

* docs

* formatted

* confict1

* confict2

* confict2

* confict

* revise

* auto format by CI

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>

* fix maybe: add Maybe(T&&) to allow constructing from rvalue T (#7125)

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* autotest_add_graph_log (#7126)

* Meta info consistency check (#7085)

* meta_info_consistency_check

* refine check function

* Update consistent_cast.cpp

* move check to opinterpreter

* refine

* add note

* refactor MetaInfoConsistencyCheck

* of_format

* refine

* NonRecursiveMetaInfoConsistencyCheck

* fix func name

* add IsMetaInfoConsistencyCheckDisable()

* mino fix

* refine

* minor fix

* format

* minor fix

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* cmake: use interface target instead of include_directories in pybind11 (#7128)

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Import cmake dependence json and inja using FetchContent (#7124)

* import cmake dependence json and inja using FetchContent

* install-llvm: fix url hash

* fix inja config

* add cache var

* fix ninja build

* fix ninja build

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Add environment variable to set GRPC_ARG_MAX_MESSAGE_LENGTH (#7130)

* env ONEFLOW_GRPC_MAX_MESSAGE_BYTE_SIZE

* set default to -1

* Fea/nhwc (#6811)

* legacy maxpool2d module

* add legacy avgpool2d

* add graph cudnn conv alg config

* add conv2d nhwc

* lazy create cuda_stream in CudaCopyD2HDeviceCtx CudaStreamHandleDeviceCtx

* refine

* conv bn pool nhwc for resnet perf

* one hot with float

* use BiasAddRowGpu

* rm l2 with 0

* reformat

* add nhwc env var

* legacy pool merged into new

* refine

* fix style

* fix and refine

* address review

* fix and refine

* fix doc test

Co-authored-by: luyang <flowingsun007@163.com>
Co-authored-by: guo-ran <360112263@qq.com>
Co-authored-by: lixinqi <lixinqi0703106@163.com>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* reduce memory usage caused by slice grad (#7144)

* cmake: fix THIRD_PARTY build (#7146)

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* fix fold op (#7156)

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Support inplace for lazy consistent (#7112)

* Support inplace for lazy consistent

* fix single client sbp hint

* refine

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* fix prelu bug (#7118)

* support dtype and device in prelu

* optimize PreluFunctor

* fix prelu 1-dim error

* update

* update

* auto format by CI

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>

* use ibn2nd_sbp to get nd_sbp (#7155)

Co-authored-by: Houjiang Chen <chenhoujiangcug@gmail.com>

* fix copy bug (#7159)

* fix copy bug

* add to test case

* refine

* fix test case

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Fix laynorm backward bug (#7164)

* fix layernorm backward index bug

* add layernorm test case

* auto format by CI

Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* [Fix] graph support 0-Size tensor (#6957)

* Add nn.functional.glu graph test

* add filter to motify functional autotest

* motify code

* add test example

* add test else

* add test judging condition for test_masked_fill.py,test_constant.py,test_tile.py、test_repeat.py,test_expand.py

* add test ok example

* Clear tensor name scope after graph build

* Add test case of 2 graph caught same free eager tensor

* auto format by CI

* Dev cc clean tensor name scope (#7082)

* Clear tensor name scope after graph build

* Add test case of 2 graph caught same free eager tensor

* auto format by CI

Co-authored-by: chengtbf <472491134@qq.com>
Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>

* submit test success example

* test success example

* submit test code

* fix a bug about relu module with 0 shape data

* fixed a bug about relu module with 0 shape data

* fix a bug about relu module with 0 shape data

* fix a bug about relu module with 0 shape data

* 0shape and 0d autotest

* fix a bug about relu module with 0 shape data

* 0shape changed to 0_size

* modify test_var.py

* modify test_eye.py

* modify test_reshape.py

* modify test_.py

* modify ReshapeFunctor

* modify some file

* Fixed graph autotest bug with reshape op test

* Fixed graph autotest bug with reshape op test

* fixed test_sub.py

* modify test_sub.py

* modify tensor_methods.cpp

* modify array_functor.cpp

* graph support 0-Size tensor

* rename 0shape to 0 size

* modified check_graph=True

* fix and refine

Co-authored-by: Zhenhua <huangzhenhua@zhejianglab.com>
Co-authored-by: tangnana925 <85614052+tangnana925@users.noreply.github.com>
Co-authored-by: tangnana <tnn_personal@163.com>
Co-authored-by: Zhenhua <1209435+hengzi@users.noreply.github.com>
Co-authored-by: chengtbf <472491134@qq.com>
Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: Xiaoyu Xu <xiaoyulink@gmail.com>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Cumsum op implementation (#7050)

* add cumsum op's forward definition

* add cumsum forward test case

* cumsum ver3

* remove calculating time

* add cumsum forward gpu implementation

* fix gpu forward error

* change var name

* remove annotation

* add cumsum cpu forward multi-thread support

* add multi-thread annotation

* add cumsum grad definition

* update

* add cumsum cpu backward

* add cumsum cpu backward functor

* add cumsum autograd

* update

* remove user interface

* use random method to test cumsum forward

* add cumsum gpu backward

* add cumsum gpu test

* fix gpu backward bug

* add a 3d cuda kernel try

* Revert "add cumsum gpu test"

This reverts commit 05c31556ba28ecb827b25e54c2f5fa38984e8096.

* Revert "Revert "add cumsum gpu test""

This reverts commit 918ee1569863b008c1d419c3528257416cffd840.

* change nele to ele_cnt

* add test_cumsum.py in oneflow/test/modules

* change original test_cumsum to autotest version

* optimize cumsum for special up_space and down_space

* add two special cu func

* add cumsum doc

* update doc

* update doc

* update code according to bbuf's review

* ditto

* change pin/pout to in_ptr/out_ptr

* remove multi-thread func

* update doc

* use tensor processor

* update by review

* update by review

* update

* update

* auto format by CI

* auto format by CI

* update doc

* update

Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>

* Logical slice in tenosr str (#7116)

* using logical slice in tensor str

* add tensor str util file

* refine

* refine

* refine

* refine

* add logical slice docs

* fix bug

* fix comment

* auto format by CI

* fix doc test bug

* delete TODO

Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Add install for oneflow py (#7107)

* Add install for oneflow py

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refien

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* fix bug: output key not exists when SavaJobToIR (#7139)

* fix bug: output key not exists when SavaJobToIR

* [test] makedirs when path not exists

* remove useless comment

Co-authored-by: Peihong Liu <mosout@qq.com>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Add linalg 2d norm op for clip_grad (#7160)

* add linalg_2d_norm op for clip_grad

* code format

* revert sqrt

* fix comment

* refine

* fix comment

* fix ci error

* fix ci error

* fix docs bug

* fix ci error

* fix ci error

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* refine nn.graph autotest (#7111)

* add linspace op

* refine graph autotest

* revert

* add graph error trace

* fix bug

* fix autotest bug

* auto format by CI

* fix set_printoptions error

* auto format by CI

* CI test bug

* auto format by CI

* For CI

* auto format by CI

* For CI test

* fix ci error

* revert for ci

* fix bug

* fix ci error

* fix bug

* fix bug

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: Li Xiang <54010254+lixiang007666@users.noreply.github.com>
Co-authored-by: lixiang <88304454@qq.com>

* add oneflow/pytorch cudnn.deterministic (#7172)

* add cudnn.deterministic

* fix bug

* auto format by CI

* fix bug

* fix generate fake program input bug

* auto format by CI

Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>

* fix linalg vector norm scalar tensor print bug (#7178)

* fix linalg vector norm scalar tensor print bug

* auto format by CI

Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>

* format

* refine

* format

Co-authored-by: Yinggang Wang <wyg19970408@gmail.com>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: liufengwei0103 <2472937968@qq.com>
Co-authored-by: daquexian <daquexian566@gmail.com>
Co-authored-by: guo ran <360112263@qq.com>
Co-authored-by: Peihong Liu <mosout@qq.com>
Co-authored-by: ZZK <42901638+MARD1NO@users.noreply.github.com>
Co-authored-by: Shenghang Tsai <jackalcooper@gmail.com>
Co-authored-by: Li Xiang <54010254+lixiang007666@users.noreply.github.com>
Co-authored-by: cheng cheng <472491134@qq.com>
Co-authored-by: Li Xinqi <lixinqi2010@gmail.com>
Co-authored-by: lichunyou <33850693+lcylcy@users.noreply.github.com>
Co-authored-by: Luyang <flowingsun007@163.com>
Co-authored-by: wyushun <wyushun@foxmail.com>
Co-authored-by: zhu wang <33675639+olojuwin@users.noreply.github.com>
Co-authored-by: leaves-zwx <kunta0932@gmail.com>
Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>
Co-authored-by: Shijie <821898965@qq.com>
Co-authored-by: XIE Xuan <xiexuanx2@gmail.com>
Co-authored-by: Juncheng <liujuncheng1022@gmail.com>
Co-authored-by: binbinHan <han_binbin@163.com>
Co-authored-by: CHI LIU <42956025+thinksoso@users.noreply.github.com>
Co-authored-by: Yao Chi <later@usopp.net>
Co-authored-by: ouyangyu <xuanjiuye@gmail.com>
Co-authored-by: Xiaoyu Xu <xiaoyulink@gmail.com>
Co-authored-by: PragmaTwice <i@twice.moe>
Co-authored-by: luqiang guo <702572275@qq.com>
Co-authored-by: ZeKai Zhou <30856589+zzk0@users.noreply.github.com>
Co-authored-by: VertexC <bob2420083992@gmail.com>
Co-authored-by: lixinqi <lixinqi0703106@163.com>
Co-authored-by: fengdaozhuo <52237830+grybd@users.noreply.github.com>
Co-authored-by: Zhenhua <huangzhenhua@zhejianglab.com>
Co-authored-by: tangnana925 <85614052+tangnana925@users.noreply.github.com>
Co-authored-by: tangnana <tnn_personal@163.com>
Co-authored-by: Zhenhua <1209435+hengzi@users.noreply.github.com>
Co-authored-by: lixiang <88304454@qq.com>
hjchen2 added a commit that referenced this pull request Jan 17, 2022
* layer_norm forward

* test case

* rm useless

* layer_norm backward dx

* layer norm param grad

* int count to T count

* fix

* fix T mask to int mask, refine code

* refine

* refine

* test case

* refine

* format

* fix

* add dtype bfloat16

* refine

* refine

* refine

* refine

* sum_loss to sum_stats

* x_buf to normalized_buf

* refine

* refine

* address review

* refine

* add testcase

* double use uncached impl to reduce compile time

* Fix python apis and xla implementation (#7183)

* Support save/load for lr_scheduler (#6948)

* feat(LrScheduler): support save/load for lr_scheduler

* refine document

* auto format by CI

* Refine test

* auto format by CI

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>

* Fix eye_op attr (#6973)

* fix

* add graph test

* Update python/oneflow/test/graph/test_graph_eye.py

Co-authored-by: daquexian <daquexian566@gmail.com>

* refine

* Update python/oneflow/test/graph/test_graph_eye.py

Co-authored-by: daquexian <daquexian566@gmail.com>

* auto format by CI

Co-authored-by: daquexian <daquexian566@gmail.com>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>

* softmax double use uncached impl to accelerate compile (#6992)

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Add [[nodiscard]] for cpp api (#6997)

* add [[nodiscard]]

* refine

* reformat

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Support Arange delta to decide dtype (#6998)

* support delta dtype to decide output dtype

* add more unittest

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Add clang as CUDA FE compiler in CI (#6954)

* update action use

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* fix

* add 80 and 86

* refine

* refine

* add CUDA_NVCC_THREADS_NUMBER

* refine

* address review

* set CUDA_NVCC_THREADS_NUMBER 8

* fix

* fix clang in init cmake

* add script

* refine

* refine

* refine

* refine

* refine

* refien

* refine

* add flags to skip zlib

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* Migrate chunk python layer to functor (#6983)

* Migrate chunk Python layer logic to functor

* fix runtime

* Fix splits bug and CI

* Modify push to emplace

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Reduce memory usage when compiling oneflow dialect ops (#7000)

* CudaAllocator device reset before OOM (#6976)

* CudaAllocator device reset before OOM

* Add NOTE

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Refactor vm stream desc (#6989)

* remove StreamDesc::num_machines

* Prepare one thread for one stream_type

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Add Diagonal Op (#6016)

* format complete

* python to cpp

* py2cpp error

* rm

* auto format by CI

* revise

* auto format by CI

* license

* docstring

* docstring

* tensor

* tensor attribute

* auto format by CI

* docstring

* revise

* test

* revise

* revise

* rename

* half

* docs

* doc,test

* test times

* revise

* format

Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* add all to all op (#6283)

* add all to all op

* add barrier

* format

* add import

* fix test

* delete barrier

* delete barrier

* Revert "delete barrier"

This reverts commit aa397ea.

* Revert "delete barrier"

This reverts commit 7ddf79a.

* check tensor meta between ranks

* add more assert

* all_reduce operate in place

* all_reduce operate in place

* fix bug

* assert tensor.is_local

* fix bug in scatter

* add more assert

* delete meta check

* add pytorch comparison test

* add pytorch comparison test

* refine

* add ONEFLOW_TEST_CPU_ONLY

* fix bug from torch gloo

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Dev ivalue for cpp api (#6890)

* add api tensor

* refine

* add nn.relu

* refine

* clean shape & refine relu test

* support void* for from_blob

* add multithreading relu test

* refine test

* refine

* refine

* add comment for __internal_tensor()

* convert to copy_util

* reformat

* refine

* add ivalue

* refine directory structure

* refine cpp api test

* refine test

* add ivalue

* refine ivalue

* refine ivalue

* refine

* refine

* refine

* refine

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* default use cpu generator (#7001)

* optimize reshape/slice/transpose functor (#6956)

* optimize reshape/slice/transpose functor

* update code according to reviewer's suggestion

* judge negative dimension number besides -1

* judge negative shape value in view::Reshape

* remove is_full_slice logic in SliceFunctor

* update code according to yinggang's advice

* move ordered permute judge to TransposeKernel

* remove print sentence

* abstract IsOrderedPermute func

* support negative permute value in TransposeFunctor

* delete tranpose_kernel optimization

* Revert "delete tranpose_kernel optimization"

This reverts commit e026434.

* not return original tensor when reshape do nothing

* simplify code

* correct spell error

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* fix IsContinuosSubspace error (#6968)

* fix IsContinuosSubspace error

* recover original IsContinuosSubspace code

* add test case

* auto format by CI

Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* add cpu group deconve impl (#6980)

* add cpu group deconv impl

* remove useless lines

* remove useless lines

* add deconv2d import

* add groups test

* remove check_allclose=False

* add tf_prelu

* add cpu group deconv impl

* remove useless lines

* remove useless lines

* add deconv2d

* add groups test

* remove check_allclose=False

* add tf_prelu

* auto format by CI

* add deconv2d impl

* add deconv2d impl

* remove useless lines

* add deconv2d in functional api

* auto format by CI

* auto format by CI

* Add variable initial

* Add variable initial

* auto format by CI

* add conv2d impl

* add conv2d impl

* auto format by CI

* remove useless lines

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>

* Migrate the python layer logic of broadcastlike to functor (#7007)

* Migrate the python layer logic of broadcastlike to functor

* add var name

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Temporarily skip comm test cases (#7015)

* Temporarily skip comm test cases

* auto format by CI

Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Fix nd_sbp attribute type and set nd_sbp in random functors (#7017)

* fix

* fix compile

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Save Job to IR and load Job from IR (#6885)

* save to ir

* test

* fix bugs

* impl load and test

* rm useless code

* fix conflict

* fix issues

* JobOp

* fix issues

* fix test_fuse_tril_scale

* fix test jit-outline-func

* fix test_mlir_opt.py

* save

* fix ods gen for max and avg pool

* rename oneflow to oneflow_foundation

* fix files checks

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* auto format by CI

* check in changes

* refine

* Update oneflow/ir/test/OneFlow/test_mlir_opt.py

* Update oneflow/ir/include/OneFlow/OneFlowOps.td

* refine includes

* printer & parser & verifier

* code tidy

* tidy include

* address review

* rm duplicated GetDataTypeType

* TensorSource trait

Co-authored-by: jackalcooper <jackalcooper@gmail.com>
Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>

* Fix Simple CI linkage (#6986)

* fix-simple-ci-linkage

* refine

* refine

* fix

* refine

* refine

* refine

* refine

* refien

* refine

* revert

* refine

* auto format by CI

* refine

* revert

* refine

Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>

* fix sbp when weight is optional (#6984)

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Feat from numpy (#7013)

* feat(Tensor): support share memory with ndarray

* test(FromNumpy): add test

* enhancement test and add document

* Fix merge error

* fix bug in numpy c api

* Fix(doctest): fix doctest error

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Add custom ShapeAttr in ODS (#7023)

* add ShapeAttr

* refine

* fix doc

* refine

* fix (#7028)

* Add linspace op (#7006)

* add linspace op

* refine doc

* refine

* fix comments

* fix comment

* auto format by CI

* fix ci doc error

Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* fix fasterrcnn infer (#7014)

* fix fasterrcnn infer

* roi_align 0shape

* refine

* refine

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* separate kernel state and cache (#6655)

* support eager state except lazy dynamic

Signed-off-by: daquexian <daquexian566@gmail.com>

* modularize kernel contexts

Signed-off-by: daquexian <daquexian566@gmail.com>

* fix warning

Signed-off-by: daquexian <daquexian566@gmail.com>

* reformat

Signed-off-by: daquexian <daquexian566@gmail.com>

* remove duplicated license

Signed-off-by: daquexian <daquexian566@gmail.com>

* fix static check error

Signed-off-by: daquexian <daquexian566@gmail.com>

* make test gpu only

Signed-off-by: daquexian <daquexian566@gmail.com>

* temp

Signed-off-by: daquexian <daquexian566@gmail.com>

* revert opkernel context changes, align with master

Signed-off-by: daquexian <daquexian566@gmail.com>

* reformat

Signed-off-by: daquexian <daquexian566@gmail.com>

* refine cachecontext

Signed-off-by: daquexian <daquexian566@gmail.com>

* add separate cache context inferface, remove out-dated files

Signed-off-by: daquexian <daquexian566@gmail.com>

* add init and cache context aliases

Signed-off-by: daquexian <daquexian566@gmail.com>

* update eager kernel

Signed-off-by: daquexian <daquexian566@gmail.com>

* fix wrong AttrMayChanged value

Signed-off-by: daquexian <daquexian566@gmail.com>

* rename and add comment

Signed-off-by: daquexian <daquexian566@gmail.com>

* auto format by CI

* fix combined_margin_loss_kernel.cpp

Signed-off-by: daquexian <daquexian566@gmail.com>

* rename op_kernel_state_wrapper.h to op_kernel_wrapper.h

Signed-off-by: daquexian <daquexian566@gmail.com>

* rename more classes, fix old cache in stateful op kernel

Signed-off-by: daquexian <daquexian566@gmail.com>

* rename more classes

Signed-off-by: daquexian <daquexian566@gmail.com>

* may changed -> not changed

Signed-off-by: daquexian <daquexian566@gmail.com>

* optimize away genrepeatedbn

Signed-off-by: daquexian <daquexian566@gmail.com>

* reformat

Signed-off-by: daquexian <daquexian566@gmail.com>

* refine

Signed-off-by: daquexian <daquexian566@gmail.com>

* update stateful local opkernel, use Cache** if possible

Signed-off-by: daquexian <daquexian566@gmail.com>

* remove TensorDesc4ArgNameAndIndex base method

Signed-off-by: daquexian <daquexian566@gmail.com>

* auto format by CI

* fix clang-tidy error

Signed-off-by: daquexian <daquexian566@gmail.com>

* auto format by CI

* fix conv kernel bug

Signed-off-by: daquexian <daquexian566@gmail.com>

* auto format by CI

* fix group conv bug and fix warning

Signed-off-by: daquexian <daquexian566@gmail.com>

* fix avgpool error

Signed-off-by: daquexian <daquexian566@gmail.com>

* fix maxpool error

Signed-off-by: daquexian <daquexian566@gmail.com>

* auto format by CI

* respect flag in deconv cpu kernel, rename cache to cache_ptr

Signed-off-by: daquexian <daquexian566@gmail.com>

* fix compile error

Signed-off-by: daquexian <daquexian566@gmail.com>

* auto format by CI

* fix deconv cache bug

Signed-off-by: daquexian <daquexian566@gmail.com>

Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: Li Xinqi <lixinqi2010@gmail.com>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Add fully support for all datatype (#7025)

* add fully support for all datatype

* Use max array size

* add clang-format off to maintain the matrix

* fix format

* remove redundant numpy dtype

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Migrate split python layer to functor (#7030)

* Migrate split python layer to functor

* modify dim

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Add add_sparse_optimizer  for Graph (#6988)

* add_sparse_optimizer

* format

* fix bug

* refine new interface by discuss

* auto format by CI

* address review

* correct syntax

* correct error message

* rm debug print

* auto format by CI

* fix cpu-only test

Co-authored-by: XIE Xuan <xiexuanx2@gmail.com>
Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Refine RUN_CUDA_KERNEL (#7003)

* Refine RUN_CUDA_KERNEL

* Added LaunchConfig

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Support llvm in tree build (#6995)

* refine

* refine

* refine

* refine

* add61

* refien

* refine

* refine

* refine

* refine

* refien

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* rm

* revert

* refine

* refine

* refine

* refine

* return_self_in_to_consistent_if_necessary (#7004)

* return_self_in_to_consistent_if_necessary

* fix error and add test case

* skip cpu test

* fix error

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Decouple ep and global (#7027)

* Decouple ep and global

* NOLINT

* fix

* fix import

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* arange doc fix (#7035)

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* add_consistency_check_in_consistent_tensor_set_data (#7002)

* add_consistency_check_in_consistent_tensor_set_data

* auto format by CI

* minor fix

* add just wrap

Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* [cmake] add liboneflow_cpp target (#7005)

* add cmake changes for liboneflow_cpp.so

Signed-off-by: daquexian <daquexian566@gmail.com>

* add separate target for cpp api test

Signed-off-by: daquexian <daquexian566@gmail.com>

* add cpp api test in ci

Signed-off-by: daquexian <daquexian566@gmail.com>

* reverse the order of cudnn and cuda library

Signed-off-by: daquexian <daquexian566@gmail.com>

* update logic of BUILD_MONOLITHIC_LIBONEFLOW

Signed-off-by: daquexian <daquexian566@gmail.com>

* rename BUILD_MONOLITHIC_LIBONEFLOW to BUILD_MONOLITHIC_LIBONEFLOW_CPP_SO

Signed-off-by: daquexian <daquexian566@gmail.com>

* share lib directory in test container

Signed-off-by: daquexian <daquexian566@gmail.com>

* add github actions debug

Signed-off-by: daquexian <daquexian566@gmail.com>

* Revert "add github actions debug"

This reverts commit 7d9aef6.

* add upterm debug after exe test

Signed-off-by: daquexian <daquexian566@gmail.com>

* sleep after fail

Signed-off-by: daquexian <daquexian566@gmail.com>

* set LD_LIBRARY_PATH in yml for cpp api test exe

Signed-off-by: daquexian <daquexian566@gmail.com>

* sleep

Signed-off-by: daquexian <daquexian566@gmail.com>

* upload liboneflow_cpp.so

Signed-off-by: daquexian <daquexian566@gmail.com>

* modify cmake to trigger compilation

Signed-off-by: daquexian <daquexian566@gmail.com>

* remove sleep

Signed-off-by: daquexian <daquexian566@gmail.com>

* build cpp api in cpu mode

Signed-off-by: daquexian <daquexian566@gmail.com>

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Fix CUDA 52 and add it to CI (#7031)

* refine

* refine

* refine

* refine

* revert

* fix

* refine

* refine

* refine

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Add check of placement constructor (#6991)

* add_check_of_placement_constructor

* move CheckDeviceIdsIsValid to runtime

* handle comment

* fix error

* fix error

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Fix(FromNumpy): fix bug in stride (#7042)

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* add non virtual destructor back (#6999)

Signed-off-by: daquexian <daquexian566@gmail.com>

Co-authored-by: Houjiang Chen <chenhoujiangcug@gmail.com>

* move python code to cpp: eye (#7036)

* 80% Sbp signature left to finish

* refine functional_api.yaml

* 90% docstr left to update

* refine

* add sbp check

* refine docs

* auto format by CI

* refine

* refine docstr

* auto format by CI

Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Fix l2norm block_size (#7044)

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* fix undefined symbol: cudaGetDeviceCount (#7052)

* fix_worker_orphan_process (#7048)

* fix_worker_orphan_process

* use SIGTERM instead

* broadcast elemwise binary (#6871)

* add

* broadcast elementwise binary

* fix

* refine

* fix

* refine

* refine

* for compile

* refine

* refine

* refine

* refine

* refine

* revert kernels

* revert kernel

* refine

* refine

* refine

* refine

* nvcc thread to 4

Co-authored-by: ZZK <42901638+MARD1NO@users.noreply.github.com>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Source op per critical section (#6472)

* backup code

* EventRecord

* auto format by CI

* backup code

* remove deprecated binary test cases

* refactor valatile to atomic

* add StreamType::InitInstructionStatusIf/StreamType::DeleteInstructionStatusIf

* merge from branch profiling_nn_graph

* address comments

* EventRecordProvider

* more comments for XXXStatusQuerier::SetLaunched

* more comments for SharedEventRecord::Init

* wait source op per critical section

* rename a task_node.cpp

* minor fix

* backup code

* fix compiler complaints

* 1) remove AddCtrlEdgeBetweenSrcDstTickAndInputOutputInSameRank; 2) create CriticalSectionInstance buffers

* fix compiler complaints

* more profiler code

* refactor vm preschedule

* TryMoveFromWaitingToReady

* revert flying_instruction_cnt

* revert to single position to call DispatchInstruction

* revert several code

* reset instruction watermark

* remove is_xxx_hook_empty

* build with profiler

* merge master

* insert device ticks before and after critical sections

* refactor register_num of cs_wait/cs_callback from 2 to 128

* fix static analysis complaints

* fix complier complaints about JobBuilder::ParallelConf4OpName

* Update oneflow/core/operator/critical_section_wait_tick_op.cpp

Co-authored-by: daquexian <daquexian566@gmail.com>

* address pr comments

* add job example for InstructionsBuilder::LaunchLazyJob

* address pr comments

Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Co-authored-by: ouyangyu <xuanjiuye@gmail.com>
Co-authored-by: daquexian <daquexian566@gmail.com>

* More details of error of getting op matched sbp signature (#7077)

* more details of error msg

* minor change

* address review comment

* avoid namesake iterator

* Module apply only once (#7055)

* add once apply of param

* apply once on buffer

* test reuse var on module to

* test resue var

* rm useless test

* finish test

* refine test

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* distributed test bugfix (#7057)

* change spawn_shell to spawn_shell_and_check, sleep in script

Signed-off-by: daquexian <daquexian566@gmail.com>

* fix distributed test master addr

Signed-off-by: daquexian <daquexian566@gmail.com>

* remove sleep

Signed-off-by: daquexian <daquexian566@gmail.com>

* spawn_shell -> spawn_shell_ignoring_failure

Signed-off-by: daquexian <daquexian566@gmail.com>

* auto format by CI

* fix bug

Signed-off-by: daquexian <daquexian566@gmail.com>

* auto format by CI

* fix the reversed logic

Signed-off-by: daquexian <daquexian566@gmail.com>

* improve error msg

Signed-off-by: daquexian <daquexian566@gmail.com>

* resolve name conflict of MASTER_ADDR

Signed-off-by: daquexian <daquexian566@gmail.com>

Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* fix promote_type matrix (#7066)

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* fix chunk op dim=-1 bug (#7073)

* fix chunk op dim=-1 bug

* Update oneflow/core/functional/impl/array_functor.cpp

Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>

* Update oneflow/core/functional/impl/array_functor.cpp

Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>

Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Fix resource desc dump cudnn conf bug (#7038)

* fix Resource::DumpCudnnConf

* fix typo and error msg

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* fix concat bug (#7075)

* fix

* support concat single input

* Clean TensorNameScope after graph build (#7076)

* Clear tensor name scope after graph build

* Add test case of 2 graph caught same free eager tensor

* auto format by CI

Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* fix_abnormal_printing (#7099)

* Fix bias add dropout fuse (#7081)

* fix bias_add dropout fuse when p=0.0

* remove redundant op

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Support 1d to 2d eager boxing (#7083)

* fix Resource::DumpCudnnConf

* support_1d_to_2d_eager_boxing

* rename stack to unflatten

* add test case

* of format

* refine test case

* Revert "fix Resource::DumpCudnnConf"

This reverts commit f07278d.

* support nd to 1d

* add 2d to 1d test case

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Implement all User Ops with Op Schema (#7032)

* add oneflow-tblgen: generate op schema (OpInterpCtx) from ods

* cmake: add inja

* tblgen: add oneflow_datatype

* tblgen: use option cat

* tblgen: fix error

* tblgen: put impl in .cpp

* tblgen: fix null attrs

* tblgen: fix null ops

* refine

* refine

* reifne

* Refine op schema template and compilation

* add base OpInterpCtx to finish compilation

* fix

* refine

* fix

* add custom infer code

* generate op registrants automatically

* refine

* fix

* update user op ods and fix shape attr

* refine

* refine

* add custom code in op base

* refine comments

* add same_output_regst_num and infer

* support declare hasxx

* update op schema emitter

* refine

* emit output regist num

* refine

* refine

* migrate acc op

* migrate onerec_reader, ones_like, send, pack and padding ops

* add has_sbp_signature_infer_fn

* refine

* migrate pad, parallel_cast, partial_fc and pooling ops

* rm redundant has_device_infer_fn

* migrate prelu, quantization, randperm, reduce and repeat ops

* migrate reshape, reshape_like, roi_align, same_pad, selu and scalar related ops

* back port

* backport

* migrate ops

* refine

* refine

* refine

* refine

* add new op

* fix llvm not found

* fix mlir headers

* fix mlir headers

* fix llvm not found

* irefine

* mark override

* fix merge

* fix

* fix

* set op schema as obj lib to speed up

* rewrite ops

* add addn

* add grdi

* refien

* add more def (#7051)

* affine grid

* refien

* refine

* refine

* refine

* fix

* refien

* refine

* refine

* refine

* refine

* refine

* refien

* refine

* refine

* refein

* refine

* refine

* refine

* refine

* refien

* refine

* refine

* refine

* refien

* refien

* refien

* refine

* refine

* refien

* refine

* refine

* refine

* refein

* refine

* refine

* refine

* refine

* refine

* refien

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refein

* refine

* refine

* refine

* move more ops

* fix math_binary_broadcast/elementwise_ops

* fix hardtanh

* add norm

* rename file and add CpuOnly no_grad

* fix ir & fix norm op

* fix oneflow-tblgen

* fix math_unary_elementwise_op

* fix norm

* fix bn

* fix op schema

* refine

* fix

* refine physical_tensor_desc_infer_fn

* refine

* add ScalarLogicalNotEqualOp & RecvOp

* refine

* auto format by CI

* fix fmt

* add cuda only trait

* delete unused inja

* del inja_copy_headers_to_destination

* delete unused inja

* del inja_copy_headers_to_destination

* add cuda only to tblgen

* fix json inja url and md5 not used

* fix json inja url and md5 not used

* refine

* revert

* add with cuda

* refine

* delete GenUserOpODS

* remove cuda only

* revert cuda only after meeting

* fix

Co-authored-by: PragmaTwice <i@twice.moe>
Co-authored-by: hjchen2 <chenhoujiangcug@gmail.com>
Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>

* Feat/debug pass (#7054)

* add pass debug

* debug pass

* refine comment of fuse add pass

* auto format by CI

Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Fix error message (#6930)

* fix error message

* fix dot doc

* fix dot elem cnt

* auto format by CI

Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* fix simple ci: add of_op_schema target to tidy check (#7105)

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Rename AnyType in .td (#7109)

* AnyType => Tensor

* refine

* refine

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Feat graph reuse var (#7080)

* add once apply of param

* apply once on buffer

* test reuse var on module to

* test resue var

* rm useless test

* finish test

* refine test

* Clear tensor name scope after graph build

* Add test case of 2 graph caught same free eager tensor

* auto format by CI

* refactor var build draft

* add full func; add check

* done

* add test of call parameter ousite its moudule

* fix break test

Co-authored-by: chengtbf <472491134@qq.com>
Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Fix l2_normalize & add nn.functional.normalize (#6940)

* fix l2_normalize

* add normalize

* add test for normalize

* refine

* clean l2_normalize and refine normalize

* simplify normalize test

* Fix l2norm block_size

* refine

Co-authored-by: Juncheng <liujuncheng1022@gmail.com>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Align api in swin transformer (#7058)

* add linspace op

* fix align error in swintransformer

* add @ magic method

* fix conflict

* support tensor list

* fix meshgrid bug

* revert

Co-authored-by: hjchen2 <chenhoujiangcug@gmail.com>

* set CMAKE_LINK_DEPENDS_NO_SHARED to ON (#7063)

Signed-off-by: daquexian <daquexian566@gmail.com>

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Add other api graph autotest (#7091)

* Clear tensor name scope after graph build

* Add test case of 2 graph caught same free eager tensor

* auto format by CI

* add other api graph autotest

* add more samples

* fix comments

* refine

* refine

* refine

* refine

* refine

* fix error

* fix test error

* fix bug

* fix flip bug

* fix bug

* fix bug

* fix ci bug

* fix ci error

* fix bug

* fix ci error

Co-authored-by: chengtbf <472491134@qq.com>
Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Co-authored-by: Li Xiang <54010254+lixiang007666@users.noreply.github.com>

* [serving] dev graph run (#7008)

* add cmake changes for liboneflow_cpp.so

Signed-off-by: daquexian <daquexian566@gmail.com>

* add separate target for cpp api test

Signed-off-by: daquexian <daquexian566@gmail.com>

* add cpp api test in ci

Signed-off-by: daquexian <daquexian566@gmail.com>

* graph run

* reverse the order of cudnn and cuda library

Signed-off-by: daquexian <daquexian566@gmail.com>

* update logic of BUILD_MONOLITHIC_LIBONEFLOW

Signed-off-by: daquexian <daquexian566@gmail.com>

* rename BUILD_MONOLITHIC_LIBONEFLOW to BUILD_MONOLITHIC_LIBONEFLOW_CPP_SO

Signed-off-by: daquexian <daquexian566@gmail.com>

* refine

* [draft] implement graph parameter load and save (#7010)

* implement parameter save (python) and load (c++)

Signed-off-by: daquexian <daquexian566@gmail.com>

* revert accident changes

Signed-off-by: daquexian <daquexian566@gmail.com>

* fix circular reference

Signed-off-by: daquexian <daquexian566@gmail.com>

* pimpl

* batching

* share lib directory in test container

Signed-off-by: daquexian <daquexian566@gmail.com>

* fix typo;

* add github actions debug

Signed-off-by: daquexian <daquexian566@gmail.com>

* Revert "add github actions debug"

This reverts commit 7d9aef6.

* add upterm debug after exe test

Signed-off-by: daquexian <daquexian566@gmail.com>

* sleep after fail

Signed-off-by: daquexian <daquexian566@gmail.com>

* set LD_LIBRARY_PATH in yml for cpp api test exe

Signed-off-by: daquexian <daquexian566@gmail.com>

* refine

* add test file && input order

* sleep

Signed-off-by: daquexian <daquexian566@gmail.com>

* upload liboneflow_cpp.so

Signed-off-by: daquexian <daquexian566@gmail.com>

* modify cmake to trigger compilation

Signed-off-by: daquexian <daquexian566@gmail.com>

* load job from ir && clean && add mlir model

* [remove useless python code]save to .pb

* add target of_common_obj to remove duplicate REGISTER_PASS  && run of_format

* remove openvino

* remove openvino test

* refine

* IValue

* Update oneflow/api/cpp/framework/graph.h

Co-authored-by: daquexian <daquexian566@gmail.com>

* refine

* refine

* refine

* refine

* refine

* refine

* rename in oneflow.cmake

* refine oneflow.cmake

* make of_api_common object library

* move device util function in api to core

* remove device check in New and ThreadLocalGetOrNew

* refine

* fix device test

* refine graph test

* refine GetExeDir()

* refine GetExeDir() again

* fix

* refine

* fix

Co-authored-by: daquexian <daquexian566@gmail.com>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Co-authored-by: mosout <mosout@qq.com>

* disable autograd in lazy mode (#7070)

* disable autograd in lazy mode

* refine

* Fix/rand source op in graph (#7092)

* add test

* fix rand consistent

* add test

* Fix powf (#7106)

* quick fix power

* add int scalar test case

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Dispatch stateful ops using functional api (#7046)

* Dispatch functional stateful ops

* fix

* fix cmake

* fix

* disable attr check since it may not given when creating op expr.

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* refine

Co-authored-by: VertexC <bob2420083992@gmail.com>

* Fix HWLoc memory affinity (#7115)

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* add_env_api_docs (#7100)

* add_env_api_docs

* minor fix

* fix grammatical errors

Co-authored-by: Yao Chi <later@usopp.net>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* tmp skip s0 print because of slice (#7065)

* tmp skip s0 print because of slice

* tmp skip s0 print in test case

* fix

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* indexing first version (#7012)

* indexing first version

* complete

* test

* out loop

* test skip

* revise

* revise

* shape

* docs

* formatted

* confict1

* confict2

* confict2

* confict

* revise

* auto format by CI

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>

* fix maybe: add Maybe(T&&) to allow constructing from rvalue T (#7125)

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* autotest_add_graph_log (#7126)

* Meta info consistency check (#7085)

* meta_info_consistency_check

* refine check function

* Update consistent_cast.cpp

* move check to opinterpreter

* refine

* add note

* refactor MetaInfoConsistencyCheck

* of_format

* refine

* NonRecursiveMetaInfoConsistencyCheck

* fix func name

* add IsMetaInfoConsistencyCheckDisable()

* mino fix

* refine

* minor fix

* format

* minor fix

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* cmake: use interface target instead of include_directories in pybind11 (#7128)

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Import cmake dependence json and inja using FetchContent (#7124)

* import cmake dependence json and inja using FetchContent

* install-llvm: fix url hash

* fix inja config

* add cache var

* fix ninja build

* fix ninja build

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Add environment variable to set GRPC_ARG_MAX_MESSAGE_LENGTH (#7130)

* env ONEFLOW_GRPC_MAX_MESSAGE_BYTE_SIZE

* set default to -1

* Fea/nhwc (#6811)

* legacy maxpool2d module

* add legacy avgpool2d

* add graph cudnn conv alg config

* add conv2d nhwc

* lazy create cuda_stream in CudaCopyD2HDeviceCtx CudaStreamHandleDeviceCtx

* refine

* conv bn pool nhwc for resnet perf

* one hot with float

* use BiasAddRowGpu

* rm l2 with 0

* reformat

* add nhwc env var

* legacy pool merged into new

* refine

* fix style

* fix and refine

* address review

* fix and refine

* fix doc test

Co-authored-by: luyang <flowingsun007@163.com>
Co-authored-by: guo-ran <360112263@qq.com>
Co-authored-by: lixinqi <lixinqi0703106@163.com>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* reduce memory usage caused by slice grad (#7144)

* cmake: fix THIRD_PARTY build (#7146)

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* fix fold op (#7156)

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Support inplace for lazy consistent (#7112)

* Support inplace for lazy consistent

* fix single client sbp hint

* refine

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* fix prelu bug (#7118)

* support dtype and device in prelu

* optimize PreluFunctor

* fix prelu 1-dim error

* update

* update

* auto format by CI

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>

* use ibn2nd_sbp to get nd_sbp (#7155)

Co-authored-by: Houjiang Chen <chenhoujiangcug@gmail.com>

* fix copy bug (#7159)

* fix copy bug

* add to test case

* refine

* fix test case

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Fix laynorm backward bug (#7164)

* fix layernorm backward index bug

* add layernorm test case

* auto format by CI

Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* [Fix] graph support 0-Size tensor (#6957)

* Add nn.functional.glu graph test

* add filter to motify functional autotest

* motify code

* add test example

* add test else

* add test judging condition for test_masked_fill.py,test_constant.py,test_tile.py、test_repeat.py,test_expand.py

* add test ok example

* Clear tensor name scope after graph build

* Add test case of 2 graph caught same free eager tensor

* auto format by CI

* Dev cc clean tensor name scope (#7082)

* Clear tensor name scope after graph build

* Add test case of 2 graph caught same free eager tensor

* auto format by CI

Co-authored-by: chengtbf <472491134@qq.com>
Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>

* submit test success example

* test success example

* submit test code

* fix a bug about relu module with 0 shape data

* fixed a bug about relu module with 0 shape data

* fix a bug about relu module with 0 shape data

* fix a bug about relu module with 0 shape data

* 0shape and 0d autotest

* fix a bug about relu module with 0 shape data

* 0shape changed to 0_size

* modify test_var.py

* modify test_eye.py

* modify test_reshape.py

* modify test_.py

* modify ReshapeFunctor

* modify some file

* Fixed graph autotest bug with reshape op test

* Fixed graph autotest bug with reshape op test

* fixed test_sub.py

* modify test_sub.py

* modify tensor_methods.cpp

* modify array_functor.cpp

* graph support 0-Size tensor

* rename 0shape to 0 size

* modified check_graph=True

* fix and refine

Co-authored-by: Zhenhua <huangzhenhua@zhejianglab.com>
Co-authored-by: tangnana925 <85614052+tangnana925@users.noreply.github.com>
Co-authored-by: tangnana <tnn_personal@163.com>
Co-authored-by: Zhenhua <1209435+hengzi@users.noreply.github.com>
Co-authored-by: chengtbf <472491134@qq.com>
Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: Xiaoyu Xu <xiaoyulink@gmail.com>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Cumsum op implementation (#7050)

* add cumsum op's forward definition

* add cumsum forward test case

* cumsum ver3

* remove calculating time

* add cumsum forward gpu implementation

* fix gpu forward error

* change var name

* remove annotation

* add cumsum cpu forward multi-thread support

* add multi-thread annotation

* add cumsum grad definition

* update

* add cumsum cpu backward

* add cumsum cpu backward functor

* add cumsum autograd

* update

* remove user interface

* use random method to test cumsum forward

* add cumsum gpu backward

* add cumsum gpu test

* fix gpu backward bug

* add a 3d cuda kernel try

* Revert "add cumsum gpu test"

This reverts commit 05c31556ba28ecb827b25e54c2f5fa38984e8096.

* Revert "Revert "add cumsum gpu test""

This reverts commit 918ee1569863b008c1d419c3528257416cffd840.

* change nele to ele_cnt

* add test_cumsum.py in oneflow/test/modules

* change original test_cumsum to autotest version

* optimize cumsum for special up_space and down_space

* add two special cu func

* add cumsum doc

* update doc

* update doc

* update code according to bbuf's review

* ditto

* change pin/pout to in_ptr/out_ptr

* remove multi-thread func

* update doc

* use tensor processor

* update by review

* update by review

* update

* update

* auto format by CI

* auto format by CI

* update doc

* update

Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>

* Logical slice in tenosr str (#7116)

* using logical slice in tensor str

* add tensor str util file

* refine

* refine

* refine

* refine

* add logical slice docs

* fix bug

* fix comment

* auto format by CI

* fix doc test bug

* delete TODO

Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Add install for oneflow py (#7107)

* Add install for oneflow py

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refien

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* fix bug: output key not exists when SavaJobToIR (#7139)

* fix bug: output key not exists when SavaJobToIR

* [test] makedirs when path not exists

* remove useless comment

Co-authored-by: Peihong Liu <mosout@qq.com>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Add linalg 2d norm op for clip_grad (#7160)

* add linalg_2d_norm op for clip_grad

* code format

* revert sqrt

* fix comment

* refine

* fix comment

* fix ci error

* fix ci error

* fix docs bug

* fix ci error

* fix ci error

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* refine nn.graph autotest (#7111)

* add linspace op

* refine graph autotest

* revert

* add graph error trace

* fix bug

* fix autotest bug

* auto format by CI

* fix set_printoptions error

* auto format by CI

* CI test bug

* auto format by CI

* For CI

* auto format by CI

* For CI test

* fix ci error

* revert for ci

* fix bug

* fix ci error

* fix bug

* fix bug

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: Li Xiang <54010254+lixiang007666@users.noreply.github.com>
Co-authored-by: lixiang <88304454@qq.com>

* add oneflow/pytorch cudnn.deterministic (#7172)

* add cudnn.deterministic

* fix bug

* auto format by CI

* fix bug

* fix generate fake program input bug

* auto format by CI

Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>

* fix linalg vector norm scalar tensor print bug (#7178)

* fix linalg vector norm scalar tensor print bug

* auto format by CI

Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>

* format

* refine

* format

Co-authored-by: Yinggang Wang <wyg19970408@gmail.com>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: liufengwei0103 <2472937968@qq.com>
Co-authored-by: daquexian <daquexian566@gmail.com>
Co-authored-by: guo ran <360112263@qq.com>
Co-authored-by: Peihong Liu <mosout@qq.com>
Co-authored-by: ZZK <42901638+MARD1NO@users.noreply.github.com>
Co-authored-by: Shenghang Tsai <jackalcooper@gmail.com>
Co-authored-by: Li Xiang <54010254+lixiang007666@users.noreply.github.com>
Co-authored-by: cheng cheng <472491134@qq.com>
Co-authored-by: Li Xinqi <lixinqi2010@gmail.com>
Co-authored-by: lichunyou <33850693+lcylcy@users.noreply.github.com>
Co-authored-by: Luyang <flowingsun007@163.com>
Co-authored-by: wyushun <wyushun@foxmail.com>
Co-authored-by: zhu wang <33675639+olojuwin@users.noreply.github.com>
Co-authored-by: leaves-zwx <kunta0932@gmail.com>
Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>
Co-authored-by: Shijie <821898965@qq.com>
Co-authored-by: XIE Xuan <xiexuanx2@gmail.com>
Co-authored-by: Juncheng <liujuncheng1022@gmail.com>
Co-authored-by: binbinHan <han_binbin@163.com>
Co-authored-by: CHI LIU <42956025+thinksoso@users.noreply.github.com>
Co-authored-by: Yao Chi <later@usopp.net>
Co-authored-by: ouyangyu <xuanjiuye@gmail.com>
Co-authored-by: Xiaoyu Xu <xiaoyulink@gmail.com>
Co-authored-by: PragmaTwice <i@twice.moe>
Co-authored-by: luqiang guo <702572275@qq.com>
Co-authored-by: ZeKai Zhou <30856589+zzk0@users.noreply.github.com>
Co-authored-by: VertexC <bob2420083992@gmail.com>
Co-authored-by: lixinqi <lixinqi0703106@163.com>
Co-authored-by: fengdaozhuo <52237830+grybd@users.noreply.github.com>
Co-authored-by: Zhenhua <huangzhenhua@zhejianglab.com>
Co-authored-by: tangnana925 <85614052+tangnana925@users.noreply.github.com>
Co-authored-by: tangnana <tnn_personal@163.com>
Co-authored-by: Zhenhua <1209435+hengzi@users.noreply.github.com>
Co-authored-by: lixiang <88304454@qq.com>

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Co-authored-by: Houjiang Chen <chenhoujiangcug@gmail.com>
Co-authored-by: Yinggang Wang <wyg19970408@gmail.com>
Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: liufengwei0103 <2472937968@qq.com>
Co-authored-by: daquexian <daquexian566@gmail.com>
Co-authored-by: Peihong Liu <mosout@qq.com>
Co-authored-by: ZZK <42901638+MARD1NO@users.noreply.github.com>
Co-authored-by: Shenghang Tsai <jackalcooper@gmail.com>
Co-authored-by: Li Xiang <54010254+lixiang007666@users.noreply.github.com>
Co-authored-by: cheng cheng <472491134@qq.com>
Co-authored-by: Li Xinqi <lixinqi2010@gmail.com>
Co-authored-by: lichunyou <33850693+lcylcy@users.noreply.github.com>
Co-authored-by: Luyang <flowingsun007@163.com>
Co-authored-by: wyushun <wyushun@foxmail.com>
Co-authored-by: zhu wang <33675639+olojuwin@users.noreply.github.com>
Co-authored-by: leaves-zwx <kunta0932@gmail.com>
Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>
Co-authored-by: Shijie <821898965@qq.com>
Co-authored-by: XIE Xuan <xiexuanx2@gmail.com>
Co-authored-by: Juncheng <liujuncheng1022@gmail.com>
Co-authored-by: binbinHan <han_binbin@163.com>
Co-authored-by: CHI LIU <42956025+thinksoso@users.noreply.github.com>
Co-authored-by: Yao Chi <later@usopp.net>
Co-authored-by: ouyangyu <xuanjiuye@gmail.com>
Co-authored-by: Xiaoyu Xu <xiaoyulink@gmail.com>
Co-authored-by: PragmaTwice <i@twice.moe>
Co-authored-by: luqiang guo <702572275@qq.com>
Co-authored-by: ZeKai Zhou <30856589+zzk0@users.noreply.github.com>
Co-authored-by: VertexC <bob2420083992@gmail.com>
Co-authored-by: lixinqi <lixinqi0703106@163.com>
Co-authored-by: fengdaozhuo <52237830+grybd@users.noreply.github.com>
Co-authored-by: Zhenhua <huangzhenhua@zhejianglab.com>
Co-authored-by: tangnana925 <85614052+tangnana925@users.noreply.github.com>
Co-authored-by: tangnana <tnn_personal@163.com>
Co-authored-by: Zhenhua <1209435+hengzi@users.noreply.github.com>
Co-authored-by: lixiang <88304454@qq.com>
Yipeng1994 pushed a commit that referenced this pull request Jan 20, 2022
* Source op per critical section (#6472)

* backup code

* EventRecord

* auto format by CI

* backup code

* remove deprecated binary test cases

* refactor valatile to atomic

* add StreamType::InitInstructionStatusIf/StreamType::DeleteInstructionStatusIf

* merge from branch profiling_nn_graph

* address comments

* EventRecordProvider

* more comments for XXXStatusQuerier::SetLaunched

* more comments for SharedEventRecord::Init

* wait source op per critical section

* rename a task_node.cpp

* minor fix

* backup code

* fix compiler complaints

* 1) remove AddCtrlEdgeBetweenSrcDstTickAndInputOutputInSameRank; 2) create CriticalSectionInstance buffers

* fix compiler complaints

* more profiler code

* refactor vm preschedule

* TryMoveFromWaitingToReady

* revert flying_instruction_cnt

* revert to single position to call DispatchInstruction

* revert several code

* reset instruction watermark

* remove is_xxx_hook_empty

* build with profiler

* merge master

* insert device ticks before and after critical sections

* refactor register_num of cs_wait/cs_callback from 2 to 128

* fix static analysis complaints

* fix complier complaints about JobBuilder::ParallelConf4OpName

* Update oneflow/core/operator/critical_section_wait_tick_op.cpp

Co-authored-by: daquexian <daquexian566@gmail.com>

* address pr comments

* add job example for InstructionsBuilder::LaunchLazyJob

* address pr comments

Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Co-authored-by: ouyangyu <xuanjiuye@gmail.com>
Co-authored-by: daquexian <daquexian566@gmail.com>

* More details of error of getting op matched sbp signature (#7077)

* more details of error msg

* minor change

* address review comment

* avoid namesake iterator

* Module apply only once (#7055)

* add once apply of param

* apply once on buffer

* test reuse var on module to

* test resue var

* rm useless test

* finish test

* refine test

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* distributed test bugfix (#7057)

* change spawn_shell to spawn_shell_and_check, sleep in script

Signed-off-by: daquexian <daquexian566@gmail.com>

* fix distributed test master addr

Signed-off-by: daquexian <daquexian566@gmail.com>

* remove sleep

Signed-off-by: daquexian <daquexian566@gmail.com>

* spawn_shell -> spawn_shell_ignoring_failure

Signed-off-by: daquexian <daquexian566@gmail.com>

* auto format by CI

* fix bug

Signed-off-by: daquexian <daquexian566@gmail.com>

* auto format by CI

* fix the reversed logic

Signed-off-by: daquexian <daquexian566@gmail.com>

* improve error msg

Signed-off-by: daquexian <daquexian566@gmail.com>

* resolve name conflict of MASTER_ADDR

Signed-off-by: daquexian <daquexian566@gmail.com>

Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* fix promote_type matrix (#7066)

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* fix chunk op dim=-1 bug (#7073)

* fix chunk op dim=-1 bug

* Update oneflow/core/functional/impl/array_functor.cpp

Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>

* Update oneflow/core/functional/impl/array_functor.cpp

Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>

Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Fix resource desc dump cudnn conf bug (#7038)

* fix Resource::DumpCudnnConf

* fix typo and error msg

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* fix concat bug (#7075)

* fix

* support concat single input

* Clean TensorNameScope after graph build (#7076)

* Clear tensor name scope after graph build

* Add test case of 2 graph caught same free eager tensor

* auto format by CI

Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* fix_abnormal_printing (#7099)

* Fix bias add dropout fuse (#7081)

* fix bias_add dropout fuse when p=0.0

* remove redundant op

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Support 1d to 2d eager boxing (#7083)

* fix Resource::DumpCudnnConf

* support_1d_to_2d_eager_boxing

* rename stack to unflatten

* add test case

* of format

* refine test case

* Revert "fix Resource::DumpCudnnConf"

This reverts commit f07278d.

* support nd to 1d

* add 2d to 1d test case

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Implement all User Ops with Op Schema (#7032)

* add oneflow-tblgen: generate op schema (OpInterpCtx) from ods

* cmake: add inja

* tblgen: add oneflow_datatype

* tblgen: use option cat

* tblgen: fix error

* tblgen: put impl in .cpp

* tblgen: fix null attrs

* tblgen: fix null ops

* refine

* refine

* reifne

* Refine op schema template and compilation

* add base OpInterpCtx to finish compilation

* fix

* refine

* fix

* add custom infer code

* generate op registrants automatically

* refine

* fix

* update user op ods and fix shape attr

* refine

* refine

* add custom code in op base

* refine comments

* add same_output_regst_num and infer

* support declare hasxx

* update op schema emitter

* refine

* emit output regist num

* refine

* refine

* migrate acc op

* migrate onerec_reader, ones_like, send, pack and padding ops

* add has_sbp_signature_infer_fn

* refine

* migrate pad, parallel_cast, partial_fc and pooling ops

* rm redundant has_device_infer_fn

* migrate prelu, quantization, randperm, reduce and repeat ops

* migrate reshape, reshape_like, roi_align, same_pad, selu and scalar related ops

* back port

* backport

* migrate ops

* refine

* refine

* refine

* refine

* add new op

* fix llvm not found

* fix mlir headers

* fix mlir headers

* fix llvm not found

* irefine

* mark override

* fix merge

* fix

* fix

* set op schema as obj lib to speed up

* rewrite ops

* add addn

* add grdi

* refien

* add more def (#7051)

* affine grid

* refien

* refine

* refine

* refine

* fix

* refien

* refine

* refine

* refine

* refine

* refine

* refien

* refine

* refine

* refein

* refine

* refine

* refine

* refine

* refien

* refine

* refine

* refine

* refien

* refien

* refien

* refine

* refine

* refien

* refine

* refine

* refine

* refein

* refine

* refine

* refine

* refine

* refine

* refien

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refein

* refine

* refine

* refine

* move more ops

* fix math_binary_broadcast/elementwise_ops

* fix hardtanh

* add norm

* rename file and add CpuOnly no_grad

* fix ir & fix norm op

* fix oneflow-tblgen

* fix math_unary_elementwise_op

* fix norm

* fix bn

* fix op schema

* refine

* fix

* refine physical_tensor_desc_infer_fn

* refine

* add ScalarLogicalNotEqualOp & RecvOp

* refine

* auto format by CI

* fix fmt

* add cuda only trait

* delete unused inja

* del inja_copy_headers_to_destination

* delete unused inja

* del inja_copy_headers_to_destination

* add cuda only to tblgen

* fix json inja url and md5 not used

* fix json inja url and md5 not used

* refine

* revert

* add with cuda

* refine

* delete GenUserOpODS

* remove cuda only

* revert cuda only after meeting

* fix

Co-authored-by: PragmaTwice <i@twice.moe>
Co-authored-by: hjchen2 <chenhoujiangcug@gmail.com>
Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>

* Feat/debug pass (#7054)

* add pass debug

* debug pass

* refine comment of fuse add pass

* auto format by CI

Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Fix error message (#6930)

* fix error message

* fix dot doc

* fix dot elem cnt

* auto format by CI

Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* fix simple ci: add of_op_schema target to tidy check (#7105)

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Rename AnyType in .td (#7109)

* AnyType => Tensor

* refine

* refine

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Feat graph reuse var (#7080)

* add once apply of param

* apply once on buffer

* test reuse var on module to

* test resue var

* rm useless test

* finish test

* refine test

* Clear tensor name scope after graph build

* Add test case of 2 graph caught same free eager tensor

* auto format by CI

* refactor var build draft

* add full func; add check

* done

* add test of call parameter ousite its moudule

* fix break test

Co-authored-by: chengtbf <472491134@qq.com>
Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Fix l2_normalize & add nn.functional.normalize (#6940)

* fix l2_normalize

* add normalize

* add test for normalize

* refine

* clean l2_normalize and refine normalize

* simplify normalize test

* Fix l2norm block_size

* refine

Co-authored-by: Juncheng <liujuncheng1022@gmail.com>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Align api in swin transformer (#7058)

* add linspace op

* fix align error in swintransformer

* add @ magic method

* fix conflict

* support tensor list

* fix meshgrid bug

* revert

Co-authored-by: hjchen2 <chenhoujiangcug@gmail.com>

* set CMAKE_LINK_DEPENDS_NO_SHARED to ON (#7063)

Signed-off-by: daquexian <daquexian566@gmail.com>

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Add other api graph autotest (#7091)

* Clear tensor name scope after graph build

* Add test case of 2 graph caught same free eager tensor

* auto format by CI

* add other api graph autotest

* add more samples

* fix comments

* refine

* refine

* refine

* refine

* refine

* fix error

* fix test error

* fix bug

* fix flip bug

* fix bug

* fix bug

* fix ci bug

* fix ci error

* fix bug

* fix ci error

Co-authored-by: chengtbf <472491134@qq.com>
Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Co-authored-by: Li Xiang <54010254+lixiang007666@users.noreply.github.com>

* [serving] dev graph run (#7008)

* add cmake changes for liboneflow_cpp.so

Signed-off-by: daquexian <daquexian566@gmail.com>

* add separate target for cpp api test

Signed-off-by: daquexian <daquexian566@gmail.com>

* add cpp api test in ci

Signed-off-by: daquexian <daquexian566@gmail.com>

* graph run

* reverse the order of cudnn and cuda library

Signed-off-by: daquexian <daquexian566@gmail.com>

* update logic of BUILD_MONOLITHIC_LIBONEFLOW

Signed-off-by: daquexian <daquexian566@gmail.com>

* rename BUILD_MONOLITHIC_LIBONEFLOW to BUILD_MONOLITHIC_LIBONEFLOW_CPP_SO

Signed-off-by: daquexian <daquexian566@gmail.com>

* refine

* [draft] implement graph parameter load and save (#7010)

* implement parameter save (python) and load (c++)

Signed-off-by: daquexian <daquexian566@gmail.com>

* revert accident changes

Signed-off-by: daquexian <daquexian566@gmail.com>

* fix circular reference

Signed-off-by: daquexian <daquexian566@gmail.com>

* pimpl

* batching

* share lib directory in test container

Signed-off-by: daquexian <daquexian566@gmail.com>

* fix typo;

* add github actions debug

Signed-off-by: daquexian <daquexian566@gmail.com>

* Revert "add github actions debug"

This reverts commit 7d9aef6.

* add upterm debug after exe test

Signed-off-by: daquexian <daquexian566@gmail.com>

* sleep after fail

Signed-off-by: daquexian <daquexian566@gmail.com>

* set LD_LIBRARY_PATH in yml for cpp api test exe

Signed-off-by: daquexian <daquexian566@gmail.com>

* refine

* add test file && input order

* sleep

Signed-off-by: daquexian <daquexian566@gmail.com>

* upload liboneflow_cpp.so

Signed-off-by: daquexian <daquexian566@gmail.com>

* modify cmake to trigger compilation

Signed-off-by: daquexian <daquexian566@gmail.com>

* load job from ir && clean && add mlir model

* [remove useless python code]save to .pb

* add target of_common_obj to remove duplicate REGISTER_PASS  && run of_format

* remove openvino

* remove openvino test

* refine

* IValue

* Update oneflow/api/cpp/framework/graph.h

Co-authored-by: daquexian <daquexian566@gmail.com>

* refine

* refine

* refine

* refine

* refine

* refine

* rename in oneflow.cmake

* refine oneflow.cmake

* make of_api_common object library

* move device util function in api to core

* remove device check in New and ThreadLocalGetOrNew

* refine

* fix device test

* refine graph test

* refine GetExeDir()

* refine GetExeDir() again

* fix

* refine

* fix

Co-authored-by: daquexian <daquexian566@gmail.com>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Co-authored-by: mosout <mosout@qq.com>

* disable autograd in lazy mode (#7070)

* disable autograd in lazy mode

* refine

* Fix/rand source op in graph (#7092)

* add test

* fix rand consistent

* add test

* Fix powf (#7106)

* quick fix power

* add int scalar test case

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Dispatch stateful ops using functional api (#7046)

* Dispatch functional stateful ops

* fix

* fix cmake

* fix

* disable attr check since it may not given when creating op expr.

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* refine

Co-authored-by: VertexC <bob2420083992@gmail.com>

* Fix HWLoc memory affinity (#7115)

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* add_env_api_docs (#7100)

* add_env_api_docs

* minor fix

* fix grammatical errors

Co-authored-by: Yao Chi <later@usopp.net>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* tmp skip s0 print because of slice (#7065)

* tmp skip s0 print because of slice

* tmp skip s0 print in test case

* fix

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* indexing first version (#7012)

* indexing first version

* complete

* test

* out loop

* test skip

* revise

* revise

* shape

* docs

* formatted

* confict1

* confict2

* confict2

* confict

* revise

* auto format by CI

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>

* fix maybe: add Maybe(T&&) to allow constructing from rvalue T (#7125)

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* autotest_add_graph_log (#7126)

* Meta info consistency check (#7085)

* meta_info_consistency_check

* refine check function

* Update consistent_cast.cpp

* move check to opinterpreter

* refine

* add note

* refactor MetaInfoConsistencyCheck

* of_format

* refine

* NonRecursiveMetaInfoConsistencyCheck

* fix func name

* add IsMetaInfoConsistencyCheckDisable()

* mino fix

* refine

* minor fix

* format

* minor fix

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* cmake: use interface target instead of include_directories in pybind11 (#7128)

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Import cmake dependence json and inja using FetchContent (#7124)

* import cmake dependence json and inja using FetchContent

* install-llvm: fix url hash

* fix inja config

* add cache var

* fix ninja build

* fix ninja build

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Add environment variable to set GRPC_ARG_MAX_MESSAGE_LENGTH (#7130)

* env ONEFLOW_GRPC_MAX_MESSAGE_BYTE_SIZE

* set default to -1

* Fea/nhwc (#6811)

* legacy maxpool2d module

* add legacy avgpool2d

* add graph cudnn conv alg config

* add conv2d nhwc

* lazy create cuda_stream in CudaCopyD2HDeviceCtx CudaStreamHandleDeviceCtx

* refine

* conv bn pool nhwc for resnet perf

* one hot with float

* use BiasAddRowGpu

* rm l2 with 0

* reformat

* add nhwc env var

* legacy pool merged into new

* refine

* fix style

* fix and refine

* address review

* fix and refine

* fix doc test

Co-authored-by: luyang <flowingsun007@163.com>
Co-authored-by: guo-ran <360112263@qq.com>
Co-authored-by: lixinqi <lixinqi0703106@163.com>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* reduce memory usage caused by slice grad (#7144)

* cmake: fix THIRD_PARTY build (#7146)

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* fix fold op (#7156)

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Support inplace for lazy consistent (#7112)

* Support inplace for lazy consistent

* fix single client sbp hint

* refine

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* fix prelu bug (#7118)

* support dtype and device in prelu

* optimize PreluFunctor

* fix prelu 1-dim error

* update

* update

* auto format by CI

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>

* use ibn2nd_sbp to get nd_sbp (#7155)

Co-authored-by: Houjiang Chen <chenhoujiangcug@gmail.com>

* fix copy bug (#7159)

* fix copy bug

* add to test case

* refine

* fix test case

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Fix laynorm backward bug (#7164)

* fix layernorm backward index bug

* add layernorm test case

* auto format by CI

Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* [Fix] graph support 0-Size tensor (#6957)

* Add nn.functional.glu graph test

* add filter to motify functional autotest

* motify code

* add test example

* add test else

* add test judging condition for test_masked_fill.py,test_constant.py,test_tile.py、test_repeat.py,test_expand.py

* add test ok example

* Clear tensor name scope after graph build

* Add test case of 2 graph caught same free eager tensor

* auto format by CI

* Dev cc clean tensor name scope (#7082)

* Clear tensor name scope after graph build

* Add test case of 2 graph caught same free eager tensor

* auto format by CI

Co-authored-by: chengtbf <472491134@qq.com>
Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>

* submit test success example

* test success example

* submit test code

* fix a bug about relu module with 0 shape data

* fixed a bug about relu module with 0 shape data

* fix a bug about relu module with 0 shape data

* fix a bug about relu module with 0 shape data

* 0shape and 0d autotest

* fix a bug about relu module with 0 shape data

* 0shape changed to 0_size

* modify test_var.py

* modify test_eye.py

* modify test_reshape.py

* modify test_.py

* modify ReshapeFunctor

* modify some file

* Fixed graph autotest bug with reshape op test

* Fixed graph autotest bug with reshape op test

* fixed test_sub.py

* modify test_sub.py

* modify tensor_methods.cpp

* modify array_functor.cpp

* graph support 0-Size tensor

* rename 0shape to 0 size

* modified check_graph=True

* fix and refine

Co-authored-by: Zhenhua <huangzhenhua@zhejianglab.com>
Co-authored-by: tangnana925 <85614052+tangnana925@users.noreply.github.com>
Co-authored-by: tangnana <tnn_personal@163.com>
Co-authored-by: Zhenhua <1209435+hengzi@users.noreply.github.com>
Co-authored-by: chengtbf <472491134@qq.com>
Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: Xiaoyu Xu <xiaoyulink@gmail.com>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Cumsum op implementation (#7050)

* add cumsum op's forward definition

* add cumsum forward test case

* cumsum ver3

* remove calculating time

* add cumsum forward gpu implementation

* fix gpu forward error

* change var name

* remove annotation

* add cumsum cpu forward multi-thread support

* add multi-thread annotation

* add cumsum grad definition

* update

* add cumsum cpu backward

* add cumsum cpu backward functor

* add cumsum autograd

* update

* remove user interface

* use random method to test cumsum forward

* add cumsum gpu backward

* add cumsum gpu test

* fix gpu backward bug

* add a 3d cuda kernel try

* Revert "add cumsum gpu test"

This reverts commit 05c31556ba28ecb827b25e54c2f5fa38984e8096.

* Revert "Revert "add cumsum gpu test""

This reverts commit 918ee1569863b008c1d419c3528257416cffd840.

* change nele to ele_cnt

* add test_cumsum.py in oneflow/test/modules

* change original test_cumsum to autotest version

* optimize cumsum for special up_space and down_space

* add two special cu func

* add cumsum doc

* update doc

* update doc

* update code according to bbuf's review

* ditto

* change pin/pout to in_ptr/out_ptr

* remove multi-thread func

* update doc

* use tensor processor

* update by review

* update by review

* update

* update

* auto format by CI

* auto format by CI

* update doc

* update

Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>

* Logical slice in tenosr str (#7116)

* using logical slice in tensor str

* add tensor str util file

* refine

* refine

* refine

* refine

* add logical slice docs

* fix bug

* fix comment

* auto format by CI

* fix doc test bug

* delete TODO

Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Add install for oneflow py (#7107)

* Add install for oneflow py

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refien

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* fix bug: output key not exists when SavaJobToIR (#7139)

* fix bug: output key not exists when SavaJobToIR

* [test] makedirs when path not exists

* remove useless comment

Co-authored-by: Peihong Liu <mosout@qq.com>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Add linalg 2d norm op for clip_grad (#7160)

* add linalg_2d_norm op for clip_grad

* code format

* revert sqrt

* fix comment

* refine

* fix comment

* fix ci error

* fix ci error

* fix docs bug

* fix ci error

* fix ci error

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* refine nn.graph autotest (#7111)

* add linspace op

* refine graph autotest

* revert

* add graph error trace

* fix bug

* fix autotest bug

* auto format by CI

* fix set_printoptions error

* auto format by CI

* CI test bug

* auto format by CI

* For CI

* auto format by CI

* For CI test

* fix ci error

* revert for ci

* fix bug

* fix ci error

* fix bug

* fix bug

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: Li Xiang <54010254+lixiang007666@users.noreply.github.com>
Co-authored-by: lixiang <88304454@qq.com>

* add oneflow/pytorch cudnn.deterministic (#7172)

* add cudnn.deterministic

* fix bug

* auto format by CI

* fix bug

* fix generate fake program input bug

* auto format by CI

Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>

* fix linalg vector norm scalar tensor print bug (#7178)

* fix linalg vector norm scalar tensor print bug

* auto format by CI

Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>

* use op schema for cumsum (#7175)

* add op schema for cumsum

* change cumsum's td definition to math group

* update

* fix get_sbp for scalar math ops (#7184)

* add inplace mul for clip_grad (#7180)

* add inplace mul for clip_grad

* auto format by CI

* fix format error

Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>

* Add swapaxes op (#7179)

* Add swapaxes op

* Modify runtime

* fix docstr

* Modify functor

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* fix install cuda include (#7191)

* Support uneven split in eager slice boxing (#7123)

* fix Resource::DumpCudnnConf

* add shape para in boxing check function

* fix GetBoxingFunction para

* asymmetric_x_to_b support cpu

* forbid uneven split in cellective boxing

* refine slice boxing kernel to support uneven split

* add test case and fix balanced_splitter error

* fix test case

* fix op/kernel bug

* fix bug in symmetric_s_to_p

* revert boxing_dividor_util.cpp

* use const Shape&

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Add stack kernel (#7152)

* fix arange bug

* build init kernel

* add stack backward

* remove annotation

* reformat and fix sbp

* fix ops td format

* fix format

* fix comment

* add more test case in dim

* fiux user ops td

* fix to use size_t

* fix annotation

* fix less than

* fix userop tabelgen

* fix bug when num of inputs greater than 128

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Add erfinv op (#7163)

* Add erfinv op pre

* fix

* add erfinv op

* Add test

* fix comment

* add inplace version of erfinv

* add inplace version docs

* fix inplace cpu version kernel and ops td

* add test and docs

* fix back

* fix unittest

* fix const &

Co-authored-by: MARD1NO <359521840@qq.com>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* add doc for pybind type (#7193)

* fix linspace bug (#7185)

* fix linspace bug

* auto format by CI

* fix comment

* annotation adaptive_avgpool3d

Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Add floor inplace version (#7187)

* add floor inplace version

* add docs

* fix comment

* fix comment

* fix comment

* auto format by CI

* fix comment

Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Co-authored-by: ZZK <42901638+MARD1NO@users.noreply.github.com>

* remove is_lazy check in nn.Graph inplace output (#7190)

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Fix test case about eye (#7194)

* fix eye test case

* add test case

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* fix(narrow): fix consistent narrow gradient bug (#7195)

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Fused kernel with broadcast (#6977)

* add broadcast for fused kernel

* fix cuda memcpy ilegal access error

* add broadcast for fused_softmax kernel

* fix errors

* add more test sample

* reformat

* add one_elif

* reformat

* use different dispatch logic

* Use simplified dims

* add simplified dims for fused_scale_mask_softmax_dropout

* add simplified broadcast for fused_scale_mask_softmax_dropout

* add simplified dims for fused_scale_mask_softmax

* try to merge duplicate code

* simpified kernel code

* fix test case

* fix check

* remove annotation

* add new line

Co-authored-by: MARD1NO <359521840@qq.com>
Co-authored-by: ZZK <42901638+MARD1NO@users.noreply.github.com>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* skip drop if drop rate is zero (#7186)

* Dev inplace clamp (#7182)

* add inplace for clamp

* first commit

* fix conflict

* add clip alias and docs

* fix bug and add test

* add more test case

* skip functional adaptive pool3d test

Co-authored-by: Zhanghuihong <garfield.gzhh@gmail.com>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>
Co-authored-by: Li Xiang <54010254+lixiang007666@users.noreply.github.com>

* Revert "Fused kernel with broadcast (#6977)" (#7207)

This reverts commit 80099aa.

* [BUG] Fixed graph autotest bug with sub op  (#7142)

* fixed Fixed graph autotest bug with sub op test

* fixed 0size data graph autotest bug with randperm op

Co-authored-by: Xiaoyu Xu <xiaoyulink@gmail.com>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* var  kernel (#7024)

* var forward kernel

* variance backward

* add var backward

* refine

* refine

* refine

* refine

* add GetSbpFn

* refine

* refine

* refine

* refine

* refine

* add TODO

* replace 'axis' str using 'dim' str

* change the way of getting cuda stream

* add comment

* auto format by CI

* fix ref bug

* fix static check error

* auto format by CI

* fix build many linux error

* format

* fix static check error

* fix mut dptr error when size is 0

* refine

* support 0 shape and nan

* auto format by CI

* refine

* fix doctest because of accuracy error

* fix backward unsqueeze dim bug

* fix bug backward

* refine

* fix out of order bug

Co-authored-by: Yao Chi <later@usopp.net>
Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Add Python code frame, debug(2) and debug(3) in nn.Graph (#7110)

* add frame

* test pass

* refine loc str

* refine code

* refine code

* refine debug

* add debug

* block forward with glog scope

* refine debug

* glog to stderr when v 2

* refine py str api

* refine and fix py obj repr

* refine pystr; use GetOrThrow at pyfunc; use alsologtostderr

* refine pystr

* move str

* fix test

* log 2 alsolog

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* update readme 0.6.0 (#7202)

* update readme

* add Publication section

* reorder

* update default version

* Fix check graph bug part1 (#7197)

* support randperm graph test

* add diagonal graph test

* fix eye op check graph bug

* refine

* fix to bug

* refine

* fix

* format

* restruct nn.graph autotest

* format

* fix bug

* auto format by CI

* fix where test bug

* comment diagonal op

* fix comment

* fix ci error

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Co-authored-by: Li Xiang <54010254+lixiang007666@users.noreply.github.com>
Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>

* Add documentation file for nn.init.xxxx (#7181)

* Add documentation file for nn.init.xxxx (#7168)

* Modify document index order (#7168)

Co-authored-by: Yao Chi <later@usopp.net>

* Refactor to numpy (#7097)

* tensor numpy method

* to numpy

* delete useless file

* replace CHECK_JUST with JUST

* tensor cpu method return self if it is in cpu

* delete tensor buffer

* delete useless code

* refine

* Update python/oneflow/nn/modules/tensor_ops.py

Co-authored-by: Yinggang Wang <wyg19970408@gmail.com>

* refine

* add docstr of cpu method

* delete useless code

* refine

* add comment

* refine

* add 'assert' info

* refine

* do .cpu if tensor is not in cpu memory

* revert format change

* fix tensor buffer numpy

* support tensor buffer to invoke numpy

* fix bug

* fix nd sbp numpy bug

* fix bug about test case because of numpy sharing memory with tensor

* auto format by CI

Co-authored-by: Yinggang Wang <wyg19970408@gmail.com>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>

* Fix eager boxing bug (#7196)

* fix_eager_boxing_bug

* remove EagerBoxingCall

* minor fix

* fix error

* fix error

* rename d to dim

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Feat eager consistent 2d sbp infer (#7143)

* feat(EagerConsistent): support 2d sbp infer

* feat(EagerConsistent): support compute copy cost

* refine 2d sbp cannot find error message

* refactor(EagerConsistent): move functions to sbp_infer_util

* feat(EagerConsistent): add same sbp judgement

* refine code

* feat(EagerConsistent): update 1d to 1d copy cost

* feat(EagerConsistent): try to get boxing from eager_consistent_boxing_mgr

* feat(EagerConsistent): update copy cost function

* remove useless code

* refine code

* fix merge bug

* refine code and fix copy cost function

* Revert "Fused kernel with broadcast (#6977)"

This reverts commit 80099aa.

* Add comment

* refine code

* fix JUST

* Revert "Revert "Fused kernel with broadcast (#6977)""

This reverts commit e7e2990.

* fix P->B copy cost

* fix error message error

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Co-authored-by: Houjiang Chen <chenhoujiangcug@gmail.com>

* fix split default arg (#7222)

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* fix no grad inplace clamp (#7220)

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* fix readthedocs auto update (#7223)

* fix docs (#7227)

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* allow_file_schema_in_mirror_third_party (#7231)

* support_symmetric_cyclic_nd_sbp_boxing (#7210)

* support_symmetric_cyclic_nd_sbp_boxing

* rename func

* minor fix

* solve comment

* minor fix

* fix typo

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Fix erfinv and swapaxes (#7217)

* Fix erfinv and swapaxes

* Fix

* Fix bug and add test

* Modify name

* Fix arg

* Modify pi

* Fix

Co-authored-by: ZZK <42901638+MARD1NO@users.noreply.github.com>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Support nd sbp dim reduce (#7230)

* support_symmetric_cyclic_nd_sbp_boxing

* rename func

* minor fix

* solve comment

* minor fix

* support_nd_sbp_dim_reduce

* fix_typo

* add test case

* fix bug

* fix bug

* refine

* fix dead loop error

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* fix comm test cases (#7021)

* fix comm test cases

* auto format by CI

* refine

* refine

* refine

Co-authored-by: ZZK <42901638+MARD1NO@users.noreply.github.com>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>

* fix_backward_bug_in_1d_to_2d_boxing (#7224)

* fix_backward_bug_in_1d_to_2d_boxing

* refine

* of_format

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Skip layernorm warp test (#7243)

* fix arange bug

* skip

* fix comment

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Print warning for non localhost proxy (#7228)

* print warning for non localhost proxy

Signed-off-by: daquexian <daquexian566@gmail.com>

* reformat

Signed-off-by: daquexian <daquexian566@gmail.com>

* add more check

Signed-off-by: daquexian <daquexian566@gmail.com>

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* add ddp return type (#7232)

* add dpp return type

* add comment

* auto format by CI

Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Parameter support both inplace op and setter (#7249)

* feat(Parameter): Parameter support both inplace op and setter

* feat(Tensor): tensor support data's getter interface

* test(Parameter): add getter test

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* fix(*): fix sbp filter function bug (#7229)

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* refine (#7240)

* Eager boxing status (#7150)

* add eager boxing status

* refine MakeBoxingInterpreterStatus

* add blank line

* del EagerBoxingCall

* refine BoxingInterpreterStatus

* refine BoxingInterpreterStatus

* add eager boxing log

* minor fix

* minor fix

* revert removed file

* add indent arg

* rename indent to prefix

* solve comment

* refine eager_boxing_logger

* use Global<const EagerBoxingLogger>

* minor fix

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* fix empty bug (#7239)

* fix empty bug

* simplify empty

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* fix empty debug str of hob primitive (#7245)

* fix empty debug str of hob primitive

Signed-off-by: daquexian <daquexian566@gmail.com>

* fix 'OF_PP_STRINGIZE(op)'

Signed-off-by: daquexian <daquexian566@gmail.com>

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Co-authored-by: ZZK <42901638+MARD1NO@users.noreply.github.com>

* Add VSCode dev container (#7233)

* add dev container

* use oneflow/devcontainer

* add settings for new lines and trailing ws

* refine docs

* add eol setting to config

* Add '"--gpus", "all"' if running a CUDA image

* set BUILD_HWLOC off in fast cmake init cache

* Skip send and recv if dst and src are same. (#7255)

* Maxpool op nhwc (#7214)

* maxpool2d_support_nhwc

* refine

* add test case

* format

* refine

* refine

* fix comments

* Implement consistent tensor detach (#7265)

* Feat/zero optimization in nn.Graph (#7165)

* debug

* modify graph.py

* fix bug about graph debug interface

* Fix nn graph variable bind (#6895)

* fix(AutoParallel): nn.Graph support auto_parallel change sbp

* fix(AutoParallel): use tensor.set_data interface and add print sbp info

* add comment

* hack check

* add test

* refine test

* refine test

* refine code

* add and refine zero

* fix test

* refine code

* rm debug log

* refine min size set

* add note

* debug zero

* fix cudnn config

* refine test doc

* add comment of check

* eager mode in graph pass

* format

* rebuid parameter according to sbp in synced plan

* auto format by CI

* fix code check

* fix test

* try init session at graph init

* refine and revert session init

* rm useless code

* add back print of sys conf

Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: grybd <52237830+grybd@users.noreply.github.com>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Co-authored-by: wyg1997 <wyg19970408@gmail.com>

* fix linspace limit bug (#7236)

* fix linspace limit bug

* auto format by CI

Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Co-authored-by: Liang Depeng <liangdepeng@gmail.com>

* fix merge bugs

* fix(NNGraph): create tensor in jobpass after pulling plan

* fix code bug

Co-authored-by: Li Xinqi <lixinqi2010@gmail.com>
Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Co-authored-by: ouyangyu <xuanjiuye@gmail.com>
Co-authored-by: daquexian <daquexian566@gmail.com>
Co-authored-by: leaves-zwx <kunta0932@gmail.com>
Co-authored-by: Xiaoyu Xu <xiaoyulink@gmail.com>
Co-authored-by: Shijie <821898965@qq.com>
Co-authored-by: Li Xiang <54010254+lixiang007666@users.noreply.github.com>
Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>
Co-authored-by: binbinHan <han_binbin@163.com>
Co-authored-by: Luyang <flowingsun007@163.com>
Co-authored-by: cheng cheng <472491134@qq.com>
Co-authored-by: ZZK <42901638+MARD1NO@users.noreply.github.com>
Co-authored-by: Shenghang Tsai <jackalcooper@gmail.com>
Co-authored-by: PragmaTwice <i@twice.moe>
Co-authored-by: hjchen2 <chenhoujiangcug@gmail.com>
Co-authored-by: luqiang guo <702572275@qq.com>
Co-authored-by: Peihong Liu <mosout@qq.com>
Co-authored-by: Juncheng <liujuncheng1022@gmail.com>
Co-authored-by: ZeKai Zhou <30856589+zzk0@users.noreply.github.com>
Co-authored-by: VertexC <bob2420083992@gmail.com>
Co-authored-by: Yao Chi <later@usopp.net>
Co-authored-by: liufengwei0103 <2472937968@qq.com>
Co-authored-by: lichunyou <33850693+lcylcy@users.noreply.github.com>
Co-authored-by: guo-ran <360112263@qq.com>
Co-authored-by: lixinqi <lixinqi0703106@163.com>
Co-authored-by: wyushun <wyushun@foxmail.com>
Co-authored-by: fengdaozhuo <52237830+grybd@users.noreply.github.com>
Co-authored-by: Zhenhua <huangzhenhua@zhejianglab.com>
Co-authored-by: tangnana925 <85614052+tangnana925@users.noreply.github.com>
Co-authored-by: tangnana <tnn_personal@163.com>
Co-authored-by: Zhenhua <1209435+hengzi@users.noreply.github.com>
Co-authored-by: lixiang <88304454@qq.com>
Co-authored-by: MARD1NO <359521840@qq.com>
Co-authored-by: DangKai <dangkai4u@outlook.com>
Co-authored-by: Zhanghuihong <garfield.gzhh@gmail.com>
Co-authored-by: Tao Lei <96455870+taoteo@users.noreply.github.com>
Co-authored-by: Liang Depeng <liangdepeng@gmail.com>
wyg1997 added a commit that referenced this pull request Mar 7, 2022
* add ddp return type (#7232)

* add dpp return type

* add comment

* auto format by CI

Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Parameter support both inplace op and setter (#7249)

* feat(Parameter): Parameter support both inplace op and setter

* feat(Tensor): tensor support data's getter interface

* test(Parameter): add getter test

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* fix(*): fix sbp filter function bug (#7229)

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* refine (#7240)

* Eager boxing status (#7150)

* add eager boxing status

* refine MakeBoxingInterpreterStatus

* add blank line

* del EagerBoxingCall

* refine BoxingInterpreterStatus

* refine BoxingInterpreterStatus

* add eager boxing log

* minor fix

* minor fix

* revert removed file

* add indent arg

* rename indent to prefix

* solve comment

* refine eager_boxing_logger

* use Global<const EagerBoxingLogger>

* minor fix

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* fix empty bug (#7239)

* fix empty bug

* simplify empty

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* fix empty debug str of hob primitive (#7245)

* fix empty debug str of hob primitive

Signed-off-by: daquexian <daquexian566@gmail.com>

* fix 'OF_PP_STRINGIZE(op)'

Signed-off-by: daquexian <daquexian566@gmail.com>

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Co-authored-by: ZZK <42901638+MARD1NO@users.noreply.github.com>

* Add VSCode dev container (#7233)

* add dev container

* use oneflow/devcontainer

* add settings for new lines and trailing ws

* refine docs

* add eol setting to config

* Add '"--gpus", "all"' if running a CUDA image

* set BUILD_HWLOC off in fast cmake init cache

* Skip send and recv if dst and src are same. (#7255)

* Maxpool op nhwc (#7214)

* maxpool2d_support_nhwc

* refine

* add test case

* format

* refine

* refine

* fix comments

* Implement consistent tensor detach (#7265)

* Feat/zero optimization in nn.Graph (#7165)

* debug

* modify graph.py

* fix bug about graph debug interface

* Fix nn graph variable bind (#6895)

* fix(AutoParallel): nn.Graph support auto_parallel change sbp

* fix(AutoParallel): use tensor.set_data interface and add print sbp info

* add comment

* hack check

* add test

* refine test

* refine test

* refine code

* add and refine zero

* fix test

* refine code

* rm debug log

* refine min size set

* add note

* debug zero

* fix cudnn config

* refine test doc

* add comment of check

* eager mode in graph pass

* format

* rebuid parameter according to sbp in synced plan

* auto format by CI

* fix code check

* fix test

* try init session at graph init

* refine and revert session init

* rm useless code

* add back print of sys conf

Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: grybd <52237830+grybd@users.noreply.github.com>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Co-authored-by: wyg1997 <wyg19970408@gmail.com>

* fix linspace limit bug (#7236)

* fix linspace limit bug

* auto format by CI

Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Co-authored-by: Liang Depeng <liangdepeng@gmail.com>

* replace throw by OF_UNIMPLEMENTED or UNIMPLEMENTED, refine error message, replace CHECK by CHECK_OR_RETURN (#7121)

* replace throw by OF_UNIMPLEMENTED in dim_scatter_ops.cpp

* replace throw by OF_UNIMPLEMENTED in scatter ralated kernels

* replace throw by OF_UNIMPLEMENTED in scatter ralated kernel

* replace glog CHECK by oneflow CHECK_OR_RETURN

* refine error message on modified UNIMPLEMENTED

* replace CHECK by CHECK_OR_RETURN in dim_scatter_ops.cpp

* refine error message on modified UNIMPLEMENTED

* refine error message on modified UNIMPLEMENTED

* refine error message on modified UNIMPLEMENTED

* remove std::endl, add period, remove redundant maybe.h including

* remove std::endl, add period

Co-authored-by: Yao Chi <later@usopp.net>

* Remove single client from CI (#7274)

* remove single client ci

* update get-oneflow

* rm changed_files

* refine workflow

* Revert "refine workflow"

This reverts commit f9cdcadf63f4634177471a06be5a2aa49e87df68.

* Update test.yml

* refine

* refine

* refine

* reorder

* rm changed_files

* refine

* add CHANGELOG.md

* refine

* Feat/eager tensor to graph out and inplace (#7254)

* feat(Parameter): Parameter support both inplace op and setter

* feat(Tensor): tensor support data's getter interface

* test(Parameter): add getter test

* debug

* add test

* open flatten graph test

* add validated flase type

* refine

* foramt

Co-authored-by: wyg1997 <wyg19970408@gmail.com>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Optimize LayerNorm backward param grad (#6996)

* layer_norm forward

* test case

* rm useless

* layer_norm backward dx

* layer norm param grad

* int count to T count

* fix

* fix T mask to int mask, refine code

* refine

* refine

* test case

* refine

* format

* fix

* add dtype bfloat16

* refine

* refine

* refine

* refine

* sum_loss to sum_stats

* x_buf to normalized_buf

* refine

* refine

* address review

* refine

* add testcase

* double use uncached impl to reduce compile time

* Fix python apis and xla implementation (#7183)

* Support save/load for lr_scheduler (#6948)

* feat(LrScheduler): support save/load for lr_scheduler

* refine document

* auto format by CI

* Refine test

* auto format by CI

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>

* Fix eye_op attr (#6973)

* fix

* add graph test

* Update python/oneflow/test/graph/test_graph_eye.py

Co-authored-by: daquexian <daquexian566@gmail.com>

* refine

* Update python/oneflow/test/graph/test_graph_eye.py

Co-authored-by: daquexian <daquexian566@gmail.com>

* auto format by CI

Co-authored-by: daquexian <daquexian566@gmail.com>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>

* softmax double use uncached impl to accelerate compile (#6992)

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Add [[nodiscard]] for cpp api (#6997)

* add [[nodiscard]]

* refine

* reformat

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Support Arange delta to decide dtype (#6998)

* support delta dtype to decide output dtype

* add more unittest

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Add clang as CUDA FE compiler in CI (#6954)

* update action use

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* fix

* add 80 and 86

* refine

* refine

* add CUDA_NVCC_THREADS_NUMBER

* refine

* address review

* set CUDA_NVCC_THREADS_NUMBER 8

* fix

* fix clang in init cmake

* add script

* refine

* refine

* refine

* refine

* refine

* refien

* refine

* add flags to skip zlib

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* Migrate chunk python layer to functor (#6983)

* Migrate chunk Python layer logic to functor

* fix runtime

* Fix splits bug and CI

* Modify push to emplace

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Reduce memory usage when compiling oneflow dialect ops (#7000)

* CudaAllocator device reset before OOM (#6976)

* CudaAllocator device reset before OOM

* Add NOTE

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Refactor vm stream desc (#6989)

* remove StreamDesc::num_machines

* Prepare one thread for one stream_type

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Add Diagonal Op (#6016)

* format complete

* python to cpp

* py2cpp error

* rm

* auto format by CI

* revise

* auto format by CI

* license

* docstring

* docstring

* tensor

* tensor attribute

* auto format by CI

* docstring

* revise

* test

* revise

* revise

* rename

* half

* docs

* doc,test

* test times

* revise

* format

Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* add all to all op (#6283)

* add all to all op

* add barrier

* format

* add import

* fix test

* delete barrier

* delete barrier

* Revert "delete barrier"

This reverts commit aa397ea5ba815fe6df883b263b82735f126345c8.

* Revert "delete barrier"

This reverts commit 7ddf79afaa7ac072813e84ce9224440939a3f95c.

* check tensor meta between ranks

* add more assert

* all_reduce operate in place

* all_reduce operate in place

* fix bug

* assert tensor.is_local

* fix bug in scatter

* add more assert

* delete meta check

* add pytorch comparison test

* add pytorch comparison test

* refine

* add ONEFLOW_TEST_CPU_ONLY

* fix bug from torch gloo

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Dev ivalue for cpp api (#6890)

* add api tensor

* refine

* add nn.relu

* refine

* clean shape & refine relu test

* support void* for from_blob

* add multithreading relu test

* refine test

* refine

* refine

* add comment for __internal_tensor()

* convert to copy_util

* reformat

* refine

* add ivalue

* refine directory structure

* refine cpp api test

* refine test

* add ivalue

* refine ivalue

* refine ivalue

* refine

* refine

* refine

* refine

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* default use cpu generator (#7001)

* optimize reshape/slice/transpose functor (#6956)

* optimize reshape/slice/transpose functor

* update code according to reviewer's suggestion

* judge negative dimension number besides -1

* judge negative shape value in view::Reshape

* remove is_full_slice logic in SliceFunctor

* update code according to yinggang's advice

* move ordered permute judge to TransposeKernel

* remove print sentence

* abstract IsOrderedPermute func

* support negative permute value in TransposeFunctor

* delete tranpose_kernel optimization

* Revert "delete tranpose_kernel optimization"

This reverts commit e026434dc7c1ebad948c76bde475540e3bf4477a.

* not return original tensor when reshape do nothing

* simplify code

* correct spell error

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* fix IsContinuosSubspace error (#6968)

* fix IsContinuosSubspace error

* recover original IsContinuosSubspace code

* add test case

* auto format by CI

Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* add cpu group deconve impl (#6980)

* add cpu group deconv impl

* remove useless lines

* remove useless lines

* add deconv2d import

* add groups test

* remove check_allclose=False

* add tf_prelu

* add cpu group deconv impl

* remove useless lines

* remove useless lines

* add deconv2d

* add groups test

* remove check_allclose=False

* add tf_prelu

* auto format by CI

* add deconv2d impl

* add deconv2d impl

* remove useless lines

* add deconv2d in functional api

* auto format by CI

* auto format by CI

* Add variable initial

* Add variable initial

* auto format by CI

* add conv2d impl

* add conv2d impl

* auto format by CI

* remove useless lines

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>

* Migrate the python layer logic of broadcastlike to functor (#7007)

* Migrate the python layer logic of broadcastlike to functor

* add var name

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Temporarily skip comm test cases (#7015)

* Temporarily skip comm test cases

* auto format by CI

Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Fix nd_sbp attribute type and set nd_sbp in random functors (#7017)

* fix

* fix compile

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Save Job to IR and load Job from IR (#6885)

* save to ir

* test

* fix bugs

* impl load and test

* rm useless code

* fix conflict

* fix issues

* JobOp

* fix issues

* fix test_fuse_tril_scale

* fix test jit-outline-func

* fix test_mlir_opt.py

* save

* fix ods gen for max and avg pool

* rename oneflow to oneflow_foundation

* fix files checks

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* auto format by CI

* check in changes

* refine

* Update oneflow/ir/test/OneFlow/test_mlir_opt.py

* Update oneflow/ir/include/OneFlow/OneFlowOps.td

* refine includes

* printer & parser & verifier

* code tidy

* tidy include

* address review

* rm duplicated GetDataTypeType

* TensorSource trait

Co-authored-by: jackalcooper <jackalcooper@gmail.com>
Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>

* Fix Simple CI linkage (#6986)

* fix-simple-ci-linkage

* refine

* refine

* fix

* refine

* refine

* refine

* refine

* refien

* refine

* revert

* refine

* auto format by CI

* refine

* revert

* refine

Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>

* fix sbp when weight is optional (#6984)

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Feat from numpy (#7013)

* feat(Tensor): support share memory with ndarray

* test(FromNumpy): add test

* enhancement test and add document

* Fix merge error

* fix bug in numpy c api

* Fix(doctest): fix doctest error

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Add custom ShapeAttr in ODS (#7023)

* add ShapeAttr

* refine

* fix doc

* refine

* fix (#7028)

* Add linspace op (#7006)

* add linspace op

* refine doc

* refine

* fix comments

* fix comment

* auto format by CI

* fix ci doc error

Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* fix fasterrcnn infer (#7014)

* fix fasterrcnn infer

* roi_align 0shape

* refine

* refine

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* separate kernel state and cache (#6655)

* support eager state except lazy dynamic

Signed-off-by: daquexian <daquexian566@gmail.com>

* modularize kernel contexts

Signed-off-by: daquexian <daquexian566@gmail.com>

* fix warning

Signed-off-by: daquexian <daquexian566@gmail.com>

* reformat

Signed-off-by: daquexian <daquexian566@gmail.com>

* remove duplicated license

Signed-off-by: daquexian <daquexian566@gmail.com>

* fix static check error

Signed-off-by: daquexian <daquexian566@gmail.com>

* make test gpu only

Signed-off-by: daquexian <daquexian566@gmail.com>

* temp

Signed-off-by: daquexian <daquexian566@gmail.com>

* revert opkernel context changes, align with master

Signed-off-by: daquexian <daquexian566@gmail.com>

* reformat

Signed-off-by: daquexian <daquexian566@gmail.com>

* refine cachecontext

Signed-off-by: daquexian <daquexian566@gmail.com>

* add separate cache context inferface, remove out-dated files

Signed-off-by: daquexian <daquexian566@gmail.com>

* add init and cache context aliases

Signed-off-by: daquexian <daquexian566@gmail.com>

* update eager kernel

Signed-off-by: daquexian <daquexian566@gmail.com>

* fix wrong AttrMayChanged value

Signed-off-by: daquexian <daquexian566@gmail.com>

* rename and add comment

Signed-off-by: daquexian <daquexian566@gmail.com>

* auto format by CI

* fix combined_margin_loss_kernel.cpp

Signed-off-by: daquexian <daquexian566@gmail.com>

* rename op_kernel_state_wrapper.h to op_kernel_wrapper.h

Signed-off-by: daquexian <daquexian566@gmail.com>

* rename more classes, fix old cache in stateful op kernel

Signed-off-by: daquexian <daquexian566@gmail.com>

* rename more classes

Signed-off-by: daquexian <daquexian566@gmail.com>

* may changed -> not changed

Signed-off-by: daquexian <daquexian566@gmail.com>

* optimize away genrepeatedbn

Signed-off-by: daquexian <daquexian566@gmail.com>

* reformat

Signed-off-by: daquexian <daquexian566@gmail.com>

* refine

Signed-off-by: daquexian <daquexian566@gmail.com>

* update stateful local opkernel, use Cache** if possible

Signed-off-by: daquexian <daquexian566@gmail.com>

* remove TensorDesc4ArgNameAndIndex base method

Signed-off-by: daquexian <daquexian566@gmail.com>

* auto format by CI

* fix clang-tidy error

Signed-off-by: daquexian <daquexian566@gmail.com>

* auto format by CI

* fix conv kernel bug

Signed-off-by: daquexian <daquexian566@gmail.com>

* auto format by CI

* fix group conv bug and fix warning

Signed-off-by: daquexian <daquexian566@gmail.com>

* fix avgpool error

Signed-off-by: daquexian <daquexian566@gmail.com>

* fix maxpool error

Signed-off-by: daquexian <daquexian566@gmail.com>

* auto format by CI

* respect flag in deconv cpu kernel, rename cache to cache_ptr

Signed-off-by: daquexian <daquexian566@gmail.com>

* fix compile error

Signed-off-by: daquexian <daquexian566@gmail.com>

* auto format by CI

* fix deconv cache bug

Signed-off-by: daquexian <daquexian566@gmail.com>

Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: Li Xinqi <lixinqi2010@gmail.com>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Add fully support for all datatype (#7025)

* add fully support for all datatype

* Use max array size

* add clang-format off to maintain the matrix

* fix format

* remove redundant numpy dtype

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Migrate split python layer to functor (#7030)

* Migrate split python layer to functor

* modify dim

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Add add_sparse_optimizer  for Graph (#6988)

* add_sparse_optimizer

* format

* fix bug

* refine new interface by discuss

* auto format by CI

* address review

* correct syntax

* correct error message

* rm debug print

* auto format by CI

* fix cpu-only test

Co-authored-by: XIE Xuan <xiexuanx2@gmail.com>
Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Refine RUN_CUDA_KERNEL (#7003)

* Refine RUN_CUDA_KERNEL

* Added LaunchConfig

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Support llvm in tree build (#6995)

* refine

* refine

* refine

* refine

* add61

* refien

* refine

* refine

* refine

* refine

* refien

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* rm

* revert

* refine

* refine

* refine

* refine

* return_self_in_to_consistent_if_necessary (#7004)

* return_self_in_to_consistent_if_necessary

* fix error and add test case

* skip cpu test

* fix error

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Decouple ep and global (#7027)

* Decouple ep and global

* NOLINT

* fix

* fix import

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* arange doc fix (#7035)

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* add_consistency_check_in_consistent_tensor_set_data (#7002)

* add_consistency_check_in_consistent_tensor_set_data

* auto format by CI

* minor fix

* add just wrap

Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* [cmake] add liboneflow_cpp target (#7005)

* add cmake changes for liboneflow_cpp.so

Signed-off-by: daquexian <daquexian566@gmail.com>

* add separate target for cpp api test

Signed-off-by: daquexian <daquexian566@gmail.com>

* add cpp api test in ci

Signed-off-by: daquexian <daquexian566@gmail.com>

* reverse the order of cudnn and cuda library

Signed-off-by: daquexian <daquexian566@gmail.com>

* update logic of BUILD_MONOLITHIC_LIBONEFLOW

Signed-off-by: daquexian <daquexian566@gmail.com>

* rename BUILD_MONOLITHIC_LIBONEFLOW to BUILD_MONOLITHIC_LIBONEFLOW_CPP_SO

Signed-off-by: daquexian <daquexian566@gmail.com>

* share lib directory in test container

Signed-off-by: daquexian <daquexian566@gmail.com>

* add github actions debug

Signed-off-by: daquexian <daquexian566@gmail.com>

* Revert "add github actions debug"

This reverts commit 7d9aef684a479285c690f38d25525c9b97865e45.

* add upterm debug after exe test

Signed-off-by: daquexian <daquexian566@gmail.com>

* sleep after fail

Signed-off-by: daquexian <daquexian566@gmail.com>

* set LD_LIBRARY_PATH in yml for cpp api test exe

Signed-off-by: daquexian <daquexian566@gmail.com>

* sleep

Signed-off-by: daquexian <daquexian566@gmail.com>

* upload liboneflow_cpp.so

Signed-off-by: daquexian <daquexian566@gmail.com>

* modify cmake to trigger compilation

Signed-off-by: daquexian <daquexian566@gmail.com>

* remove sleep

Signed-off-by: daquexian <daquexian566@gmail.com>

* build cpp api in cpu mode

Signed-off-by: daquexian <daquexian566@gmail.com>

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Fix CUDA 52 and add it to CI (#7031)

* refine

* refine

* refine

* refine

* revert

* fix

* refine

* refine

* refine

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Add check of placement constructor (#6991)

* add_check_of_placement_constructor

* move CheckDeviceIdsIsValid to runtime

* handle comment

* fix error

* fix error

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Fix(FromNumpy): fix bug in stride (#7042)

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* add non virtual destructor back (#6999)

Signed-off-by: daquexian <daquexian566@gmail.com>

Co-authored-by: Houjiang Chen <chenhoujiangcug@gmail.com>

* move python code to cpp: eye (#7036)

* 80% Sbp signature left to finish

* refine functional_api.yaml

* 90% docstr left to update

* refine

* add sbp check

* refine docs

* auto format by CI

* refine

* refine docstr

* auto format by CI

Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Fix l2norm block_size (#7044)

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* fix undefined symbol: cudaGetDeviceCount (#7052)

* fix_worker_orphan_process (#7048)

* fix_worker_orphan_process

* use SIGTERM instead

* broadcast elemwise binary (#6871)

* add

* broadcast elementwise binary

* fix

* refine

* fix

* refine

* refine

* for compile

* refine

* refine

* refine

* refine

* refine

* revert kernels

* revert kernel

* refine

* refine

* refine

* refine

* nvcc thread to 4

Co-authored-by: ZZK <42901638+MARD1NO@users.noreply.github.com>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Source op per critical section (#6472)

* backup code

* EventRecord

* auto format by CI

* backup code

* remove deprecated binary test cases

* refactor valatile to atomic

* add StreamType::InitInstructionStatusIf/StreamType::DeleteInstructionStatusIf

* merge from branch profiling_nn_graph

* address comments

* EventRecordProvider

* more comments for XXXStatusQuerier::SetLaunched

* more comments for SharedEventRecord::Init

* wait source op per critical section

* rename a task_node.cpp

* minor fix

* backup code

* fix compiler complaints

* 1) remove AddCtrlEdgeBetweenSrcDstTickAndInputOutputInSameRank; 2) create CriticalSectionInstance buffers

* fix compiler complaints

* more profiler code

* refactor vm preschedule

* TryMoveFromWaitingToReady

* revert flying_instruction_cnt

* revert to single position to call DispatchInstruction

* revert several code

* reset instruction watermark

* remove is_xxx_hook_empty

* build with profiler

* merge master

* insert device ticks before and after critical sections

* refactor register_num of cs_wait/cs_callback from 2 to 128

* fix static analysis complaints

* fix complier complaints about JobBuilder::ParallelConf4OpName

* Update oneflow/core/operator/critical_section_wait_tick_op.cpp

Co-authored-by: daquexian <daquexian566@gmail.com>

* address pr comments

* add job example for InstructionsBuilder::LaunchLazyJob

* address pr comments

Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Co-authored-by: ouyangyu <xuanjiuye@gmail.com>
Co-authored-by: daquexian <daquexian566@gmail.com>

* More details of error of getting op matched sbp signature (#7077)

* more details of error msg

* minor change

* address review comment

* avoid namesake iterator

* Module apply only once (#7055)

* add once apply of param

* apply once on buffer

* test reuse var on module to

* test resue var

* rm useless test

* finish test

* refine test

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* distributed test bugfix (#7057)

* change spawn_shell to spawn_shell_and_check, sleep in script

Signed-off-by: daquexian <daquexian566@gmail.com>

* fix distributed test master addr

Signed-off-by: daquexian <daquexian566@gmail.com>

* remove sleep

Signed-off-by: daquexian <daquexian566@gmail.com>

* spawn_shell -> spawn_shell_ignoring_failure

Signed-off-by: daquexian <daquexian566@gmail.com>

* auto format by CI

* fix bug

Signed-off-by: daquexian <daquexian566@gmail.com>

* auto format by CI

* fix the reversed logic

Signed-off-by: daquexian <daquexian566@gmail.com>

* improve error msg

Signed-off-by: daquexian <daquexian566@gmail.com>

* resolve name conflict of MASTER_ADDR

Signed-off-by: daquexian <daquexian566@gmail.com>

Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* fix promote_type matrix (#7066)

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* fix chunk op dim=-1 bug (#7073)

* fix chunk op dim=-1 bug

* Update oneflow/core/functional/impl/array_functor.cpp

Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>

* Update oneflow/core/functional/impl/array_functor.cpp

Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>

Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Fix resource desc dump cudnn conf bug (#7038)

* fix Resource::DumpCudnnConf

* fix typo and error msg

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* fix concat bug (#7075)

* fix

* support concat single input

* Clean TensorNameScope after graph build (#7076)

* Clear tensor name scope after graph build

* Add test case of 2 graph caught same free eager tensor

* auto format by CI

Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* fix_abnormal_printing (#7099)

* Fix bias add dropout fuse (#7081)

* fix bias_add dropout fuse when p=0.0

* remove redundant op

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Support 1d to 2d eager boxing (#7083)

* fix Resource::DumpCudnnConf

* support_1d_to_2d_eager_boxing

* rename stack to unflatten

* add test case

* of format

* refine test case

* Revert "fix Resource::DumpCudnnConf"

This reverts commit f07278d71e3f344f435fc8f116a12cbd1c099b54.

* support nd to 1d

* add 2d to 1d test case

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Implement all User Ops with Op Schema (#7032)

* add oneflow-tblgen: generate op schema (OpInterpCtx) from ods

* cmake: add inja

* tblgen: add oneflow_datatype

* tblgen: use option cat

* tblgen: fix error

* tblgen: put impl in .cpp

* tblgen: fix null attrs

* tblgen: fix null ops

* refine

* refine

* reifne

* Refine op schema template and compilation

* add base OpInterpCtx to finish compilation

* fix

* refine

* fix

* add custom infer code

* generate op registrants automatically

* refine

* fix

* update user op ods and fix shape attr

* refine

* refine

* add custom code in op base

* refine comments

* add same_output_regst_num and infer

* support declare hasxx

* update op schema emitter

* refine

* emit output regist num

* refine

* refine

* migrate acc op

* migrate onerec_reader, ones_like, send, pack and padding ops

* add has_sbp_signature_infer_fn

* refine

* migrate pad, parallel_cast, partial_fc and pooling ops

* rm redundant has_device_infer_fn

* migrate prelu, quantization, randperm, reduce and repeat ops

* migrate reshape, reshape_like, roi_align, same_pad, selu and scalar related ops

* back port

* backport

* migrate ops

* refine

* refine

* refine

* refine

* add new op

* fix llvm not found

* fix mlir headers

* fix mlir headers

* fix llvm not found

* irefine

* mark override

* fix merge

* fix

* fix

* set op schema as obj lib to speed up

* rewrite ops

* add addn

* add grdi

* refien

* add more def (#7051)

* affine grid

* refien

* refine

* refine

* refine

* fix

* refien

* refine

* refine

* refine

* refine

* refine

* refien

* refine

* refine

* refein

* refine

* refine

* refine

* refine

* refien

* refine

* refine

* refine

* refien

* refien

* refien

* refine

* refine

* refien

* refine

* refine

* refine

* refein

* refine

* refine

* refine

* refine

* refine

* refien

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refein

* refine

* refine

* refine

* move more ops

* fix math_binary_broadcast/elementwise_ops

* fix hardtanh

* add norm

* rename file and add CpuOnly no_grad

* fix ir & fix norm op

* fix oneflow-tblgen

* fix math_unary_elementwise_op

* fix norm

* fix bn

* fix op schema

* refine

* fix

* refine physical_tensor_desc_infer_fn

* refine

* add ScalarLogicalNotEqualOp & RecvOp

* refine

* auto format by CI

* fix fmt

* add cuda only trait

* delete unused inja

* del inja_copy_headers_to_destination

* delete unused inja

* del inja_copy_headers_to_destination

* add cuda only to tblgen

* fix json inja url and md5 not used

* fix json inja url and md5 not used

* refine

* revert

* add with cuda

* refine

* delete GenUserOpODS

* remove cuda only

* revert cuda only after meeting

* fix

Co-authored-by: PragmaTwice <i@twice.moe>
Co-authored-by: hjchen2 <chenhoujiangcug@gmail.com>
Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>

* Feat/debug pass (#7054)

* add pass debug

* debug pass

* refine comment of fuse add pass

* auto format by CI

Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Fix error message (#6930)

* fix error message

* fix dot doc

* fix dot elem cnt

* auto format by CI

Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* fix simple ci: add of_op_schema target to tidy check (#7105)

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Rename AnyType in .td (#7109)

* AnyType => Tensor

* refine

* refine

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Feat graph reuse var (#7080)

* add once apply of param

* apply once on buffer

* test reuse var on module to

* test resue var

* rm useless test

* finish test

* refine test

* Clear tensor name scope after graph build

* Add test case of 2 graph caught same free eager tensor

* auto format by CI

* refactor var build draft

* add full func; add check

* done

* add test of call parameter ousite its moudule

* fix break test

Co-authored-by: chengtbf <472491134@qq.com>
Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Fix l2_normalize & add nn.functional.normalize (#6940)

* fix l2_normalize

* add normalize

* add test for normalize

* refine

* clean l2_normalize and refine normalize

* simplify normalize test

* Fix l2norm block_size

* refine

Co-authored-by: Juncheng <liujuncheng1022@gmail.com>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Align api in swin transformer (#7058)

* add linspace op

* fix align error in swintransformer

* add @ magic method

* fix conflict

* support tensor list

* fix meshgrid bug

* revert

Co-authored-by: hjchen2 <chenhoujiangcug@gmail.com>

* set CMAKE_LINK_DEPENDS_NO_SHARED to ON (#7063)

Signed-off-by: daquexian <daquexian566@gmail.com>

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Add other api graph autotest (#7091)

* Clear tensor name scope after graph build

* Add test case of 2 graph caught same free eager tensor

* auto format by CI

* add other api graph autotest

* add more samples

* fix comments

* refine

* refine

* refine

* refine

* refine

* fix error

* fix test error

* fix bug

* fix flip bug

* fix bug

* fix bug

* fix ci bug

* fix ci error

* fix bug

* fix ci error

Co-authored-by: chengtbf <472491134@qq.com>
Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Co-authored-by: Li Xiang <54010254+lixiang007666@users.noreply.github.com>

* [serving] dev graph run (#7008)

* add cmake changes for liboneflow_cpp.so

Signed-off-by: daquexian <daquexian566@gmail.com>

* add separate target for cpp api test

Signed-off-by: daquexian <daquexian566@gmail.com>

* add cpp api test in ci

Signed-off-by: daquexian <daquexian566@gmail.com>

* graph run

* reverse the order of cudnn and cuda library

Signed-off-by: daquexian <daquexian566@gmail.com>

* update logic of BUILD_MONOLITHIC_LIBONEFLOW

Signed-off-by: daquexian <daquexian566@gmail.com>

* rename BUILD_MONOLITHIC_LIBONEFLOW to BUILD_MONOLITHIC_LIBONEFLOW_CPP_SO

Signed-off-by: daquexian <daquexian566@gmail.com>

* refine

* [draft] implement graph parameter load and save (#7010)

* implement parameter save (python) and load (c++)

Signed-off-by: daquexian <daquexian566@gmail.com>

* revert accident changes

Signed-off-by: daquexian <daquexian566@gmail.com>

* fix circular reference

Signed-off-by: daquexian <daquexian566@gmail.com>

* pimpl

* batching

* share lib directory in test container

Signed-off-by: daquexian <daquexian566@gmail.com>

* fix typo;

* add github actions debug

Signed-off-by: daquexian <daquexian566@gmail.com>

* Revert "add github actions debug"

This reverts commit 7d9aef684a479285c690f38d25525c9b97865e45.

* add upterm debug after exe test

Signed-off-by: daquexian <daquexian566@gmail.com>

* sleep after fail

Signed-off-by: daquexian <daquexian566@gmail.com>

* set LD_LIBRARY_PATH in yml for cpp api test exe

Signed-off-by: daquexian <daquexian566@gmail.com>

* refine

* add test file && input order

* sleep

Signed-off-by: daquexian <daquexian566@gmail.com>

* upload liboneflow_cpp.so

Signed-off-by: daquexian <daquexian566@gmail.com>

* modify cmake to trigger compilation

Signed-off-by: daquexian <daquexian566@gmail.com>

* load job from ir && clean && add mlir model

* [remove useless python code]save to .pb

* add target of_common_obj to remove duplicate REGISTER_PASS  && run of_format

* remove openvino

* remove openvino test

* refine

* IValue

* Update oneflow/api/cpp/framework/graph.h

Co-authored-by: daquexian <daquexian566@gmail.com>

* refine

* refine

* refine

* refine

* refine

* refine

* rename in oneflow.cmake

* refine oneflow.cmake

* make of_api_common object library

* move device util function in api to core

* remove device check in New and ThreadLocalGetOrNew

* refine

* fix device test

* refine graph test

* refine GetExeDir()

* refine GetExeDir() again

* fix

* refine

* fix

Co-authored-by: daquexian <daquexian566@gmail.com>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Co-authored-by: mosout <mosout@qq.com>

* disable autograd in lazy mode (#7070)

* disable autograd in lazy mode

* refine

* Fix/rand source op in graph (#7092)

* add test

* fix rand consistent

* add test

* Fix powf (#7106)

* quick fix power

* add int scalar test case

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Dispatch stateful ops using functional api (#7046)

* Dispatch functional stateful ops

* fix

* fix cmake

* fix

* disable attr check since it may not given when creating op expr.

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* refine

Co-authored-by: VertexC <bob2420083992@gmail.com>

* Fix HWLoc memory affinity (#7115)

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* add_env_api_docs (#7100)

* add_env_api_docs

* minor fix

* fix grammatical errors

Co-authored-by: Yao Chi <later@usopp.net>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* tmp skip s0 print because of slice (#7065)

* tmp skip s0 print because of slice

* tmp skip s0 print in test case

* fix

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* indexing first version (#7012)

* indexing first version

* complete

* test

* out loop

* test skip

* revise

* revise

* shape

* docs

* formatted

* confict1

* confict2

* confict2

* confict

* revise

* auto format by CI

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>

* fix maybe: add Maybe(T&&) to allow constructing from rvalue T (#7125)

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* autotest_add_graph_log (#7126)

* Meta info consistency check (#7085)

* meta_info_consistency_check

* refine check function

* Update consistent_cast.cpp

* move check to opinterpreter

* refine

* add note

* refactor MetaInfoConsistencyCheck

* of_format

* refine

* NonRecursiveMetaInfoConsistencyCheck

* fix func name

* add IsMetaInfoConsistencyCheckDisable()

* mino fix

* refine

* minor fix

* format

* minor fix

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* cmake: use interface target instead of include_directories in pybind11 (#7128)

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Import cmake dependence json and inja using FetchContent (#7124)

* import cmake dependence json and inja using FetchContent

* install-llvm: fix url hash

* fix inja config

* add cache var

* fix ninja build

* fix ninja build

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Add environment variable to set GRPC_ARG_MAX_MESSAGE_LENGTH (#7130)

* env ONEFLOW_GRPC_MAX_MESSAGE_BYTE_SIZE

* set default to -1

* Fea/nhwc (#6811)

* legacy maxpool2d module

* add legacy avgpool2d

* add graph cudnn conv alg config

* add conv2d nhwc

* lazy create cuda_stream in CudaCopyD2HDeviceCtx CudaStreamHandleDeviceCtx

* refine

* conv bn pool nhwc for resnet perf

* one hot with float

* use BiasAddRowGpu

* rm l2 with 0

* reformat

* add nhwc env var

* legacy pool merged into new

* refine

* fix style

* fix and refine

* address review

* fix and refine

* fix doc test

Co-authored-by: luyang <flowingsun007@163.com>
Co-authored-by: guo-ran <360112263@qq.com>
Co-authored-by: lixinqi <lixinqi0703106@163.com>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* reduce memory usage caused by slice grad (#7144)

* cmake: fix THIRD_PARTY build (#7146)

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* fix fold op (#7156)

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Support inplace for lazy consistent (#7112)

* Support inplace for lazy consistent

* fix single client sbp hint

* refine

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* fix prelu bug (#7118)

* support dtype and device in prelu

* optimize PreluFunctor

* fix prelu 1-dim error

* update

* update

* auto format by CI

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>

* use ibn2nd_sbp to get nd_sbp (#7155)

Co-authored-by: Houjiang Chen <chenhoujiangcug@gmail.com>

* fix copy bug (#7159)

* fix copy bug

* add to test case

* refine

* fix test case

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Fix laynorm backward bug (#7164)

* fix layernorm backward index bug

* add layernorm test case

* auto format by CI

Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* [Fix] graph support 0-Size tensor (#6957)

* Add nn.functional.glu graph test

* add filter to motify functional autotest

* motify code

* add test example

* add test else

* add test judging condition for test_masked_fill.py,test_constant.py,test_tile.py、test_repeat.py,test_expand.py

* add test ok example

* Clear tensor name scope after graph build

* Add test case of 2 graph caught same free eager tensor

* auto format by CI

* Dev cc clean tensor name scope (#7082)

* Clear tensor name scope after graph build

* Add test case of 2 graph caught same free eager tensor

* auto format by CI

Co-authored-by: chengtbf <472491134@qq.com>
Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>

* submit test success example

* test success example

* submit test code

* fix a bug about relu module with 0 shape data

* fixed a bug about relu module with 0 shape data

* fix a bug about relu module with 0 shape data

* fix a bug about relu module with 0 shape data

* 0shape and 0d autotest

* fix a bug about relu module with 0 shape data

* 0shape changed to 0_size

* modify test_var.py

* modify test_eye.py

* modify test_reshape.py

* modify test_.py

* modify ReshapeFunctor

* modify some file

* Fixed graph autotest bug with reshape op test

* Fixed graph autotest bug with reshape op test

* fixed test_sub.py

* modify test_sub.py

* modify tensor_methods.cpp

* modify array_functor.cpp

* graph support 0-Size tensor

* rename 0shape to 0 size

* modified check_graph=True

* fix and refine

Co-authored-by: Zhenhua <huangzhenhua@zhejianglab.com>
Co-authored-by: tangnana925 <85614052+tangnana925@users.noreply.github.com>
Co-authored-by: tangnana <tnn_personal@163.com>
Co-authored-by: Zhenhua <1209435+hengzi@users.noreply.github.com>
Co-authored-by: chengtbf <472491134@qq.com>
Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: Xiaoyu Xu <xiaoyulink@gmail.com>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Cumsum op implementation (#7050)

* add cumsum op's forward definition

* add cumsum forward test case

* cumsum ver3

* remove calculating time

* add cumsum forward gpu implementation

* fix gpu forward error

* change var name

* remove annotation

* add cumsum cpu forward multi-thread support

* add multi-thread annotation

* add cumsum grad definition

* update

* add cumsum cpu backward

* add cumsum cpu backward functor

* add cumsum autograd

* update

* remove user interface

* use random method to test cumsum forward

* add cumsum gpu backward

* add cumsum gpu test

* fix gpu backward bug

* add a 3d cuda kernel try

* Revert "add cumsum gpu test"

This reverts commit 05c31556ba28ecb827b25e54c2f5fa38984e8096.

* Revert "Revert "add cumsum gpu test""

This reverts commit 918ee1569863b008c1d419c3528257416cffd840.

* change nele to ele_cnt

* add test_cumsum.py in oneflow/test/modules

* change original test_cumsum to autotest version

* optimize cumsum for special up_space and down_space

* add two special cu func

* add cumsum doc

* update doc

* update doc

* update code according to bbuf's review

* ditto

* change pin/pout to in_ptr/out_ptr

* remove multi-thread func

* update doc

* use tensor processor

* update by review

* update by review

* update

* update

* auto format by CI

* auto format by CI

* update doc

* update

Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>

* Logical slice in tenosr str (#7116)

* using logical slice in tensor str

* add tensor str util file

* refine

* refine

* refine

* refine

* add logical slice docs

* fix bug

* fix comment

* auto format by CI

* fix doc test bug

* delete TODO

Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Add install for oneflow py (#7107)

* Add install for oneflow py

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refien

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* refine

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* fix bug: output key not exists when SavaJobToIR (#7139)

* fix bug: output key not exists when SavaJobToIR

* [test] makedirs when path not exists

* remove useless comment

Co-authored-by: Peihong Liu <mosout@qq.com>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Add linalg 2d norm op for clip_grad (#7160)

* add linalg_2d_norm op for clip_grad

* code format

* revert sqrt

* fix comment

* refine

* fix comment

* fix ci error

* fix ci error

* fix docs bug

* fix ci error

* fix ci error

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* refine nn.graph autotest (#7111)

* add linspace op

* refine graph autotest

* revert

* add graph error trace

* fix bug

* fix autotest bug

* auto format by CI

* fix set_printoptions error

* auto format by CI

* CI test bug

* auto format by CI

* For CI

* auto format by CI

* For CI test

* fix ci error

* revert for ci

* fix bug

* fix ci error

* fix bug

* fix bug

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: Li Xiang <54010254+lixiang007666@users.noreply.github.com>
Co-authored-by: lixiang <88304454@qq.com>

* add oneflow/pytorch cudnn.deterministic (#7172)

* add cudnn.deterministic

* fix bug

* auto format by CI

* fix bug

* fix generate fake program input bug

* auto format by CI

Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>

* fix linalg vector norm scalar tensor print bug (#7178)

* fix linalg vector norm scalar tensor print bug

* auto format by CI

Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>

* format

* refine

* format

Co-authored-by: Yinggang Wang <wyg19970408@gmail.com>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: liufengwei0103 <2472937968@qq.com>
Co-authored-by: daquexian <daquexian566@gmail.com>
Co-authored-by: guo ran <360112263@qq.com>
Co-authored-by: Peihong Liu <mosout@qq.com>
Co-authored-by: ZZK <42901638+MARD1NO@users.noreply.github.com>
Co-authored-by: Shenghang Tsai <jackalcooper@gmail.com>
Co-authored-by: Li Xiang <54010254+lixiang007666@users.noreply.github.com>
Co-authored-by: cheng cheng <472491134@qq.com>
Co-authored-by: Li Xinqi <lixinqi2010@gmail.com>
Co-authored-by: lichunyou <33850693+lcylcy@users.noreply.github.com>
Co-authored-by: Luyang <flowingsun007@163.com>
Co-authored-by: wyushun <wyushun@foxmail.com>
Co-authored-by: zhu wang <33675639+olojuwin@users.noreply.github.com>
Co-authored-by: leaves-zwx <kunta0932@gmail.com>
Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>
Co-authored-by: Shijie <821898965@qq.com>
Co-authored-by: XIE Xuan <xiexuanx2@gmail.com>
Co-authored-by: Juncheng <liujuncheng1022@gmail.com>
Co-authored-by: binbinHan <han_binbin@163.com>
Co-authored-by: CHI LIU <42956025+thinksoso@users.noreply.github.com>
Co-authored-by: Yao Chi <later@usopp.net>
Co-authored-by: ouyangyu <xuanjiuye@gmail.com>
Co-authored-by: Xiaoyu Xu <xiaoyulink@gmail.com>
Co-authored-by: PragmaTwice <i@twice.moe>
Co-authored-by: luqiang guo <702572275@qq.com>
Co-authored-by: ZeKai Zhou <30856589+zzk0@users.noreply.github.com>
Co-authored-by: VertexC <bob2420083992@gmail.com>
Co-authored-by: lixinqi <lixinqi0703106@163.com>
Co-authored-by: fengdaozhuo <52237830+grybd@users.noreply.github.com>
Co-authored-by: Zhenhua <huangzhenhua@zhejianglab.com>
Co-authored-by: tangnana925 <85614052+tangnana925@users.noreply.github.com>
Co-authored-by: tangnana <tnn_personal@163.com>
Co-authored-by: Zhenhua <1209435+hengzi@users.noreply.github.com>
Co-authored-by: lixiang <88304454@qq.com>

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Co-authored-by: Houjiang Chen <chenhoujiangcug@gmail.com>
Co-authored-by: Yinggang Wang <wyg19970408@gmail.com>
Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: liufengwei0103 <2472937968@qq.com>
Co-authored-by: daquexian <daquexian566@gmail.com>
Co-authored-by: Peihong Liu <mosout@qq.com>
Co-authored-by: ZZK <42901638+MARD1NO@users.noreply.github.com>
Co-authored-by: Shenghang Tsai <jackalcooper@gmail.com>
Co-authored-by: Li Xiang <54010254+lixiang007666@users.noreply.github.com>
Co-authored-by: cheng cheng <472491134@qq.com>
Co-authored-by: Li Xinqi <lixinqi2010@gmail.com>
Co-authored-by: lichunyou <33850693+lcylcy@users.noreply.github.com>
Co-authored-by: Luyang <flowingsun007@163.com>
Co-authored-by: wyushun <wyushun@foxmail.com>
Co-authored-by: zhu wang <33675639+olojuwin@users.noreply.github.com>
Co-authored-by: leaves-zwx <kunta0932@gmail.com>
Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>
Co-authored-by: Shijie <821898965@qq.com>
Co-authored-by: XIE Xuan <xiexuanx2@gmail.com>
Co-authored-by: Juncheng <liujuncheng1022@gmail.com>
Co-authored-by: binbinHan <han_binbin@163.com>
Co-authored-by: CHI LIU <42956025+thinksoso@users.noreply.github.com>
Co-authored-by: Yao Chi <later@usopp.net>
Co-authored-by: ouyangyu <xuanjiuye@gmail.com>
Co-authored-by: Xiaoyu Xu <xiaoyulink@gmail.com>
Co-authored-by: PragmaTwice <i@twice.moe>
Co-authored-by: luqiang guo <702572275@qq.com>
Co-authored-by: ZeKai Zhou <30856589+zzk0@users.noreply.github.com>
Co-authored-by: VertexC <bob2420083992@gmail.com>
Co-authored-by: lixinqi <lixinqi0703106@163.com>
Co-authored-by: fengdaozhuo <52237830+grybd@users.noreply.github.com>
Co-authored-by: Zhenhua <huangzhenhua@zhejianglab.com>
Co-authored-by: tangnana925 <85614052+tangnana925@users.noreply.github.com>
Co-authored-by: tangnana <tnn_personal@163.com>
Co-authored-by: Zhenhua <1209435+hengzi@users.noreply.github.com>
Co-authored-by: lixiang <88304454@qq.com>

* Use normalize instead of l2_normalize (#7113)

* use normalize instead of l2_normalize

* refine

* fix l2_norm

* reformat

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* add_eager_naive_s_to_p_boxing (#7203)

* add_eager_naive_s_to_p_boxing

* fix typo

* minor fix

* fix test case

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* add timeout for distributed run (#7286)

* add timeout

* strict timeout

* quick fix

* cmake: import gflags and glog using FetchContent (#7176)

* cmake: import gflags and glog using FetchContent

* cmake: use set_mirror_url_with_hash

* fix THIRD_PARTY build

* fix lib path

* fix gflags

* remove gflags

* format

* auto format by CI

* fix xrt gflags

* fix name

* remove oneflow_exe_third_party_libs

* remove PUBLIC

* revert some changes

* Update oneflow.cmake

* fix so

* fix so

* remove Custom op test and Single client dry run test

Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Prune parallel dim with val eq one in parallel dim reduce (#7257)

* fix Resource::DumpCudnnConf

* prune_parallel_dim_with_val_eq_one_in_parallel_dim_reduce

* minor fix

* refine Prune

* refine

* refine

* minor fix

* fix bug

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* add sync (#7294)

* add polynomial scheduler (#7260)

Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Fix autotest inplace bug, hardsigmod (#7276)

* Fix autotest inplace bug, hardsigmod

* Fix

* Format

* Fix

* Fix kwargs

* auto format by CI

Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Use flowvision replace flow.utils.vision (#6612)

* use flowvision

* del flow.utils.vision

* add flow.utils.data

* refine

* update version

* refine

* align clip grad with torch in error_if_nonfinite (#7304)

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Use cmake install to copy cpp api related files (#7200)

* install cpp api

* install mlir related files

* clean

* handle third party dependences

* support cpack

* fix

* Update oneflow.cmake

* fix

* fix compiling error

* refine

* add exe test as deps

* install third party

* refine

* refine

* revert install dir

* install third party

* refine

Co-authored-by: Shenghang Tsai <jackalcooper@gmail.com>

* Add_Tensor.T_and_Tensor.t()_ops (#7269)

* Add_Tensor.T_and_Tensor.t()_ops

* Update single test and docs

* Update single test

* auto format by CI

* Update tensor.T

* recover requirements.txt

* auto format by CI

* Update check_graph=False

Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Refactor release tensor (#7071)

* refactor ReleaseTensor instruction

* support CurrentDevVmDepObjectConsumeMode for ReleaseTensor

* rm useless Touch instruction

* reset speed test threshold

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Co-authored-by: Yu OuYang <xuanjiuye@gmail.com>

* Fix functional dropout and Docs (#7237)

* fix addend to kwargs

* fix to an extra

* fix test

* fix to use key word arguments

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Modify graph and 0-D Tensor (#7208)

* Fix 0 dim bug

* part1

* Fix

* Fix

* Add 0dim to 1dim function

* Fix

* Fix

* Fix

* Fix

* Fix

* Fix

* Delete test_logical_not_with_0dim_data

* Fix

* Format

* FIx

* Fix

* Fix

* Update test_movedim.py

* Update test_narrow.py

* Test bug

* Test bug

* Fix graph bug

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Refactor OneRecReader to stateful op and provide Module api (#7271)

* refactor read_onerec to nn.OneRecReader

* fix

* refine doc

* auto format by CI

Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: Yao Chi <later@usopp.net>
Co-authored-by: Houjiang Chen <chenhoujiangcug@gmail.com>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* [bug] Adam align torch params (#7318)

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* add channel-last to resnet50 graph ci test (#7253)

* add channel-last to resnet50 graph ci test

* refine

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Add oneTBB (#7213)

* add tbb

* refien

* refine

* refine

* refine

* revert

* add tbb

* success  add tbb

* tbb onednn ok

* fix ninja onednn

* component

* install tbb include file

* updata tbb master zip

* fix md5

* refine

* refjine

* fix

* cmake option

* modified  clang 10 OMP

* add line

* fix add OMP flags

* fix

* fix

* fix OF_RUNTIME_TBB

* refine

* clean

* fix

Co-authored-by: jackalcooper <jackalcooper@gmail.com>
Co-authored-by: mosout <mosout@qq.com>

* Dev all op bool (#6962)

* fix typo

* dev all op bool type

* add bool testcase

* support bool for ops and kernels

* bool api

* functional bool

* ndarray bool

* fix conflict

* fix tabel gen

* fix

* refine

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* add broadcast_to_compatible bool kernel

* fix

* fix

* fix split api

* fix docstr

* fix setitem bool

* fix char

* fix

* fix

* fix

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Fix leaf tensor backward error (#7331)

* fix(Autograd): fix leaf tensor backward error

* test(Autograd): add scalar leaf tensor backward test

* Create tensor in jobpass after pulling plan (#7315)

* fix(NNGraph): create tensor in jobpass after pulling plan

* fix(NNGraph): remove useless sync and fix typo

* refine error message

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* [BUG] suport 0D tensor in eager consistent (#7242)

* suport 0D tensor in eager consistent

* fixed_0Size_0D_bug_with_eager_consistent

* fixed_0Size_0D_bug_with_eager_consistent

* fixed a bug with flatten op in graph autotest

* fixed a bug with flattenfunctor in graph autotest

* fixed a bug with flatten op in graph autotest

* add Notes

* modifyed some check_graph=False

* auto format by CI

* modifyed check_graph=True

* modifyed test_add.py

* modifyed test_index_select.py

* modifyed test_mul.py

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: Xiaoyu Xu <xiaoyulink@gmail.com>

* Use cudnn maxpool when possible (#7333)

* use_cudnn_maxpool_when_possible

* restruct

* refine

* Remove `register_tensor_op` decorator and use `add_docstr`[part] (#7306)

* Remove register_tensor_op decorator and use add_docstr

* Fix eq

* Fix ne

* Fix lt

* Fix le

* Fix to_local

* Fix

* test

* Resolve conflict

* disable nccl release_tensor sequential (#7341)

* Rm local dep object pool (#7131)

* refactor ReleaseTensor instruction

* remove LocalDepObject::logical_object

* remove LocalDepObjectPool

* refine code by profiling

* support CurrentDevVmDepObjectConsumeMode for ReleaseTensor

* rm useless Touch instruction

* set default value of EagerNcclBroadcastOp::async_launch to true

* flow._C.stream_touch (#7209)

* flow._C.stream_touch

* fix compiler complaints

* reset speed test threshold

* reserve more size for vector dep_objects

* fix static checker complaint

* stream_touch does nothing if inputs empty

* do not run stream_touch if inputs empty

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Co-authored-by: Yu OuYang <xuanjiuye@gmail.com>

* Autotest add graph backward (#7270)

* add graph backward run in autotest

* format

* revert

* fix ci

* auto format by CI

* fix bug

* fix bug

* auto format by CI

Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Cumprod (#7278)

* add comprod

* fix name

* rename

* fix when specified dim is 1

* add docstr

* add docstr

* refine

* add WITH_CUDA

* refine

* refine

* Update python/oneflow/framework/docstr/math_ops.py

fix docstr

Co-authored-by: Yao Chi <later@usopp.net>

* Update python/oneflow/framework/docstr/math_ops.py

fix docstr

Co-authored-by: Yao Chi <later@usopp.net>

* fix docstr

* refine

* refine

* fix include

* refine

* refine

* refine

Co-authored-by: Yao Chi <later@usopp.net>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Migrate tile python layer to functor (#7305)

* tile implement

* migrate repeat

* of format

* align document with pytorch

* change input args to *size

* change with review comments

* migrate tile python layer to functor

* fix tile document

* fix tile document

* add document's function signature

* auto format by CI

* add repeat document signature

* fix document

* fix document

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>

* fix (#7338)

* update_version_of_flowvision (#7346)

* Release tensor storage as soon as possible (#7287)

* Release tensor storage as soon as possible

* refine

Co-authored-by: Luyang <flowingsun007@163.com>

* Dev eager consistent autotest (#7204)

* add doc for pybind type

* eager consistent autotest

* align placement repr and api

* export necessary apis to pickle object

* broadcast rank 0 to other rank if consistent test

* update consistent add unittest

* refine

* cmake: import gtest using FetchContent (#7292)

* cmake: import gflags and glog using FetchContent

* cmake: use set_mirror_url_with_hash

* fix THIRD_PARTY build

* fix lib path

* fix gflags

* remove gflags

* format

* auto format by CI

* fix xrt gflags

* fix name

* remove oneflow_exe_third_party…
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants