-
Notifications
You must be signed in to change notification settings - Fork 760
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add some fused kernels #6635
add some fused kernels #6635
Conversation
}; | ||
|
||
template<typename SRC, typename DST> | ||
struct DropoutScore { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Score -> Store
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已改,顺便问一下,functional_api.yaml中,对于xxx_grad函数也进行了导出,这个是不是没有必要,因为bind_python通常设置为False,而且用户大多数情况下也不会使用
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
在autograd里面会使用,具体看oneflow/core/autograd/gradient_funcs/
const TensorTuple& out_grads, TensorTuple* in_grads) const { | ||
if (!ctx->input_requires_grad) { return Maybe<void>::Ok(); } | ||
|
||
CHECK_EQ_OR_RETURN(out_grads.size(), 2); // softmax_y, dy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里的描述可能不太准确
fusedscalemaskSoftmax有两个输出,因此回来的梯度是两个
所以一个是softmax_y的梯度,一个是y的梯度。y的梯度叫dy可以,softmax_y的梯度这里注释需要重新写下,比如softmax_dy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
softmax_y 会输出给前向吗得到diff吗,印象中一般是用来传给后向op?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
传过来应该都是两个梯度,但我们只要dy的梯度。softmax_y是传给后向op的,这里我只是提醒下这里的注释什么的
.Output("softmax_y") | ||
.Build()); | ||
} | ||
Maybe<Tensor> operator()(const std::shared_ptr<one::Tensor>& x, const std::shared_ptr<one::Tensor>& mask, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
你最后调用的op是 fused_scale_mask_softmax_dropout_op_
,他有两个输出,所以这里返回值应该是一个Tensor Tuple。不然最后输出只有一个,你在求导那里保存不到softmax_y,进而求不了导数
const TensorTuple& out_grads, TensorTuple* in_grads) const { | ||
if (!ctx->input_requires_grad) { return Maybe<void>::Ok(); } | ||
|
||
CHECK_EQ_OR_RETURN(out_grads.size(), 2); // softmax_y, dy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
传过来应该都是两个梯度,但我们只要dy的梯度。softmax_y是传给后向op的,这里我只是提醒下这里的注释什么的
404ffd8
to
e7d07c3
Compare
- name: "fused_scale_tril_softmax_mask_scale" | ||
signature: "TensorTuple (Tensor a, *, Float p=0.5, Int64 diagonal, Float tril_scale_value, Generator generator=None) => FusedScaleTrilSoftmaxMaskScale" | ||
bind_python: True | ||
|
||
- name: "fused_scale_tril_softmax_mask_scale_grad" | ||
signature: "Tensor (Tensor softmax_y, Tensor dy, Tensor mask, Int64 diagonal, Float tril_scale_value, Float mask_scale_value) => FusedScaleTrilSoftmaxMaskScaleGrad" | ||
signature: "Tensor (Tensor y, Tensor dy, Tensor mask, Int64 diagonal, Float tril_scale_value, Float mask_scale_value) => FusedScaleTrilSoftmaxMaskScaleGrad" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个不用改吧,用的就是softmax_y?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
好的,这里改了
ctx->scale = JUST(composed_attrs.GetAttr<float>("scale_value")); | ||
ctx->dropout_scale = JUST(composed_attrs.GetAttr<float>("dropout_scale_value")); | ||
|
||
ctx->SaveTensorForBackward(inputs.at(1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
可以加点注释,表示这里save了什么
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里也要加
这个pr我本地测试通过了,你们可以测试一下,之后就合并吧 |
}) | ||
.SetDataTypeInferFn([](user_op::InferContext* ctx) -> Maybe<void> { | ||
const user_op::TensorDesc& x_desc = ctx->InputTensorDesc("x", 0); | ||
*ctx->OutputDType("y", 0) = x_desc.data_type(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
需要加个check,保证mask的dtype是int8类型
.Attr<float>("mask_fill_value", 0.) | ||
.Attr<float>("dropout_scale_value", 1.0) | ||
.SetTensorDescInferFn([](user_op::InferContext* ctx) -> Maybe<void> { | ||
const user_op::TensorDesc& x_desc = ctx->InputTensorDesc("x", 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
现在mask和x是elementwise关系,所以要check mask的shape是不是和x的shape一致
REGISTER_USER_OP("fused_scale_mask_softmax_dropout_grad") | ||
.Input("softmax_y") | ||
.Input("dy") | ||
.Input("dropout_mask") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
前面顺序是先mask,然后dropout_mask,这里也保持顺序一致吧,然后functional那里也要记得改
.Attr<float>("scale_value") | ||
.Attr<float>("dropout_scale_value") | ||
.SetTensorDescInferFn([](user_op::InferContext* ctx) -> Maybe<void> { | ||
const user_op::TensorDesc& softmax_y_desc = ctx->InputTensorDesc("softmax_y", 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
同上加入关于mask的check
|
||
} // namespace | ||
|
||
} // namespace oneflow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
换行,后面的文件也要这么换行
x = np.random.randn(batch_size, num_heads, seq_length, seq_length) | ||
mask = np.random.randint(0, 2, size=(batch_size, num_heads, seq_length, seq_length), dtype=np.uint8) | ||
|
||
# x = np.array([[1., 2., 3.], [4., 5., 6.]]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
注释去除
ComposedAttrMap composed_attrs(attrs, base_attrs_); | ||
ctx->scale = JUST(composed_attrs.GetAttr<float>("scale_value")); | ||
|
||
ctx->SaveTensorForBackward(inputs.at(1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ctx->SaveTensorForBackward(inputs.at(1)); / / Save mask
这样补充注释
ctx->scale = JUST(composed_attrs.GetAttr<float>("scale_value")); | ||
|
||
ctx->SaveTensorForBackward(inputs.at(1)); | ||
ctx->SaveTensorForBackward(outputs.at(0)); // save y, ie. softmax result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
同上
ctx->scale = JUST(composed_attrs.GetAttr<float>("scale_value")); | ||
ctx->dropout_scale = JUST(composed_attrs.GetAttr<float>("dropout_scale_value")); | ||
|
||
ctx->SaveTensorForBackward(inputs.at(1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里也要加
op_ = CHECK_JUST(one::OpBuilder("fused_scale_mask_softmax_dropout_grad") | ||
.Input("softmax_y") | ||
.Input("dy") | ||
.Input("dropout_mask") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
顺序还是按照先mask后dropmask,跟前向顺序保持一致
@@ -1487,6 +1487,22 @@ | |||
signature: "Tensor (Tensor a, Tensor b, *, Float p=0.5, Int32 axis, Generator generator=None) => FusedBiasAddDropout" | |||
bind_python: True | |||
|
|||
- name: "fused_scale_mask_softmax" | |||
signature: "Tensor (Tensor x, Tensor mask, *, Float fill_value, Float scale=1.0) => FusedScaleMaskSoftmax" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fill_value要不要给个默认值,我觉得0.0是不是比较合理
@flow.unittest.skip_unless_1n1d() | ||
@unittest.skipIf(os.getenv("ONEFLOW_TEST_CPU_ONLY"), "only test gpu cases") | ||
class TestFusedScaleMaskSoftmax(flow.unittest.TestCase): | ||
def test_gather(test_case): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个def的名字改一下
): | ||
|
||
x = np.random.randn(batch_size, num_heads, seq_length, seq_length) | ||
mask = np.random.randint(1, 2, size=(batch_size, num_heads, seq_length, seq_length), dtype=np.uint8) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
是不是应该是randint(0, 1)才对
@flow.unittest.skip_unless_1n1d() | ||
@unittest.skipIf(os.getenv("ONEFLOW_TEST_CPU_ONLY"), "only test gpu cases") | ||
class TestFusedScaleMaskSoftmaxDropout(flow.unittest.TestCase): | ||
def test_gather(test_case): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def名字换一下
CI failed, removing label automerge |
Speed stats:
|
Speed stats:
|
* Rename class OneflowVM to VirtualMachine (#6753) * Rename class OneflowVM to VirtualMachine * refine * refine * refine Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> * upgrade cub to 1.11.0 for NVIDIA/cub#170 (#6795) Signed-off-by: daquexian <daquexian566@gmail.com> Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> * lazy create cuda_stream (#6806) * lazy create cuda_stream in CudaCopyD2HDeviceCtx CudaStreamHandleDeviceCtx * refine * Remove KernelContext::stream_ctx() (#6805) * Remove KernelContext::stream_ctx() * fix GetCudaAlignedSize * refine * Remove StreamContextAdapter * refine include Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> * add tensor method docstr (#6800) * add tensor method docstr * add tensor method docstr * add tensor method docstr * add tensor method docstr * add tensor method docstr * add tensor method docstr * add tensor method docstr * fix ci related bug * set common compiler flags in oneflow_add_library(...), enable it for CUDA (#6813) * apply treating warnings as errors in oneflow_add_library(...), enable it to CUDA Signed-off-by: daquexian <daquexian566@gmail.com> * support target_try_compile_options on clang cuda Signed-off-by: daquexian <daquexian566@gmail.com> * reorder oneflow_add_library Signed-off-by: daquexian <daquexian566@gmail.com> * add cuda-61-clang.cmake and cuda-75-clang.cmake Signed-off-by: daquexian <daquexian566@gmail.com> * move oneflow_add_xxx after set_compile_options_to_oneflow_target Signed-off-by: daquexian <daquexian566@gmail.com> * reformat Signed-off-by: daquexian <daquexian566@gmail.com> Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> * Use ep::Stream instead of DeviceCtx (#6825) * remove redundant code (#6807) * Prevent CI failure when cublas alloc fail (#6826) * Dev nms (#6817) * fix typo * dev nms * fix * fix * fix format * skip distribute test Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> * Refactor vm consuming (#6748) * refactor PhyInstrOperand::ForEachXXXMirroredObject * remove ForEachXXXMirroredObject4XXXPhyInstrOperand * reduce for-loops for InstructionList * reduce for-loops on InstructionMsgList * refactor MakeInstructions * refactor PhyInstrOperand::ForEachXXXMirroredObject * 1) refactor ConnectInstruction to TryConnectInstruction; 2) refactor BackInserter to SetInserter * create RwMutextObjectAccess/InstructionEdge from intrusive::ObjectPool * refactor profiler range name * fix barrier instruction comment typos * fix compiler complaints * Update oneflow/core/intrusive/object_pool.h Co-authored-by: daquexian <daquexian566@gmail.com> * fix static analysis complaints Co-authored-by: daquexian <daquexian566@gmail.com> Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> * Add init method docstr modify int to int32 (#6828) * Add nn.init method docstr, and modify np.int * Add nn.init method docstr, and modify np.int * Check whether the expand_shape parameter is legal (#6812) * check parameters * simplify logic * fix ci error Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> * refactor local call opkernel instruction (#6733) * remove CheckOutputBlobObjectsMemCase * move calling of ChooseOpKernel from scheduler thread to main thread. * address pr comments Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> * one_hot primitive interface (#6796) * one_hot primitive interface * refine * refine Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> * revert DependenceVector to std::vector (#6835) * fix indexed slice for adam max_x (#6824) Co-authored-by: ZZK <42901638+MARD1NO@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> * set str option (#6832) * set str option * refine * refine * fix * refine * fix * fix * refine * refine * refine * fix Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> * Empty op support float16 (#6847) * support fp16 * add float16 test case * add graph cudnn conv alg config (#6799) Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> * Dev vm view instruction (#6815) * shallow copy * try reset blob data * refine * debug * raw implementation * refine * refine * to_contiguous op * reine * refine * refine * set_last_used_device * refine * raw implementation * debug * replace TryResetBlobData with SyncAccessBlobByCallback * tensor_view_instruction * refine * tensor_view_operand * remove tensor_view_phy_instr_operand * refine * refine * refine * restruct * refine * refine * refine * refine * Remove deafult l2 and use bias add in lazy mode (#6844) * remove_deafult_l2_and_use_bias_add_in_lazy_mode * minor fix * minor fix * undo bais add Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> * Add arccos op and docstr (#6841) * Add arccos op and docstr * fix docstr format Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> * add some fused kernels (#6635) * fix errors, op with dropout successes, but op without dropout has error * fix errors, success * fix typo error * test dropout * add comments * fix typos * change format * reformat file * fix error * change format * remove useless head file * fix errors * fix errors * reformat * fix errors * reformat * fix errors Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Co-authored-by: ZZK <42901638+MARD1NO@users.noreply.github.com> * Add CUDA arch 52 back and compile it in CI (#6802) * Add CUDA arch 53 back and compile it in CI * fix cuda * fix * don't build 52 by default * rm comment Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> * [EP] Add ep::Device/ep::Event (#6851) * [EP] Add ep::Device/ep::Event * Refine ActiveDeviceContext * fix * refine include * fix tidy error * fix cudaEventRecord * fix test * refine * Fix FuseBN eval error (#6836) * fix arange bug * fix fuse bn * Remove redundant saved_tensor * fix bug * add more test case * add more random test case * add fuse functor when track_stats=false * fix backward errror when track_stats=false Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> * Remove KernelXxxContext::device_ctx()/device_ctx() (#6862) * pool code refine (#6853) * pool code refine * refine * format * fix static analysis error * fix max_pool_2d_grad name * prefix tf is used to pool functor name * fix Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> * Add cpu group conv impl (#6823) * add cpu group conv kernel, test success * add group conv cpu backward kernel * rename * update test case * fix comment * fix comments * fix comment * optimize again and fix ci eroor * fix error * fix ci error * fix ci_tidy error * fix ci error * revert code * fix bug * delete useless file * delete useless file * fix ci error Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> * Add nsys profile host thread name (#6865) Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> * Rename DeviceType::kGPU to DeviceType::kCUDA (#6863) * Rename DeviceType::kGPU to DeviceType::kCUDA * fix * fix typo Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> * Check modify op module (#6860) * Add arccos op and docstr * Check and modify Op module * delete register_tensor_op * Fix random ops (#6868) Co-authored-by: Bowen Chen <bob2420083992@gmail.com> Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> * fix docstr problem (#6554) * fix docstr problem * fix * Update random.py Co-authored-by: Yao Chi <later@usopp.net> * fix retinanet (#6870) Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> * Optimize LayerNorm Forward (#6842) * layer_norm forward * test case * rm useless * int count to T count * fix * fix T mask to int mask, refine code * refine * refine * test case * format * fix Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> * Refactor last used device (#6852) * move last_used_device * refine * refine * fix pipeline delay ctrl edge between src subset tick and output (#6881) Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> * Support OpenVINO in xrt (#6709) * Support openvino in xrt * OpenVINO: add graph input and weight in op * OpenVINO: support more op * update follow review * update follow review * update follow review * Add doc for graph_config.py * update follow review * update follow review * modify after review * format * add xrt in check_src.py Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> * Dev optimize std vector (#6630) * use reserve * use emplace_back * refine * remove useless codes (#6859) * remove useless codes * fix index_select * fix expand error * fix expand error Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> * add alpha parameter in add_op (#6867) * add alpha parameter in add_op * format * refine * refine * refine * fix bug about dtype caused by alpha Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> * add fallback to cpu builder (#6582) * add fallback to cpu slice boxing * fix * fix * merge master * format * fix * modify graph.py (#6884) Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> * fix eye op attr name error (#6873) * fix eye op attr name error * refine * refine * fix * delete useless attr Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> * add inplace mul (#6861) * init commit for inplace mul * fix issue, format code * add tests and fix issues * format code * delete redundant code * Update oneflow/core/functional/impl/binary_functor.cpp Co-authored-by: Yinggang Wang <wyg19970408@gmail.com> * refine * fix unit test * fix bug * refine * fix unittests * add boardcast test * refine * refine * fix ci issue Co-authored-by: Yinggang Wang <wyg19970408@gmail.com> * Dev roialign (#6879) * dev roialign * testcase * fux * fix Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> * Pick Variant from Standalone Maybe (#6856) * refactor maybe: add variant * maybe: add optional and tests * maybe: add hash for optional & variant; support NullOpt for both optional & variant * maybe: more notes * maybe: binary search impl for Variant::Visit * maybe: add more relational operator to optional & variant * maybe: add nonstd::string_view * maybe: fix construct of optional & variant * maybe: support comparision for optional & variant * maybe: add monadic operations for optional * maybe: add error traits * maybe: add JUST and Maybe * maybe: remove useless comment * maybe: add more test * maybe: customizable JUST * maybe: add Map and Bind (AndThen) to Maybe * maybe: re-design JustConfig * maybe: rename xxxT to xxxS * maybe: fix method names * maybe: add maybe to cmake * maybe: fix error traits * maybe: rename fields & add aggregate type checking * maybe: move string_view to new file * maybe: rename fields for optional and error * maybe: new Value (no index checking, protected method) and Get (has check, public method) * maybe: remove DefaultArgument * Pick Variant from Standalone Maybe Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> * Limit CI run speed test on one machine (#6891) * Run speed test on one machine * fix * Add oneDNN (#6767) * add onednn cmake * add onednn stream engine * Successfully implement addn * add int64 double * optimization voctor * fix * fix merge master error * fix merge master * fix merge error * Add BUILD_ONEDNN cmake flags * fix format * fix onednn datatype * optmizer onedn type * modified for(n) => for(i) * modified ci * modified oneDNN.cmake * fix clang 10 error * rename BUILD_ONEDNN * Delete oneDNN installation path by mistake * fix ci error, c++: error: third_party_install/onednn/lib/libdnnl.a: No such file or directory * include(GNUInstallDirs) * print ci error * reformat * Only the first parameter can be operated inplace * format * fix inlcude onednn, add clang 11 support refernce Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> * Rework op import with new ods (#6883) * add naive impl * refine * refine * refine * add naive gen td * refine * refine * fix * refine * refine * refine * sort alphabetically * support optional * support Variadic * refine * refine * add conv * add input output order * add todo * add todo * refine * refine * refine * refine * naive bn order interface * fix includes * refine * refine * refine * refine * group ops * refine order * add math * refine * refine * refine * refine * refine * refine * add quantization ops * refine * refine * add detection * fix * refien * add new .td generated * refine * refine * refine * refine * refine * refine * refine * refine * Use generated ods in mlir (#6857) * refine * check in changes * refine * move pattern to another file * compile grouped op * refine * add todo * fix * add GetUserOpDef in wrapper * check in files * refine * refine * fix * refine * refine * refine * refine * refine * refine * refine tablegen * refine * fix * refine * refine * refine * refine * refine * refine * refine * refien * refine * refine * refine * refine * fix * refine * rm log * refine * refine * make ctrl edge type safe * refine * refine * refine * rm legacy code * refine * refien * refine * dirty trick addn2 without variadic deduction * fix jit op * refine * extract GetOutputLbn * refine * fix for single seg * refine * rm todo * update .mlir file * refine * add todo * refine * refine * refine * refine * refine * add log * refine * refine * make op_type_name type safe * refine * refine * refine * delete trainable * add IsOpConfCompatible * add IsImportCompatible * refine * refien * mv ir_pass.cpp out of core * refine * refine * refine * refine * refine * refine * gen new ods from master * refine * refine * update for tf pool ops * refine * refine * refine * refine APIs * refine order * rm * rm output_lbn_segment_keys * output_lbn_segment_sizes * rm output_lbns * refine * refine * refine * refine * fmt * use less cores to prevent OOM in CI * refine * refine Co-authored-by: BBuf <1182563586@qq.com> Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> * add cudnn.h (#6886) Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Co-authored-by: Shenghang Tsai <jackalcooper@gmail.com> * refine * refactor jit interpreter with updated ODS * refine Co-authored-by: Yu OuYang <xuanjiuye@gmail.com> Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Co-authored-by: daquexian <daquexian566@gmail.com> Co-authored-by: guo ran <360112263@qq.com> Co-authored-by: Juncheng <liujuncheng1022@gmail.com> Co-authored-by: Li Xiang <54010254+lixiang007666@users.noreply.github.com> Co-authored-by: dssgsra <dssgsra@gmail.com> Co-authored-by: Shijie <821898965@qq.com> Co-authored-by: Li Xinqi <lixinqi2010@gmail.com> Co-authored-by: Liang Depeng <liangdepeng@gmail.com> Co-authored-by: ZZK <42901638+MARD1NO@users.noreply.github.com> Co-authored-by: liufengwei0103 <2472937968@qq.com> Co-authored-by: Luyang <flowingsun007@163.com> Co-authored-by: Xiaoyu Xu <xiaoyulink@gmail.com> Co-authored-by: binbinHan <han_binbin@163.com> Co-authored-by: DangKai <dangkai4u@outlook.com> Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com> Co-authored-by: cheng cheng <472491134@qq.com> Co-authored-by: Houjiang Chen <chenhoujiangcug@gmail.com> Co-authored-by: Bowen Chen <bob2420083992@gmail.com> Co-authored-by: Derek Zhang <85550485+HENGRui6@users.noreply.github.com> Co-authored-by: Yao Chi <later@usopp.net> Co-authored-by: tingkuanpei <50049308+tingkuanpei@users.noreply.github.com> Co-authored-by: grybd <52237830+grybd@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: Zhanghuihong Guan <31779698+Garfieldgzhh@users.noreply.github.com> Co-authored-by: Yinggang Wang <wyg19970408@gmail.com> Co-authored-by: Twice <i@twice.moe> Co-authored-by: luqiang guo <702572275@qq.com> Co-authored-by: BBuf <1182563586@qq.com>
Add some fused kernel used in Transformer: