-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] [CodeGen] Fix fp16 -> fp32 codegen on X86-64 #2925
Conversation
@tqchen, @yidawang, @yzhliu, this may be of interest to you folks. One question - is it possible to add unit tests that only execute when the underlying machine supports architectural features? It would be convenient to add a unit test for this that runs on machines with AVX or AVX-512, but I don't know how to express that in the current testing infrastructure. |
@ajtulloch, cpu-info can detect AVX2 or AVX512: https://github.com/pytorch/cpuinfo/blob/40c5f3695b053e5c3d642d9bc34113f3baa71ef2/include/cpuinfo.h#L1009 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM in general. Just have some questions, mostly for my own educational purpose :)
src/codegen/llvm/codegen_x86_64.cc
Outdated
@@ -0,0 +1,68 @@ | |||
/*! | |||
* Copyright (c) 2017 by Contributors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how time flies
src/codegen/llvm/codegen_x86_64.cc
Outdated
|
||
llvm::Value* CodeGenX86_64::VisitExpr_(const Cast* op) { | ||
// LLVM does not automatically generate the correct instruction sequences for | ||
// half -> float conversion (using AVX2/AVX512 variants of vcvtph2ps). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AVX-512
src/codegen/llvm/codegen_x86_64.cc
Outdated
target_machine_->getTargetFeatureString().find("avx512f") != llvm::StringRef::npos; | ||
|
||
// TODO(tulloch): implement version generic over lanes. | ||
if (from.lanes() == 8 && (has_f16c || has_avx512f)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't get the logic here. If lanes==8
, why could has_avx512f
be true?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lanes == 8 is a property of the Expr we are compiling, has_avx512f is a property of the TargetMachine we are generating code for.
src/codegen/llvm/codegen_x86_64.cc
Outdated
} | ||
|
||
// TODO(tulloch): implement version generic over lanes. | ||
if (from.lanes() == 16 && has_avx512f) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does has_avx512f==true
imply has_f16c==true
? And, is there a case that lanes==16
but has_avx512f==false
? I am not familiar with the F16C instruction set.
import ctypes | ||
|
||
def test_fp16_to_fp32_with_f16c(): | ||
target = 'llvm -mcpu=core-avx2 -mattr=+f16c' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the mattr
flag needs to be set by the users?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've fixed this to query the TargetMachine directly, so no changes are needed (assuming users are using the existing -mcpu=core-avx2, -mcpu=skylake-avx512, etc)
target = 'llvm' | ||
elements = 64 | ||
n = tvm.convert(elements) | ||
A = tvm.placeholder((n, 8), dtype="float16", name='A') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is fp16 handled without F16C?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without f16c (i.e. vector/scalar vcvtph2ps), we invoke compiler-rt's builtins i.e. https://github.com/llvm-mirror/compiler-rt/blob/7a739a0dfb6d408c6d587e5c7b52abd89fc3fdd3/lib/builtins/fp_extend_impl.inc#L40.
https://godbolt.org/z/23tsVK is an example.
50798e4
to
c5f5719
Compare
The problem with that is that this only works if we are generating code for the current host architecture, which isn't what we want. I think the right way here is to continue to pass |
ce5bd60
to
e88c1c8
Compare
215ae63
to
a842f1b
Compare
Thanks @ajtulloch , @yidawang @hlu1 this is now merged |
lint lint save save add more case save error lint lint commit do lint save fix lint wrap it back as func lint save remove dead comment fix style fix lint Update src/relay/pass/partial_eval.cc Co-Authored-By: MarisaKirisame <lolisa@marisa.moe> Update src/relay/pass/partial_eval.cc Co-Authored-By: MarisaKirisame <lolisa@marisa.moe> Update src/relay/pass/partial_eval.cc Co-Authored-By: MarisaKirisame <lolisa@marisa.moe> Update src/relay/pass/partial_eval.cc Co-Authored-By: MarisaKirisame <lolisa@marisa.moe> Update src/relay/pass/partial_eval.cc Co-Authored-By: MarisaKirisame <lolisa@marisa.moe> Update src/relay/pass/partial_eval.cc Co-Authored-By: MarisaKirisame <lolisa@marisa.moe> address review feedback pe now handle freevar. as a result preserving function is now trivial. test add basic test, implement pretty printing for generic function test lint fix segfault save save do test fix another error address comment commit save address review feedback add test for invalidate, fix error in lookup rename cont to boduy fix error and add regression test Update src/relay/pass/partial_eval.cc Co-Authored-By: MarisaKirisame <lolisa@marisa.moe> fix error, add test case fix lint remove extra line fix some error pe commit save save save save save (pe/dce broken) [DOCKER] Pin flatbuffers checkout to the last release tag (apache#2823). (apache#2879) [Relay][Text Format] Reverse CallNode Print Order (apache#2882) [NNPACK] Modernize test (apache#2868) [Relay] Add list update to prelude (apache#2866) Add missing sgx includes (apache#2878) Fix setting up hints for getaddrinfo (apache#2872) [ARITH] RewriteSimplifier: improved cmp simplification (apache#2851) do (apache#2883) [RELAY][Frontend][TF] decompile tf control flow (apache#2830) * decompile tf control flow * Add docs * remove import relay * move tests under tensorflow frontend * minor fix Enhance upsample operator to adapt onnx opset version 9 (apache#2840) Use version invariant rustfmt (apache#2886) [Relay][Op] Add group conv2d dispatch to topi function (apache#2870) * [Relay][Op] Add group conv2d dispatch to topi function * Rerun tests [Apps] [howto_deploy] fix cxx-flags order and build directory (apache#2888) fix prelu, now can use on 2d input and add one test (apache#2875) Add dense schedules to __init__ for cpu (apache#2855) * Add dense schedules to __init__ for cpu * Add documentation for topi::shape * Add additional imports to topi CPU __init__. [TESTS] Improve script robustness (apache#2893) A number of test scripts use the '|| exit 1' idiom. This has two issues, first process exit codes are defined to be in the range 0-255. Second, more importantly, the idiom is fragile because it requires that every possible failure point be explicitly coded. This patch removes the idiom in favour of "set -e" as used in the docker scripts as a more robust mechanism to ensure that script failures are always caught and propagated by default. [Relay] Fix name of bias in testing.mlp (apache#2892) winograd_nnpack (apache#2721) [Relay] Fix Relay ARM CPU depthwise spatial pack schedule alter op layout issue. (apache#2861) * Fix Relay ARM CPU spatial pack depthwise alter op layout issue. * Update tune_relay_arm.py [TESTS] Import script robustness (set -u) (apache#2896) Adopt the "set -u" idiom from the docker scripts as a mechanism to improve future robustness. [DOCKER] Upgrade ci-cpu to latest v0.50 (apache#2901) Allow linking against MKLML (apache#2902) [COMMUNITY] ASF mentors (apache#2906) [Relay] Allow converting keras.layers.Sequential (apache#2842) * Allow converting keras.layers.Sequential * Use existing new_var function * Only update expr when missing * Add test [Relay] clean up hd, change tl (apache#2917) Turn on USE_SORT by default (apache#2916) [TEST] Cache test data (apache#2921) Unified error handling in NNVM and Relay frontends (apache#2828) add support for mxnet smooth_l1 (apache#2905) [Relay] Add support for TupleGetItem in op fusion (apache#2914) [Relay, TOPI] Deformable conv2d (apache#2908) * [Relay, TOPI] Add deformable conv2d * Moved to op level2 * Fix lint * Moved to level2 & bug fix * Update comments * Disabled flaky test of conv2d TVM debugresult dump to Chrome Tracing (apache#2922) [Relay] add test for second order ad (apache#2754) * do second order * add comment * better name * use tvm assert all close * refire ci Revert "[Relay] add test for second order ad (apache#2754)" (apache#2926) This reverts commit f5ca991. [Tutorial] Cache the test data in tutorial (apache#2923) [AUTOTVM] Refactor measure build func (apache#2927) Fix intersect of modular set (apache#2904) Fix comment bugs and code style [Relay, OpFusion] Fix handling TupleGetItem for nested tuples (apache#2929) Consistent result of DetectLinearEquation() when an empy vars is passed (apache#2860) [FRONTEND][ONNX] Some bug fixes and Shape operator fixed for relay. (apache#2850) * [FRONTEND][ONNX] Some bug fixes and Shape operator fixed for relay. * * test cases * * ci error Outdated renaming for flatten in ONNX converter (apache#2843) [FRONTEND][TENSORFLOW] bug fix for tensorflow official slim models. (apache#2864) * [FRONTEND][TENSORFLOW] bug fix for tensorflow official slim models. * * review comments Fix vcvtph2ps codegen (apache#2925) Port changes More fixes save save Changes to schedules and mxnet importer
lint lint save save add more case save error lint lint commit do lint save fix lint wrap it back as func lint save remove dead comment fix style fix lint Update src/relay/pass/partial_eval.cc Co-Authored-By: MarisaKirisame <lolisa@marisa.moe> Update src/relay/pass/partial_eval.cc Co-Authored-By: MarisaKirisame <lolisa@marisa.moe> Update src/relay/pass/partial_eval.cc Co-Authored-By: MarisaKirisame <lolisa@marisa.moe> Update src/relay/pass/partial_eval.cc Co-Authored-By: MarisaKirisame <lolisa@marisa.moe> Update src/relay/pass/partial_eval.cc Co-Authored-By: MarisaKirisame <lolisa@marisa.moe> Update src/relay/pass/partial_eval.cc Co-Authored-By: MarisaKirisame <lolisa@marisa.moe> address review feedback pe now handle freevar. as a result preserving function is now trivial. test add basic test, implement pretty printing for generic function test lint fix segfault save save do test fix another error address comment commit save address review feedback add test for invalidate, fix error in lookup rename cont to boduy fix error and add regression test Update src/relay/pass/partial_eval.cc Co-Authored-By: MarisaKirisame <lolisa@marisa.moe> fix error, add test case fix lint remove extra line fix some error pe commit save save save save save (pe/dce broken) [DOCKER] Pin flatbuffers checkout to the last release tag (apache#2823). (apache#2879) [Relay][Text Format] Reverse CallNode Print Order (apache#2882) [NNPACK] Modernize test (apache#2868) [Relay] Add list update to prelude (apache#2866) Add missing sgx includes (apache#2878) Fix setting up hints for getaddrinfo (apache#2872) [ARITH] RewriteSimplifier: improved cmp simplification (apache#2851) do (apache#2883) [RELAY][Frontend][TF] decompile tf control flow (apache#2830) * decompile tf control flow * Add docs * remove import relay * move tests under tensorflow frontend * minor fix Enhance upsample operator to adapt onnx opset version 9 (apache#2840) Use version invariant rustfmt (apache#2886) [Relay][Op] Add group conv2d dispatch to topi function (apache#2870) * [Relay][Op] Add group conv2d dispatch to topi function * Rerun tests [Apps] [howto_deploy] fix cxx-flags order and build directory (apache#2888) fix prelu, now can use on 2d input and add one test (apache#2875) Add dense schedules to __init__ for cpu (apache#2855) * Add dense schedules to __init__ for cpu * Add documentation for topi::shape * Add additional imports to topi CPU __init__. [TESTS] Improve script robustness (apache#2893) A number of test scripts use the '|| exit 1' idiom. This has two issues, first process exit codes are defined to be in the range 0-255. Second, more importantly, the idiom is fragile because it requires that every possible failure point be explicitly coded. This patch removes the idiom in favour of "set -e" as used in the docker scripts as a more robust mechanism to ensure that script failures are always caught and propagated by default. [Relay] Fix name of bias in testing.mlp (apache#2892) winograd_nnpack (apache#2721) [Relay] Fix Relay ARM CPU depthwise spatial pack schedule alter op layout issue. (apache#2861) * Fix Relay ARM CPU spatial pack depthwise alter op layout issue. * Update tune_relay_arm.py [TESTS] Import script robustness (set -u) (apache#2896) Adopt the "set -u" idiom from the docker scripts as a mechanism to improve future robustness. [DOCKER] Upgrade ci-cpu to latest v0.50 (apache#2901) Allow linking against MKLML (apache#2902) [COMMUNITY] ASF mentors (apache#2906) [Relay] Allow converting keras.layers.Sequential (apache#2842) * Allow converting keras.layers.Sequential * Use existing new_var function * Only update expr when missing * Add test [Relay] clean up hd, change tl (apache#2917) Turn on USE_SORT by default (apache#2916) [TEST] Cache test data (apache#2921) Unified error handling in NNVM and Relay frontends (apache#2828) add support for mxnet smooth_l1 (apache#2905) [Relay] Add support for TupleGetItem in op fusion (apache#2914) [Relay, TOPI] Deformable conv2d (apache#2908) * [Relay, TOPI] Add deformable conv2d * Moved to op level2 * Fix lint * Moved to level2 & bug fix * Update comments * Disabled flaky test of conv2d TVM debugresult dump to Chrome Tracing (apache#2922) [Relay] add test for second order ad (apache#2754) * do second order * add comment * better name * use tvm assert all close * refire ci Revert "[Relay] add test for second order ad (apache#2754)" (apache#2926) This reverts commit f5ca991. [Tutorial] Cache the test data in tutorial (apache#2923) [AUTOTVM] Refactor measure build func (apache#2927) Fix intersect of modular set (apache#2904) Fix comment bugs and code style [Relay, OpFusion] Fix handling TupleGetItem for nested tuples (apache#2929) Consistent result of DetectLinearEquation() when an empy vars is passed (apache#2860) [FRONTEND][ONNX] Some bug fixes and Shape operator fixed for relay. (apache#2850) * [FRONTEND][ONNX] Some bug fixes and Shape operator fixed for relay. * * test cases * * ci error Outdated renaming for flatten in ONNX converter (apache#2843) [FRONTEND][TENSORFLOW] bug fix for tensorflow official slim models. (apache#2864) * [FRONTEND][TENSORFLOW] bug fix for tensorflow official slim models. * * review comments Fix vcvtph2ps codegen (apache#2925) Port changes More fixes save save Changes to schedules and mxnet importer
lint lint save save add more case save error lint lint commit do lint save fix lint wrap it back as func lint save remove dead comment fix style fix lint Update src/relay/pass/partial_eval.cc Co-Authored-By: MarisaKirisame <lolisa@marisa.moe> Update src/relay/pass/partial_eval.cc Co-Authored-By: MarisaKirisame <lolisa@marisa.moe> Update src/relay/pass/partial_eval.cc Co-Authored-By: MarisaKirisame <lolisa@marisa.moe> Update src/relay/pass/partial_eval.cc Co-Authored-By: MarisaKirisame <lolisa@marisa.moe> Update src/relay/pass/partial_eval.cc Co-Authored-By: MarisaKirisame <lolisa@marisa.moe> Update src/relay/pass/partial_eval.cc Co-Authored-By: MarisaKirisame <lolisa@marisa.moe> address review feedback pe now handle freevar. as a result preserving function is now trivial. test add basic test, implement pretty printing for generic function test lint fix segfault save save do test fix another error address comment commit save address review feedback add test for invalidate, fix error in lookup rename cont to boduy fix error and add regression test Update src/relay/pass/partial_eval.cc Co-Authored-By: MarisaKirisame <lolisa@marisa.moe> fix error, add test case fix lint remove extra line fix some error pe commit save save save save save (pe/dce broken) [DOCKER] Pin flatbuffers checkout to the last release tag (apache#2823). (apache#2879) [Relay][Text Format] Reverse CallNode Print Order (apache#2882) [NNPACK] Modernize test (apache#2868) [Relay] Add list update to prelude (apache#2866) Add missing sgx includes (apache#2878) Fix setting up hints for getaddrinfo (apache#2872) [ARITH] RewriteSimplifier: improved cmp simplification (apache#2851) do (apache#2883) [RELAY][Frontend][TF] decompile tf control flow (apache#2830) * decompile tf control flow * Add docs * remove import relay * move tests under tensorflow frontend * minor fix Enhance upsample operator to adapt onnx opset version 9 (apache#2840) Use version invariant rustfmt (apache#2886) [Relay][Op] Add group conv2d dispatch to topi function (apache#2870) * [Relay][Op] Add group conv2d dispatch to topi function * Rerun tests [Apps] [howto_deploy] fix cxx-flags order and build directory (apache#2888) fix prelu, now can use on 2d input and add one test (apache#2875) Add dense schedules to __init__ for cpu (apache#2855) * Add dense schedules to __init__ for cpu * Add documentation for topi::shape * Add additional imports to topi CPU __init__. [TESTS] Improve script robustness (apache#2893) A number of test scripts use the '|| exit 1' idiom. This has two issues, first process exit codes are defined to be in the range 0-255. Second, more importantly, the idiom is fragile because it requires that every possible failure point be explicitly coded. This patch removes the idiom in favour of "set -e" as used in the docker scripts as a more robust mechanism to ensure that script failures are always caught and propagated by default. [Relay] Fix name of bias in testing.mlp (apache#2892) winograd_nnpack (apache#2721) [Relay] Fix Relay ARM CPU depthwise spatial pack schedule alter op layout issue. (apache#2861) * Fix Relay ARM CPU spatial pack depthwise alter op layout issue. * Update tune_relay_arm.py [TESTS] Import script robustness (set -u) (apache#2896) Adopt the "set -u" idiom from the docker scripts as a mechanism to improve future robustness. [DOCKER] Upgrade ci-cpu to latest v0.50 (apache#2901) Allow linking against MKLML (apache#2902) [COMMUNITY] ASF mentors (apache#2906) [Relay] Allow converting keras.layers.Sequential (apache#2842) * Allow converting keras.layers.Sequential * Use existing new_var function * Only update expr when missing * Add test [Relay] clean up hd, change tl (apache#2917) Turn on USE_SORT by default (apache#2916) [TEST] Cache test data (apache#2921) Unified error handling in NNVM and Relay frontends (apache#2828) add support for mxnet smooth_l1 (apache#2905) [Relay] Add support for TupleGetItem in op fusion (apache#2914) [Relay, TOPI] Deformable conv2d (apache#2908) * [Relay, TOPI] Add deformable conv2d * Moved to op level2 * Fix lint * Moved to level2 & bug fix * Update comments * Disabled flaky test of conv2d TVM debugresult dump to Chrome Tracing (apache#2922) [Relay] add test for second order ad (apache#2754) * do second order * add comment * better name * use tvm assert all close * refire ci Revert "[Relay] add test for second order ad (apache#2754)" (apache#2926) This reverts commit f5ca991. [Tutorial] Cache the test data in tutorial (apache#2923) [AUTOTVM] Refactor measure build func (apache#2927) Fix intersect of modular set (apache#2904) Fix comment bugs and code style [Relay, OpFusion] Fix handling TupleGetItem for nested tuples (apache#2929) Consistent result of DetectLinearEquation() when an empy vars is passed (apache#2860) [FRONTEND][ONNX] Some bug fixes and Shape operator fixed for relay. (apache#2850) * [FRONTEND][ONNX] Some bug fixes and Shape operator fixed for relay. * * test cases * * ci error Outdated renaming for flatten in ONNX converter (apache#2843) [FRONTEND][TENSORFLOW] bug fix for tensorflow official slim models. (apache#2864) * [FRONTEND][TENSORFLOW] bug fix for tensorflow official slim models. * * review comments Fix vcvtph2ps codegen (apache#2925) Port changes More fixes save save Changes to schedules and mxnet importer save save save save save
lint lint save save add more case save error lint lint commit do lint save fix lint wrap it back as func lint save remove dead comment fix style fix lint Update src/relay/pass/partial_eval.cc Co-Authored-By: MarisaKirisame <lolisa@marisa.moe> Update src/relay/pass/partial_eval.cc Co-Authored-By: MarisaKirisame <lolisa@marisa.moe> Update src/relay/pass/partial_eval.cc Co-Authored-By: MarisaKirisame <lolisa@marisa.moe> Update src/relay/pass/partial_eval.cc Co-Authored-By: MarisaKirisame <lolisa@marisa.moe> Update src/relay/pass/partial_eval.cc Co-Authored-By: MarisaKirisame <lolisa@marisa.moe> Update src/relay/pass/partial_eval.cc Co-Authored-By: MarisaKirisame <lolisa@marisa.moe> address review feedback pe now handle freevar. as a result preserving function is now trivial. test add basic test, implement pretty printing for generic function test lint fix segfault save save do test fix another error address comment commit save address review feedback add test for invalidate, fix error in lookup rename cont to boduy fix error and add regression test Update src/relay/pass/partial_eval.cc Co-Authored-By: MarisaKirisame <lolisa@marisa.moe> fix error, add test case fix lint remove extra line fix some error pe commit save save save save save (pe/dce broken) [DOCKER] Pin flatbuffers checkout to the last release tag (apache#2823). (apache#2879) [Relay][Text Format] Reverse CallNode Print Order (apache#2882) [NNPACK] Modernize test (apache#2868) [Relay] Add list update to prelude (apache#2866) Add missing sgx includes (apache#2878) Fix setting up hints for getaddrinfo (apache#2872) [ARITH] RewriteSimplifier: improved cmp simplification (apache#2851) do (apache#2883) [RELAY][Frontend][TF] decompile tf control flow (apache#2830) * decompile tf control flow * Add docs * remove import relay * move tests under tensorflow frontend * minor fix Enhance upsample operator to adapt onnx opset version 9 (apache#2840) Use version invariant rustfmt (apache#2886) [Relay][Op] Add group conv2d dispatch to topi function (apache#2870) * [Relay][Op] Add group conv2d dispatch to topi function * Rerun tests [Apps] [howto_deploy] fix cxx-flags order and build directory (apache#2888) fix prelu, now can use on 2d input and add one test (apache#2875) Add dense schedules to __init__ for cpu (apache#2855) * Add dense schedules to __init__ for cpu * Add documentation for topi::shape * Add additional imports to topi CPU __init__. [TESTS] Improve script robustness (apache#2893) A number of test scripts use the '|| exit 1' idiom. This has two issues, first process exit codes are defined to be in the range 0-255. Second, more importantly, the idiom is fragile because it requires that every possible failure point be explicitly coded. This patch removes the idiom in favour of "set -e" as used in the docker scripts as a more robust mechanism to ensure that script failures are always caught and propagated by default. [Relay] Fix name of bias in testing.mlp (apache#2892) winograd_nnpack (apache#2721) [Relay] Fix Relay ARM CPU depthwise spatial pack schedule alter op layout issue. (apache#2861) * Fix Relay ARM CPU spatial pack depthwise alter op layout issue. * Update tune_relay_arm.py [TESTS] Import script robustness (set -u) (apache#2896) Adopt the "set -u" idiom from the docker scripts as a mechanism to improve future robustness. [DOCKER] Upgrade ci-cpu to latest v0.50 (apache#2901) Allow linking against MKLML (apache#2902) [COMMUNITY] ASF mentors (apache#2906) [Relay] Allow converting keras.layers.Sequential (apache#2842) * Allow converting keras.layers.Sequential * Use existing new_var function * Only update expr when missing * Add test [Relay] clean up hd, change tl (apache#2917) Turn on USE_SORT by default (apache#2916) [TEST] Cache test data (apache#2921) Unified error handling in NNVM and Relay frontends (apache#2828) add support for mxnet smooth_l1 (apache#2905) [Relay] Add support for TupleGetItem in op fusion (apache#2914) [Relay, TOPI] Deformable conv2d (apache#2908) * [Relay, TOPI] Add deformable conv2d * Moved to op level2 * Fix lint * Moved to level2 & bug fix * Update comments * Disabled flaky test of conv2d TVM debugresult dump to Chrome Tracing (apache#2922) [Relay] add test for second order ad (apache#2754) * do second order * add comment * better name * use tvm assert all close * refire ci Revert "[Relay] add test for second order ad (apache#2754)" (apache#2926) This reverts commit f5ca991. [Tutorial] Cache the test data in tutorial (apache#2923) [AUTOTVM] Refactor measure build func (apache#2927) Fix intersect of modular set (apache#2904) Fix comment bugs and code style [Relay, OpFusion] Fix handling TupleGetItem for nested tuples (apache#2929) Consistent result of DetectLinearEquation() when an empy vars is passed (apache#2860) [FRONTEND][ONNX] Some bug fixes and Shape operator fixed for relay. (apache#2850) * [FRONTEND][ONNX] Some bug fixes and Shape operator fixed for relay. * * test cases * * ci error Outdated renaming for flatten in ONNX converter (apache#2843) [FRONTEND][TENSORFLOW] bug fix for tensorflow official slim models. (apache#2864) * [FRONTEND][TENSORFLOW] bug fix for tensorflow official slim models. * * review comments Fix vcvtph2ps codegen (apache#2925) Port changes More fixes save save Changes to schedules and mxnet importer save save save save save remove remove
lint lint save save add more case save error lint lint commit do lint save fix lint wrap it back as func lint save remove dead comment fix style fix lint Update src/relay/pass/partial_eval.cc Co-Authored-By: MarisaKirisame <lolisa@marisa.moe> Update src/relay/pass/partial_eval.cc Co-Authored-By: MarisaKirisame <lolisa@marisa.moe> Update src/relay/pass/partial_eval.cc Co-Authored-By: MarisaKirisame <lolisa@marisa.moe> Update src/relay/pass/partial_eval.cc Co-Authored-By: MarisaKirisame <lolisa@marisa.moe> Update src/relay/pass/partial_eval.cc Co-Authored-By: MarisaKirisame <lolisa@marisa.moe> Update src/relay/pass/partial_eval.cc Co-Authored-By: MarisaKirisame <lolisa@marisa.moe> address review feedback pe now handle freevar. as a result preserving function is now trivial. test add basic test, implement pretty printing for generic function test lint fix segfault save save do test fix another error address comment commit save address review feedback add test for invalidate, fix error in lookup rename cont to boduy fix error and add regression test Update src/relay/pass/partial_eval.cc Co-Authored-By: MarisaKirisame <lolisa@marisa.moe> fix error, add test case fix lint remove extra line fix some error pe commit save save save save save (pe/dce broken) [DOCKER] Pin flatbuffers checkout to the last release tag (apache#2823). (apache#2879) [Relay][Text Format] Reverse CallNode Print Order (apache#2882) [NNPACK] Modernize test (apache#2868) [Relay] Add list update to prelude (apache#2866) Add missing sgx includes (apache#2878) Fix setting up hints for getaddrinfo (apache#2872) [ARITH] RewriteSimplifier: improved cmp simplification (apache#2851) do (apache#2883) [RELAY][Frontend][TF] decompile tf control flow (apache#2830) * decompile tf control flow * Add docs * remove import relay * move tests under tensorflow frontend * minor fix Enhance upsample operator to adapt onnx opset version 9 (apache#2840) Use version invariant rustfmt (apache#2886) [Relay][Op] Add group conv2d dispatch to topi function (apache#2870) * [Relay][Op] Add group conv2d dispatch to topi function * Rerun tests [Apps] [howto_deploy] fix cxx-flags order and build directory (apache#2888) fix prelu, now can use on 2d input and add one test (apache#2875) Add dense schedules to __init__ for cpu (apache#2855) * Add dense schedules to __init__ for cpu * Add documentation for topi::shape * Add additional imports to topi CPU __init__. [TESTS] Improve script robustness (apache#2893) A number of test scripts use the '|| exit 1' idiom. This has two issues, first process exit codes are defined to be in the range 0-255. Second, more importantly, the idiom is fragile because it requires that every possible failure point be explicitly coded. This patch removes the idiom in favour of "set -e" as used in the docker scripts as a more robust mechanism to ensure that script failures are always caught and propagated by default. [Relay] Fix name of bias in testing.mlp (apache#2892) winograd_nnpack (apache#2721) [Relay] Fix Relay ARM CPU depthwise spatial pack schedule alter op layout issue. (apache#2861) * Fix Relay ARM CPU spatial pack depthwise alter op layout issue. * Update tune_relay_arm.py [TESTS] Import script robustness (set -u) (apache#2896) Adopt the "set -u" idiom from the docker scripts as a mechanism to improve future robustness. [DOCKER] Upgrade ci-cpu to latest v0.50 (apache#2901) Allow linking against MKLML (apache#2902) [COMMUNITY] ASF mentors (apache#2906) [Relay] Allow converting keras.layers.Sequential (apache#2842) * Allow converting keras.layers.Sequential * Use existing new_var function * Only update expr when missing * Add test [Relay] clean up hd, change tl (apache#2917) Turn on USE_SORT by default (apache#2916) [TEST] Cache test data (apache#2921) Unified error handling in NNVM and Relay frontends (apache#2828) add support for mxnet smooth_l1 (apache#2905) [Relay] Add support for TupleGetItem in op fusion (apache#2914) [Relay, TOPI] Deformable conv2d (apache#2908) * [Relay, TOPI] Add deformable conv2d * Moved to op level2 * Fix lint * Moved to level2 & bug fix * Update comments * Disabled flaky test of conv2d TVM debugresult dump to Chrome Tracing (apache#2922) [Relay] add test for second order ad (apache#2754) * do second order * add comment * better name * use tvm assert all close * refire ci Revert "[Relay] add test for second order ad (apache#2754)" (apache#2926) This reverts commit f5ca991. [Tutorial] Cache the test data in tutorial (apache#2923) [AUTOTVM] Refactor measure build func (apache#2927) Fix intersect of modular set (apache#2904) Fix comment bugs and code style [Relay, OpFusion] Fix handling TupleGetItem for nested tuples (apache#2929) Consistent result of DetectLinearEquation() when an empy vars is passed (apache#2860) [FRONTEND][ONNX] Some bug fixes and Shape operator fixed for relay. (apache#2850) * [FRONTEND][ONNX] Some bug fixes and Shape operator fixed for relay. * * test cases * * ci error Outdated renaming for flatten in ONNX converter (apache#2843) [FRONTEND][TENSORFLOW] bug fix for tensorflow official slim models. (apache#2864) * [FRONTEND][TENSORFLOW] bug fix for tensorflow official slim models. * * review comments Fix vcvtph2ps codegen (apache#2925) Port changes More fixes save save Changes to schedules and mxnet importer save save save save save remove remove save
lint lint save save add more case save error lint lint commit do lint save fix lint wrap it back as func lint save remove dead comment fix style fix lint Update src/relay/pass/partial_eval.cc Co-Authored-By: MarisaKirisame <lolisa@marisa.moe> Update src/relay/pass/partial_eval.cc Co-Authored-By: MarisaKirisame <lolisa@marisa.moe> Update src/relay/pass/partial_eval.cc Co-Authored-By: MarisaKirisame <lolisa@marisa.moe> Update src/relay/pass/partial_eval.cc Co-Authored-By: MarisaKirisame <lolisa@marisa.moe> Update src/relay/pass/partial_eval.cc Co-Authored-By: MarisaKirisame <lolisa@marisa.moe> Update src/relay/pass/partial_eval.cc Co-Authored-By: MarisaKirisame <lolisa@marisa.moe> address review feedback pe now handle freevar. as a result preserving function is now trivial. test add basic test, implement pretty printing for generic function test lint fix segfault save save do test fix another error address comment commit save address review feedback add test for invalidate, fix error in lookup rename cont to boduy fix error and add regression test Update src/relay/pass/partial_eval.cc Co-Authored-By: MarisaKirisame <lolisa@marisa.moe> fix error, add test case fix lint remove extra line fix some error pe commit save save save save save (pe/dce broken) [DOCKER] Pin flatbuffers checkout to the last release tag (apache#2823). (apache#2879) [Relay][Text Format] Reverse CallNode Print Order (apache#2882) [NNPACK] Modernize test (apache#2868) [Relay] Add list update to prelude (apache#2866) Add missing sgx includes (apache#2878) Fix setting up hints for getaddrinfo (apache#2872) [ARITH] RewriteSimplifier: improved cmp simplification (apache#2851) do (apache#2883) [RELAY][Frontend][TF] decompile tf control flow (apache#2830) * decompile tf control flow * Add docs * remove import relay * move tests under tensorflow frontend * minor fix Enhance upsample operator to adapt onnx opset version 9 (apache#2840) Use version invariant rustfmt (apache#2886) [Relay][Op] Add group conv2d dispatch to topi function (apache#2870) * [Relay][Op] Add group conv2d dispatch to topi function * Rerun tests [Apps] [howto_deploy] fix cxx-flags order and build directory (apache#2888) fix prelu, now can use on 2d input and add one test (apache#2875) Add dense schedules to __init__ for cpu (apache#2855) * Add dense schedules to __init__ for cpu * Add documentation for topi::shape * Add additional imports to topi CPU __init__. [TESTS] Improve script robustness (apache#2893) A number of test scripts use the '|| exit 1' idiom. This has two issues, first process exit codes are defined to be in the range 0-255. Second, more importantly, the idiom is fragile because it requires that every possible failure point be explicitly coded. This patch removes the idiom in favour of "set -e" as used in the docker scripts as a more robust mechanism to ensure that script failures are always caught and propagated by default. [Relay] Fix name of bias in testing.mlp (apache#2892) winograd_nnpack (apache#2721) [Relay] Fix Relay ARM CPU depthwise spatial pack schedule alter op layout issue. (apache#2861) * Fix Relay ARM CPU spatial pack depthwise alter op layout issue. * Update tune_relay_arm.py [TESTS] Import script robustness (set -u) (apache#2896) Adopt the "set -u" idiom from the docker scripts as a mechanism to improve future robustness. [DOCKER] Upgrade ci-cpu to latest v0.50 (apache#2901) Allow linking against MKLML (apache#2902) [COMMUNITY] ASF mentors (apache#2906) [Relay] Allow converting keras.layers.Sequential (apache#2842) * Allow converting keras.layers.Sequential * Use existing new_var function * Only update expr when missing * Add test [Relay] clean up hd, change tl (apache#2917) Turn on USE_SORT by default (apache#2916) [TEST] Cache test data (apache#2921) Unified error handling in NNVM and Relay frontends (apache#2828) add support for mxnet smooth_l1 (apache#2905) [Relay] Add support for TupleGetItem in op fusion (apache#2914) [Relay, TOPI] Deformable conv2d (apache#2908) * [Relay, TOPI] Add deformable conv2d * Moved to op level2 * Fix lint * Moved to level2 & bug fix * Update comments * Disabled flaky test of conv2d TVM debugresult dump to Chrome Tracing (apache#2922) [Relay] add test for second order ad (apache#2754) * do second order * add comment * better name * use tvm assert all close * refire ci Revert "[Relay] add test for second order ad (apache#2754)" (apache#2926) This reverts commit f5ca991. [Tutorial] Cache the test data in tutorial (apache#2923) [AUTOTVM] Refactor measure build func (apache#2927) Fix intersect of modular set (apache#2904) Fix comment bugs and code style [Relay, OpFusion] Fix handling TupleGetItem for nested tuples (apache#2929) Consistent result of DetectLinearEquation() when an empy vars is passed (apache#2860) [FRONTEND][ONNX] Some bug fixes and Shape operator fixed for relay. (apache#2850) * [FRONTEND][ONNX] Some bug fixes and Shape operator fixed for relay. * * test cases * * ci error Outdated renaming for flatten in ONNX converter (apache#2843) [FRONTEND][TENSORFLOW] bug fix for tensorflow official slim models. (apache#2864) * [FRONTEND][TENSORFLOW] bug fix for tensorflow official slim models. * * review comments Fix vcvtph2ps codegen (apache#2925) Port changes More fixes save save Changes to schedules and mxnet importer save save save save save remove remove save save
lint lint save save add more case save error lint lint commit do lint save fix lint wrap it back as func lint save remove dead comment fix style fix lint Update src/relay/pass/partial_eval.cc Co-Authored-By: MarisaKirisame <lolisa@marisa.moe> Update src/relay/pass/partial_eval.cc Co-Authored-By: MarisaKirisame <lolisa@marisa.moe> Update src/relay/pass/partial_eval.cc Co-Authored-By: MarisaKirisame <lolisa@marisa.moe> Update src/relay/pass/partial_eval.cc Co-Authored-By: MarisaKirisame <lolisa@marisa.moe> Update src/relay/pass/partial_eval.cc Co-Authored-By: MarisaKirisame <lolisa@marisa.moe> Update src/relay/pass/partial_eval.cc Co-Authored-By: MarisaKirisame <lolisa@marisa.moe> address review feedback pe now handle freevar. as a result preserving function is now trivial. test add basic test, implement pretty printing for generic function test lint fix segfault save save do test fix another error address comment commit save address review feedback add test for invalidate, fix error in lookup rename cont to boduy fix error and add regression test Update src/relay/pass/partial_eval.cc Co-Authored-By: MarisaKirisame <lolisa@marisa.moe> fix error, add test case fix lint remove extra line fix some error pe commit save save save save save (pe/dce broken) [DOCKER] Pin flatbuffers checkout to the last release tag (apache#2823). (apache#2879) [Relay][Text Format] Reverse CallNode Print Order (apache#2882) [NNPACK] Modernize test (apache#2868) [Relay] Add list update to prelude (apache#2866) Add missing sgx includes (apache#2878) Fix setting up hints for getaddrinfo (apache#2872) [ARITH] RewriteSimplifier: improved cmp simplification (apache#2851) do (apache#2883) [RELAY][Frontend][TF] decompile tf control flow (apache#2830) * decompile tf control flow * Add docs * remove import relay * move tests under tensorflow frontend * minor fix Enhance upsample operator to adapt onnx opset version 9 (apache#2840) Use version invariant rustfmt (apache#2886) [Relay][Op] Add group conv2d dispatch to topi function (apache#2870) * [Relay][Op] Add group conv2d dispatch to topi function * Rerun tests [Apps] [howto_deploy] fix cxx-flags order and build directory (apache#2888) fix prelu, now can use on 2d input and add one test (apache#2875) Add dense schedules to __init__ for cpu (apache#2855) * Add dense schedules to __init__ for cpu * Add documentation for topi::shape * Add additional imports to topi CPU __init__. [TESTS] Improve script robustness (apache#2893) A number of test scripts use the '|| exit 1' idiom. This has two issues, first process exit codes are defined to be in the range 0-255. Second, more importantly, the idiom is fragile because it requires that every possible failure point be explicitly coded. This patch removes the idiom in favour of "set -e" as used in the docker scripts as a more robust mechanism to ensure that script failures are always caught and propagated by default. [Relay] Fix name of bias in testing.mlp (apache#2892) winograd_nnpack (apache#2721) [Relay] Fix Relay ARM CPU depthwise spatial pack schedule alter op layout issue. (apache#2861) * Fix Relay ARM CPU spatial pack depthwise alter op layout issue. * Update tune_relay_arm.py [TESTS] Import script robustness (set -u) (apache#2896) Adopt the "set -u" idiom from the docker scripts as a mechanism to improve future robustness. [DOCKER] Upgrade ci-cpu to latest v0.50 (apache#2901) Allow linking against MKLML (apache#2902) [COMMUNITY] ASF mentors (apache#2906) [Relay] Allow converting keras.layers.Sequential (apache#2842) * Allow converting keras.layers.Sequential * Use existing new_var function * Only update expr when missing * Add test [Relay] clean up hd, change tl (apache#2917) Turn on USE_SORT by default (apache#2916) [TEST] Cache test data (apache#2921) Unified error handling in NNVM and Relay frontends (apache#2828) add support for mxnet smooth_l1 (apache#2905) [Relay] Add support for TupleGetItem in op fusion (apache#2914) [Relay, TOPI] Deformable conv2d (apache#2908) * [Relay, TOPI] Add deformable conv2d * Moved to op level2 * Fix lint * Moved to level2 & bug fix * Update comments * Disabled flaky test of conv2d TVM debugresult dump to Chrome Tracing (apache#2922) [Relay] add test for second order ad (apache#2754) * do second order * add comment * better name * use tvm assert all close * refire ci Revert "[Relay] add test for second order ad (apache#2754)" (apache#2926) This reverts commit f5ca991. [Tutorial] Cache the test data in tutorial (apache#2923) [AUTOTVM] Refactor measure build func (apache#2927) Fix intersect of modular set (apache#2904) Fix comment bugs and code style [Relay, OpFusion] Fix handling TupleGetItem for nested tuples (apache#2929) Consistent result of DetectLinearEquation() when an empy vars is passed (apache#2860) [FRONTEND][ONNX] Some bug fixes and Shape operator fixed for relay. (apache#2850) * [FRONTEND][ONNX] Some bug fixes and Shape operator fixed for relay. * * test cases * * ci error Outdated renaming for flatten in ONNX converter (apache#2843) [FRONTEND][TENSORFLOW] bug fix for tensorflow official slim models. (apache#2864) * [FRONTEND][TENSORFLOW] bug fix for tensorflow official slim models. * * review comments Fix vcvtph2ps codegen (apache#2925) Port changes More fixes save save Changes to schedules and mxnet importer save save save save save remove remove save save revert
LLVM fails to generate calls to AVX/AVX512 variants of vcvtph2ps by default, which is inconvenient.
This adds a pattern matches (I'll generalize to arbitrary vector length) which dispatches to the AVX/AVX512 intrinsic functions when available.
This speeds this simple script up by approximately ~5x on my Haswell Core i7.