[WIP] [CodeGen] Fix fp16 -> fp32 codegen on X86-64 #2925

ajtulloch · 2019-03-30T01:09:17Z

LLVM fails to generate calls to AVX/AVX512 variants of vcvtph2ps by default, which is inconvenient.

This adds a pattern matches (I'll generalize to arbitrary vector length) which dispatches to the AVX/AVX512 intrinsic functions when available.

This speeds this simple script up by approximately ~5x on my Haswell Core i7.

import tvm
import numpy as np
# Build a simple op that does 8-wide vectorized Float16 to Float32 conversion.

N = tvm.var("N")
V = 8
X = tvm.placeholder((N, V), name="X", dtype="float16")

def tvm_from_fp16(x):
    return x.astype("float32")

Y = tvm.compute(X.shape, lambda *i: tvm_from_fp16(X(*i)))

s = tvm.create_schedule(Y.op)
s[Y].vectorize(s[Y].op.axis[1])

target = tvm.target.create("llvm -mcpu=core-avx2")

with target:
    func = tvm.build(s, [X, Y])
    # print(func.get_source("asm"))
    # print(func.get_source("ll"))
    te = func.time_evaluator(func.entry_name, ctx=tvm.cpu(), min_repeat_ms=1000)
    x_np = tvm.nd.array(np.random.randn(10000, 8).astype(np.float16))
    y_np = tvm.nd.array(np.random.randn(10000, 8).astype(np.float32))
    print("Before: {:.2f}us".format(te(x_np, y_np).mean * 10 ** 6))
    np.testing.assert_equal(x_np.asnumpy().astype(np.float32), y_np.asnumpy())
    
target = tvm.target.create("llvm -mcpu=core-avx2 -mattr=+avx2,+f16c")

with target:
    func = tvm.build(s, [X, Y])
    # print(func.get_source("asm"))
    # print(func.get_source("ll"))
    te = func.time_evaluator(func.entry_name, ctx=tvm.cpu(), min_repeat_ms=1000)
    x_np = tvm.nd.array(np.random.randn(10000, 8).astype(np.float16))
    y_np = tvm.nd.array(np.random.randn(10000, 8).astype(np.float32))
    print("After: {:.2f}us".format(te(x_np, y_np).mean * 10 ** 6))
    np.testing.assert_equal(x_np.asnumpy().astype(np.float32), y_np.asnumpy())


# Before: 74.09us
# After: 14.90us

ajtulloch · 2019-03-30T01:11:05Z

@tqchen, @yidawang, @yzhliu, this may be of interest to you folks.

One question - is it possible to add unit tests that only execute when the underlying machine supports architectural features? It would be convenient to add a unit test for this that runs on machines with AVX or AVX-512, but I don't know how to express that in the current testing infrastructure.

hlu1 · 2019-03-30T03:23:37Z

@ajtulloch, cpu-info can detect AVX2 or AVX512: https://github.com/pytorch/cpuinfo/blob/40c5f3695b053e5c3d642d9bc34113f3baa71ef2/include/cpuinfo.h#L1009

yidawang

LGTM in general. Just have some questions, mostly for my own educational purpose :)

yidawang · 2019-03-30T06:26:17Z

src/codegen/llvm/codegen_x86_64.cc

@@ -0,0 +1,68 @@
+/*!
+ *  Copyright (c) 2017 by Contributors


how time flies

yidawang · 2019-03-30T06:26:27Z

src/codegen/llvm/codegen_x86_64.cc

+
+llvm::Value* CodeGenX86_64::VisitExpr_(const Cast* op) {
+  // LLVM does not automatically generate the correct instruction sequences for
+  // half -> float conversion (using AVX2/AVX512 variants of vcvtph2ps).


yidawang · 2019-03-30T06:29:12Z

src/codegen/llvm/codegen_x86_64.cc

+        target_machine_->getTargetFeatureString().find("avx512f") != llvm::StringRef::npos;
+
+    // TODO(tulloch): implement version generic over lanes.
+    if (from.lanes() == 8 && (has_f16c || has_avx512f)) {


I don't get the logic here. If lanes==8, why could has_avx512f be true?

lanes == 8 is a property of the Expr we are compiling, has_avx512f is a property of the TargetMachine we are generating code for.

yidawang · 2019-03-30T06:32:09Z

src/codegen/llvm/codegen_x86_64.cc

+    }
+
+    // TODO(tulloch): implement version generic over lanes.
+    if (from.lanes() == 16 && has_avx512f) {


Does has_avx512f==true imply has_f16c==true? And, is there a case that lanes==16 but has_avx512f==false? I am not familiar with the F16C instruction set.

yidawang · 2019-03-30T06:32:49Z

tests/python/unittest/test_codegen_x86.py

+import ctypes
+
+def test_fp16_to_fp32_with_f16c():
+    target = 'llvm -mcpu=core-avx2 -mattr=+f16c'


So the mattr flag needs to be set by the users?

I've fixed this to query the TargetMachine directly, so no changes are needed (assuming users are using the existing -mcpu=core-avx2, -mcpu=skylake-avx512, etc)

yidawang · 2019-03-30T06:37:36Z

tests/python/unittest/test_codegen_x86.py

+    target = 'llvm'
+    elements = 64
+    n = tvm.convert(elements)
+    A = tvm.placeholder((n, 8), dtype="float16", name='A')


How is fp16 handled without F16C?

Without f16c (i.e. vector/scalar vcvtph2ps), we invoke compiler-rt's builtins i.e. https://github.com/llvm-mirror/compiler-rt/blob/7a739a0dfb6d408c6d587e5c7b52abd89fc3fdd3/lib/builtins/fp_extend_impl.inc#L40.

https://godbolt.org/z/23tsVK is an example.

ajtulloch · 2019-03-31T01:40:02Z

@hlu1

@ajtulloch, cpu-info can detect AVX2 or AVX512: https://github.com/pytorch/cpuinfo/blob/40c5f3695b053e5c3d642d9bc34113f3baa71ef2/include/cpuinfo.h#L1009

The problem with that is that this only works if we are generating code for the current host architecture, which isn't what we want. I think the right way here is to continue to pass -mcpu (or -mattr) to LLVM when we want to generate architecture-specific code, and then we can query the constructed llvm::TargetMachine instance to see if it has the supplied feature (which is how LLVM chooses how to select instructions already, so we're guaranteed to be consistent). This is how Rust, etc implements this kind of thing (rust-lang/rust#31709). We can pass -mattr=-f16c,-avx512f, etc if we want to disable this optimization.

tqchen · 2019-03-31T16:31:05Z

Thanks @ajtulloch , @yidawang @hlu1 this is now merged

lint lint save save add more case save error lint lint commit do lint save fix lint wrap it back as func lint save remove dead comment fix style fix lint Update src/relay/pass/partial_eval.cc Co-Authored-By: MarisaKirisame <lolisa@marisa.moe> Update src/relay/pass/partial_eval.cc Co-Authored-By: MarisaKirisame <lolisa@marisa.moe> Update src/relay/pass/partial_eval.cc Co-Authored-By: MarisaKirisame <lolisa@marisa.moe> Update src/relay/pass/partial_eval.cc Co-Authored-By: MarisaKirisame <lolisa@marisa.moe> Update src/relay/pass/partial_eval.cc Co-Authored-By: MarisaKirisame <lolisa@marisa.moe> Update src/relay/pass/partial_eval.cc Co-Authored-By: MarisaKirisame <lolisa@marisa.moe> address review feedback pe now handle freevar. as a result preserving function is now trivial. test add basic test, implement pretty printing for generic function test lint fix segfault save save do test fix another error address comment commit save address review feedback add test for invalidate, fix error in lookup rename cont to boduy fix error and add regression test Update src/relay/pass/partial_eval.cc Co-Authored-By: MarisaKirisame <lolisa@marisa.moe> fix error, add test case fix lint remove extra line fix some error pe commit save save save save save (pe/dce broken) [DOCKER] Pin flatbuffers checkout to the last release tag (apache#2823). (apache#2879) [Relay][Text Format] Reverse CallNode Print Order (apache#2882) [NNPACK] Modernize test (apache#2868) [Relay] Add list update to prelude (apache#2866) Add missing sgx includes (apache#2878) Fix setting up hints for getaddrinfo (apache#2872) [ARITH] RewriteSimplifier: improved cmp simplification (apache#2851) do (apache#2883) [RELAY][Frontend][TF] decompile tf control flow (apache#2830) * decompile tf control flow * Add docs * remove import relay * move tests under tensorflow frontend * minor fix Enhance upsample operator to adapt onnx opset version 9 (apache#2840) Use version invariant rustfmt (apache#2886) [Relay][Op] Add group conv2d dispatch to topi function (apache#2870) * [Relay][Op] Add group conv2d dispatch to topi function * Rerun tests [Apps] [howto_deploy] fix cxx-flags order and build directory (apache#2888) fix prelu, now can use on 2d input and add one test (apache#2875) Add dense schedules to __init__ for cpu (apache#2855) * Add dense schedules to __init__ for cpu * Add documentation for topi::shape * Add additional imports to topi CPU __init__. [TESTS] Improve script robustness (apache#2893) A number of test scripts use the '|| exit 1' idiom. This has two issues, first process exit codes are defined to be in the range 0-255. Second, more importantly, the idiom is fragile because it requires that every possible failure point be explicitly coded. This patch removes the idiom in favour of "set -e" as used in the docker scripts as a more robust mechanism to ensure that script failures are always caught and propagated by default. [Relay] Fix name of bias in testing.mlp (apache#2892) winograd_nnpack (apache#2721) [Relay] Fix Relay ARM CPU depthwise spatial pack schedule alter op layout issue. (apache#2861) * Fix Relay ARM CPU spatial pack depthwise alter op layout issue. * Update tune_relay_arm.py [TESTS] Import script robustness (set -u) (apache#2896) Adopt the "set -u" idiom from the docker scripts as a mechanism to improve future robustness. [DOCKER] Upgrade ci-cpu to latest v0.50 (apache#2901) Allow linking against MKLML (apache#2902) [COMMUNITY] ASF mentors (apache#2906) [Relay] Allow converting keras.layers.Sequential (apache#2842) * Allow converting keras.layers.Sequential * Use existing new_var function * Only update expr when missing * Add test [Relay] clean up hd, change tl (apache#2917) Turn on USE_SORT by default (apache#2916) [TEST] Cache test data (apache#2921) Unified error handling in NNVM and Relay frontends (apache#2828) add support for mxnet smooth_l1 (apache#2905) [Relay] Add support for TupleGetItem in op fusion (apache#2914) [Relay, TOPI] Deformable conv2d (apache#2908) * [Relay, TOPI] Add deformable conv2d * Moved to op level2 * Fix lint * Moved to level2 & bug fix * Update comments * Disabled flaky test of conv2d TVM debugresult dump to Chrome Tracing (apache#2922) [Relay] add test for second order ad (apache#2754) * do second order * add comment * better name * use tvm assert all close * refire ci Revert "[Relay] add test for second order ad (apache#2754)" (apache#2926) This reverts commit f5ca991. [Tutorial] Cache the test data in tutorial (apache#2923) [AUTOTVM] Refactor measure build func (apache#2927) Fix intersect of modular set (apache#2904) Fix comment bugs and code style [Relay, OpFusion] Fix handling TupleGetItem for nested tuples (apache#2929) Consistent result of DetectLinearEquation() when an empy vars is passed (apache#2860) [FRONTEND][ONNX] Some bug fixes and Shape operator fixed for relay. (apache#2850) * [FRONTEND][ONNX] Some bug fixes and Shape operator fixed for relay. * * test cases * * ci error Outdated renaming for flatten in ONNX converter (apache#2843) [FRONTEND][TENSORFLOW] bug fix for tensorflow official slim models. (apache#2864) * [FRONTEND][TENSORFLOW] bug fix for tensorflow official slim models. * * review comments Fix vcvtph2ps codegen (apache#2925) Port changes More fixes save save Changes to schedules and mxnet importer

lint lint save save add more case save error lint lint commit do lint save fix lint wrap it back as func lint save remove dead comment fix style fix lint Update src/relay/pass/partial_eval.cc Co-Authored-By: MarisaKirisame <lolisa@marisa.moe> Update src/relay/pass/partial_eval.cc Co-Authored-By: MarisaKirisame <lolisa@marisa.moe> Update src/relay/pass/partial_eval.cc Co-Authored-By: MarisaKirisame <lolisa@marisa.moe> Update src/relay/pass/partial_eval.cc Co-Authored-By: MarisaKirisame <lolisa@marisa.moe> Update src/relay/pass/partial_eval.cc Co-Authored-By: MarisaKirisame <lolisa@marisa.moe> Update src/relay/pass/partial_eval.cc Co-Authored-By: MarisaKirisame <lolisa@marisa.moe> address review feedback pe now handle freevar. as a result preserving function is now trivial. test add basic test, implement pretty printing for generic function test lint fix segfault save save do test fix another error address comment commit save address review feedback add test for invalidate, fix error in lookup rename cont to boduy fix error and add regression test Update src/relay/pass/partial_eval.cc Co-Authored-By: MarisaKirisame <lolisa@marisa.moe> fix error, add test case fix lint remove extra line fix some error pe commit save save save save save (pe/dce broken) [DOCKER] Pin flatbuffers checkout to the last release tag (apache#2823). (apache#2879) [Relay][Text Format] Reverse CallNode Print Order (apache#2882) [NNPACK] Modernize test (apache#2868) [Relay] Add list update to prelude (apache#2866) Add missing sgx includes (apache#2878) Fix setting up hints for getaddrinfo (apache#2872) [ARITH] RewriteSimplifier: improved cmp simplification (apache#2851) do (apache#2883) [RELAY][Frontend][TF] decompile tf control flow (apache#2830) * decompile tf control flow * Add docs * remove import relay * move tests under tensorflow frontend * minor fix Enhance upsample operator to adapt onnx opset version 9 (apache#2840) Use version invariant rustfmt (apache#2886) [Relay][Op] Add group conv2d dispatch to topi function (apache#2870) * [Relay][Op] Add group conv2d dispatch to topi function * Rerun tests [Apps] [howto_deploy] fix cxx-flags order and build directory (apache#2888) fix prelu, now can use on 2d input and add one test (apache#2875) Add dense schedules to __init__ for cpu (apache#2855) * Add dense schedules to __init__ for cpu * Add documentation for topi::shape * Add additional imports to topi CPU __init__. [TESTS] Improve script robustness (apache#2893) A number of test scripts use the '|| exit 1' idiom. This has two issues, first process exit codes are defined to be in the range 0-255. Second, more importantly, the idiom is fragile because it requires that every possible failure point be explicitly coded. This patch removes the idiom in favour of "set -e" as used in the docker scripts as a more robust mechanism to ensure that script failures are always caught and propagated by default. [Relay] Fix name of bias in testing.mlp (apache#2892) winograd_nnpack (apache#2721) [Relay] Fix Relay ARM CPU depthwise spatial pack schedule alter op layout issue. (apache#2861) * Fix Relay ARM CPU spatial pack depthwise alter op layout issue. * Update tune_relay_arm.py [TESTS] Import script robustness (set -u) (apache#2896) Adopt the "set -u" idiom from the docker scripts as a mechanism to improve future robustness. [DOCKER] Upgrade ci-cpu to latest v0.50 (apache#2901) Allow linking against MKLML (apache#2902) [COMMUNITY] ASF mentors (apache#2906) [Relay] Allow converting keras.layers.Sequential (apache#2842) * Allow converting keras.layers.Sequential * Use existing new_var function * Only update expr when missing * Add test [Relay] clean up hd, change tl (apache#2917) Turn on USE_SORT by default (apache#2916) [TEST] Cache test data (apache#2921) Unified error handling in NNVM and Relay frontends (apache#2828) add support for mxnet smooth_l1 (apache#2905) [Relay] Add support for TupleGetItem in op fusion (apache#2914) [Relay, TOPI] Deformable conv2d (apache#2908) * [Relay, TOPI] Add deformable conv2d * Moved to op level2 * Fix lint * Moved to level2 & bug fix * Update comments * Disabled flaky test of conv2d TVM debugresult dump to Chrome Tracing (apache#2922) [Relay] add test for second order ad (apache#2754) * do second order * add comment * better name * use tvm assert all close * refire ci Revert "[Relay] add test for second order ad (apache#2754)" (apache#2926) This reverts commit f5ca991. [Tutorial] Cache the test data in tutorial (apache#2923) [AUTOTVM] Refactor measure build func (apache#2927) Fix intersect of modular set (apache#2904) Fix comment bugs and code style [Relay, OpFusion] Fix handling TupleGetItem for nested tuples (apache#2929) Consistent result of DetectLinearEquation() when an empy vars is passed (apache#2860) [FRONTEND][ONNX] Some bug fixes and Shape operator fixed for relay. (apache#2850) * [FRONTEND][ONNX] Some bug fixes and Shape operator fixed for relay. * * test cases * * ci error Outdated renaming for flatten in ONNX converter (apache#2843) [FRONTEND][TENSORFLOW] bug fix for tensorflow official slim models. (apache#2864) * [FRONTEND][TENSORFLOW] bug fix for tensorflow official slim models. * * review comments Fix vcvtph2ps codegen (apache#2925) Port changes More fixes save save Changes to schedules and mxnet importer save save save save save

lint lint save save add more case save error lint lint commit do lint save fix lint wrap it back as func lint save remove dead comment fix style fix lint Update src/relay/pass/partial_eval.cc Co-Authored-By: MarisaKirisame <lolisa@marisa.moe> Update src/relay/pass/partial_eval.cc Co-Authored-By: MarisaKirisame <lolisa@marisa.moe> Update src/relay/pass/partial_eval.cc Co-Authored-By: MarisaKirisame <lolisa@marisa.moe> Update src/relay/pass/partial_eval.cc Co-Authored-By: MarisaKirisame <lolisa@marisa.moe> Update src/relay/pass/partial_eval.cc Co-Authored-By: MarisaKirisame <lolisa@marisa.moe> Update src/relay/pass/partial_eval.cc Co-Authored-By: MarisaKirisame <lolisa@marisa.moe> address review feedback pe now handle freevar. as a result preserving function is now trivial. test add basic test, implement pretty printing for generic function test lint fix segfault save save do test fix another error address comment commit save address review feedback add test for invalidate, fix error in lookup rename cont to boduy fix error and add regression test Update src/relay/pass/partial_eval.cc Co-Authored-By: MarisaKirisame <lolisa@marisa.moe> fix error, add test case fix lint remove extra line fix some error pe commit save save save save save (pe/dce broken) [DOCKER] Pin flatbuffers checkout to the last release tag (apache#2823). (apache#2879) [Relay][Text Format] Reverse CallNode Print Order (apache#2882) [NNPACK] Modernize test (apache#2868) [Relay] Add list update to prelude (apache#2866) Add missing sgx includes (apache#2878) Fix setting up hints for getaddrinfo (apache#2872) [ARITH] RewriteSimplifier: improved cmp simplification (apache#2851) do (apache#2883) [RELAY][Frontend][TF] decompile tf control flow (apache#2830) * decompile tf control flow * Add docs * remove import relay * move tests under tensorflow frontend * minor fix Enhance upsample operator to adapt onnx opset version 9 (apache#2840) Use version invariant rustfmt (apache#2886) [Relay][Op] Add group conv2d dispatch to topi function (apache#2870) * [Relay][Op] Add group conv2d dispatch to topi function * Rerun tests [Apps] [howto_deploy] fix cxx-flags order and build directory (apache#2888) fix prelu, now can use on 2d input and add one test (apache#2875) Add dense schedules to __init__ for cpu (apache#2855) * Add dense schedules to __init__ for cpu * Add documentation for topi::shape * Add additional imports to topi CPU __init__. [TESTS] Improve script robustness (apache#2893) A number of test scripts use the '|| exit 1' idiom. This has two issues, first process exit codes are defined to be in the range 0-255. Second, more importantly, the idiom is fragile because it requires that every possible failure point be explicitly coded. This patch removes the idiom in favour of "set -e" as used in the docker scripts as a more robust mechanism to ensure that script failures are always caught and propagated by default. [Relay] Fix name of bias in testing.mlp (apache#2892) winograd_nnpack (apache#2721) [Relay] Fix Relay ARM CPU depthwise spatial pack schedule alter op layout issue. (apache#2861) * Fix Relay ARM CPU spatial pack depthwise alter op layout issue. * Update tune_relay_arm.py [TESTS] Import script robustness (set -u) (apache#2896) Adopt the "set -u" idiom from the docker scripts as a mechanism to improve future robustness. [DOCKER] Upgrade ci-cpu to latest v0.50 (apache#2901) Allow linking against MKLML (apache#2902) [COMMUNITY] ASF mentors (apache#2906) [Relay] Allow converting keras.layers.Sequential (apache#2842) * Allow converting keras.layers.Sequential * Use existing new_var function * Only update expr when missing * Add test [Relay] clean up hd, change tl (apache#2917) Turn on USE_SORT by default (apache#2916) [TEST] Cache test data (apache#2921) Unified error handling in NNVM and Relay frontends (apache#2828) add support for mxnet smooth_l1 (apache#2905) [Relay] Add support for TupleGetItem in op fusion (apache#2914) [Relay, TOPI] Deformable conv2d (apache#2908) * [Relay, TOPI] Add deformable conv2d * Moved to op level2 * Fix lint * Moved to level2 & bug fix * Update comments * Disabled flaky test of conv2d TVM debugresult dump to Chrome Tracing (apache#2922) [Relay] add test for second order ad (apache#2754) * do second order * add comment * better name * use tvm assert all close * refire ci Revert "[Relay] add test for second order ad (apache#2754)" (apache#2926) This reverts commit f5ca991. [Tutorial] Cache the test data in tutorial (apache#2923) [AUTOTVM] Refactor measure build func (apache#2927) Fix intersect of modular set (apache#2904) Fix comment bugs and code style [Relay, OpFusion] Fix handling TupleGetItem for nested tuples (apache#2929) Consistent result of DetectLinearEquation() when an empy vars is passed (apache#2860) [FRONTEND][ONNX] Some bug fixes and Shape operator fixed for relay. (apache#2850) * [FRONTEND][ONNX] Some bug fixes and Shape operator fixed for relay. * * test cases * * ci error Outdated renaming for flatten in ONNX converter (apache#2843) [FRONTEND][TENSORFLOW] bug fix for tensorflow official slim models. (apache#2864) * [FRONTEND][TENSORFLOW] bug fix for tensorflow official slim models. * * review comments Fix vcvtph2ps codegen (apache#2925) Port changes More fixes save save Changes to schedules and mxnet importer save save save save save remove remove

lint lint save save add more case save error lint lint commit do lint save fix lint wrap it back as func lint save remove dead comment fix style fix lint Update src/relay/pass/partial_eval.cc Co-Authored-By: MarisaKirisame <lolisa@marisa.moe> Update src/relay/pass/partial_eval.cc Co-Authored-By: MarisaKirisame <lolisa@marisa.moe> Update src/relay/pass/partial_eval.cc Co-Authored-By: MarisaKirisame <lolisa@marisa.moe> Update src/relay/pass/partial_eval.cc Co-Authored-By: MarisaKirisame <lolisa@marisa.moe> Update src/relay/pass/partial_eval.cc Co-Authored-By: MarisaKirisame <lolisa@marisa.moe> Update src/relay/pass/partial_eval.cc Co-Authored-By: MarisaKirisame <lolisa@marisa.moe> address review feedback pe now handle freevar. as a result preserving function is now trivial. test add basic test, implement pretty printing for generic function test lint fix segfault save save do test fix another error address comment commit save address review feedback add test for invalidate, fix error in lookup rename cont to boduy fix error and add regression test Update src/relay/pass/partial_eval.cc Co-Authored-By: MarisaKirisame <lolisa@marisa.moe> fix error, add test case fix lint remove extra line fix some error pe commit save save save save save (pe/dce broken) [DOCKER] Pin flatbuffers checkout to the last release tag (apache#2823). (apache#2879) [Relay][Text Format] Reverse CallNode Print Order (apache#2882) [NNPACK] Modernize test (apache#2868) [Relay] Add list update to prelude (apache#2866) Add missing sgx includes (apache#2878) Fix setting up hints for getaddrinfo (apache#2872) [ARITH] RewriteSimplifier: improved cmp simplification (apache#2851) do (apache#2883) [RELAY][Frontend][TF] decompile tf control flow (apache#2830) * decompile tf control flow * Add docs * remove import relay * move tests under tensorflow frontend * minor fix Enhance upsample operator to adapt onnx opset version 9 (apache#2840) Use version invariant rustfmt (apache#2886) [Relay][Op] Add group conv2d dispatch to topi function (apache#2870) * [Relay][Op] Add group conv2d dispatch to topi function * Rerun tests [Apps] [howto_deploy] fix cxx-flags order and build directory (apache#2888) fix prelu, now can use on 2d input and add one test (apache#2875) Add dense schedules to __init__ for cpu (apache#2855) * Add dense schedules to __init__ for cpu * Add documentation for topi::shape * Add additional imports to topi CPU __init__. [TESTS] Improve script robustness (apache#2893) A number of test scripts use the '|| exit 1' idiom. This has two issues, first process exit codes are defined to be in the range 0-255. Second, more importantly, the idiom is fragile because it requires that every possible failure point be explicitly coded. This patch removes the idiom in favour of "set -e" as used in the docker scripts as a more robust mechanism to ensure that script failures are always caught and propagated by default. [Relay] Fix name of bias in testing.mlp (apache#2892) winograd_nnpack (apache#2721) [Relay] Fix Relay ARM CPU depthwise spatial pack schedule alter op layout issue. (apache#2861) * Fix Relay ARM CPU spatial pack depthwise alter op layout issue. * Update tune_relay_arm.py [TESTS] Import script robustness (set -u) (apache#2896) Adopt the "set -u" idiom from the docker scripts as a mechanism to improve future robustness. [DOCKER] Upgrade ci-cpu to latest v0.50 (apache#2901) Allow linking against MKLML (apache#2902) [COMMUNITY] ASF mentors (apache#2906) [Relay] Allow converting keras.layers.Sequential (apache#2842) * Allow converting keras.layers.Sequential * Use existing new_var function * Only update expr when missing * Add test [Relay] clean up hd, change tl (apache#2917) Turn on USE_SORT by default (apache#2916) [TEST] Cache test data (apache#2921) Unified error handling in NNVM and Relay frontends (apache#2828) add support for mxnet smooth_l1 (apache#2905) [Relay] Add support for TupleGetItem in op fusion (apache#2914) [Relay, TOPI] Deformable conv2d (apache#2908) * [Relay, TOPI] Add deformable conv2d * Moved to op level2 * Fix lint * Moved to level2 & bug fix * Update comments * Disabled flaky test of conv2d TVM debugresult dump to Chrome Tracing (apache#2922) [Relay] add test for second order ad (apache#2754) * do second order * add comment * better name * use tvm assert all close * refire ci Revert "[Relay] add test for second order ad (apache#2754)" (apache#2926) This reverts commit f5ca991. [Tutorial] Cache the test data in tutorial (apache#2923) [AUTOTVM] Refactor measure build func (apache#2927) Fix intersect of modular set (apache#2904) Fix comment bugs and code style [Relay, OpFusion] Fix handling TupleGetItem for nested tuples (apache#2929) Consistent result of DetectLinearEquation() when an empy vars is passed (apache#2860) [FRONTEND][ONNX] Some bug fixes and Shape operator fixed for relay. (apache#2850) * [FRONTEND][ONNX] Some bug fixes and Shape operator fixed for relay. * * test cases * * ci error Outdated renaming for flatten in ONNX converter (apache#2843) [FRONTEND][TENSORFLOW] bug fix for tensorflow official slim models. (apache#2864) * [FRONTEND][TENSORFLOW] bug fix for tensorflow official slim models. * * review comments Fix vcvtph2ps codegen (apache#2925) Port changes More fixes save save Changes to schedules and mxnet importer save save save save save remove remove save

lint lint save save add more case save error lint lint commit do lint save fix lint wrap it back as func lint save remove dead comment fix style fix lint Update src/relay/pass/partial_eval.cc Co-Authored-By: MarisaKirisame <lolisa@marisa.moe> Update src/relay/pass/partial_eval.cc Co-Authored-By: MarisaKirisame <lolisa@marisa.moe> Update src/relay/pass/partial_eval.cc Co-Authored-By: MarisaKirisame <lolisa@marisa.moe> Update src/relay/pass/partial_eval.cc Co-Authored-By: MarisaKirisame <lolisa@marisa.moe> Update src/relay/pass/partial_eval.cc Co-Authored-By: MarisaKirisame <lolisa@marisa.moe> Update src/relay/pass/partial_eval.cc Co-Authored-By: MarisaKirisame <lolisa@marisa.moe> address review feedback pe now handle freevar. as a result preserving function is now trivial. test add basic test, implement pretty printing for generic function test lint fix segfault save save do test fix another error address comment commit save address review feedback add test for invalidate, fix error in lookup rename cont to boduy fix error and add regression test Update src/relay/pass/partial_eval.cc Co-Authored-By: MarisaKirisame <lolisa@marisa.moe> fix error, add test case fix lint remove extra line fix some error pe commit save save save save save (pe/dce broken) [DOCKER] Pin flatbuffers checkout to the last release tag (apache#2823). (apache#2879) [Relay][Text Format] Reverse CallNode Print Order (apache#2882) [NNPACK] Modernize test (apache#2868) [Relay] Add list update to prelude (apache#2866) Add missing sgx includes (apache#2878) Fix setting up hints for getaddrinfo (apache#2872) [ARITH] RewriteSimplifier: improved cmp simplification (apache#2851) do (apache#2883) [RELAY][Frontend][TF] decompile tf control flow (apache#2830) * decompile tf control flow * Add docs * remove import relay * move tests under tensorflow frontend * minor fix Enhance upsample operator to adapt onnx opset version 9 (apache#2840) Use version invariant rustfmt (apache#2886) [Relay][Op] Add group conv2d dispatch to topi function (apache#2870) * [Relay][Op] Add group conv2d dispatch to topi function * Rerun tests [Apps] [howto_deploy] fix cxx-flags order and build directory (apache#2888) fix prelu, now can use on 2d input and add one test (apache#2875) Add dense schedules to __init__ for cpu (apache#2855) * Add dense schedules to __init__ for cpu * Add documentation for topi::shape * Add additional imports to topi CPU __init__. [TESTS] Improve script robustness (apache#2893) A number of test scripts use the '|| exit 1' idiom. This has two issues, first process exit codes are defined to be in the range 0-255. Second, more importantly, the idiom is fragile because it requires that every possible failure point be explicitly coded. This patch removes the idiom in favour of "set -e" as used in the docker scripts as a more robust mechanism to ensure that script failures are always caught and propagated by default. [Relay] Fix name of bias in testing.mlp (apache#2892) winograd_nnpack (apache#2721) [Relay] Fix Relay ARM CPU depthwise spatial pack schedule alter op layout issue. (apache#2861) * Fix Relay ARM CPU spatial pack depthwise alter op layout issue. * Update tune_relay_arm.py [TESTS] Import script robustness (set -u) (apache#2896) Adopt the "set -u" idiom from the docker scripts as a mechanism to improve future robustness. [DOCKER] Upgrade ci-cpu to latest v0.50 (apache#2901) Allow linking against MKLML (apache#2902) [COMMUNITY] ASF mentors (apache#2906) [Relay] Allow converting keras.layers.Sequential (apache#2842) * Allow converting keras.layers.Sequential * Use existing new_var function * Only update expr when missing * Add test [Relay] clean up hd, change tl (apache#2917) Turn on USE_SORT by default (apache#2916) [TEST] Cache test data (apache#2921) Unified error handling in NNVM and Relay frontends (apache#2828) add support for mxnet smooth_l1 (apache#2905) [Relay] Add support for TupleGetItem in op fusion (apache#2914) [Relay, TOPI] Deformable conv2d (apache#2908) * [Relay, TOPI] Add deformable conv2d * Moved to op level2 * Fix lint * Moved to level2 & bug fix * Update comments * Disabled flaky test of conv2d TVM debugresult dump to Chrome Tracing (apache#2922) [Relay] add test for second order ad (apache#2754) * do second order * add comment * better name * use tvm assert all close * refire ci Revert "[Relay] add test for second order ad (apache#2754)" (apache#2926) This reverts commit f5ca991. [Tutorial] Cache the test data in tutorial (apache#2923) [AUTOTVM] Refactor measure build func (apache#2927) Fix intersect of modular set (apache#2904) Fix comment bugs and code style [Relay, OpFusion] Fix handling TupleGetItem for nested tuples (apache#2929) Consistent result of DetectLinearEquation() when an empy vars is passed (apache#2860) [FRONTEND][ONNX] Some bug fixes and Shape operator fixed for relay. (apache#2850) * [FRONTEND][ONNX] Some bug fixes and Shape operator fixed for relay. * * test cases * * ci error Outdated renaming for flatten in ONNX converter (apache#2843) [FRONTEND][TENSORFLOW] bug fix for tensorflow official slim models. (apache#2864) * [FRONTEND][TENSORFLOW] bug fix for tensorflow official slim models. * * review comments Fix vcvtph2ps codegen (apache#2925) Port changes More fixes save save Changes to schedules and mxnet importer save save save save save remove remove save save

lint lint save save add more case save error lint lint commit do lint save fix lint wrap it back as func lint save remove dead comment fix style fix lint Update src/relay/pass/partial_eval.cc Co-Authored-By: MarisaKirisame <lolisa@marisa.moe> Update src/relay/pass/partial_eval.cc Co-Authored-By: MarisaKirisame <lolisa@marisa.moe> Update src/relay/pass/partial_eval.cc Co-Authored-By: MarisaKirisame <lolisa@marisa.moe> Update src/relay/pass/partial_eval.cc Co-Authored-By: MarisaKirisame <lolisa@marisa.moe> Update src/relay/pass/partial_eval.cc Co-Authored-By: MarisaKirisame <lolisa@marisa.moe> Update src/relay/pass/partial_eval.cc Co-Authored-By: MarisaKirisame <lolisa@marisa.moe> address review feedback pe now handle freevar. as a result preserving function is now trivial. test add basic test, implement pretty printing for generic function test lint fix segfault save save do test fix another error address comment commit save address review feedback add test for invalidate, fix error in lookup rename cont to boduy fix error and add regression test Update src/relay/pass/partial_eval.cc Co-Authored-By: MarisaKirisame <lolisa@marisa.moe> fix error, add test case fix lint remove extra line fix some error pe commit save save save save save (pe/dce broken) [DOCKER] Pin flatbuffers checkout to the last release tag (apache#2823). (apache#2879) [Relay][Text Format] Reverse CallNode Print Order (apache#2882) [NNPACK] Modernize test (apache#2868) [Relay] Add list update to prelude (apache#2866) Add missing sgx includes (apache#2878) Fix setting up hints for getaddrinfo (apache#2872) [ARITH] RewriteSimplifier: improved cmp simplification (apache#2851) do (apache#2883) [RELAY][Frontend][TF] decompile tf control flow (apache#2830) * decompile tf control flow * Add docs * remove import relay * move tests under tensorflow frontend * minor fix Enhance upsample operator to adapt onnx opset version 9 (apache#2840) Use version invariant rustfmt (apache#2886) [Relay][Op] Add group conv2d dispatch to topi function (apache#2870) * [Relay][Op] Add group conv2d dispatch to topi function * Rerun tests [Apps] [howto_deploy] fix cxx-flags order and build directory (apache#2888) fix prelu, now can use on 2d input and add one test (apache#2875) Add dense schedules to __init__ for cpu (apache#2855) * Add dense schedules to __init__ for cpu * Add documentation for topi::shape * Add additional imports to topi CPU __init__. [TESTS] Improve script robustness (apache#2893) A number of test scripts use the '|| exit 1' idiom. This has two issues, first process exit codes are defined to be in the range 0-255. Second, more importantly, the idiom is fragile because it requires that every possible failure point be explicitly coded. This patch removes the idiom in favour of "set -e" as used in the docker scripts as a more robust mechanism to ensure that script failures are always caught and propagated by default. [Relay] Fix name of bias in testing.mlp (apache#2892) winograd_nnpack (apache#2721) [Relay] Fix Relay ARM CPU depthwise spatial pack schedule alter op layout issue. (apache#2861) * Fix Relay ARM CPU spatial pack depthwise alter op layout issue. * Update tune_relay_arm.py [TESTS] Import script robustness (set -u) (apache#2896) Adopt the "set -u" idiom from the docker scripts as a mechanism to improve future robustness. [DOCKER] Upgrade ci-cpu to latest v0.50 (apache#2901) Allow linking against MKLML (apache#2902) [COMMUNITY] ASF mentors (apache#2906) [Relay] Allow converting keras.layers.Sequential (apache#2842) * Allow converting keras.layers.Sequential * Use existing new_var function * Only update expr when missing * Add test [Relay] clean up hd, change tl (apache#2917) Turn on USE_SORT by default (apache#2916) [TEST] Cache test data (apache#2921) Unified error handling in NNVM and Relay frontends (apache#2828) add support for mxnet smooth_l1 (apache#2905) [Relay] Add support for TupleGetItem in op fusion (apache#2914) [Relay, TOPI] Deformable conv2d (apache#2908) * [Relay, TOPI] Add deformable conv2d * Moved to op level2 * Fix lint * Moved to level2 & bug fix * Update comments * Disabled flaky test of conv2d TVM debugresult dump to Chrome Tracing (apache#2922) [Relay] add test for second order ad (apache#2754) * do second order * add comment * better name * use tvm assert all close * refire ci Revert "[Relay] add test for second order ad (apache#2754)" (apache#2926) This reverts commit f5ca991. [Tutorial] Cache the test data in tutorial (apache#2923) [AUTOTVM] Refactor measure build func (apache#2927) Fix intersect of modular set (apache#2904) Fix comment bugs and code style [Relay, OpFusion] Fix handling TupleGetItem for nested tuples (apache#2929) Consistent result of DetectLinearEquation() when an empy vars is passed (apache#2860) [FRONTEND][ONNX] Some bug fixes and Shape operator fixed for relay. (apache#2850) * [FRONTEND][ONNX] Some bug fixes and Shape operator fixed for relay. * * test cases * * ci error Outdated renaming for flatten in ONNX converter (apache#2843) [FRONTEND][TENSORFLOW] bug fix for tensorflow official slim models. (apache#2864) * [FRONTEND][TENSORFLOW] bug fix for tensorflow official slim models. * * review comments Fix vcvtph2ps codegen (apache#2925) Port changes More fixes save save Changes to schedules and mxnet importer save save save save save remove remove save save revert

ajtulloch force-pushed the vcvtph2ps branch from 281c113 to d529109 Compare March 30, 2019 01:11

tqchen added the status: need review label Mar 30, 2019

yidawang reviewed Mar 30, 2019

View reviewed changes

ajtulloch force-pushed the vcvtph2ps branch 8 times, most recently from 50798e4 to c5f5719 Compare March 31, 2019 01:37

ajtulloch force-pushed the vcvtph2ps branch 5 times, most recently from ce5bd60 to e88c1c8 Compare March 31, 2019 02:52

yidawang approved these changes Mar 31, 2019

View reviewed changes

ajtulloch force-pushed the vcvtph2ps branch 3 times, most recently from 215ae63 to a842f1b Compare March 31, 2019 14:08

Fix vcvtph2ps codegen

337dc0a

ajtulloch force-pushed the vcvtph2ps branch from a842f1b to 337dc0a Compare March 31, 2019 14:14

tqchen approved these changes Mar 31, 2019

View reviewed changes

tqchen merged commit eb1ed11 into apache:master Mar 31, 2019

tqchen added status: accepted and removed status: need review labels Mar 31, 2019

wweic pushed a commit to wweic/tvm that referenced this pull request Apr 7, 2019

Fix vcvtph2ps codegen (apache#2925)

8b2237b

wweic pushed a commit to wweic/tvm that referenced this pull request Apr 7, 2019

Fix vcvtph2ps codegen (apache#2925)

290d5cf

wweic pushed a commit to wweic/tvm that referenced this pull request Apr 8, 2019

Fix vcvtph2ps codegen (apache#2925)

7cb30b8

MarisaKirisame pushed a commit to MarisaKirisame/tvm that referenced this pull request Apr 9, 2019

Fix vcvtph2ps codegen (apache#2925)

5f26896

MarisaKirisame mentioned this pull request Apr 9, 2019

Track MarisaKirisame/tvm#3

Closed

wweic pushed a commit to wweic/tvm that referenced this pull request Apr 10, 2019

Fix vcvtph2ps codegen (apache#2925)

189863f

wweic pushed a commit to neo-ai/tvm that referenced this pull request Apr 11, 2019

Fix vcvtph2ps codegen (apache#2925)

a102547

tqchen mentioned this pull request Nov 8, 2019

[RELEASE][DRAFT] TVM v0.6 Release candidate #4259

Closed

bertmaher mentioned this pull request Apr 14, 2021

Converting float16->bool causes an internal compiler error with llvm pytorch/pytorch#56090

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] [CodeGen] Fix fp16 -> fp32 codegen on X86-64 #2925

[WIP] [CodeGen] Fix fp16 -> fp32 codegen on X86-64 #2925

ajtulloch commented Mar 30, 2019 •

edited

Loading

ajtulloch commented Mar 30, 2019

hlu1 commented Mar 30, 2019

yidawang left a comment

yidawang Mar 30, 2019

yidawang Mar 30, 2019

yidawang Mar 30, 2019

ajtulloch Mar 30, 2019

yidawang Mar 30, 2019

yidawang Mar 30, 2019

ajtulloch Mar 30, 2019

yidawang Mar 30, 2019

ajtulloch Mar 30, 2019

ajtulloch commented Mar 31, 2019

tqchen commented Mar 31, 2019

[WIP] [CodeGen] Fix fp16 -> fp32 codegen on X86-64 #2925

[WIP] [CodeGen] Fix fp16 -> fp32 codegen on X86-64 #2925

Conversation

ajtulloch commented Mar 30, 2019 • edited Loading

ajtulloch commented Mar 30, 2019

hlu1 commented Mar 30, 2019

yidawang left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ajtulloch commented Mar 31, 2019

tqchen commented Mar 31, 2019

ajtulloch commented Mar 30, 2019 •

edited

Loading