[VTA][OpenCL] Cloud FPGA support #5842

zhanghaohit · 2020-06-18T14:59:49Z

This PR, coupled with this on tvm-vta repo, is the basic implementation of RFC #5840

Some notes:

this is just a basic version without much performance optimization. We've done some optimization, and achieved significant improvement (but this part of code is not ready; has to be organized and cleaned up further).
there are some experimental features, which are also included in this PR, including
- sync all the instructions per model (instead of per layer)
- static auto-tune using profiling results
These codes are not well styled (using some environmental variables), and we have to think about how to format/implement nicely. But we think these features are useful so we also commit. We can discuss if we should include these codes in this PR, or leave it for other PRs.

- add load acc_int8 in simulation - remove copy op - add vta schedule - add always 32-bits

tmoreau89 · 2020-06-23T21:22:09Z

Thanks @zhanghaohit I can generate the new VTA design on Pynq to verify that it still passes on the new ISA. I will let you know if this passes. Installation documentation will be highly welcome.

tmoreau89 · 2020-06-26T19:06:17Z

@zhanghaohit good news, the end to end tests ran successfully on VTA hardware with your changes. Given that this PR brings modifications to the ISA spec, we need to bump the HW_VER string from 0.0.1 to 0.0.2. I can upload the Pynq bitstream so it's ready for use. I will need some help from @liangfu to re-generate the DE-10 bitstream as well.

tmoreau89 · 2020-06-26T19:06:38Z

In addition were you able to put a guide together? Which Intel FPGA have you tested this on?

zhanghaohit · 2020-06-29T03:11:41Z

@zhanghaohit good news, the end to end tests ran successfully on VTA hardware with your changes. Given that this PR brings modifications to the ISA spec, we need to bump the HW_VER string from 0.0.1 to 0.0.2. I can upload the Pynq bitstream so it's ready for use. I will need some help from @liangfu to re-generate the DE-10 bitstream as well.

Thanks @tmoreau89 for the help with the tests.

In addition were you able to put a guide together? Which Intel FPGA have you tested this on?

Sure. What guide are you referring to? The guide on HW_VER?

We're using Intel Arria 10.

tmoreau89 · 2020-06-30T16:31:12Z

@zhanghaohit I was referring to an install guide as the one shown here: https://tvm.apache.org/docs/vta/install.html

The source file is found here: https://github.com/apache/incubator-tvm/blob/master/docs/vta/install.rst

tmoreau89 · 2020-07-03T01:32:15Z

Note that I've also uploaded the Pynq bitstream given the ISA changes here: https://github.com/uwsampl/vta-distro/tree/master/bitstreams/pynq/0.0.2

It would be great @liangfu if you had the bandwidth to synthesize the DE10 bistream as well and submit a pull request. Let me know if you are tight on time and may need some help.

Finally, I will add an ultra96 variant in there as well. It would be great @zhanghaohit if you could also pre-generate the Arria10 bitstream and add it to the vta-distro repo.

zhanghaohit · 2020-07-17T02:13:53Z

Note that I've also uploaded the Pynq bitstream given the ISA changes here: https://github.com/uwsampl/vta-distro/tree/master/bitstreams/pynq/0.0.2

It would be great @liangfu if you had the bandwidth to synthesize the DE10 bistream as well and submit a pull request. Let me know if you are tight on time and may need some help.

Finally, I will add an ultra96 variant in there as well. It would be great @zhanghaohit if you could also pre-generate the Arria10 bitstream and add it to the vta-distro repo.

Thanks @tmoreau89 for the help. For the Arria10 bitstream, because the fpga device we are currently using is from other vendors (not Intel), it is not easy to be used by others. We're trying to adapt to the fpga devices on AWS, which are more standard. After that, we can upload the aocx/bitstream.

zhanghaohit · 2020-07-17T02:16:06Z

@tmoreau89 @vegaluisjose @huajsj @pasqoc
Any comments are welcome. Thanks.

liangfu · 2020-07-17T03:55:26Z

Thanks for the update @zhanghaohit , please rebase upon latest master branch and resolve the conflicts.

tmoreau89 · 2020-07-17T15:55:24Z

@zhanghaohit thanks for the follow up. I recommend to echo @liangfu to rebase so the PR can be in a mergeable state. In addition, if the target is a custom FPGA board, I think we can leave it as a generic target for now. Having it ported to AWS F1 should be a great addition. I understand having setup instructions at this time would not make the most sense.

tmoreau89 · 2020-07-17T15:56:14Z

3rdparty/aoclutils/opencl.cc

@@ -0,0 +1,555 @@
+// Copyright (C) 2013-2018 Altera Corporation, San Jose, California, USA. All rights reserved.


@tqchen will this copyright work with Apache licensing rules? this will end up in 3rd party

vta/python/vta/transform.py

tmoreau89 · 2020-07-17T21:02:53Z

python/tvm/autotvm/measure/measure_methods.py

@@ -186,6 +189,16 @@ def __init__(self,
                 timeout=10, n_parallel=None,
                 number=4, repeat=3, min_repeat_ms=0, cooldown_interval=0.1,
                 check_correctness=False):
+        static_tune = os.getenv("TVM_STATIC_TUNE_EXPERIMENTAL")


Can you elaborate on what TVM_STATIC_TUNE_EXPERIMENTAL is being used for? I'm a little weary of inserting additional execution modes in AutoTVM dictated by environmental variables.

(1) we either add additional flags in AutoTVM config
(2) we clean this experimental mode if it's not really necessary for the OpenCL variant of VTA

TVM_STATIC_TUNE_EXPERIMENTAL is used for static auto tune. I think as you advised, we can split the PR into separate PRs. For static auto tune, the interface is not well nicely done. I think maybe we remove this part first.

tmoreau89 · 2020-07-17T21:04:04Z

python/tvm/relay/op/strategy/generic.py

@@ -323,8 +331,10 @@ def compute_conv2d_transpose(attrs, inputs, out_dtype):
        out = topi_compute(
            inputs[0], inputs[1], strides, padding, out_dtype)
        output_padding = get_const_tuple(attrs.output_padding)
-        out = topi.nn.pad(out, [0, 0, 0, 0],
-                          [0, 0, output_padding[0], output_padding[1]])
+        if output_padding[0] != 0 or output_padding[1] != 0:


Which operators was this breaking on previously?

This will cause errors when running dcgan on vta. There are two issues here:

if output_padding are all 0, no need to do pad

if the size of out.shape is not 4, it will cause problems.

This will cause errors when running dcgan on vta. There are two issues here:

if output_padding are all 0, no need to do pad

if the size of out.shape is not 4, it will cause problems.

After rebase, this changes are not needed.

tmoreau89 · 2020-07-17T22:01:37Z

@zhanghaohit I took a second look at the changes in this PR, and I believe it would be best to break the PR down into separate, smaller PRs:

(1) modifications to VTA for OpenCL support (runtime, operator support etc.)
(2) quantization support for conv2d_transpose which is device agnostic
(3) modifications to device_annotation.cc
(4) once (3) is merged and approved, updating GraphPack pass to use device annotation

Let me know what you think. This will make the PRs easier to review and merge.

tmoreau89

Please consider breaking down the PR down into smaller PRs.

tmoreau89 · 2020-07-17T22:04:28Z

I'm thinking that the static auto-tune using profiling results can also be separated out in a standalone PR

zhanghaohit · 2020-07-23T07:27:37Z

@zhanghaohit I took a second look at the changes in this PR, and I believe it would be best to break the PR down into separate, smaller PRs:

(1) modifications to VTA for OpenCL support (runtime, operator support etc.)
(2) quantization support for conv2d_transpose which is device agnostic
(3) modifications to device_annotation.cc
(4) once (3) is merged and approved, updating GraphPack pass to use device annotation

Let me know what you think. This will make the PRs easier to review and merge.

Thanks @tmoreau89 for the suggestion. I've split into 3 small PRs here:
#6124
#6125
#6126

And remove the unnecessary code that are not very related to OpenCL support.

Thanks.

Li Jiashu and others added 30 commits June 15, 2020 09:48

Added support of Intel OpenCL for FPGA devices

004b759

put resnet18 middle layers to run on vta

3e51e49

- add load acc_int8 in simulation - remove copy op - add vta schedule - add always 32-bits

adapt to the code base

082f64e

auto device_copy feature for vta

b6bc82a

bugfix for AddDeviceCopy pass; add Mul for vta simulation

ef153e2

intelfocl support in samples

87461d1

sync all insts and uops in one batch

bd79e83

support for static auto-tune

f8eaef9

update cost calculation formula

82cbd4f

bugfix for vta add schedule

a810b85

bugfix for insn buffer overflow

3a8e244

tune vta relay refine

5c7ead7

separate cost function from general method_methods

cc96cbb

vta mobilenetG prediction script

d880b3b

quickfix for auto-tune segfault

f80c3e0

add dcgan support (simulation)

dadf045

make sync in batch as an option

bb3dc0e

quickfix for buffer overflow

cb46477

bugfix for allocated_ destructor order

4ede466

refine device annotation

4f375d5

auto-tune for vta alu ops

d16d5ec

bugfix: make get_workload consistent with master_op selection

a752638

some fixes after rebase with master

1b7aa58

update vta-hw commit

127ae4a

Rename VTA_MEM_ID_ACC_8 to VTA_MEM_ID_ACC_8BIT

a6cd975

back-compatible other vta hardware impl

06af08b

update vta-hw commit

0855a4a

update vta-hw commit

6397792

remove unneeded code

e43981f

refine graphpack and deploy exp

d699384

Move AOCLUtils from Intel FPGA into 3rdparty directory

b6c1763

tmoreau89 reviewed Jul 17, 2020

View reviewed changes

vta/python/vta/transform.py Outdated Show resolved Hide resolved

tmoreau89 reviewed Jul 17, 2020

View reviewed changes

tmoreau89 requested changes Jul 17, 2020

View reviewed changes

tmoreau89 requested a review from liangfu July 17, 2020 22:03

zhanghaohit added 3 commits July 18, 2020 22:53

merge from master

2075649

remove unnecessary comment

c0f918c

api to program intelfocl aocx

348fb91

This was referenced Jul 23, 2020

[Relay] change device annotation from post DFS to recursive #6124

Merged

[VTA][OpenCL] add device_annot support in graphpack #6125

Merged

[VTA][OpenCL] intelfocl #6126

Merged

tqchen closed this Oct 11, 2020

zhanghaohit deleted the feature/opencl branch December 17, 2020 01:59

zhanghaohit restored the feature/opencl branch December 17, 2020 01:59

zhanghaohit deleted the feature/opencl branch December 17, 2020 01:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[VTA][OpenCL] Cloud FPGA support #5842

[VTA][OpenCL] Cloud FPGA support #5842

zhanghaohit commented Jun 18, 2020 •

edited

Loading

tmoreau89 commented Jun 23, 2020

tmoreau89 commented Jun 26, 2020

tmoreau89 commented Jun 26, 2020

zhanghaohit commented Jun 29, 2020

tmoreau89 commented Jun 30, 2020

tmoreau89 commented Jul 3, 2020

zhanghaohit commented Jul 17, 2020 •

edited

Loading

zhanghaohit commented Jul 17, 2020

liangfu commented Jul 17, 2020 •

edited

Loading

tmoreau89 commented Jul 17, 2020

tmoreau89 Jul 17, 2020

tmoreau89 Jul 17, 2020

zhanghaohit Jul 18, 2020

tmoreau89 Jul 17, 2020

zhanghaohit Jul 18, 2020

zhanghaohit Jul 18, 2020

tmoreau89 commented Jul 17, 2020

tmoreau89 left a comment

tmoreau89 commented Jul 17, 2020

zhanghaohit commented Jul 23, 2020

		@@ -0,0 +1,555 @@
		// Copyright (C) 2013-2018 Altera Corporation, San Jose, California, USA. All rights reserved.

[VTA][OpenCL] Cloud FPGA support #5842

[VTA][OpenCL] Cloud FPGA support #5842

Conversation

zhanghaohit commented Jun 18, 2020 • edited Loading

tmoreau89 commented Jun 23, 2020

tmoreau89 commented Jun 26, 2020

tmoreau89 commented Jun 26, 2020

zhanghaohit commented Jun 29, 2020

tmoreau89 commented Jun 30, 2020

tmoreau89 commented Jul 3, 2020

zhanghaohit commented Jul 17, 2020 • edited Loading

zhanghaohit commented Jul 17, 2020

liangfu commented Jul 17, 2020 • edited Loading

tmoreau89 commented Jul 17, 2020

tmoreau89 Jul 17, 2020

Choose a reason for hiding this comment

tmoreau89 Jul 17, 2020

Choose a reason for hiding this comment

zhanghaohit Jul 18, 2020

Choose a reason for hiding this comment

tmoreau89 Jul 17, 2020

Choose a reason for hiding this comment

zhanghaohit Jul 18, 2020

Choose a reason for hiding this comment

zhanghaohit Jul 18, 2020

Choose a reason for hiding this comment

tmoreau89 commented Jul 17, 2020

tmoreau89 left a comment

Choose a reason for hiding this comment

tmoreau89 commented Jul 17, 2020

zhanghaohit commented Jul 23, 2020

zhanghaohit commented Jun 18, 2020 •

edited

Loading

zhanghaohit commented Jul 17, 2020 •

edited

Loading

liangfu commented Jul 17, 2020 •

edited

Loading