-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Hardware][OpenCL] Intelfocl support #9
Conversation
- add mul, load_int8 - some bugfix for bits width
Rename VTA_MEM_ID_ACC_8 to VTA_MEM_ID_ACC_8BIT
CC-ing @vegaluisjose @huajsj @pasqoc |
config/vta_cost.py
Outdated
factor = 1000000.0 | ||
def alu_imm_cost(iter_out, iter_in, uop_bgn, uop_end): | ||
x = (uop_end - uop_bgn) * iter_out * iter_in | ||
cycles = x + 46 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How are these costs generally derived? Are they FPGA device specific? If we change the compilation parameters in the OpenCL compiler, could it affect these cost models?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Those costs are FPGA and configuration specific, and we need to profile/measure the values from the actual hardware.
I believe the values are provided here as an example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I think it would be good to define constants here, and add a comment that states that these are set to arbitrarily (e.g. 46).
I'm working with a colleague on making easier to perform some of these cycle accurate co-simulation of VTA modules which should help us derive these costs more accurately on a per-VTA variant basis. In the meantime, can we use constant names, e.g. ALU_LATENCY_CYCLES = 46
Thanks @zhanghaohit for the PR! I left a couple questions, but will need to review in more depth. One question: have you tested it on the Pynq board to see if some changes may affect the existing design? Given that the ISA has changed slightly (some fields are now wider, like VTA_MEMOP_ID_BIT_WIDTH), it may affect some parameterizations of VTA. |
Thanks @tmoreau89 for the comments. Yes. We have tested on the ultra96 board (I only have ultra96 board on hand). It works. The only thing we have to change is to re-compile the bitstream with the new ISA. I think we have to update the pre-compiled bitstream here https://github.com/uwsampl/vta-distro/tree/master/bitstreams, right? |
That is correct, we'll need to bump the versioning of the bitstream |
@tmoreau89 @liangfu any other comments? Thanks. |
src/intelfocl/intelfocl_device.cc
Outdated
|
||
int IntelFOCLDevice::init(size_t mem_size, std::string aocx_file) | ||
{ | ||
cl_int status; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please ensure the source code pass cpplint rules. Specifically, we use 2 spaces for indentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. Thanks. We will update that.
config/intelfocl_sample.json
Outdated
@@ -0,0 +1,13 @@ | |||
{ | |||
"TARGET" : "intelfocl", | |||
"HW_VER" : "0.0.1", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might need to update this version number, since the HW ISA has changed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. Let's bump it up to 0.0.2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please apply the 0.0.2 bump thanks
@@ -0,0 +1,341 @@ | |||
#pragma OPENCL EXTENSION cl_intel_channels: enable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious, as I remember this PR is going to work for ultra96, does this extension work for Xilinx toolchain to compile the hardware design? I might misunderstood somewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code that work for ultra96 is still inside hardware/xilinx
Thanks for the changes. Please apply the 0.0.2 and rename the vta target to something more specific, e.g. "arria10". Also there are some CI errors related to linting that could be addressed. Thanks! |
Thank you very much! Sure. We will apply the 0.0.2 and address the linting errors. |
Thanks. The linting error here is due to filetype checking. I think we have to add the opencl filetype to the lint script. I've created a PR here: apache/tvm#6092 |
Just my humble opinion, given that both "Intel OpenCL for FPGA" and "VTA" requires a large amount of logic utilization, many Cyclone V chips that supports AOCL couldn't get this compiled because of the hardware utilization issue. As a side note, the reason we are taking these efforts in building open source projects, in part, we are hoping someone in the community could reproduce what we have done, and could easily start to build something that is even better. With the target being defined too board, an potential grad student could fail to reproduce the result, since not all the student could easily purchase a board with Stratix 10, and low cost Cyclone V boards couldn't get this running. In addition, they're wasting large amount of valuable logic resources, even they could afford a board with Stratix 10. Therefore, we should specify a precise target device for the vta_config. |
Thank you for your reply. However, could you explain more on the reason why the design shall not work on Cyclone V FPGAs that supports Intel OpenCL for FPGA? Precisely as you mentioned, the VTA design is versatile, the user could always change the settings (i.e. LOG_BLOCK and LOG_*_BUF_SIZE) to adjust the resource usage in order to fit their own FPGA boards. Surely a low cost Cyclone V device could not support 64x64 GEMV cores like large Stratix 10 FPGAs do. But the user could always try to use 16x16 or even 4x4 GEMV cores, by setting the LOG_BLOCK lower. Considering that, we used a relatively small default setting for LOG_BLOCK (4) and Buffers(15, 15, 18, 17). Thus the design should be able to fit into FPGAs comparable to the original Zynq/Zedboard platforms. We must admit that we don't have those Cyclone V boards on hand, nor has the design been tested on those platforms. However, if there is any issue on compiling the design for a AOCL-compatibale cyclone V board, we will be more than happy to investigate and try to solve the issue together. In terms of accessibility, we know that high-end could FPGA cards are very expensive. The good news is that nowadays there are many Cloud Service Providers available offering high-end FPGA instances! Those FPGA instances generally only cost few bucks for hour's usage. In addition, we are also working on porting the design over to Amazon EC2 F1 instances (Xilinx SDAccel). We will update again when we finish testing on the Amazon platforms. |
Hi @tmoreau89, We have completed the changes listed, and we have also included an README.rst file as a preliminary installation guide. Please let us know if more changes are required. Thank you very much! |
71d55c5
to
0d967d5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@remotego @zhanghaohit thank you for making the requested changes and adding the README.rst, it reads very well.
@liangfu please approve/request changes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LTGM
ci tests failed on tsim seems to be unrelated, @zhanghaohit do you mind retrigger ci to see if it passes? (It was successful previously.) |
@liangfu Thanks for the comments and sorry for my late reply. I tried to trigger the ci again, but it failed at the same place. I tried to run |
@zhanghaohit sorry for the late reply. I think that the CI issue is due to the changes in the ISA; we are relying on essentially tests that assume the older ISA, therefore breaking unit tests here. The ISA rarely changes, so we didn't set up the tests to account for changes in ISA. |
@zhanghaohit what I'd like to suggest is that we temporarily disable the unit tests that are failing and re-enable them once we've update TVM since there's a bit of a circular dependence to make the tests work Please comment out the test in |
Thanks @tmoreau89 for the suggestion. I cannot find the test Now the CI has passed. Could you help check? And thanks for the help to re-enable the tsim test after all are merged. |
Thank you @zhanghaohit, @remotego, @liangfu , the PR is merged! |
This PR relates to this RFC.
Main changes are:
This is a basic version as a POC, without much performance optimization. We're still optimizing some parts, and will be submitted as a new PR later.