Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP][VTA] Support Intel FPGA in VTA #1694

Closed
wants to merge 87 commits into from
Closed

Conversation

liangfu
Copy link
Member

@liangfu liangfu commented Sep 7, 2018

This is an initial working in progress port of HLS based instruction design for Intel FPGA.

Please refer to RFC #1656 for more details.

Copy link
Contributor

@tmoreau89 tmoreau89 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @liangfu, this is a promising start. A couple points: I think it might be sufficient to rename the path from intel_fpga to intel simply since hardware implies under this path FPGA hardware. In addition, there are leftover files that were copied which you can safely remove, such as compile_designs.py (which should also be eliminated from the main branch), vivado.tcl/hsi.tcl/hls.tcl which are all xilinx-specific.

@tmoreau89
Copy link
Contributor

Also I just ordered a couple DE10-nano boards so I can test out the Intel FPGA backend (I should have those by early next week).

@liangfu Our first step should be to generate a complete VTA design with all of the HLS modules, connected via FIFOs, BRAMs, and Avalon bus to the memory controller / ACP port of the ARM SoC. From there on we can build unit tests in C to test basic functionality such as data transfer, and single tensor ops.

@liangfu
Copy link
Member Author

liangfu commented Sep 12, 2018

@tmoreau89 the latest commit have safely remove those xilinx-specfic script files, and all the instruction components as well as the simulation program have been successfully compiled (not functional yet). I'm on the way to debug the modules and make it functional under simulation mode. However, would you kindly provide full debug log that is functional? This would be helpful to checkout the errors in migrating Xilinx HLS based implement to Intel HLS.

In the mean time, I agree what our first step should be at this stage.

@tmoreau89
Copy link
Contributor

@liangfu I see, what you would like are unit tests for each of the HLS modules to test the functions in isolation? I've been planning to do a simulation infrastructure revamp, but this could take a few days. In the meantime, can you reproduce the simulations using the xilinx toolchains?

There's some guidance on how to run the simulation test. You can turn a DEBUG flag before compiling the design, or insert your own printf statements to obtain a more detailed trace. Let me know if you run into problems.

@liangfu
Copy link
Member Author

liangfu commented Sep 13, 2018

@tmoreau89 I didn't expect HLS modules tests in isolation, instead, I'm currently running into existing simulation infrastructure. Thanks to your debugging guidance, I've just installed Xilinx toolchains and started to compare the output results, which is helpful in debugging into the migrated version. Good news is that I have successfully migrated ALU modules in simulation mode. However, as Intel HLS don't support volatile in ac_int copy constructor, I've remove volatile keywords everywhere in the code for now.

@tmoreau89
Copy link
Contributor

@liangfu thanks for the update, I'm glad you've been able to test the compute module. Volatile may not be necessary for the Intel toolchains - it was necessary for Vivado since simulation would not behave correctly if the volatile keyword wasn't specified. That being said, I don't think it affected the behavior of the synthesized hardware.

@liangfu
Copy link
Member Author

liangfu commented Sep 18, 2018

@tmoreau89 I've successfully performed gemm in simulation lately, and cleaned up unused code. However, when I generate hardware with the same design, there is a small section that constantly causes hardware generation failure:

// Store to accum memory/store buffer         
if (alu_opcode == VTA_ALU_OPCODE_MIN ||       
    alu_opcode == VTA_ALU_OPCODE_MAX) {       
  acc_mem[dst_idx][i] = cmp_res;              
  out_mem[dst_idx][i] = short_cmp_res;        
} else if (alu_opcode == VTA_ALU_OPCODE_ADD) {
  acc_mem[dst_idx][i] = add_res;              
  out_mem[dst_idx][i] = short_add_res;        
} else if (alu_opcode == VTA_ALU_OPCODE_SHR) {
  acc_mem[dst_idx][i] = shr_res;              
  out_mem[dst_idx][i] = short_shr_res;        
}                                             

The debug level compilation error reports:

Optimizing component(s) and generating Verilog files
PHINode should have one entry for each predecessor of its parent basic block!
  %cmp_res.0.0.0.7 = phi i512 [ %cmp_res.0.0.0.11675, %if.else275 ], [ %cmp_res.0.0.0.11675, %if.else275 ], [ %cmp_res
.0.0.0.11675, %if.then123 ], [ %cmp_res.0.0.0.11675, %if.end431.loopexit ], [ %or.i.i220, %if.end431.loopexit35528 ], 
!dbg !12755
Broken module found, compilation aborted!
0  libLLVM-3.0.so  0x00007fe98ce8532f
1  libLLVM-3.0.so  0x00007fe98ce872a2
2  libpthread.so.0 0x00007fe98c38c330
3  libc.so.6       0x00007fe98b3a3c37 gsignal + 55
4  libc.so.6       0x00007fe98b3a7028 abort + 328
5  libLLVM-3.0.so  0x00007fe98dbd9446
6  libLLVM-3.0.so  0x00007fe98dbb75ef llvm::FPPassManager::runOnFunction(llvm::Function&) + 527
7  libLLVM-3.0.so  0x00007fe98dbb7750 llvm::FPPassManager::runOnModule(llvm::Module&) + 80
8  libLLVM-3.0.so  0x00007fe98dbb7111 llvm::MPPassManager::runOnModule(llvm::Module&) + 577
9  libLLVM-3.0.so  0x00007fe98dbb72bb llvm::PassManagerImpl::run(llvm::Module&) + 187
10 aocl-opt        0x00000000004194dd main + 4765
11 libc.so.6       0x00007fe98b38ef45 __libc_start_main + 245
12 aocl-opt        0x000000000040ccc9
Stack dump:
0.  Program arguments: /DATA2/liangfu/intelFPGA_lite/18.0/hls/linux64/bin/aocl-opt -HLS --grif --soft-elementary-math=

If we can ignore this section temporarily, the generated hardware looks fine. Here is the estimated resource allocation with current hardware design (targeting DE10-Nano):

Component NameALUTs FFs RAMs DSPs
  compute509814972028556
  fetch1326104740
  load3694918013850
  store41015355400
  Total93357 (85%)74135 (34%)414 (81%)56 (50%)
  Available109572219144514112

@tmoreau89
Copy link
Contributor

@liangfu thank you for the update, this is looking promising. Would you mind summarizing the commands needed to run the synthesis and simulation for your WIP HLS modules with the Intel toolchains?

@liangfu
Copy link
Member Author

liangfu commented Sep 18, 2018

Just enable MODE=sim in Makefile, it would use i++ to compile the HLS modules. Ob the other hand, I'm a bit worried about how to drive the generated hardware in software. I'm not quite familiar with this at the moment.

@nhynes nhynes self-requested a review April 9, 2019 16:11
@nhynes nhynes dismissed their stale review April 9, 2019 16:12

stale

@liangfu
Copy link
Member Author

liangfu commented Apr 10, 2019

@nhynes This PR is still WIP. I would reply to your comments one-by-one, update the requested changes, and request for another round of review when I think this is ready.

@liangfu
Copy link
Member Author

liangfu commented May 30, 2019

There seem to be a concurrent effort at #3258, closing this PR for now.

@liangfu liangfu closed this May 30, 2019
@liangfu liangfu mentioned this pull request Jun 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants