Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add example accelerator using HLS #2056

Merged
merged 9 commits into from
Sep 30, 2024
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 0 additions & 23 deletions .github/workflows/chipyard-run-tests.yml
abejgonzalez marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
Expand Up @@ -972,29 +972,6 @@ jobs:
group-key: "group-accels"
project-key: "chipyard-compressacc"

chipyard-hlsacc-run-tests:
name: chipyard-hlsacc-run-tests
needs: prepare-chipyard-accels
runs-on: as4
steps:
- name: Delete old checkout
run: |
ls -alh .
rm -rf ${{ github.workspace }}/* || true
rm -rf ${{ github.workspace }}/.* || true
ls -alh .
- name: Checkout
uses: actions/checkout@v4
- name: Git workaround
uses: ./.github/actions/git-workaround
- name: Create conda env
uses: ./.github/actions/create-conda-env
- name: Run tests
uses: ./.github/actions/run-tests
with:
group-key: "group-accels"
project-key: "chipyard-hlsacc"


tracegen-boomv3-run-tests:
name: tracegen-boomv3-run-tests
Expand Down
7 changes: 1 addition & 6 deletions build.sbt
Original file line number Diff line number Diff line change
Expand Up @@ -158,7 +158,7 @@ lazy val chipyard = (project in file("generators/chipyard"))
dsptools, rocket_dsp_utils,
gemmini, icenet, tracegen, cva6, nvdla, sodor, ibex, fft_generator,
constellation, mempress, barf, shuttle, caliptra_aes, rerocc,
compressacc, saturn, ara, firrtl2_bridge, hls_accel)
compressacc, saturn, ara, firrtl2_bridge)
.settings(libraryDependencies ++= rocketLibDeps.value)
.settings(
libraryDependencies ++= Seq(
Expand Down Expand Up @@ -263,11 +263,6 @@ lazy val rocc_acc_utils = (project in file("generators/rocc-acc-utils"))
.settings(libraryDependencies ++= rocketLibDeps.value)
.settings(commonSettings)

lazy val hls_accel = (project in file("generators/hls-example"))
.dependsOn(rocketchip)
.settings(libraryDependencies ++= rocketLibDeps.value)
.settings(commonSettings)

lazy val tapeout = (project in file("./tools/tapeout/"))
.settings(chisel3Settings) // stuck on chisel3 and SFC
.settings(commonSettings)
Expand Down
37 changes: 26 additions & 11 deletions docs/Customization/Incorporating-HLS.rst
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also mention the TODO of how to connect up to an AXI interface.

Original file line number Diff line number Diff line change
Expand Up @@ -10,36 +10,50 @@ circuit to match a specification in a high level language like C.
Here, we will integrate an HLS-generated accelerator that computes
the Great Common Denominator (GCD) of two integers. This tutorial
builds on the sections :ref:`mmio-accelerators` and
:ref:`incorporating-verilog-blocks`. The code for this example can
be found in ``/generators/hls-example``
:ref:`incorporating-verilog-blocks`.

Adding an HLS project
---------------------------------------

In this tutorial, we use Vitis HLS, version 2023.2.
In this tutorial, we use Vitis HLS. The user guide for this tool
can be found at https://docs.amd.com/r/en-US/ug1399-vitis-hls.

Our project consists of 3 HLS files:
* C program of the GCD algorithm: ``accel/HLSAccel.cpp``
* Header file: ``accel/HLSAccel.hpp``
* TCL script to run Vitis HLS: ``run_hls.tcl``
* C program of the GCD algorithm: :gh-file-ref:`generators/chipyard/src/main/resources/hls/HLSAccel.cpp`
* TCL script to run Vitis HLS: :gh-file-ref:`generators/chipyard/src/main/resources/hls/run_hls.tcl`
* Makefile to run HLS and move verilog files: :gh-file-ref:`generators/chipyard/src/main/resources/hls/Makefile`

This example implements an iterative GCD algorithm, which is manually connected to
a TileLink register node in the ``HLSGCDAccel`` class in
:gh-file-ref:`generators/chipyard/src/main/scala/GCD.scala`.
HLS also supports adding AXI nodes to accelerators using compiler directives and
the HLS stream library. See the Vitis HLS user guide for AXI implementation information.

The HLS code is synthesized for a particular FPGA target, in this case,
an AMD Alveo U200. The target FPGA part is specified in ``run_hls.tcl`` using
the ``set_part command``. The clock period, used for design optimization purposes,
is also set in ``run_hls.tcl`` using the ``create_clock`` command.

To generate the verilog files, as well as synthesis reports, run:

.. code-block:: none

vitis_hls run_hls.tcl

The files can be found in a generated folder named proj\_\<your\_project\_name>,
in our case, ``proj_gcd_example``.

In our case, we include a ``Makefile`` to script running HLS. To generate the
verilog files using the Makefile, run:
In our case, we include a ``Makefile`` to run HLS and to move files to
their intended locations. To generate the verilog files using the Makefile, run:

.. code-block:: none

make

To delete the generated files, run:

.. code-block:: none

make clean

Creating the Verilog black box
Expand All @@ -49,21 +63,22 @@ Creating the Verilog black box
Please consult :ref:`incorporating-verilog-blocks` for background information
on writing a Verilog black box.

We use Scala to run ``make``, which runs HLS and copies the files into ``hls-example/src/main/resources/vsrc``.
We use Scala to run ``make``, which runs HLS and copies the files into :gh-file-ref:`generators/chipyard/src/main/resources/vsrc`.
Then, we add the path to each file. This code will execute during Chisel elaboration, conveniently handling
file generation for the user.

.. literalinclude:: ../../generators/hls-example/src/main/scala/example/HLSExample.scala
.. literalinclude:: ../../generators/chipyard/src/main/scala/example/GCD.scala
:language: scala
:start-after: DOC include start: HLS blackbox
:end-before: DOC include end: HLS blackbox

Running the example
---------------------------------------

To test if the accelerator works, use the test program in ``tests/gcd.c``.
To test if the accelerator works, use the test program in :gh-file-ref:`tests/gcd.c`.
Compile the program with ``make``. Then, run:

.. code-block:: none

cd sims/vcs
make run-binary CONFIG=HLSAcceleratorRocketConfig BINARY=../../tests/gcd.riscv
28 changes: 28 additions & 0 deletions generators/chipyard/src/main/resources/hls/HLSAccel.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
#ifndef _GCD_EX_H_
#define _GCD_EX_H_

#include <ap_int.h>

#define DATA_WIDTH 32

typedef ap_uint<DATA_WIDTH> io_t;

io_t HLSGCDAccelBlackBox(io_t x, io_t y) {
io_t tmp;
io_t gcd;

tmp = y;
gcd = x;

while(tmp != 0) {
if (gcd > tmp) {
gcd = gcd - tmp;
} else {
tmp = tmp - gcd;
}
}

return gcd;
}

#endif
21 changes: 21 additions & 0 deletions generators/chipyard/src/main/resources/hls/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
base_dir=$(abspath ../../../..)
hls_dir=$(abspath .)
hls_vlog_gendir=$(hls_dir)/proj_gcd_example/solution1/syn/verilog
vsrc_dir=$(base_dir)/src/main/resources/vsrc

.PHONY: default run-hls clean

HLS_CMD = vitis_hls
TCL_SCRIPT = run_hls.tcl
ACCEL_C = HLSAccel.cpp

default: run-hls

run-hls: $(ACCEL_C) $(TCL_SCRIPT)
$(HLS_CMD) $(TCL_SCRIPT)
cp -r $(hls_vlog_gendir)/. $(vsrc_dir)

clean:
rm -rf $(hls_dir)/proj_gcd_example
rm -f $(hls_dir)/vitis_hls.log
rm -f $(vsrc_dir)/HLSGCDAccelBlackBox*
11 changes: 11 additions & 0 deletions generators/chipyard/src/main/resources/hls/run_hls.tcl
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
open_project -reset proj_gcd_example
add_files HLSAccel.cpp
set_top HLSGCDAccelBlackBox
open_solution -reset "solution1"

# Specify FPGA board and clock frequency
set_part {xcu200-fsgd2104-2-e}
create_clock -period 10

csynth_design
exit
1 change: 0 additions & 1 deletion generators/chipyard/src/main/scala/DigitalTop.scala
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we change the naming to something like CanHavePeripheryHLSGCD. Ditto for the config. fragment name.

Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,6 @@ class DigitalTop(implicit p: Parameters) extends ChipyardSystem
with fftgenerator.CanHavePeripheryFFT // Enables optionally having an MMIO-based FFT block
with constellation.soc.CanHaveGlobalNoC // Support instantiating a global NoC interconnect
with rerocc.CanHaveReRoCCTiles // Support tiles that instantiate rerocc-attached accelerators
with hlsaccel.CanHavePeripheryHLSAccel
{
override lazy val module = new DigitalTopModule(this)
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,11 @@ class GCDAXI4BlackBoxRocketConfig extends Config(
new chipyard.config.AbstractConfig)
// DOC include end: GCDAXI4BlackBoxRocketConfig

class GCDHLSRocketConfig extends Config(
new chipyard.example.WithHLSGCD ++
new freechips.rocketchip.rocket.WithNHugeCores(1) ++
new chipyard.config.AbstractConfig)

// DOC include start: InitZeroRocketConfig
class InitZeroRocketConfig extends Config(
new chipyard.example.WithInitZero(0x88000000L, 0x1000L) ++ // add InitZero
Expand Down Expand Up @@ -65,8 +70,3 @@ class ManyMMIOAcceleratorRocketConfig extends Config(
new chipyard.example.WithStreamingFIR ++ // use top with tilelink-controlled streaming FIR
new freechips.rocketchip.rocket.WithNHugeCores(1) ++
new chipyard.config.AbstractConfig)

class HLSAcceleratorRocketConfig extends Config(
new hlsaccel.WithHLSAccel ++
new freechips.rocketchip.rocket.WithNHugeCores(1) ++
new chipyard.config.AbstractConfig)
106 changes: 105 additions & 1 deletion generators/chipyard/src/main/scala/example/GCD.scala
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
package chipyard.example

import sys.process._

import chisel3._
import chisel3.util._
import chisel3.experimental.{IntParam, BaseModule}
Expand All @@ -17,7 +19,8 @@ case class GCDParams(
address: BigInt = 0x4000,
width: Int = 32,
useAXI4: Boolean = false,
useBlackBox: Boolean = true)
useBlackBox: Boolean = true,
useHLS: Boolean = false)
// DOC include end: GCD params

// DOC include start: GCD key
Expand All @@ -37,6 +40,18 @@ class GCDIO(val w: Int) extends Bundle {
val busy = Output(Bool())
}

class HLSGCDAccelIO(val w: Int) extends Bundle {
val ap_clk = Input(Clock())
val ap_rst = Input(Reset())
val ap_start = Input(Bool())
val ap_done = Output(Bool())
val ap_idle = Output(Bool())
val ap_ready = Output(Bool())
val x = Input(UInt(w.W))
val y = Input(UInt(w.W))
val ap_return = Output(UInt(w.W))
}

class GCDTopIO extends Bundle {
val gcd_busy = Output(Bool())
}
Expand Down Expand Up @@ -88,6 +103,23 @@ class GCDMMIOChiselModule(val w: Int) extends Module {
}
// DOC include end: GCD chisel

// DOC include start: HLS blackbox
class HLSGCDAccelBlackBox(val w: Int) extends BlackBox with HasBlackBoxPath {
val io = IO(new HLSGCDAccelIO(w))

val chipyardDir = System.getProperty("user.dir")
val hlsDir = s"$chipyardDir/generators/chipyard"

// Run HLS command
val make = s"make -C ${hlsDir}/src/main/resources/hls default"
require (make.! == 0, "Failed to run HLS")

// Add each vlog file
addPath(s"$hlsDir/src/main/resources/vsrc/HLSGCDAccelBlackBox.v")
addPath(s"$hlsDir/src/main/resources/vsrc/HLSGCDAccelBlackBox_flow_control_loop_pipe.v")
}
// DOC include end: HLS blackbox

// DOC include start: GCD router
class GCDTL(params: GCDParams, beatBytes: Int)(implicit p: Parameters) extends ClockSinkDomain(ClockSinkParameters())(p) {
val device = new SimpleDevice("gcd", Seq("ucbbar,gcd"))
Expand Down Expand Up @@ -190,6 +222,64 @@ class GCDAXI4(params: GCDParams, beatBytes: Int)(implicit p: Parameters) extends
}
// DOC include end: GCD router

class HLSGCDAccel(params: GCDParams, beatBytes: Int)(implicit p: Parameters) extends ClockSinkDomain(ClockSinkParameters())(p) {
val device = new SimpleDevice("hlsgcdaccel", Seq("ucbbar,hlsgcdaccel"))
val node = TLRegisterNode(Seq(AddressSet(params.address, 4096-1)), device, "reg/control", beatBytes=beatBytes)

override lazy val module = new HLSGCDAccelImpl
class HLSGCDAccelImpl extends Impl with HasGCDTopIO {
val io = IO(new GCDTopIO)
withClockAndReset(clock, reset) {
val x = Reg(UInt(params.width.W))
val y = Wire(new DecoupledIO(UInt(params.width.W)))
val y_reg = Reg(UInt(params.width.W))
val gcd = Wire(new DecoupledIO(UInt(params.width.W)))
val gcd_reg = Reg(UInt(params.width.W))
val status = Wire(UInt(2.W))

val impl = Module(new HLSGCDAccelBlackBox(params.width))

impl.io.ap_clk := clock
impl.io.ap_rst := reset

val s_idle :: s_busy :: Nil = Enum(2)
val state = RegInit(s_idle)
val result_valid = RegInit(false.B)
when (state === s_idle && y.valid) {
state := s_busy
result_valid := false.B
y_reg := y.bits
} .elsewhen (state === s_busy && impl.io.ap_done) {
state := s_idle
result_valid := true.B
gcd_reg := impl.io.ap_return
}

impl.io.ap_start := state === s_busy

gcd.valid := result_valid
status := Cat(impl.io.ap_idle, result_valid)

impl.io.x := x
impl.io.y := y_reg
y.ready := impl.io.ap_idle
gcd.bits := gcd_reg

io.gcd_busy := !impl.io.ap_idle

node.regmap(
0x00 -> Seq(
RegField.r(2, status)), // a read-only register capturing current status
0x04 -> Seq(
RegField.w(params.width, x)), // a plain, write-only register
0x08 -> Seq(
RegField.w(params.width, y)), // write-only, y.valid is set on write
0x0C -> Seq(
RegField.r(params.width, gcd))) // read-only, gcd.ready is set on read
}
}
}

// DOC include start: GCD lazy trait
trait CanHavePeripheryGCD { this: BaseSubsystem =>
private val portName = "gcd"
Expand All @@ -210,6 +300,15 @@ trait CanHavePeripheryGCD { this: BaseSubsystem =>
TLFragmenter(pbus.beatBytes, pbus.blockBytes, holdFirstDeny = true) := _
}
gcd
} else if (params.useHLS) {
// val gcd = LazyModule(
// if (params.useHLS) new HLSGCDAccel(params, pbus.beatBytes)(p)
// else new GCDTL(params, pbus.beatBytes)(p)
// )
abejgonzalez marked this conversation as resolved.
Show resolved Hide resolved
val gcd = LazyModule(new HLSGCDAccel(params, pbus.beatBytes)(p))
gcd.clockNode := pbus.fixedClockNode
pbus.coupleTo(portName) { gcd.node := TLFragmenter(pbus.beatBytes, pbus.blockBytes) := _ }
gcd
} else {
val gcd = LazyModule(new GCDTL(params, pbus.beatBytes)(p))
gcd.clockNode := pbus.fixedClockNode
Expand All @@ -233,3 +332,8 @@ class WithGCD(useAXI4: Boolean = false, useBlackBox: Boolean = false) extends Co
case GCDKey => Some(GCDParams(useAXI4 = useAXI4, useBlackBox = useBlackBox))
})
// DOC include end: GCD config fragment

// useHLS cannot be used with useAXI4 and useBlackBox
class WithHLSGCD extends Config((site, here, up) => {
case GCDKey => Some(GCDParams(useAXI4 = false, useBlackBox = false, useHLS = true))
})
abejgonzalez marked this conversation as resolved.
Show resolved Hide resolved
Loading
Loading