If you use Rosetta in your research, please cite our FPGA'18 paper:
@article{zhou-rosetta-fpga2018,
title = "{Rosetta: A Realistic High-Level Synthesis Benchmark Suite for
Software-Programmable FPGAs}",
author = {Yuan Zhou and Udit Gupta and Steve Dai and Ritchie Zhao and
Nitish Srivastava and Hanchen Jin and Joseph Featherston and
Yi-Hsiang Lai and Gai Liu and Gustavo Angarita Velasquez and
Wenping Wang and Zhiru Zhang},
journal = {Int'l Symp. on Field-Programmable Gate Arrays (FPGA)},
month = {Feb},
year = {2018},
}
Rosetta is a set of realistic benchmarks for software programmable FPGAs. It contains six fully-developed applications from machine learning and image/video processing domains, where each benchmark consists multiple compute kernels that expose diverse sources of parallelism. These applications are developed under realistic design constraints, and are optimized at both kernel-level and application-level with the advanced features of HLS tools to meet these constraints. As a result, Rosetta is not only a practical benchmark suite for the HLS community, but also a design tutorial on how to build application-specific FPGA accelerators with state-of-the-art HLS tools and optimizations. We will continue to include more applications and optimize existing benchmarks.
For each Rosetta benchmark, we provide an unoptimized software version which does not use any HLS-specific optimization, and optimized versions targeting cloud and embedded FPGA platforms. Rosetta currently supports Xilinx SDx 2017.1, which combines the previous Xilinx SDAccel and Xilinx SDSoC development environments. SDAccel is used for cloud FPGA platforms, and SDSoC is used for embedded FPGA platforms. Our designs have been tested on the AWS f1.2xlarge instance and a local ZC706 evaluation kit. Major results are as follows. For more results please refer to our FPGA'18 paper.
Benchmark | #LUTs | #FFs | #BRAMs | #DSPs | Runtime (ms) | Throughput |
---|---|---|---|---|---|---|
3D Rendering | 8893 | 12471 | 48 | 11 | 4.7 | 213 frames/s |
Digit Recognition1 | 41238 | 26468 | 338 | 1 | 10.6 | 189k digits/s |
Spam Filtering2 | 12678 | 22134 | 69 | 224 | 60.8 | 370k samples/s |
Optical Flow | 42878 | 61078 | 54 | 454 | 24.3 | 41.2 frames/s |
BNN3 | 46899 | 46760 | 102 | 4 | 4995.2 | 200 images/s |
Face Detection | 62688 | 83804 | 121 | 79 | 33.0 | 30.3 frames/s |
1: K=3, PAR_FACTOR
=40.
2: Five epochs, PAR_FACTOR
=32, VDWIDTH
=64.
3: Eight convolvers, 1000 test images.
Benchmark | #LUTs | #FFs | #BRAMs | #DSPs | Runtime (ms) | Throughput | Performance-cost Ratio |
---|---|---|---|---|---|---|---|
3D Rendering | 6763 | 7916 | 36 | 11 | 4.4 | 227 frames/s | 496k frames/$ |
Digit Recognition1 | 39971 | 33853 | 207 | 0 | 11.1 | 180k digits/s | 393M digits/$ |
Spam Filtering2 | 7207 | 17434 | 90 | 224 | 25.1 | 728k samples/s | 1.6G samples/$ |
Optical Flow | 38094 | 63438 | 55 | 484 | 8.4 | 119 frames/s | 260k frames/$ |
Face Detection | 48217 | 54206 | 92 | 72 | 21.5 | 46.5 frames/s | 101k frames/$ |
1: K=3, PAR_FACTOR
=40.
2: Five epochs, PAR_FACTOR
=32, VDWIDTH
=512.
- 3D rendering;
- Digit recognition;
- Spam filtering;
- Optical flow;
- Binarized neural network, adopted from our open-source BNN implementation;
- Face detection, adopted from our open-source Haar face detection implementation.
The harness
directory contains the wrapper code for OpenCL APIs, as well as the main makefile.
The src
directory contains the source code for CPU host function (host
), software implementation (sw
), sdsoc hardware function implementation (sdsoc
), and sdaccel hardware function implementation (ocl
).
Each benchmark has its own makefile specifying the paths to necessary source files.
The BNN
folder is currently a copy of the original BNN repo by Zhao et.al. For instructions on how to simulate and compile the design please refer to the README file inside the folder.
- Figure out your target platform. SDAccel only supports a limited number of platforms.
The code for your target platform can be found from the SDAccel user guide, or any other materials provided by the platform vendor.
SDAccel also supports using custom platforms which are not integrated yet.
A platform specification file (usually has the extension
.xpfm
) is needed to describe the target platform. - Go into any benchmark folder.
- To compile for software emulation and get a quick latency estimate, do
make ocl OCL_TARGET=sw_emu
. The reportsystem_estimate.xtxt
shows latency and resource estimate after high-level synthesis. If only a software model is needed, comment out--report estimate
from the local makefile. Compilation time will significantly decrease. - To compile for hardware emulation, do
make ocl OCL_TARGET=hw_emu
. - To compile for bitstream and actually execute on the board, do
make ocl OCL_TARGET=hw
. - Target platform can be specified with the
OCL_DEVICE
variable. Default is Alpha Data 7v3 board. For example, to target the Alpha Data KU3 board and generate bitstream, domake ocl OCL_TARGET=hw OCL_DEVICE=xilinx:adm-pcie-ku3:2ddr-xpr:4.0
. To use a custom platform, specify its path with theOCL_PLATFORM
variable. For example, to generate bitstream for a custom platform, domake ocl OCL_TARGET=hw OCL_PLATFORM=<path_to_custom_platform_xfpm_file>
. Also remember to change the target device string inhost/typedefs.h
. - To run simulation, please run
make emu_setup OCL_PLATFORM=<path_to_custom_platform_xfpm_file>
to create the.json
file used by the Xilinx OpenCL runtime. Then, set theXCL_EMULATION_MODE
environment variable tosw_emu
if you want to run software simulation, orhw_emu
for hardware simulation. More details can be found from the Xilinx SDx Command and Utility Reference Guide (UG1279). - For instructions on how to run the applications, please refer to the READMEs in the benchmark folders.
After finishing the required setup steps on AWS, follow above steps with following differences:
- Use the option
OCL_PLATFORM=$AWS_PLATFORM
. The environment variableAWS_PLATFORM
specifies the location of the AWS platform file. - In
host/typedefs.h
setTARGET_DEVICE = "xilinx:aws-vu9p-f1:4ddr-xpr-2pr:4.0"
. - When running the application, choose the
.awsxclbin
bitstream file instead of.xclbin
.
- Go into any benchmark folder.
- Do
make sdsoc
. - The target platform is now hard-coded in the makefiles. All benchmarks currently target the ZC706 platform.
- Go into any benchmark folder.
- Do
make sw
.
Please refer to the README files in the corresponding application folder for instructions.
Our repo now supports the latest version of the AWS FPGA AWI (version 1.7.0). Please try it out. Bug reports are welcome.