The Tridsolver-FPGA Library provides high-throughput implementations of multiple multi-dimensional tridiagonal system solvers on FPGAs. The libray is based on the inexpensive Thomas algorithm with batching of multiple systems for solving smaller and medium sized systems and hybrid Thomas_PCR and Thomas_Thomas algorithms to solve larger systems. The HLS Techniques used to implement the Libray and data path for 3D ADI applications can be found here. The library currentry supports Xilinx and Intel FPGA devices and have been tested on Xilinx Alveo U280, Alveo U50 cards and Intel PAC D5005. The library and performance results are currenty under review for publication.
The library has been used to implement the 2D and 3D Heat diffusion application using FP32 and FP64 arithmetic. The implementation supports the batched computation of systems. The /FPGA/Xilinx
directory consists the following varients of these applications targetting Xilinx FPGAs. Library and applications are implemented using C++ for Vivado.
ADI2D_F32 | 2D ADI application using FP32 |
ADI2D_F32 | 2D ADI application using FP64 |
ADI3D_F32 | 3D ADI application using FP32 |
ADI3D_F32 | 3D ADI application using FP64 |
ADI2D_TH_TH_F32 | 2D ADI application with Tiled Thomas-Thomas solver using FP32 |
ADI2D_THPCR_F32 | 2D ADI application with Tiled Thomas-PCR solver using FP32 |
/FPGA/Intel
directory consits the batched thomas solver libray, Data path library and 2D ADI application using FP32 arithmetics targetting intel FPGAs. DPC++ is used to implement the library and application.
Makefile based FPGA application implementation is supported. Optionally user can implement Application using Vitis GUI to target Xilinx FPGAs. In that case, user need to point the config file and set number of kernels. Here we note that separate config files are provided for U50 and U280 devices.
Following are the steps for Makefile based flow for the Xilinx FPGAs,
cd <application directory>
set the target config file(_u50.cfg or u280.cfg) in the Makefile
make build TARGET=<sw_emu/hw_emu/hw> PLATFORM=<FPGA platform>
make run TARGET=<sw_emu/hw_emu/hw> PLATFORM=<FPGA platform>
please make sure XRT setup.sh and Vitis settings64.sh scripts are sourced before using Makefile commands. E.g
source /disk1/Xilinx/Vitis/2019.2/settings64.sh
source /opt/xilinx/xrt/setup.sh
Application to target intel FPGAs can be compiled using following make file command. Target board is set as Intel PAC D5005.
make report/run_emu/hw
this requires Intel oneAPI toolkit as well as FPGA add on.
The performance of Tridsolver-FPGA library on Xilinx FPGAs has been compared to performance of the same applications on Nvida V100 GPUs (using the Tridsolver GPU library by László et al. and NVIDIA's cuSPARSE). The following results are for the 2D and 3D Heat Diffusion Application implemented with the ADI technique and a Stochastic Local Volatility (SLV) model application, implemented with a Hundsdorfer-Verwer (HV) method for time integration.
FP32, v= 8, fCU=3, NCU=2 | FP64, v= 8, fCU=3, NCU=2 |
FP32, v= 8, NCU=4 | FP64, v= 8, NCU=2 |
FP32, Thomas-Thomas solver, NCU=4 | FP32, Thomas-PCR solver, NCU=4 |
40x20 Mesh, v = 1, NCU=2, FP64 | 100x50 Mesh, v = 1, NCU=2, FP64 |
FP32, v= 8, fCU=3, NCU=3 | FP64, v= 8, fCU=3, NCU=3 |
FP32, v= 8, NCU=6 | FP64, v= 8, NCU=3 |
FP32, Thomas-Thomas solver, NCU=4 | FP32, Thomas-PCR solver, NCU=4 |
40x20 Mesh, v = 1, NCU=3, FP64 | 100x50 Mesh, v = 1, NCU=3, FP64 |