SGen is a generator capable of producing efficient hardware designs for a variety of signal processing transforms. These designs operate on streaming data, meaning that the dataset is divided into several chunks that are processed during several cycles, thus allowing a reduced use of resources. The size of these chunks is called the streaming width. As an example, the figures below represent three discrete Fourier transforms on 8 elements, with a streaming width of 8 (no streaming), 4 and 2.
The generator outputs a Verilog file that can be used for FPGAs.
- A technical overview and an interface to download various generated designs is available here.
- Feel free to report any bug, issue, or suggestion to François Serre.
The easiest way to use SGen is by using SBT. The following commands, in a Windows or Linux console, will generate a streaming Walsh-Hadamard transform on 8 points:
git clone https://github.com/fserre/sgen.git
cd sgen
sbt "run -n 3 wht"
The following section describes the different parameters that can be used.
A SGen command line consists of a list parameters followed by the desired transform.
The following parameters can be used:
-n
n
Logarithm of the size of the transform. As an example,-n 3
means that the transform operates on 2^3=8 points. This parameter is required.-k
k
Logarithm of the streaming width of the implementation.-k 2
means that the resulting implementation will have 2^2=4 input ports and 4 output ports, and will perform a transform every 2^(n-k) cycles. In case where this parameter is not specified, the implementation is not folded, i.e. the implementation will have one input port and one output port for each data point, and will perform one transform per cycle.-r
r
Logarithm of the radix (for DFTs and WHTs). This parameter specifies the size of the base transform used in the algorithm. r must divide n, and, for compact designs (dftcompact
andwhtcompact
), be smaller or equal to k. It is ignored by tranforms not requiring it (permutations). If this parameter is not specified, the highest possible radix is used.-o
filename
Name of the output file.-benchmark
Adds a benchmark module in the generated design.-rtlgraph
Produces a DOT graph representing the design.-dualramcontrol
Uses dual-control for memory (read and write addresses are computed independently). This option yields designs that use more resources than with single RAM control (default), but that have more flexible timing constraints (see the description in generated files). This option is automatically enabled for compact designs.-singleported
Uses single-ported memory (read and write addresses are the same). This option has the same constraints as single RAM control (default), but may have a higher latency.-zip
Creates a zip file containing the design and its dependencies (e.g. FloPoCo modules).-hw
repr
Hardware arithmetic representation of the input data.repr
can be one of the following:char
Signed integer of 8 bits. Equivalent ofsigned 8
.short
Signed integer of 16 bits. Equivalent ofsigned 16
.int
Signed integer of 32 bits. Equivalent ofsigned 32
.long
Signed integer of 64 bits. Equivalent ofsigned 64
.uchar
Unsigned integer of 8 bits. Equivalent ofunsigned 8
.ushort
Unsigned integer of 16 bits. Equivalent ofunsigned 16
.uint
Unsigned integer of 32 bits. Equivalent ofunsigned 32
.ulong
Unsigned integer of 64 bits. Equivalent ofunsigned 64
.float
Simple precision floating point (32 bits). Equivalent ofieee754 8 23
.double
Double precision floating point (64 bits). Equivalent ofieee754 11 52
.half
Half precision floating point (16 bits). Equivalent ofieee754 5 10
.minifloat
Minifloat of 8 bits. Equivalent ofieee754 4 3
.bfloat16
bfloat16 floating point . Equivalent ofieee754 8 7
.unsigned
size
Unsigned integer of size bits.signed
size
Signed integer of size bits. Equivalent offixedpoint
size
0
.fixedpoint
integer fractional
Signed fixed-point representation with an integer part of integer bits and a fractional part of fractional bits.flopoco
wE wF
FloPoCo floating point representation with an exponent size of wE bits, and a mantissa of wF bits. The resulting design will depend on the corresponding FloPoCo generated arithmetic operators, which must be placed in theflopoco
folder. In the case where the corresponding vhdl file is not present, SGen provides the command line to generate it. Custom options (e.g.frequency
ortarget
) can be used.ieee754
wE wF
IEEE754 floating point representation with an exponent size of wE bits, and a mantissa of wF bits. Arithmetic operations are performed using FloPoCo. Note that, unless otherwise specified when generating FloPoCo operators, denormal numbers are flushed to zero.complex
repr
Cartesian complex number. Represented by the concatenation of its coordinates, each represented using repr arithmetic representation.
Supported transforms are the following:
Linear permutations can be implemented using the lp
command:
# generates a bit-reversal permutation on 32 points, streamed on 2^2=4 ports.
sbt "run -n 5 -k 2 lp bitrev"
# generates a streaming datapath that performs a bit-reversal permutation on 8 points on the first dataset, and a "half-reversal" on the second dataset on 2 ports
sbt "run -n 3 -k 1 lp bitrev 100110111"
The command lp
takes as parameter the invertible bit-matrix representing the linear permutation (see this publication for details) in row-major order. Alternatively, the matrix can be replaced by the keyword bitrev
or identity
.
Several bit-matrices can be listed (seperated by a space) to generate a datapath performing several permutations. In this case, the first permutation will be applied to the first dataset entering, the second one on the second dataset, ...
The resulting implementation supports full throughput, meaning that no delay is required between two datasets.
Fourier transforms (with full-throughput, i.e. without delay between datasets) can be generated using the dft
command:
# generates a Fourier transform on 16 points, streamed on 4 ports, with fixed-point complex datatype with a mantissa of 8 bits and an exponent of 8 bits.
sbt "run -n 4 -k 2 -hw complex fixedpoint 8 8 dft"
Fourier transforms (with an architecture that reuses several times the same hardware) can be generated using the dftcompact
command:
# generates a Fourier transform on 1024 points, streamed on 8 ports, with fixed-point complex datatype with a mantissa of 8 bits and an exponent of 8 bits.
sbt "run -n 10 -k 3 -hw complex fixedpoint 8 8 dftcompact"
In the case of a streaming design (n > k), memory modules may need to be used. In this case, SGen allows to choose the control strategy used for this module:
- Dual RAM control: read and write addresses are computed independently. This offers the highest flexibility (a dataset can be input at any time after the previous one), but this uses more resources. It is automatically used for compact designs (
dftcompact
), but can be enabled for other designs using the-dualRAMcontrol
parameter. - Single RAM control: write address is the same as the read address, delayed by a constant time. This uses less resources, but it has less flexibility: a dataset must be input either immediately after the previous one, or wait that the previous dataset is completely out. This is the default mode (except for compact designs).
- Single-ported RAM: write and read addresses are the same. This has the same constraints as Single RAM control, but may have a higher latency.