Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: Document $macc #4316

Merged
merged 3 commits into from
Apr 8, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
44 changes: 43 additions & 1 deletion docs/source/yosys_internals/formats/cell_library.rst
Original file line number Diff line number Diff line change
Expand Up @@ -619,6 +619,48 @@ Finite state machines

Add a brief description of the ``$fsm`` cell type.

Coarse arithmetics
~~~~~~~~~~~~~~~~~~~~~

The ``$macc`` cell type represents a generalized multiply and accumulate operation. The cell is purely combinational. It outputs the result of summing up a sequence of products and other injected summands.

.. code-block::

Y = 0 +- a0factor1 * a0factor2 +- a1factor1 * a1factor2 +- ...
+ B[0] + B[1] + ...

The A port consists of concatenated pairs of multiplier inputs ("factors").
A zero length factor2 acts as a constant 1, turning factor1 into a simple summand.

In this pseudocode, ``u(foo)`` means an unsigned int that's foo bits long.

.. code-block::

struct A {
u(CONFIG.mul_info[0].factor1_len) a0factor1;
u(CONFIG.mul_info[0].factor2_len) a0factor2;
u(CONFIG.mul_info[1].factor1_len) a1factor1;
u(CONFIG.mul_info[1].factor2_len) a1factor2;
...
};

The cell's ``CONFIG`` parameter determines the layout of cell port ``A``.
The CONFIG parameter carries the following information:

.. code-block::

struct CONFIG {
u4 num_bits;
struct mul_info {
bool is_signed;
bool is_subtract;
u(num_bits) factor1_len;
u(num_bits) factor2_len;
}[num_ports];
};
Copy link
Member

@whitequark whitequark Apr 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@widlarizer Thank you for documenting this!

@ everyone-else The semantics for this is completely insane, right? I'm not alone in wanting to see someone widlarize it? Why does it have dependently typed bitfields? The biggest factor[12]_len it can express is u16. Why not just use u16 at all times, so that you do not need dependently typed bitfields?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I consider it an extreme case of data oriented design, probably to the extent of premature optimization. The complexity makes it probably liable for bugs if, say, somebody wrote HLS using Yosys that tries to generate $macc outside of yosys itself. Special casing single bit arguments, special casing zero length second multiplication arguments instead of using constants - I'm interested what motivated this

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The biggest factor[12]_len it can express is u16. Why not just use u16 at all times, so that you do not need dependently typed bitfields?

It's a bit worse, the maximum is u15. We could restrict the definition of $macc going forward, we could say that they are only valid with 15 for num_bits.

It's more complex than it needs to be, but I am not convinced that alone justifies changing it right now. If we had other changes we would want to make in the cell's definition, then sure, let's go for $macc_v2 that's as simple as we can make it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I change my position. We can make this much simpler. We can express all of this through a sequence of products on signals, where those signals can be constant as need be. We can afford to make all of the operands the same width, though that may be going too far. I am in favor of defining macc_v2.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we externally bound to keep macc unchanged? Replacing all uses of current macc with simpler macc seems easy

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we externally bound to keep macc unchanged?

I can't imagine anyone outside of YosysHQ is using this mess...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put "defining $macc_v2" as an agenda item for the next dev JF.


B is an array of concatenated 1-bit-wide unsigned integers to also be summed up.

Specify rules
~~~~~~~~~~~~~

Expand Down Expand Up @@ -1152,4 +1194,4 @@ file via ABC using the abc pass.

.. todo:: Add information about ``$lut`` and ``$sop`` cells.

.. todo:: Add information about ``$alu``, ``$macc``, ``$fa``, and ``$lcu`` cells.
.. todo:: Add information about ``$alu``, ``$fa``, and ``$lcu`` cells.
56 changes: 52 additions & 4 deletions techlibs/common/simlib.v
Original file line number Diff line number Diff line change
Expand Up @@ -902,18 +902,34 @@ endgenerate
endmodule

// --------------------------------------------------------

// |---v---|---v---|---v---|---v---|---v---|---v---|---v---|---v---|---v---|---v---|
//-
//- $macc (A, B, Y)
//-
//- Multiply and accumulate.
//- A building block for summing any number of negated and unnegated signals
//- and arithmetic products of pairs of signals. Cell port A concatenates pairs
//- of signals to be multiplied together. When the second signal in a pair is zero
//- length, a constant 1 is used instead as the second factor. Cell port B
//- concatenates 1-bit-wide signals to also be summed, such as "carry in" in adders.
//- Typically created by the `alumacc` pass, which transforms $add and $mul
//- into $macc cells.
module \$macc (A, B, Y);

parameter A_WIDTH = 0;
parameter B_WIDTH = 0;
parameter Y_WIDTH = 0;
// CONFIG determines the layout of A, as explained below
parameter CONFIG = 4'b0000;
parameter CONFIG_WIDTH = 4;

input [A_WIDTH-1:0] A;
input [B_WIDTH-1:0] B;
output reg [Y_WIDTH-1:0] Y;
// In the terms used for this cell, there's mixed meanings for the term "port". To disambiguate:
// A cell port is for example the A input (it is constructed in C++ as cell->setPort(ID::A, ...))
// Multiplier ports are pairs of multiplier inputs ("factors").
// If the second signal in such a pair is zero length, no multiplication is necessary, and the first signal is just added to the sum.
input [A_WIDTH-1:0] A; // Cell port A is the concatenation of all arithmetic ports
input [B_WIDTH-1:0] B; // Cell port B is the concatenation of single-bit unsigned signals to be also added to the sum
output reg [Y_WIDTH-1:0] Y; // Output sum

// Xilinx XSIM does not like $clog2() below..
function integer my_clog2;
Expand All @@ -929,10 +945,42 @@ function integer my_clog2;
end
endfunction

// Bits that a factor's length field in CONFIG per factor in cell port A
localparam integer num_bits = CONFIG[3:0] > 0 ? CONFIG[3:0] : 1;
// Number of multiplier ports
localparam integer num_ports = (CONFIG_WIDTH-4) / (2 + 2*num_bits);
// Minium bit width of an induction variable to iterate over all bits of cell port A
localparam integer num_abits = my_clog2(A_WIDTH) > 0 ? my_clog2(A_WIDTH) : 1;

// In this pseudocode, u(foo) means an unsigned int that's foo bits long.
// The CONFIG parameter carries the following information:
// struct CONFIG {
// u4 num_bits;
// struct port_field {
// bool is_signed;
// bool is_subtract;
// u(num_bits) factor1_len;
// u(num_bits) factor2_len;
// }[num_ports];
// };

// The A cell port carries the following information:
// struct A {
// u(CONFIG.port_field[0].factor1_len) port0factor1;
// u(CONFIG.port_field[0].factor2_len) port0factor2;
// u(CONFIG.port_field[1].factor1_len) port1factor1;
// u(CONFIG.port_field[1].factor2_len) port1factor2;
// ...
// };
// and log(sizeof(A)) is num_abits.
// No factor1 may have a zero length.
// A factor2 having a zero length implies factor2 is replaced with a constant 1.

// Additionally, B is an array of 1-bit-wide unsigned integers to also be summed up.
// Finally, we have:
// Y = port0factor1 * port0factor2 + port1factor1 * port1factor2 + ...
// * B[0] + B[1] + ...

function [2*num_ports*num_abits-1:0] get_port_offsets;
input [CONFIG_WIDTH-1:0] cfg;
integer i, cursor;
Expand Down
Loading