Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mult_pipe not inferred correctly pipelined for DSP inference #1175

Open
rachitnigam opened this issue Sep 14, 2022 · 8 comments
Open

mult_pipe not inferred correctly pipelined for DSP inference #1175

rachitnigam opened this issue Sep 14, 2022 · 8 comments
Labels
AMC Needed for Andrew's memory compiler C: FPGA Changes for the FPGA backend Calyx 2.0 Things that move us towards Calyx 2.0 S: Needs Triage Issue needs some thinking

Comments

@rachitnigam
Copy link
Contributor

When running resource estimation, the main_drc_routed.rpt file provides the following warning:

DPOP-3#1 Warning
PREG Output pipelining
DSP mult_pipe0/comp/out_tmp0__0 output mult_pipe0/comp/out_tmp0__0/P[47:0] is not pipelined (PREG=0). Pipelining the DSP48 output will improve performance and often saves power so it is suggested whenever possible to fully pipeline this function. If this DSP48 function was inferred, it is suggested to describe an additional register stage after this function. If the DSP48 was instantiated in the design, it is suggested to set the PREG attribute to 1.
Related violations:

My guess is that registering the final output causes the implementation to be quite the same pipeline as a DSP48 resulting in the warning. A possible solution is adding another register in the pipeline so that the first three registers in the mult_pipe implementation are fully pipelined while the last one is responsible for sustaining the output after the done signal is set to high.

@rachitnigam rachitnigam added S: Async discussion needed C: FPGA Changes for the FPGA backend labels Sep 14, 2022
@sampsyo
Copy link
Contributor

sampsyo commented Sep 14, 2022

Interesting! Yeah, certainly worth seeing if tacking on that additional register just solves it. We could also consider an explicit (Xilinx-specific) instantiation, as the warning messages suggests.

@rachitnigam
Copy link
Contributor Author

Might be interested to @andrewb1999 too

@andrewb1999
Copy link
Collaborator

Yeah I'm planning on looking at this eventually, should be a straight forward fix.

@rachitnigam
Copy link
Contributor Author

Weirdly, even a Filament generated design seems to have this problem? Would be useful to figure out what behavioral RTL would generate the correct code or if we need to use something like Vivado's IP generator in the backend to make it work

@andrewb1999
Copy link
Collaborator

Out of curiosity, I checked what Vitis HLS does. Here is an example multiplier generated by HLS that infers a DSP:

module mv_mul_32s_32s_32_5_1(clk, ce, reset, din0, din1, dout);
parameter ID = 1;
parameter NUM_STAGE = 4;
parameter din0_WIDTH = 14;
parameter din1_WIDTH = 12;
parameter dout_WIDTH = 26;
input clk;
input ce;
input reset;
input[din0_WIDTH - 1 : 0] din0; 
input[din1_WIDTH - 1 : 0] din1; 
output[dout_WIDTH - 1 : 0] dout;

reg signed [din0_WIDTH - 1 : 0] a_reg0;
reg signed [din1_WIDTH - 1 : 0] b_reg0;
wire signed [dout_WIDTH - 1 : 0] tmp_product;
reg signed [dout_WIDTH - 1 : 0] buff0;
reg signed [dout_WIDTH - 1 : 0] buff1;
reg signed [dout_WIDTH - 1 : 0] buff2;

assign dout = buff2;
assign tmp_product = a_reg0 * b_reg0;
always @ (posedge clk) begin
    if (ce) begin
        a_reg0 <= din0;
        b_reg0 <= din1;
        buff0 <= tmp_product;
        buff1 <= buff0;
        buff2 <= buff1;
    end
end
endmodule

Surprisingly enough it took setting a pretty high frequency to get the tool to actually generate that. With a 100MHz target the tool only generates a single cycle multiplier (that still maps to a DSP), or even a combinational multiplier on a large ultrascale+ device with a low frequency target. The multiplier here hits nearly 500MHz on an ultrascale+ device.

@rachitnigam
Copy link
Contributor Author

Wow, that’s crazy. This would indicate we should switch our mult implementation to be 4 cycles.

Also, @andrewb1999 and I discussed the possibility to using the mult latency insensitive interface as a “virtual operation” and backing it up with different physical implementations depending on target device (in the style of #1151).

1 similar comment
@rachitnigam

This comment was marked as duplicate.

@sampsyo
Copy link
Contributor

sampsyo commented Nov 14, 2022

Nice! Indeed, some manner of abstraction over different implementations here would be super duper cool. But as @rachitnigam has said in the past, this is much easier to imagine if we stay in the realm of sequential components and don't try to abstract over the combinational/sequential divide.

@rachitnigam rachitnigam added S: Discussion needed Issues blocked on discussion and removed S: Async discussion needed labels Nov 15, 2022
@rachitnigam rachitnigam added S: Needs Triage Issue needs some thinking and removed S: Discussion needed Issues blocked on discussion labels Jan 1, 2023
@rachitnigam rachitnigam added the AMC Needed for Andrew's memory compiler label Feb 2, 2023
@rachitnigam rachitnigam added this to the Quality of Results milestone Mar 6, 2023
@rachitnigam rachitnigam added the Calyx 2.0 Things that move us towards Calyx 2.0 label Apr 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
AMC Needed for Andrew's memory compiler C: FPGA Changes for the FPGA backend Calyx 2.0 Things that move us towards Calyx 2.0 S: Needs Triage Issue needs some thinking
Projects
None yet
Development

No branches or pull requests

3 participants