-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issues with the AXI implementation #1071
Comments
@nathanielnrn @sampsyo @rachitnigam I updated this issue with all the bugs I found and explanations. Let me know if you have any questions, I have a pretty good grasp on what's going on now so I should be able to help explain stuff. |
Thank you for the incredible debugging work @andrewb1999!! This will be really useful in getting a push button FPGA flows for calyx stuff! |
This is amazing! I'm implementing changes to code generation now. Sorry if this question has an obvious answer, I looked around but couldn't find one. Thought it might save time asking here. Where exactly does the kernel.xml interface with verilog code? In other words, would changes need to be made to any port widths in verilog with respect to
|
@nathanielnrn The kernel.xml doesn't really "interface" with the verilog per-say, it is just a simpler description of the underlying verilog that XRT uses to know how to interact with the kernel. What's important here is that the data width of the RDATA and WDATA channels match the datawidth that we specify in the kernel.xml. So for example, |
Maybe one way of putting it is that the One thing that's still unclear to me is where the width 512 comes from in the generator code. This is where the data width seems to stem from: …and there it's set to 32. So this may require some debugging! |
That's actually the Control_axi instantiation, so 32 is the correct value there. The issue is with the memory module here: There is a 512 there, but there's also a 64 and a 32 (for address width and the data width of the argument). I think there's just a mismatch somewhere of what value needs to go into the kernel.xml. |
Aha, thanks for the correction, @andrewb1999!! That would explain it. |
I spent some time today getting the dot-product design working with AXI on real FPGAs. I just manually edited the verilog for now, but I'll document the issues I found so they can be fixed in the generator at some point. Here's the repo with the working design: https://github.com/andrewb1999/calyx-axi-ex.git
Actual Calyx issues (dot-product.sv)
It's a bad idea to initialize logic (especially resets) that are used as wires rather than registers. I just commented out the entire initial block in the verilog which prevented issues with std_regs not being reset properly (currently none of the logic defined in the main module is being used as a register so this is fine to do).
XML issues (kernel.xml)
The datawidth of the m axi ports need to be 512 because this is the width we actually use in hardware.
Wrapper (wrapper.v)
Starting from the top of wrapper.v here are all the bugs I found and changes I made.
to
Two things are going on here. First, I removed the timeout as this can trigger ap_done being asserted before the data is ready. This could potentially be added back after verifying the rest of the design, but it just produces confusing behavior where the kernel thinks it finished when it actually didn't. Second, because of the previous change to memories_sent, it is now a single bit register so we don't have to do any comparison before setting it equal to ap_done. The previous implementation was also wrong even before changing the memories_sent semantics because ap_done wasn't asserted when all memories were sent, but rather when only A0 was sent (if any other memory was finished sending before A0, it would never assert ap_done).
4. Actually assigned the Done signal properly in SINGLE_PORT_BRAM (it was unconnected before).
5. Added an ap_idle signal to the control s axi lite. This signal is stored in rdata[2] and initialized to 1, becomes 0 when ap_start is asserted, and becomes 1 when ap_done is asserted. This allows the design to work properly with PYNQ and other implementations that rely on the ap_idle signal.
6. Changed all lines in the memory controllers that look like:
or
to
or
The parenthesis are required because verilog does the addition before the left shift to it previously would give the wrong ARADDR or AWADDR if the BASE_ADDRESS was anything other than 0.
7. Changed all assigns to WDATA from:
to
This isn't strictly necessary but it prevents writing multiple copies of the output data to the buffer if some of the other signals are wrong.
8. Changed all assigns to WSTRB from:
to
WSTRB is an axi signal where each bit corresponds to a single byte of the full axi data width. We are using a 512 bit data width on our axi signals, so we have a 64 bit WSTRB value (512/8). We are only sending one uint32 per axi transfer so only the bottom 4 bytes are valid in the axi transfer (therefore we only want to set the bottom four bits of the WSTRB to 1).
9. In the v0 memory controller (Memory_controller_axi_2), I changed:
and
to
and
This is the most important change and the primary reason we saw no outputs before. Because of how the module increments the bram address and the buffer address, the previous version would write the correct data on the first axi transfer and then write 0 to the same buffer location 31 times (which would obviously overwrite the value). Since this buffer only contains a single uint32, we only want to write values once anyways. Obviously, there is more work that needs to be done here to ensure that this works for all potential buffer sizes.
We should really also change these values from 32 to 8 in the other two memory controllers, although because of how the buffers are allocated and the memory controllers work, it still works with these values set to 32.
Other comment:
At some point we should update the generator to use a minimal version of bursting over AXI. Essentially all this does is pack multiple elements into one 512 bit data width, which reduces the number of axi transfers required significantly with minimal effort. I have an example for read bursting in the
opt
branch of thecalyx-axi-ex
repo, but obviously if we get to this point I can help out more.The text was updated successfully, but these errors were encountered: