-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ME reads the wrong stub in Vivado simulation #188
Comments
Thanks for revisiting this @meisonlikesicecream. I at least had lost track of this issue. I still wonder if this is related to the following warning that we get during C-synthesis of the ME:
which I think is related to this warning that we get during C/RTL cosimulation:
If these warning are relevant, maybe we will have to introduce some gap between events similar to what HLS does in the cosimulation. |
Hmm maybe? Although I get the same warnings from the VMR and that one seems to do alright? |
Do we know that though? The ME seems to fail when one event uses all processing iterations and the following event produces output from the first (few?) inputs. Have we confirmed that the other modules with these warnings, which I think might be all of them, succeed in these situations? |
Good question. I just tested a single VMR_L2PHIC in Vivado as it uses all processing iterations for some events (unlike the one used for the IRVMR tb), and something is definitely wrong: the 108th stub in the output is always missing. Will look into it. |
Update: Turns out the missing 108th stub were due to some bugs in the FileWriterFromRAM, which are unrelated to this. Although there are still inconsistencies for that specific VMR which might or might not have to do something with those HLS warnings... I'll keep investigating. |
Is there a reason why it takes 3 clock cycles to load a projection from memory? I thought that would take 1-2 clock cycles. Is this normal? |
I can't figure out what's wrong with the remaining VMR_L2PHIC inconsistencies... for some reason the VMR IP sends out the wrong write address for some output stubs. The actual output stub and nentries look good, so does the input stubs. It is more likely to happen when it processes 108 events, but it does happen when we have less than 108 as well. It also seems to happen relatively close to either the start of a new event or in the end. If this is somewhat related to the incorrect read addresses of the ME, I don't know. The weird 3 clock cycle warning disappears if one removes the HLS RESOURCE pragma (or change the latency from 2 to 1), but then the VMR fails cosim... |
Okay, I think I have made some progress regarding this. So it seems like the issue might be related to resetting the variables that keep track of the read address (for the ME) and write address (for the VMR) at the start of the main function. Currently, the reset is being performed before the main loop and Vivado simulation doesn't seem to like that? (I'm referring to line https://github.com/cms-L1TK/firmware-hls/blob/master/TrackletAlgorithm/MatchEngine.h#L208 and https://github.com/cms-L1TK/firmware-hls/blob/master/TrackletAlgorithm/VMRouter.h#L653-L664) Maybe this is related to the line in the HLS rewind documentation stating: If one instead resets the read/write address variables inside the loop it seems to work better... see: The changes in the ME reduced the number of errors in the PRMEMC chain from 3 to 1. I don't know what the last error is though (it's not a new one, it's one of the 3 errors that are already there). Will have to investigate that further. |
is this issue still open? @meisonlikesicecream @aehart or can we close it? |
I guess this is irrelevant now since we're using the combined modules :-) |
The remaining errors in the PRMEMC chain is caused by the ME. It seems like the ME reads the wrong VMSME stubs if the previous event continues reading from the VMSME memories after the current event has started (thanks to rewind?). We don’t see this issue in the HLS simulation as it adds a few clock cycles in between events for some reason.
I have attached a screenshot of the waveforms for one of the inconsistent events (event 27 in ME_L3PHIC19) to try and demonstrate this:
The event starts where the blue vertical marker is positioned.
The read address for the VMPROJ memories (red row) are correct and two clock cycles later, marked by the yellow vertical marker, it reads the correct VMPROJ stub (magenta row).
In that same clock cycle (yellow vertical marker) the ME then uses this VMPROJ stub to figure out the read address for the VMSME stub (yellow rows; I have split it up into BX, BIN, STUB NUMBER = which stub we read within a bin). This address is incorrect. The BX and BIN bits are correct, but it should start with the 0th stub and continue up to the number of entries within that bin. In the events when it fails, STUB NUMBER is non-zero (7 in this case) and we end up reading the wrong ME stub. Instead of resetting the STUB NUMBER to 0 when reading from a new bin, it increments the stub number from the previous event as it had more than 7 stubs in that bin.
This then leads to the wrong VMSME stub being read, which in turn leads to a missing stub in the output memory.
There is an ugly bodge to avoid this issue by not allowing the ME to read VMSME stubs after clock cycle 106 ( i.e. add
if (bufferNotEmpty && istep<kMaxProc-2) {
to this line: https://github.com/cms-L1TK/firmware-hls/blob/master/TrackletAlgorithm/MatchEngine.cc#L140), but then it would of course truncate some stubs further down the chain unless we change the testvectors. Don't know if that bodge only works for our specific PRMEMC chain or in general.I have added the ME VHDL testbench in my repo and it can be run using the same instructions as for the PRMEMC in the README for debugging purposes: https://github.com/meisonlikesicecream/firmware-hls/tree/me_vhdl_tb/IntegrationTests/ME
The text was updated successfully, but these errors were encountered: