Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ME reads the wrong stub in Vivado simulation #188

Closed
meisonlikesicecream opened this issue Aug 6, 2021 · 10 comments
Closed

ME reads the wrong stub in Vivado simulation #188

meisonlikesicecream opened this issue Aug 6, 2021 · 10 comments
Assignees

Comments

@meisonlikesicecream
Copy link
Contributor

The remaining errors in the PRMEMC chain is caused by the ME. It seems like the ME reads the wrong VMSME stubs if the previous event continues reading from the VMSME memories after the current event has started (thanks to rewind?). We don’t see this issue in the HLS simulation as it adds a few clock cycles in between events for some reason.

I have attached a screenshot of the waveforms for one of the inconsistent events (event 27 in ME_L3PHIC19) to try and demonstrate this:
The event starts where the blue vertical marker is positioned.
The read address for the VMPROJ memories (red row) are correct and two clock cycles later, marked by the yellow vertical marker, it reads the correct VMPROJ stub (magenta row).
In that same clock cycle (yellow vertical marker) the ME then uses this VMPROJ stub to figure out the read address for the VMSME stub (yellow rows; I have split it up into BX, BIN, STUB NUMBER = which stub we read within a bin). This address is incorrect. The BX and BIN bits are correct, but it should start with the 0th stub and continue up to the number of entries within that bin. In the events when it fails, STUB NUMBER is non-zero (7 in this case) and we end up reading the wrong ME stub. Instead of resetting the STUB NUMBER to 0 when reading from a new bin, it increments the stub number from the previous event as it had more than 7 stubs in that bin.
This then leads to the wrong VMSME stub being read, which in turn leads to a missing stub in the output memory.

image001

There is an ugly bodge to avoid this issue by not allowing the ME to read VMSME stubs after clock cycle 106 ( i.e. add if (bufferNotEmpty && istep<kMaxProc-2) { to this line: https://github.com/cms-L1TK/firmware-hls/blob/master/TrackletAlgorithm/MatchEngine.cc#L140), but then it would of course truncate some stubs further down the chain unless we change the testvectors. Don't know if that bodge only works for our specific PRMEMC chain or in general.

I have added the ME VHDL testbench in my repo and it can be run using the same instructions as for the PRMEMC in the README for debugging purposes: https://github.com/meisonlikesicecream/firmware-hls/tree/me_vhdl_tb/IntegrationTests/ME

@aehart
Copy link
Contributor

aehart commented Aug 6, 2021

Thanks for revisiting this @meisonlikesicecream. I at least had lost track of this issue.

I still wonder if this is related to the following warning that we get during C-synthesis of the ME:

WARNING: [SCHED 204-63] Unable to schedule the whole 3 cycles 'load' operation ('projectiondatatmp.data_.V', ../TrackletAlgorithm/MemoryTemplate.h:42->../TrackletAlgorithm/MatchEngine.cc:101->../TrackletAlgorithm/MatchEngine.cc:239) on array 'inputProjectionData_dataarray_data_V' within the first 0 cycles (II = 1).
Please consider 1) removing the 'rewind' option from the pipeline, or 2) increasing the target initiation interval of the pipeline.

which I think is related to this warning that we get during C/RTL cosimulation:

WARNING: [HLS 200-626] This design is unable to schedule all read ports in the first II cycle. The RTL testbench may treat the design as non-pipelined

If these warning are relevant, maybe we will have to introduce some gap between events similar to what HLS does in the cosimulation.

@meisonlikesicecream
Copy link
Contributor Author

Thanks for revisiting this @meisonlikesicecream. I at least had lost track of this issue.

I still wonder if this is related to the following warning that we get during C-synthesis of the ME:

WARNING: [SCHED 204-63] Unable to schedule the whole 3 cycles 'load' operation ('projectiondatatmp.data_.V', ../TrackletAlgorithm/MemoryTemplate.h:42->../TrackletAlgorithm/MatchEngine.cc:101->../TrackletAlgorithm/MatchEngine.cc:239) on array 'inputProjectionData_dataarray_data_V' within the first 0 cycles (II = 1).
Please consider 1) removing the 'rewind' option from the pipeline, or 2) increasing the target initiation interval of the pipeline.

which I think is related to this warning that we get during C/RTL cosimulation:

WARNING: [HLS 200-626] This design is unable to schedule all read ports in the first II cycle. The RTL testbench may treat the design as non-pipelined

If these warning are relevant, maybe we will have to introduce some gap between events similar to what HLS does in the cosimulation.

Hmm maybe? Although I get the same warnings from the VMR and that one seems to do alright?

@aehart
Copy link
Contributor

aehart commented Aug 9, 2021

Hmm maybe? Although I get the same warnings from the VMR and that one seems to do alright?

Do we know that though? The ME seems to fail when one event uses all processing iterations and the following event produces output from the first (few?) inputs. Have we confirmed that the other modules with these warnings, which I think might be all of them, succeed in these situations?

@meisonlikesicecream
Copy link
Contributor Author

meisonlikesicecream commented Aug 9, 2021

Hmm maybe? Although I get the same warnings from the VMR and that one seems to do alright?

Do we know that though? The ME seems to fail when one event uses all processing iterations and the following event produces output from the first (few?) inputs. Have we confirmed that the other modules with these warnings, which I think might be all of them, succeed in these situations?

Good question. I just tested a single VMR_L2PHIC in Vivado as it uses all processing iterations for some events (unlike the one used for the IRVMR tb), and something is definitely wrong: the 108th stub in the output is always missing. Will look into it.

@meisonlikesicecream
Copy link
Contributor Author

meisonlikesicecream commented Aug 11, 2021

Good question. I just tested a single VMR_L2PHIC in Vivado as it uses all processing iterations for some events (unlike the one used for the IRVMR tb), and something is definitely wrong: the 108th stub in the output is always missing. Will look into it.

Update: Turns out the missing 108th stub were due to some bugs in the FileWriterFromRAM, which are unrelated to this. Although there are still inconsistencies for that specific VMR which might or might not have to do something with those HLS warnings... I'll keep investigating.

@aperloff
Copy link

Is there a reason why it takes 3 clock cycles to load a projection from memory? I thought that would take 1-2 clock cycles. Is this normal?

@meisonlikesicecream
Copy link
Contributor Author

Update: Turns out the missing 108th stub were due to some bugs in the FileWriterFromRAM, which are unrelated to this. Although there are still inconsistencies for that specific VMR which might or might not have to do something with those HLS warnings... I'll keep investigating.

I can't figure out what's wrong with the remaining VMR_L2PHIC inconsistencies... for some reason the VMR IP sends out the wrong write address for some output stubs. The actual output stub and nentries look good, so does the input stubs. It is more likely to happen when it processes 108 events, but it does happen when we have less than 108 as well. It also seems to happen relatively close to either the start of a new event or in the end. If this is somewhat related to the incorrect read addresses of the ME, I don't know.

The weird 3 clock cycle warning disappears if one removes the HLS RESOURCE pragma (or change the latency from 2 to 1), but then the VMR fails cosim...

@meisonlikesicecream
Copy link
Contributor Author

meisonlikesicecream commented Oct 22, 2021

Okay, I think I have made some progress regarding this. So it seems like the issue might be related to resetting the variables that keep track of the read address (for the ME) and write address (for the VMR) at the start of the main function. Currently, the reset is being performed before the main loop and Vivado simulation doesn't seem to like that? (I'm referring to line https://github.com/cms-L1TK/firmware-hls/blob/master/TrackletAlgorithm/MatchEngine.h#L208 and https://github.com/cms-L1TK/firmware-hls/blob/master/TrackletAlgorithm/VMRouter.h#L653-L664)

Maybe this is related to the line in the HLS rewind documentation stating:
The code segment before the loop:
Is considered as initialization.
Is executed only once in the pipeline.

? I have no idea.

If one instead resets the read/write address variables inside the loop it seems to work better... see:
https://github.com/cms-L1TK/firmware-hls/blob/vmr_vhdl_debug/TrackletAlgorithm/MatchEngine.h#L304-L306
https://github.com/cms-L1TK/firmware-hls/blob/vmr_vhdl_debug/TrackletAlgorithm/VMRouter.h#L662-L678

The changes in the ME reduced the number of errors in the PRMEMC chain from 3 to 1. I don't know what the last error is though (it's not a new one, it's one of the 3 errors that are already there). Will have to investigate that further.
The changes in the VMR seems to work as well, however, I have to sacrifice one loop iteration or it didn't pipeline properly. Maybe I can find some way of fixing this as well, or the VMR will only be able to process 107 stubs...

@pwittich
Copy link
Contributor

pwittich commented Apr 4, 2024

is this issue still open? @meisonlikesicecream @aehart or can we close it?

@meisonlikesicecream
Copy link
Contributor Author

I guess this is irrelevant now since we're using the combined modules :-)

@pwittich pwittich closed this as completed Apr 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants