Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Infinite RAW stall in floating-point code #304

Closed
jcarletta opened this issue Apr 5, 2020 · 21 comments · Fixed by #649
Closed

Infinite RAW stall in floating-point code #304

jcarletta opened this issue Apr 5, 2020 · 21 comments · Fixed by #649

Comments

@jcarletta
Copy link

The attached code exposes an apparent bug in EduMIPS; I tested in v1.2.6 (standalone) and v1.2.5. For this code, floating-point instructions reach the MEM stage at the same time that an integer instruction does. The integer instruction's MEM and WB stages are apparently never accounted for, and a consequent integer instruction infinitely RAW stalls.

My students found this bug in working on a project. (EduMIPS has been a great learning tool overall for my class.) I've pared out some code that wasn't need to expose the bug, and
added comments about what I see in the cycle window and register values that is not accurate.

Infinite RAW stall code.txt

@lupino3
Copy link
Member

lupino3 commented Apr 5, 2020

Hi and thanks for the bug report and for the easy-to-reproduce case.

I tried running the attached code and verified that indeed this seems to be a bug (actually, it seems like there are several bugs here). It'll take me some time to fix them :) I hope to have some time next week-end.

Thanks a lot and to your students for contributing to raising the quality of our simulator!

@lupino3 lupino3 self-assigned this Apr 5, 2020
@lupino3 lupino3 added this to the 1.2.7 milestone Apr 5, 2020
@lupino3
Copy link
Member

lupino3 commented Apr 5, 2020

I have one question to you: could you please allow me to use your code as a unit test for EduMIPS64, releasing it as GPL code? That will allow me to use it as a regression test once I fix the bug(s). Thanks!

@jcarletta
Copy link
Author

jcarletta commented Apr 5, 2020 via email

@lupino3
Copy link
Member

lupino3 commented Apr 5, 2020

Thank you very much! I found and fixed the bug where MEM/WB for one instruction were not executed (4e65f11), just need to add a regression test for it.

Unfortunately the bug where MEM/WB are not visible in the Cycle UI is harder to fix and will require a couple of hours of work, I hope to be able to do it next weekend and push out a new release.

I hope you can live with the temporary workaround that you found for the time being. Thanks again for the bug reports!

@jcarletta
Copy link
Author

jcarletta commented Apr 6, 2020 via email

@lupino3
Copy link
Member

lupino3 commented Apr 11, 2020

Note to self on things to do:

  • unit test for the infinite RAW stall bug (i.e., EX being overridden on RAW stalls generated by FPU)
  • unit tests for wrong terminating CycleElement state for some FP instructions
  • fix bug with wrong terminating CycleElement state (caused by short look-back in the state progression algorithm in CycleBuilder -- should be fixed by keeping in memory the exact working set of instructions)
  • see if it's possible to do CycleBuilder checks on wrong terminating element at run-time, and emit a SEVERE log statement
  • see if it's possible to have SEVERE log statements fail unit tests.

@lupino3
Copy link
Member

lupino3 commented Apr 11, 2020

Sorry for the delay, busy week. I hope to have some time this weekend or next week!

@lupino3
Copy link
Member

lupino3 commented Apr 11, 2020

@jcarletta the standalone JAR from the latest commit to master contains the fix for the infinite RAW stall: https://github.com/EduMIPS64/edumips64/actions/runs/76054445

Once I manage to fix the second bug as well I'll release a new version (1.2.7) with both fixes.

lupino3 added a commit that referenced this issue Apr 11, 2020
This fixes #304, but may not work in all cases.
@lupino3
Copy link
Member

lupino3 commented Apr 11, 2020

@jcarletta just released v1.2.7, which should contain a fix for both issues: https://github.com/EduMIPS64/edumips64/releases/tag/v1.2.7. Thanks a lot for the detailed bug report!

@hugmanrique
Copy link

hugmanrique commented Nov 12, 2021

It seems that this bug is appearing in some other place. The following program causes EduMIPS to stall endlessly due to a RAW dependency on F8:

 .data
v: .space 256
w: .space 256
x: .double 2.0
y: .double 1.5
z: .double 0.0
.text
daddi R1, R0, v
daddi R2, R0, w
daddi R3, R0, 256

ldc1 F2, x(R0)
ldc1 F4, y(R0)
daddi R4, R0, 0

loop1:
dmtc1 R4, F6
daddi R4, R4, 8
cvt.d.l F6, F6
mul.d F8, F2, F6
mul.d F10, F4, F6
dmtc1 R4, F12
daddi R4, R4, 8
cvt.d.l F12, F12
mul.d F14, F2, F12
daddi R1, R1, 16
mul.d F8, F8, F8
daddi R2, R2, 16

sdc1 F10, -16(R2)
mul.d F14, F14, F14
mul.d F16, F4, F12
sdc1 F8, -16(R1) ; here 
sdc1 F14, -8(R1)
sdc1 F16, -8(R2)
bne R4, R3, loop1
syscall 0

Weirdly, the daddi R1, R1, 16 instruction displays no memory or write-back stages, thus the value in R1 remains 0, which causes the simulator to try to access a negative memory address. Nonetheless, is it documented behavior that an invalid memory access causes an infinite stall?

I've tried running the program with release 1.2.9 and the latest release on the CI (1.2.9-alpha-bbb71d8). The code is part of a class assignment, so I cannot release it under an open source license for regression tests atm.

@lupino3
Copy link
Member

lupino3 commented Nov 17, 2021

Thanks for the report @hugmanrique, I'll have a look at this as soon as possible!

@lupino3
Copy link
Member

lupino3 commented Nov 19, 2021

I tried running your code and indeed I can see the stall when I enable forwarding.

@lupino3
Copy link
Member

lupino3 commented Nov 19, 2021

The RAW exception is caused by the lack of write-back execution for the DADDI instruction. That leaves the instruction "locked" (the internal write semaphore is cleaned in WB) and causes SDC1 to throw the RAW exception. Now to understand why the WB stage doesn't execute for DADDI...

@lupino3
Copy link
Member

lupino3 commented Nov 19, 2021

image

`mul.d` at address `0024` enters `MEM` instead of `DADDI` and overrides it at cycle 19.

@lupino3
Copy link
Member

lupino3 commented Nov 21, 2021

This is what should happen: a structural stall.
image

@lupino3
Copy link
Member

lupino3 commented Feb 22, 2022

I am going to reopen this issue to keep track of the problem.

@lupino3 lupino3 reopened this Feb 22, 2022
@jcarletta
Copy link
Author

I appreciate this! My class is using EduMIPs again this year; it's very helpful to them.

@lupino3 lupino3 modified the milestones: 1.2.7, 1.2.10 Mar 2, 2022
@lupino3
Copy link
Member

lupino3 commented Mar 2, 2022

I found the issue and should have fixed it. @hugmanrique, would you be ok now with me using your sample code as a unit test?

lupino3 added a commit that referenced this issue Mar 2, 2022
Similar to #305 (which originally fixed #304), add a check on whether
EX is empty before inserting a BUBBLE instruction.

Also add some more verbose logging to help reason about the flow
without having to use a debugger.
@lupino3
Copy link
Member

lupino3 commented Mar 2, 2022

I'd love to add the test into #649 before merging, if possible @hugmanrique.

@hugmanrique
Copy link

hugmanrique commented Mar 4, 2022

Sure, go ahead. I can now release the code to the public domain.

Thank you for the fix!

lupino3 added a commit that referenced this issue Mar 5, 2022
This test would have exposed the second reoccurrence of issue #304.

Thanks to @hugmanrique for this code.
@lupino3
Copy link
Member

lupino3 commented Mar 5, 2022

Added, thanks @hugmanrique !

lupino3 added a commit that referenced this issue Mar 5, 2022
Fixes #304.

* fix: do not wipe EX on FP-induced WAW stalls

Similar to #305 (which originally fixed #304), add a check on whether
EX is empty before inserting a BUBBLE instruction.

Also add some more verbose logging to help reason about the flow
without having to use a debugger.

* test: add a regression test for issue 304

This test would have exposed the second reoccurrence of issue #304.

Thanks to @hugmanrique for this code.

* fix: fix merge conflict resolution error
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants