Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Will VexiiRiscv be extended to support configuratble multi-issue ? #15

Open
franktaTian opened this issue May 27, 2024 · 7 comments
Open

Comments

@franktaTian
Copy link

Hi,
Will VexiiRiscv be extended to support configuratble multi-issue ? For example, 4-issue not just 1 or 2 issue(s).

@Dolu1990
Copy link
Member

Hi,

Technicaly speaking, i think it "already" support it, just that the ParamSimple class (thing which provide a way to configure the CPU in a easy way) doesn't provide support for more than 2. It is the only play where things relating that are hardcoded.
Note that i never tested anythign with more than 2 issue.

@franktaTian
Copy link
Author

Hi,

Technicaly speaking, i think it "already" support it, just that the ParamSimple class (thing which provide a way to configure the CPU in a easy way) doesn't provide support for more than 2. It is the only play where things relating that are hardcoded. Note that i never tested anythign with more than 2 issue.

Cool!

@Jzjerry
Copy link

Jzjerry commented Jun 24, 2024

I tried to get a 4-issue Vexii simply by copying and adding if (lanes>=3) and if (lanes>=4 ) in Param.scala just like:

if(lanes >= 2) {
val lane1 = newExecuteLanePlugin("lane1")
val early1 = new LaneLayer("early1", lane1, priority = 10)
plugins += lane1
plugins += new SrcPlugin(early1, executeAt = 0, relaxedRs = relaxedSrc)
plugins += new IntAluPlugin(early1, formatAt = 0)
plugins += shifter(early1, formatAt = relaxedShift.toInt)
plugins += new IntFormatPlugin(lane1)
plugins += new BranchPlugin(early1, aluAt = 0, jumpAt = relaxedBranch.toInt, wbAt = 0)
if(withRvZb) plugins ++= ZbPlugin.make(early1, formatAt=0)
if(withLateAlu) {
val late1 = new LaneLayer("late1", lane1, priority = -3)
plugins += new SrcPlugin(late1, executeAt = lateAluAt, relaxedRs = relaxedSrc)
plugins += new IntAluPlugin(late1, aluAt = lateAluAt, formatAt = lateAluAt)
plugins += shifter(late1, shiftAt = lateAluAt, formatAt = lateAluAt)
plugins += new BranchPlugin(late1, aluAt = lateAluAt, jumpAt = lateAluAt/*+relaxedBranch.toInt*/, wbAt = lateAluAt, withJalr = false)
if(withRvZb) plugins ++= ZbPlugin.make(late1, executeAt = lateAluAt, formatAt = lateAluAt)
}
// if (withMul) {
// plugins += new MulPlugin(early1)
// }
plugins += new WriteBackPlugin(lane1, IntRegFile, writeAt = withLateAlu.mux(lateAluAt, intWritebackAt), allowBypassFrom = allowBypassFrom)
}

and also added num of decoders to 4. There is no problem for the generation and simulation.

I benchmarked 3-issue and 4-issue RV32IMC on dhrystone and coremark:

  • 2-issue: 16149 Dhrystones/Second, 0.76 DMIPS/MHz. 1.53 Coremark/MHz.
  • 3-issue: 16619 Dhrystones/Second, 0.78 DMIPS/MHz. 1.54 Coremark/MHz.
  • 4-issue: 16711 Dhrystones/Second, 0.79 DMIPS/MHz. 1.55 Coremark/MHz.

The performance difference will be higher if you toggle more performance options like late-alu.
Anyway, I believe there is no big problem with multi-issue, you can modify Param.scala to get even more lanes XD

@Dolu1990
Copy link
Member

and also added num of decoders to 4. There is no problem for the generation and simulation.

LOL
Nice :)

2-issue: 16149 Dhrystones/Second, 0.76 DMIPS/MHz. 1.53 Coremark/MHz.

Hmm that is weird, the performance are well bellow what it should be.

Did you enabled the branch predictors aswell ? Did you had caches ? One thing is that by default, most performance oriented feature are disabled.

The one thing were i can see have so many lane scale, is for AES (for instance) and well optimized code, as GCC will likely generate coupled code which do not take advantages of in order execution over all those lanes

@Jzjerry
Copy link

Jzjerry commented Jun 24, 2024

Hmm that is weird, the performance are well bellow what it should be.

Did you enabled the branch predictors aswell ? Did you had caches ? One thing is that by default, most performance oriented feature are disabled.

I didn't enable anything beyond the default LOL. If those performance features are enabled, we can get a larger gap between 2-issue and 4-issue, like 4.16 Coremark/MHz v.s. 4.38 Coremark/MHz (tested with late-alu, lsu-l1, fetch-l1 and predictors).

@Dolu1990
Copy link
Member

There is a few more :
withDispatcherBuffer = true // may do a big difference
withAlignerBuffer = true // will not make a big difference

lsu-l1, fetch-l1

Did you increase the number of way to at least 4 ?

@Jzjerry
Copy link

Jzjerry commented Jun 24, 2024

There is a few more : withDispatcherBuffer = true // may do a big difference withAlignerBuffer = true // will not make a big difference

lsu-l1, fetch-l1

Did you increase the number of way to at least 4 ?

Yeah, they do amplify the advantage of multi-issue!
Got a 4.32 Coremark/MHz v.s. 4.85 Coremark/MHz after adding dispatcher buffer, and a 4.51 v.s. 5.04 after enabling all of them and increasing to 4 ways of L1.
Looks like the dispatcher buffer matters more🤔.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants