Introduce preliminary macro operation fusion #132

qwe661234 · 2023-05-22T06:21:36Z

Through our observations, we have identified certain patterns in instruction sequences. By converting these specific RISC-V instruction patterns into faster and equivalent code, we can significantly improve execution efficiency.

In our current analysis, we focus on a commonly used benchmark and have found the following frequently occurring instruction patterns: auipc + addi, auipc + add, multiple sw, and multiple lw.

Metric	commit `fba5802`	macro fuse operation	Speedup
CoreMark	1351.065 (Iterations/Sec)	1352.843 (Iterations/Sec)	+0.13％
dhrystone	1073 DMIPS	1146 DMIPS	+6.8%
nqueens	8295 msec	7824 msec	+6.0%

src/decode.h

src/emulate.c

jserv · 2023-05-22T07:35:21Z

To enhance execution efficiency, we employ instruction fusion by combining sequences that adhere to specific patterns into fused instructions. Currently, we have incorporated four fused instructions: auipc + addi, auipc + add, multiple sw, and multiple lw.

You shall show some numbers to illustrate how we can benefit from macro operation fusion.
In addition, why were 4 patterns were picked? Denote them with existing benchmark programs.

src/emulate.c

src/decode.h

src/riscv.c

src/emulate.c

src/decode.h

src/emulate.c

jserv · 2023-05-27T19:48:22Z

src/emulate.c

 })
 #endif

+/* auipc + addi */


Is it possible to manipulate the sequence lui + addi?
See #81 (comment)

Disassembly of CoreMark:

10324: 000087b7 lui a5,0x8 10328: b0578793 addi a5,a5,-1275 # 0x7b05

It is possible. however, there are some problems when running qrcode.elf if we import this pattern, so I skip it in this pull request.

It is possible. however, there are some problems when running qrcode.elf if we import this pattern, so I skip it in this pull request.

Add a comment starting with "FIXME: lui + addi"

jserv · 2023-05-27T19:51:08Z

src/emulate.c

+    rv->PC += ir->insn_len * (ir->imm2 - 1);
+})
+
+/* multiple lw */


lw is the most frequent instruction (see #34), and we might dive into its use case more.

Can you handle the following case? (disassembly from CoreMark

10248: 03012603 lw a2,48(sp) 1024c: 01c11583 lh a1,28(sp) 10250: 03412503 lw a0,52(sp)

In addition, consider the following scenario:

10a84: 01c12083 lw ra,28(sp) 10a88: 07f47513 andi a0,s0,127 10a8c: 01812403 lw s0,24(sp) 10a90: 01412483 lw s1,20(sp) 10a94: 01012903 lw s2,16(sp) 10a98: 00c12983 lw s3,12(sp)

It can be regarded as 5 lw. Roughly speaking, if peephole optimization can be applied, we shall benefit from further optimizations.

Another case: (disassembly from CoreMark)

10c08: 01162023 sw a7,0(a2) 10c0c: 00052783 lw a5,0(a0) 10c10: 00059883 lh a7,0(a1) 10c14: 00259603 lh a2,2(a1) 10c18: 00f82023 sw a5,0(a6) 10c1c: 01052023 sw a6,0(a0) 10c20: 00e82223 sw a4,4(a6) 10c24: 0006a783 lw a5,0(a3)

Mixture of sw and lw.

Another case: (disassembly from CoreMark)

10c08: 01162023 sw a7,0(a2) 10c0c: 00052783 lw a5,0(a0) 10c10: 00059883 lh a7,0(a1) 10c14: 00259603 lh a2,2(a1) 10c18: 00f82023 sw a5,0(a6) 10c1c: 01052023 sw a6,0(a0) 10c20: 00e82223 sw a4,4(a6) 10c24: 0006a783 lw a5,0(a3)

Mixture of sw and lw.

In this case, the memory address is not contiguous, what we can do just pack these instructions, but we cannot save any operation, such as checking misaligned.

In addition, consider the following scenario:

10a84: 01c12083 lw ra,28(sp) 10a88: 07f47513 andi a0,s0,127 10a8c: 01812403 lw s0,24(sp) 10a90: 01412483 lw s1,20(sp) 10a94: 01012903 lw s2,16(sp) 10a98: 00c12983 lw s3,12(sp)

It can be regarded as 5 lw. Roughly speaking, if peephole optimization can be applied, we shall benefit from further optimizations.

In this case, we can pack the last four instruction lw. if we want to handle this case by packing 5 lw, we need to reorder the instruction. For example, swap the first and the second instruction.

Can you handle the following case? (disassembly from CoreMark

10248: 03012603 lw a2,48(sp) 1024c: 01c11583 lh a1,28(sp) 10250: 03412503 lw a0,52(sp)

Ditto, if we want to handle this case, we need some strategies to reorder the instructions.

In this pull request, let's concentrate on preliminary support of macro operation fusion. You shall add some comments for further efforts such as instruction reordering.

src/emulate.c

jserv

Add some FIXME/TODO comments which address more macro operation fusion we can pay attention to.

src/emulate.c

+        case rv_insn_lw:
+            COMBINE_MEM_OPS(1);
+            break;
+            /* FIXME: lui + addi */


Through our observations, we have identified certain patterns in instruction sequences. By converting these specific RISC-V instruction patterns into faster and equivalent code, we can significantly improve execution efficiency. In our current analysis, we focus on a commonly used benchmark and have found the following frequently occurring instruction patterns: auipc + addi, auipc + add, multiple sw, and multiple lw. | Metric | commit fba5802 | macro fuse operation |Speedup| |----------+--------------------------+---------------------------+-------| | CoreMark | 1351.065 (Iterations/Sec)| 1352.843 (Iterations/Sec)|+0.13% | | dhrystone| 1073 DMIPS | 1146 DMIPS | +6.8% | | nqueens | 8295 msec | 7824 msec | +6.0% |

qwe661234 · 2023-05-29T15:45:02Z

Check CI failure.

In debug mode, the rv_step only emulates one instruction per step, specifically, it executes only the first instruction in a basic block then translate next basic block in PC + 4. If we apply macro fusion operations in debug mode, errors can occur. For instance, fusing auipc and addi and executing them together. However, the subsequent instruction is not a nop because the emulator only emulates the first instruction in a basic block. Consequently, the following instruction remains addi, resulting in an error because the result become auipc + addi + addi.

Therefore, we cannot do fuse operation in debug mode.

Introduce preliminary macro operation fusion

qwe661234 force-pushed the Add_fuse_operation branch from bb1b72f to 3ae3059 Compare May 22, 2023 06:22

jserv reviewed May 22, 2023

View reviewed changes

src/decode.h Show resolved Hide resolved

jserv reviewed May 22, 2023

View reviewed changes

src/decode.h Outdated Show resolved Hide resolved

jserv reviewed May 22, 2023

View reviewed changes

src/decode.h Outdated Show resolved Hide resolved

jserv reviewed May 22, 2023

View reviewed changes

src/emulate.c Outdated Show resolved Hide resolved

jserv reviewed May 22, 2023

View reviewed changes

src/emulate.c Outdated Show resolved Hide resolved

jserv reviewed May 22, 2023

View reviewed changes

src/emulate.c Show resolved Hide resolved

jserv reviewed May 22, 2023

View reviewed changes

src/emulate.c Show resolved Hide resolved

jserv changed the title ~~Add fuse instruction~~ Introduce macro operation fusion May 22, 2023

jserv reviewed May 22, 2023

View reviewed changes

src/emulate.c Outdated Show resolved Hide resolved

jserv reviewed May 22, 2023

View reviewed changes

src/emulate.c Outdated Show resolved Hide resolved

jserv reviewed May 22, 2023

View reviewed changes

src/emulate.c Outdated Show resolved Hide resolved

qwe661234 force-pushed the Add_fuse_operation branch from 3ae3059 to b66f4dc Compare May 27, 2023 08:23

qwe661234 requested a review from jserv May 27, 2023 08:27

jserv reviewed May 27, 2023

View reviewed changes

src/decode.h Outdated Show resolved Hide resolved

qwe661234 force-pushed the Add_fuse_operation branch from b66f4dc to 3f84ce5 Compare May 27, 2023 11:26

qwe661234 requested a review from jserv May 27, 2023 11:26

qwe661234 force-pushed the Add_fuse_operation branch from 3f84ce5 to 1f9cbea Compare May 27, 2023 11:32

jserv reviewed May 27, 2023

View reviewed changes

src/riscv.c Outdated Show resolved Hide resolved

jserv reviewed May 27, 2023

View reviewed changes

src/emulate.c Outdated Show resolved Hide resolved

jserv reviewed May 27, 2023

View reviewed changes

src/emulate.c Outdated Show resolved Hide resolved

jserv reviewed May 27, 2023

View reviewed changes

src/decode.h Outdated Show resolved Hide resolved

jserv reviewed May 27, 2023

View reviewed changes

src/emulate.c Outdated Show resolved Hide resolved

jserv reviewed May 27, 2023

View reviewed changes

jserv changed the title ~~Introduce macro operation fusion~~ Introduce preliminary macro operation fusion May 28, 2023

qwe661234 force-pushed the Add_fuse_operation branch from a7b8455 to 8933804 Compare May 29, 2023 08:44

jserv reviewed May 29, 2023

View reviewed changes

src/emulate.c Outdated Show resolved Hide resolved

qwe661234 force-pushed the Add_fuse_operation branch from 8933804 to fc9c3b8 Compare May 29, 2023 08:59

qwe661234 force-pushed the Add_fuse_operation branch from fc9c3b8 to 56b14b8 Compare May 29, 2023 08:59

github-advanced-security bot found potential problems May 29, 2023

View reviewed changes

src/emulate.c Fixed Show fixed Hide fixed

qwe661234 requested a review from jserv May 29, 2023 09:01

jserv reviewed May 29, 2023

View reviewed changes

src/emulate.c Outdated Show resolved Hide resolved

qwe661234 force-pushed the Add_fuse_operation branch from 56b14b8 to 743110f Compare May 29, 2023 09:17

github-advanced-security bot found potential problems May 29, 2023

View reviewed changes

src/emulate.c Fixed Show fixed Hide fixed

jserv requested changes May 29, 2023

View reviewed changes

jserv reviewed May 29, 2023

View reviewed changes

src/emulate.c Outdated Show resolved Hide resolved

jserv reviewed May 29, 2023

View reviewed changes

src/emulate.c Show resolved Hide resolved

qwe661234 force-pushed the Add_fuse_operation branch from 743110f to 9636542 Compare May 29, 2023 09:32

github-advanced-security bot found potential problems May 29, 2023

View reviewed changes

src/emulate.c

case rv_insn_lw:

COMBINE_MEM_OPS(1);

break;

/* FIXME: lui + addi */

Check notice

Code scanning / CodeQL

FIXME comment

FIXME comment: lui + addi

qwe661234 requested a review from jserv May 29, 2023 09:58

This comment was marked as outdated.

Sign in to view

This comment was marked as duplicate.

Sign in to view

qwe661234 force-pushed the Add_fuse_operation branch from 9636542 to 18213bc Compare May 29, 2023 15:33

qwe661234 requested a review from jserv May 29, 2023 15:45

jserv merged commit 5fb9d8b into sysprog21:master May 29, 2023

vestata pushed a commit to vestata/rv32emu that referenced this pull request Jan 24, 2025

Merge pull request sysprog21#132 from qwe661234/Add_fuse_operation

0405750

Introduce preliminary macro operation fusion

Introduce preliminary macro operation fusion #132

Introduce preliminary macro operation fusion #132

Uh oh!

Conversation

qwe661234 commented May 22, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jserv commented May 22, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jserv May 27, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jserv left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Check notice

This comment was marked as outdated.

Uh oh!

This comment was marked as duplicate.

Uh oh!

qwe661234 commented May 29, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

qwe661234 commented May 22, 2023 •

edited

Loading

jserv May 27, 2023 •

edited

Loading