Skip to content

Commit

Permalink
Introducing local register allocation for the tier-1 JIT compiler
Browse files Browse the repository at this point in the history
Local register allocation effectively reuses the host register value
within a basic block scope, thereby reducing the number of load and
store instructions.

Take continuous addi instructions as an example:

addi t0, t0, 1
addi t0, t0, 1
addi t0, t0, 1

* The generated machine code without register allocation

load t0, t0_addr
add t0, 1
sw t0, t0_addr
load t0, t0_addr
add t0, 1
sw t0, t0_addr
load t0, t0_addr
add t0, 1
sw t0, t0_addr

* The generated machine code without register allocation

load t0, t0_addr
add t0, 1
add t0, 1
add t0, 1
sw t0, t0_addr

As shown in the above example, register allocation reuses the host
register and reduces the number of load and store instructions.

* x86-64(i7-11700)

| Metric   |  W/O RA  |  W/ RA   | SpeedUp |
|----------+----------+----------+---------|
| dhrystone| 0.342 s  | 0.328 s  |  +4.27% |
| miniz    | 1.243 s  | 1.185 s  |  +4.89% |
| primes   | 1.716 s  | 1.689 s  |  +1.60% |
| sha512   | 2.063 s  | 1.880 s  |  +9.73% |
| stream   |11.619 s  |11.419 s  |  +1.75% |

* Aarch64 (eMag)

| Metric   |  W/O RA  |  W/ RA   | SpeedUp |
|----------+----------+----------+---------|
| dhrystone| 1.935 s  | 1.301 s  | +48.73% |
| miniz    | 7.706 s  | 4.362 s  | +76.66% |
| primes   |10.513 s  | 9.633 s  |  +9.14% |
| sha512   | 6.508 s  | 6.119 s  |  +6.36% |
| stream   |45.174 s  |38.037 s  | +18.76% |

As demonstrated in the performance analysis, the register allocation
improves the overall performance for the T1C generated machine code.
Without RA, the generated machine need to store back the register
value in the end of intruction. With RA, we only need to store back the
register value in the end of basic block or when host registers are
fully occupied. The performance enhancement is particularly pronounced
on Aarch64 due to its increased availability of registers, providing a
more extensive mapping capability for VM registers.
  • Loading branch information
qwe661234 committed Feb 13, 2024
1 parent b787dc2 commit 49ab1ac
Show file tree
Hide file tree
Showing 4 changed files with 670 additions and 479 deletions.
Loading

1 comment on commit 49ab1ac

@jserv
Copy link
Contributor

@jserv jserv commented on 49ab1ac Feb 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Benchmarks

Benchmark suite Current: 49ab1ac Previous: df43757 Ratio
Dhrystone 1745.5 Average DMIPS over 10 runs 1746.88 Average DMIPS over 10 runs 1.00
Coremark 1531.116 Average iterations/sec over 10 runs 1516.62 Average iterations/sec over 10 runs 0.99

This comment was automatically generated by workflow using github-action-benchmark.

Please sign in to comment.