-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce basic block #91
Introduce basic block #91
Conversation
I merged commit 2aa7154 and let's concentrate on basic block part. |
569c378
to
7fe0c6a
Compare
Instead of decoding and interpreting individual instructions in a loop, as with an interpreting emulator, an attempt is made to combine entire blocks that usually end with a branch/jump instruction. By means of the JIT frameworks such as MIR, native instructions corresponding to the block are generated and executed at the first point in time at which a memory address is jumped to. Jumps are not executed directly, but instead the jump target is saved and the generated function is exited with RET. This allows the runtime environment to first compile the block at the jump target and perform other parallel tasks, including interrupt, input, and peripheral emulation. During the compilation of a program block, the number of RISC-V clock cycles required up to this point is calculated for each possible end over which the block can be exited, and this sum is added to an instruction counter during execution. By means of this counter, events that occur on the RISC-V at certain times, such as system timer, can be precisely timed despite the higher speed of the host platform. Since there may be routines in the emulated programs that are dependent on a fixed number of executed instructions in a certain period of time, the timers of the host system cannot be used without compatibility problems. Due to the block-wise execution, however, there is also the problem with the emulator presented here that interrupts or timers are only executed or updated a few clock pulses late - after the next jump. @Risheng1128, you shall confirm that using queue-based block management meets the requirements outlined above. In particular, when an exception/interrupt occurs, the proposed emulator can jump to the specified block. |
7fe0c6a
to
b7f09eb
Compare
Use queue-based block management to manage the instruction decode and execution. In decode stage, allocate a new memory block, put the decoded instruction into the block, and record the order of instructions by the member In execution stage, when the queue is not empty, emulator executes the memory block that index is In particular, when an exception/interrupt occurs, emulator will do the following steps:
|
TODO: use |
b7f09eb
to
cae22ec
Compare
Let emulator executes branch Overhead Command Shared Object Symbol
52.99% rv32emu rv32emu [.] rv_step
9.80% rv32emu rv32emu [.] memory_ifetch
8.69% rv32emu rv32emu [.] on_mem_ifetch
3.88% rv32emu rv32emu [.] rv_userdata
2.98% rv32emu rv32emu [.] memory_write
1.98% rv32emu rv32emu [.] main
1.40% rv32emu rv32emu [.] memory_read_w
0.82% rv32emu rv32emu [.] memory_read_s
0.73% rv32emu rv32emu [.] on_mem_write_w
0.72% rv32emu rv32emu [.] rv_has_halted
0.55% rv32emu rv32emu [.] on_mem_read_w
0.48% rv32emu rv32emu [.] on_mem_read_s
... branch Overhead Command Shared Object Symbol
39.95% rv32emu rv32emu [.] rv_step
10.52% rv32emu rv32emu [.] rv_decode
6.59% rv32emu rv32emu [.] memory_ifetch
6.09% rv32emu rv32emu [.] op_op_imm
5.71% rv32emu rv32emu [.] on_mem_ifetch
3.85% rv32emu rv32emu [.] op_branch
2.35% rv32emu rv32emu [.] rv_userdata
2.29% rv32emu rv32emu [.] op_op
2.27% rv32emu rv32emu [.] op_load
1.99% rv32emu rv32emu [.] memory_write
0.99% rv32emu rv32emu [.] memory_read_w
0.92% rv32emu rv32emu [.] op_store
0.51% rv32emu rv32emu [.] memory_read_s
0.51% rv32emu rv32emu [.] on_mem_write_w
0.50% rv32emu rv32emu [.] on_mem_read_w
... |
cae22ec
to
fcce814
Compare
If a block has been translated, we would not throw it away after emulating the instructions within; instead, we'll save it for future lookups.
|
Check |
I recommend using a hash table and block prediction. it may be more convenient to implement the control flow graph in addition to having quick access to the basic block. |
fcce814
to
177d998
Compare
This pull request introduces 1 alert and fixes 1 when merging 177d998 into 2aa7154 - view on LGTM.com new alerts:
fixed alerts:
Heads-up: LGTM.com's PR analysis will be disabled on the 5th of December, and LGTM.com will be shut down ⏻ completely on the 16th of December 2022. Please enable GitHub code scanning, which uses the same CodeQL engine ⚙️ that powers LGTM.com. For more information, please check out our post on the GitHub blog. |
Remove the queue in block because we just need to know the number of instructions encompased and the first executed instruction in block is located at first position of IR array. Using hash table and block prediction to manage the blocks. it cans avoid to decode the duplicate block and find the next block efficiently. Finally, using old: coremark
---------------------------------------------------------------------
2K performance run parameters for coremark.
CoreMark Size : 666
Total ticks : 13939747
Total time (secs): 13.939747
Iterations/Sec : 430.423881
Iterations : 6000
Compiler version : GCC11.1.0
Compiler flags : -O2 -DPERFORMANCE_RUN=1
Memory location : Please put data memory location here
(e.g. code in flash, data on heap etc)
seedcrc : 0xe9f5
[0]crclist : 0xe714
[0]crcmatrix : 0x1fd7
[0]crcstate : 0x8e3a
[0]crcfinal : 0xa14c
Correct operation validated. See README.md for run and reporting rules.
CoreMark 1.0 : 430.423881 / GCC11.1.0 -O2 -DPERFORMANCE_RUN=1 / Heap
dhrystone
---------------------------------------------------------------------
Dhrystone(1.1-mc), 10000000 passes, 13554164 microseconds, 419 DMIPS new: coremark
---------------------------------------------------------------------
2K performance run parameters for coremark.
CoreMark Size : 666
Total ticks : 12069925
Total time (secs): 12.069925
Iterations/Sec : 911.356119
Iterations : 11000
Compiler version : GCC11.1.0
Compiler flags : -O2 -DPERFORMANCE_RUN=1
Memory location : Please put data memory location here
(e.g. code in flash, data on heap etc)
seedcrc : 0xe9f5
[0]crclist : 0xe714
[0]crcmatrix : 0x1fd7
[0]crcstate : 0x8e3a
[0]crcfinal : 0x33ff
dhrystone
---------------------------------------------------------------------
Dhrystone(1.1-mc), 10000000 passes, 5904499 microseconds, 961 DMIPS The new implementation is about 2 ~ 3 times faster than old one. |
177d998
to
da35767
Compare
This pull request introduces 1 alert and fixes 1 when merging da35767 into 2aa7154 - view on LGTM.com new alerts:
fixed alerts:
Heads-up: LGTM.com's PR analysis will be disabled on the 5th of December, and LGTM.com will be shut down ⏻ completely on the 16th of December 2022. Please enable GitHub code scanning, which uses the same CodeQL engine ⚙️ that powers LGTM.com. For more information, please check out our post on the GitHub blog. |
Retain |
da35767
to
76a028a
Compare
This commit introduces the basic block in emulator, meaning that it makes emulator decode and execute numerous instructions at a time. Use hash table and block prediction to manage blocks efficiently. In decode stage, allocate a new block which contains up to 1024 instruction by default, decode the instruction into block until it is full or the latest instruction is a branch instruction and put it into the block map. In execution stage, emulator executes instructions in block. The number of instructions based on the member insn_num in struct block. In particular, when an exception/interrupt occurs, emulator will do the following steps: 1. Execute the exception/interrupt handler that resets a new program counter from the register mtvec and function emulate returns false. 2. Enter to the decode stage again, and create new block based on the new program counter. That is, emulator will stop executing old block and create the new one from new program counter. On the other hand, the file decode.c includes the header file riscv_private.h which includes the gdbstub file. It will make emulator compile failed because the gdbstub is cloned until compiling emulate.c. To resolve this problem, swapping the compile order between emulate.o and decode.o .
76a028a
to
856a3e9
Compare
Introduce basic block
This commit introduces the basic block in emulator, meaning that it makes emulator decode and execute numerous instructions at a time.
Complete the first requirement in #88 .
TODO: