-
Notifications
You must be signed in to change notification settings - Fork 706
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding support for ZCMT Extension for Code-Size Reduction in CVA6 #2659
base: master
Are you sure you want to change the base?
Conversation
❌ failed run, report available here. |
Hi @JeanRochCoulon how can we know which line in the code this spyglass failure refers to? |
@ASintzoff do you know how to help ? |
❌ failed run, report available here. |
❌ failed run, report available here. |
❌ failed run, report available here. |
5973264
to
645d24f
Compare
❌ failed run, report available here. |
@JeanRochCoulon |
is the extension completely optional? All the RTL added for Zcmt should be removed when the extension is not set. If not, it can increase the gate count. |
❌ failed run, report available here. |
1 similar comment
❌ failed run, report available here. |
); | ||
end | ||
if (CVA6Cfg.RVZCMP) begin | ||
if (CVA6Cfg.RVZCMP || (CVA6Cfg.RVZCMT & ~CVA6Cfg.MmuPresent)) begin //MMU should be off when using ZCMT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I bet 2 cents that the gate count increase comes from this condition: decoder_macro and zcmt_decoder are both inferred by the same if condition. To me, decoder_macro depends on zcmp and zcmt_decoder depends on zcmt.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JeanRochCoulon almost ~1k gate count is increased in issue_stage, which I think is because we added one signal in scoreboard struct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest we increase the expected_synth gate count. What do you think, @JeanRochCoulon?
❌ failed run, report available here. |
@cathales Currently, it is not compatible with a superscalar architecture because it stalls the pipeline in the decode stage until the implicit fetch is completed. Compatibility can be addressed in a future update alongside the ZCMP extension. |
core/branch_unit.sv
Outdated
resolved_branch_o.is_mispredict = 1'b1; // miss prediction for ZCMT | ||
resolved_branch_o.cf_type = ariane_pkg::Jump; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could send a JumpR
instead of Jump
. It would update BTB in the frontend to correctly predict the destination next time.
We could then calculate is_mispredict
accordingly, to use this prediction when we later encounter the same cm.jt
or cm.jalt
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cathales
I have incorporated the changes
core/zcmt_decoder.sv
Outdated
end | ||
TABLE_JUMP: begin | ||
if (req_port_i.data_rvalid) begin | ||
jump_addr = $unsigned($signed(req_port_i.data_rdata) - $signed(pc_i)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because of this limitation, this implementation is not compliant with Zcmt, is it?
❌ failed run, report available here. |
8 similar comments
❌ failed run, report available here. |
❌ failed run, report available here. |
❌ failed run, report available here. |
❌ failed run, report available here. |
❌ failed run, report available here. |
❌ failed run, report available here. |
❌ failed run, report available here. |
❌ failed run, report available here. |
9b168ab
to
5bf4c64
Compare
❌ failed run, report available here. |
1 similar comment
❌ failed run, report available here. |
69839bd
to
b1fe91a
Compare
❌ failed run, report available here. |
Hello @farhan-108 |
Introduction
This PR implements the ZCMT extension in the CVA6 core, targeting the 32-bit embedded-class platforms. ZCMT is a code-size reduction feature that utilizes compressed table jump instructions (cm.jt and cm.jalt) to reduce code size for embedded systems
Note: Due to implementation complexity, ZCMT extension is primarily targeted at embedded class CPUs. Additionally, it is not compatible with architecture class profiles.(Ref. Unprivilege spec 27.20)
Key additions
Added zcmt_decoder module for compressed table jump instructions: cm.jt (jump table) and cm.jalt (jump-and-link table)
Implemented the Jump Vector Table (JVT) CSR to store the base address of the jump table in csr_reg module
Implemented a return address stack, enabling cm.jalt to behave equivalently to jal ra (jump-and-link with return address), by pushing the return address onto the stack in zcmt_decoder module
Implementation in CVA6
The implementation of the ZCMT extension involves the following major modifications:
compressed decoder
The compressed decoder scans and identifies the cm.jt and cm.jalt instructions, and generates signals indicating that the instruction is both compressed and a ZCMT instruction.
zcmt_decoder
A new zcmt_decoder module was introduced to decode the cm.jt and cm.jalt instructions, fetch the base address of the JVT table from JVT CSR, extract the index and construct jump instructions to ensure efficient integration of the ZCMT extension in embedded platforms. Table.1 shows the IO port connection of zcmt_decoder module. High-level block diagram of zcmt implementation in CVA6 is shown in Figure 1.
Table. 1 IO port connection with zcmt_decoder module
branch unit condition
A condition is implemented in the branch unit to ensure that ZCMT instructions always cause a misprediction, forcing the program to jump to the calculated address of the newly constructed jump instruction.
JVT CSR
A new JVT csr is implemented in csr_reg which holds the base address of the JVT table. The base address is fetched from the JVT CSR, and combined with the index value to calculate the effective address.
No MMU
Embedded platform does not utilize the MMU, so zcmt_decoder is connected with cache through port 0 of the Dcache module for implicit read access from the memory.
Figure. 1 High level block diagram of ZCMT extension implementation
Known Limitations
The implementation targets 32-bit instructions for embedded-class platforms without an MMU. Since the core does not utilize an MMU, it is leveraged to connect the zcmt_decoder to the cache via port 0.
Testing and Verification
Test Plan
A test plan is developed to test the functionality of ZCMT extension along with JVT CSR. Directed Assembly test executed to check the functionality.
Table. 2 Test plan
Note: Please find the test under CVA6_REPO_DIR/verif/tests/custom/zcmt"