Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding support for ZCMT Extension for Code-Size Reduction in CVA6 #2659

Open
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

farhan-108
Copy link

@farhan-108 farhan-108 commented Dec 10, 2024

Introduction

This PR implements the ZCMT extension in the CVA6 core, targeting the 32-bit embedded-class platforms. ZCMT is a code-size reduction feature that utilizes compressed table jump instructions (cm.jt and cm.jalt) to reduce code size for embedded systems
Note: Due to implementation complexity, ZCMT extension is primarily targeted at embedded class CPUs. Additionally, it is not compatible with architecture class profiles.(Ref. Unprivilege spec 27.20)

Key additions

  • Added zcmt_decoder module for compressed table jump instructions: cm.jt (jump table) and cm.jalt (jump-and-link table)

  • Implemented the Jump Vector Table (JVT) CSR to store the base address of the jump table in csr_reg module

  • Implemented a return address stack, enabling cm.jalt to behave equivalently to jal ra (jump-and-link with return address), by pushing the return address onto the stack in zcmt_decoder module

Implementation in CVA6

The implementation of the ZCMT extension involves the following major modifications:

compressed decoder

The compressed decoder scans and identifies the cm.jt and cm.jalt instructions, and generates signals indicating that the instruction is both compressed and a ZCMT instruction.

zcmt_decoder

A new zcmt_decoder module was introduced to decode the cm.jt and cm.jalt instructions, fetch the base address of the JVT table from JVT CSR, extract the index and construct jump instructions to ensure efficient integration of the ZCMT extension in embedded platforms. Table.1 shows the IO port connection of zcmt_decoder module. High-level block diagram of zcmt implementation in CVA6 is shown in Figure 1.

Table. 1 IO port connection with zcmt_decoder module

Signals IO Description Connection Type
clk_i in Subsystem Clock SUBSYSTEM logic
rst_ni in Asynchronous reset active low SUBSYSTEM logic
instr_i in Instruction in compressed_decoder logic [31:0]
pc_i in Current PC PC from FRONTEND logic [CVA6Cfg.VLEN-1:0]
is_zcmt_instr_i in Is instruction a zcmt instruction compressed_decoder logic
illegal_instr_i in Is instruction a illegal instruction compressed_decoder logic
is_compressed_i in Is instruction a compressed instruction compressed_decoder logic
jvt_i in JVT struct from CSR CSR jvt_t
req_port_i in Handshake between CACHE and FRONTEND (fetch) Cache dcache_req_o_t
instr_o out Instruction out cvxif_compressed_if_driver logic [31:0]
illegal_instr_o out Is the instruction is illegal cvxif_compressed_if_driver logic
is_compressed_o out Is the instruction is compressed cvxif_compressed_if_driver logic
fetch_stall_o out Stall siganl cvxif_compressed_if_driver logic
req_port_o out Handshake between CACHE and FRONTEND (fetch) Cache dcache_req_i_t

branch unit condition

A condition is implemented in the branch unit to ensure that ZCMT instructions always cause a misprediction, forcing the program to jump to the calculated address of the newly constructed jump instruction.

JVT CSR

A new JVT csr is implemented in csr_reg which holds the base address of the JVT table. The base address is fetched from the JVT CSR, and combined with the index value to calculate the effective address.

No MMU

Embedded platform does not utilize the MMU, so zcmt_decoder is connected with cache through port 0 of the Dcache module for implicit read access from the memory.

zcmt_block drawio
Figure. 1 High level block diagram of ZCMT extension implementation

Known Limitations

The implementation targets 32-bit instructions for embedded-class platforms without an MMU. Since the core does not utilize an MMU, it is leveraged to connect the zcmt_decoder to the cache via port 0.

Testing and Verification

  • Developed directed test cases to validate cm.jt and cm.jalt instruction functionality
  • Verified correct initialization and updates of JVT CSR

Test Plan

A test plan is developed to test the functionality of ZCMT extension along with JVT CSR. Directed Assembly test executed to check the functionality.

Table. 2 Test plan

S.no Features Description Pass/Fail Criteria Test Type Test status
1 cm.jt Simple assembly test to validate the working of cm.jt instruction in  CV32A60x. Check against Spike's ref. model Directed Pass
2 cm.jalt Simple assembly test to validate the working of cm.jalt instruction in both CV32A60x. Check against Spike's ref. model Directed Pass
3 cm.jalt with return address stack Simple assembly test to validate the working of cm.jalt instruction with return address stack in both CV32A60x. It works as jump and link ( j ra, imm) Check against Spike's ref. model Directed Pass
4 JVT CSR Read and write base address of Jump table to JVT CSR Check against Spike's ref. model Directed Pass

Note: Please find the test under CVA6_REPO_DIR/verif/tests/custom/zcmt"

@JeanRochCoulon
Copy link
Contributor

@yanicasa

@JeanRochCoulon
Copy link
Contributor

@ASintzoff

Copy link
Contributor

❌ failed run, report available here.

@fatimasaleem
Copy link
Contributor

fatimasaleem commented Dec 10, 2024

Hi @JeanRochCoulon how can we know which line in the code this spyglass failure refers to? Some inputs to instance are not driven or unconnected

@JeanRochCoulon
Copy link
Contributor

@ASintzoff do you know how to help ?

Copy link
Contributor

❌ failed run, report available here.

2 similar comments
Copy link
Contributor

❌ failed run, report available here.

Copy link
Contributor

❌ failed run, report available here.

core/id_stage.sv Outdated Show resolved Hide resolved
core/id_stage.sv Outdated Show resolved Hide resolved
Copy link
Contributor

❌ failed run, report available here.

2 similar comments
Copy link
Contributor

❌ failed run, report available here.

Copy link
Contributor

❌ failed run, report available here.

core/compressed_decoder.sv Outdated Show resolved Hide resolved
core/id_stage.sv Outdated Show resolved Hide resolved
Copy link
Contributor

❌ failed run, report available here.

2 similar comments
Copy link
Contributor

❌ failed run, report available here.

Copy link
Contributor

❌ failed run, report available here.

Copy link
Contributor

❌ failed run, report available here.

@fatimasaleem
Copy link
Contributor

@ASintzoff do you know how to help ?

@JeanRochCoulon
@farhan-108 found the issue and resolved it. Now only failure is due to the gate count increase.

@ASintzoff
Copy link
Contributor

@ASintzoff do you know how to help ?

@JeanRochCoulon @farhan-108 found the issue and resolved it. Now only failure is due to the gate count increase.

is the extension completely optional? All the RTL added for Zcmt should be removed when the extension is not set. If not, it can increase the gate count.

core/csr_regfile.sv Outdated Show resolved Hide resolved
core/csr_regfile.sv Outdated Show resolved Hide resolved
core/csr_regfile.sv Outdated Show resolved Hide resolved
core/csr_regfile.sv Outdated Show resolved Hide resolved
core/csr_regfile.sv Outdated Show resolved Hide resolved
core/csr_regfile.sv Outdated Show resolved Hide resolved
core/csr_regfile.sv Outdated Show resolved Hide resolved
core/issue_read_operands.sv Outdated Show resolved Hide resolved
Copy link
Contributor

❌ failed run, report available here.

1 similar comment
Copy link
Contributor

❌ failed run, report available here.

core/include/cv32a6_ima_sv32_fpga_config_pkg.sv Outdated Show resolved Hide resolved
core/include/cv32a6_imac_sv0_config_pkg.sv Outdated Show resolved Hide resolved
core/include/cv32a6_imac_sv32_config_pkg.sv Outdated Show resolved Hide resolved
core/include/cv32a6_imafc_sv32_config_pkg.sv Outdated Show resolved Hide resolved
core/include/cv64a6_imafdc_sv39_config_pkg.sv Outdated Show resolved Hide resolved
core/include/cv64a6_imafdc_sv39_hpdcache_config_pkg.sv Outdated Show resolved Hide resolved
core/include/cv64a6_imafdc_sv39_openpiton_config_pkg.sv Outdated Show resolved Hide resolved
core/include/cv64a6_imafdc_sv39_wb_config_pkg.sv Outdated Show resolved Hide resolved
core/include/cv64a6_imafdch_sv39_config_pkg.sv Outdated Show resolved Hide resolved
core/include/cv64a6_imafdch_sv39_wb_config_pkg.sv Outdated Show resolved Hide resolved
);
end
if (CVA6Cfg.RVZCMP) begin
if (CVA6Cfg.RVZCMP || (CVA6Cfg.RVZCMT & ~CVA6Cfg.MmuPresent)) begin //MMU should be off when using ZCMT
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I bet 2 cents that the gate count increase comes from this condition: decoder_macro and zcmt_decoder are both inferred by the same if condition. To me, decoder_macro depends on zcmp and zcmt_decoder depends on zcmt.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JeanRochCoulon almost ~1k gate count is increased in issue_stage, which I think is because we added one signal in scoreboard struct.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest we increase the expected_synth gate count. What do you think, @JeanRochCoulon?

Copy link
Contributor

❌ failed run, report available here.

@farhan-108
Copy link
Author

farhan-108 commented Dec 17, 2024

Thank you for your contribution. Here are some suggestions.

By the way, do you think it would be feasible to make it compatible with superscalar?

Also, there is still the area increase which is really impacting even when it is disabled at elaboration time.

@cathales Currently, it is not compatible with a superscalar architecture because it stalls the pipeline in the decode stage until the implicit fetch is completed. Compatibility can be addressed in a future update alongside the ZCMP extension.

core/decoder.sv Outdated Show resolved Hide resolved
core/include/config_pkg.sv Outdated Show resolved Hide resolved
Comment on lines 93 to 94
resolved_branch_o.is_mispredict = 1'b1; // miss prediction for ZCMT
resolved_branch_o.cf_type = ariane_pkg::Jump;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could send a JumpR instead of Jump. It would update BTB in the frontend to correctly predict the destination next time.
We could then calculate is_mispredict accordingly, to use this prediction when we later encounter the same cm.jt or cm.jalt.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cathales
I have incorporated the changes

end
TABLE_JUMP: begin
if (req_port_i.data_rvalid) begin
jump_addr = $unsigned($signed(req_port_i.data_rdata) - $signed(pc_i));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because of this limitation, this implementation is not compliant with Zcmt, is it?

Copy link
Contributor

❌ failed run, report available here.

8 similar comments
Copy link
Contributor

❌ failed run, report available here.

Copy link
Contributor

❌ failed run, report available here.

Copy link
Contributor

❌ failed run, report available here.

Copy link
Contributor

❌ failed run, report available here.

Copy link
Contributor

❌ failed run, report available here.

Copy link
Contributor

❌ failed run, report available here.

Copy link
Contributor

❌ failed run, report available here.

Copy link
Contributor

❌ failed run, report available here.

core/zcmt_decoder.sv Outdated Show resolved Hide resolved
Copy link
Contributor

❌ failed run, report available here.

1 similar comment
Copy link
Contributor

❌ failed run, report available here.

Copy link
Contributor

❌ failed run, report available here.

@JeanRochCoulon
Copy link
Contributor

Hello @farhan-108
Happy New Year !
I reviewed the PR. Can you update the expected gate count value, rebase and status about comments?
Cheers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants