[WIP] Vector interface for a Zve32x/Zvl4096b/baremetal vector implementation #3317

ZenithalHourlyRate · 2023-03-23T00:19:35Z

Related issue:

Type of change: feature

Impact: API modification

Development Phase: proposal

Intro
This is mainly for sequencer/vector, a Zve32x(can be adapted to Zve64x by changing params) vector implementation mainly for long vectors (e.g. Zvl1024b/Zvl4096b). It only supports baremetal machine (i.e. vector coprocessor can not handle virtual memory) by design. That vector implementation has not been freezed, and this WIP interface will evolve with that implementation.

That implementation is not a full-fledged implementation of the V extension so the interface might be incomplete.

That implementation is willing to be upstreamed to chipsalliance once it is freezed and verified.

Design
The new vector interface is similar to the RoCC interface. The main difference is the CSR interface and the Memory interface. Before talking about the interface, the design of the vector implementation itself should be talked about.

Instruction issue:

The main idea of the vector coprocessor is to use long vector and chaining enough of them so that the ramp up/down time can be saved.

So in this interface, vector instructions are issued aggressively (there is a instruction queue in the vector part) and as only some commands will write back to GPR (e.g. vmv.x.s), the main pipeline will only wait for these cases (scoreboard and the OoO writeback mechanism).

Memory:

Historically, there are sketches for a vector coprocessor, which connects to the L1D (so it can handle virtual memory).

This design directly connects to SBus for a higher bandwidth as L2 is banked. Coherence is maintained by the L2 inclusive cache.

To keep the memory order, a counter will track whether there are memory commands in the coprocessor. If so, the main pipeline will wait until the coprocessor finishes its execution.

CSR:

As instructions are issued aggresively, the state of the vector CSRs should be queued along with the instruction.

When there are csrr instructions, if previous vector instructions could affect the state of the csr, the main pipeline will wait for them. This is maintained by another counter.

Further details will be documented once that RTL is freezed.

Release Notes

Add vector interface for a Zve32x/Zvl4096b/baremetal vector implementation

instead of rocc

ZenithalHourlyRate force-pushed the vector-coproc-clean branch from ea0c82d to 4dcb235 Compare March 31, 2023 17:13

ZenithalHourlyRate added 10 commits April 26, 2023 18:36

Add Non-FP Vector Instructions for RoCC dispatching

76cbc62

IDecode: scalar part for vector instructions

d63298c

Stage

7d2ce5b

Fix MVV decode

387a944

Directly hook rocc and core

acfb673

Stall main pipeline for mem/csr

662ac9d

Decode vector sub class

0554556

Implement vset

3bf9629

Vector coprocessor interface

d9be8ed

instead of rocc

Make vLen a CDE param

2a4526d

ZenithalHourlyRate force-pushed the vector-coproc-clean branch from 4dcb235 to 2a4526d Compare April 26, 2023 18:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Vector interface for a Zve32x/Zvl4096b/baremetal vector implementation #3317

[WIP] Vector interface for a Zve32x/Zvl4096b/baremetal vector implementation #3317

ZenithalHourlyRate commented Mar 23, 2023

[WIP] Vector interface for a Zve32x/Zvl4096b/baremetal vector implementation #3317

Are you sure you want to change the base?

[WIP] Vector interface for a Zve32x/Zvl4096b/baremetal vector implementation #3317

Conversation

ZenithalHourlyRate commented Mar 23, 2023