[WIP] Vector interface for a Zve32x/Zvl4096b/baremetal vector implementation #3317
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Related issue:
Type of change: feature
Impact: API modification
Development Phase: proposal
Intro
This is mainly for sequencer/vector, a Zve32x(can be adapted to Zve64x by changing params) vector implementation mainly for long vectors (e.g. Zvl1024b/Zvl4096b). It only supports baremetal machine (i.e. vector coprocessor can not handle virtual memory) by design. That vector implementation has not been freezed, and this WIP interface will evolve with that implementation.
That implementation is not a full-fledged implementation of the V extension so the interface might be incomplete.
That implementation is willing to be upstreamed to chipsalliance once it is freezed and verified.
Design
The new vector interface is similar to the RoCC interface. The main difference is the CSR interface and the Memory interface. Before talking about the interface, the design of the vector implementation itself should be talked about.
Instruction issue:
The main idea of the vector coprocessor is to use long vector and chaining enough of them so that the ramp up/down time can be saved.
So in this interface, vector instructions are issued aggressively (there is a instruction queue in the vector part) and as only some commands will write back to GPR (e.g. vmv.x.s), the main pipeline will only wait for these cases (scoreboard and the OoO writeback mechanism).
Memory:
Historically, there are sketches for a vector coprocessor, which connects to the L1D (so it can handle virtual memory).
This design directly connects to SBus for a higher bandwidth as L2 is banked. Coherence is maintained by the L2 inclusive cache.
To keep the memory order, a counter will track whether there are memory commands in the coprocessor. If so, the main pipeline will wait until the coprocessor finishes its execution.
CSR:
As instructions are issued aggresively, the state of the vector CSRs should be queued along with the instruction.
When there are
csrr
instructions, if previous vector instructions could affect the state of the csr, the main pipeline will wait for them. This is maintained by another counter.Further details will be documented once that RTL is freezed.
Release Notes
Add vector interface for a Zve32x/Zvl4096b/baremetal vector implementation