Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Vector interface for a Zve32x/Zvl4096b/baremetal vector implementation #3317

Draft
wants to merge 10 commits into
base: dev
Choose a base branch
from

Conversation

ZenithalHourlyRate
Copy link
Contributor

Related issue:

Type of change: feature

Impact: API modification

Development Phase: proposal

Intro
This is mainly for sequencer/vector, a Zve32x(can be adapted to Zve64x by changing params) vector implementation mainly for long vectors (e.g. Zvl1024b/Zvl4096b). It only supports baremetal machine (i.e. vector coprocessor can not handle virtual memory) by design. That vector implementation has not been freezed, and this WIP interface will evolve with that implementation.

That implementation is not a full-fledged implementation of the V extension so the interface might be incomplete.

That implementation is willing to be upstreamed to chipsalliance once it is freezed and verified.

Design
The new vector interface is similar to the RoCC interface. The main difference is the CSR interface and the Memory interface. Before talking about the interface, the design of the vector implementation itself should be talked about.

Instruction issue:

The main idea of the vector coprocessor is to use long vector and chaining enough of them so that the ramp up/down time can be saved.

So in this interface, vector instructions are issued aggressively (there is a instruction queue in the vector part) and as only some commands will write back to GPR (e.g. vmv.x.s), the main pipeline will only wait for these cases (scoreboard and the OoO writeback mechanism).

Memory:

Historically, there are sketches for a vector coprocessor, which connects to the L1D (so it can handle virtual memory).

This design directly connects to SBus for a higher bandwidth as L2 is banked. Coherence is maintained by the L2 inclusive cache.

To keep the memory order, a counter will track whether there are memory commands in the coprocessor. If so, the main pipeline will wait until the coprocessor finishes its execution.

CSR:

As instructions are issued aggresively, the state of the vector CSRs should be queued along with the instruction.

When there are csrr instructions, if previous vector instructions could affect the state of the csr, the main pipeline will wait for them. This is maintained by another counter.

Further details will be documented once that RTL is freezed.

Release Notes

Add vector interface for a Zve32x/Zvl4096b/baremetal vector implementation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant