You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What I'd like to have is a separate BranchPredictionUnit (BPU) with it's own pipeline that streams requests into the fetch unit. The requests essentially contain an address, bytes to fetch, and the termination type (taken branch, return, none etc.). Requests could be one per cacheline (i.e. if the predictor predicts across a cacheline it'll separate them into multiple requests). The fetch unit queues these requests, and flow control on this interface is managed with credits.
Once the fetch unit has a request queued, and enough credits downstream, it'll forward the address to the instruction cache.
Hit responses from the cache go back to the fetch unit where they're broken into instructions and paired with the prediction meta data given in the original BPU request. Some BTB miss detection happens here, misses or misfetches redirect the BPU.
I'm not completely sure how to handle mapping the instructions from the trace onto the data read, handling alignment, unexpected change of flow, malformed traces etc. I've tried to illustrate an idea on how it could be done, but I'm open to other suggestions.
Some further details on the ICache
What I propose is that we change the existing MemoryAccessInfo class to be generic enough so that it can be used as a memory transaction.
Fetch creates the MemoryAccessInfo request, sets the address and size. It's sent to the ICache, which has some simple pipeline.
The cache look up is preformed, and the original request is updated with hit/miss info, and returned back to Fetch.
Misses allocate a MSHR entry, which then propagates the request to the L2.
Once the L2 responds, the MemoryAccessInfo is sent back to fetch through the same ICacheResp port, triggering Fetch to continue
There's a few technical details that need to be hashed out still:
Handling misaligned requests (page/cacheline/block crossing i.e. how to split memoryaccessinfo into multiple downstream)
Miss on full MSHR
MSHR hits
Only filling on demand requests
I've identified a few changes needed to the current codebase before the ICache work can starts.
Add address/size to MemoryAccessInfo
Update L2Cache to use MemoryAccessInfo class instead of InstPtr
This might also help the prefetch work
The text was updated successfully, but these errors were encountered:
…hes and MSS (#144)
As part of #143 use MemoryAccessInfo as the standard transaction type
for making memory requests outside of the core, instead of instructions.
This paves way for adding an instruction cache, as well as prefetching
and more complicated memory subsystems where requests do not always
correspond to a particularly instruction.
Changes are minor, mostly just renaming.
Co-authored-by: Daniel Bone <daniel.bone@imgtec.com>
On the instruction cache side. I'd like to add functionality for:
bank interleaving and set interleaving
This feature should enable the fetch unit to send read requests that cross either a subword boundary, or a cacheline boundary.
My plan for implementing this is to introduce a new parameter in fetch that controls the number of read ports on the instruction cache, then looping the fetch instruction code to make multiple requests per cycle.
Within the instruction cache, new parameters will be added that enable bank interleaving, and set interleaving. Look up requests will be picked from the request queue such that multiple requests can be handled per clock if they target different banks/sets.
The fetch unit won't know about the interleaving configuration, and will just wait for the responses to come back.
Add prefetching port
No official plans on how it would be used, but I imagine it'll be connected to the branch predictor.
Add kill port
Simple port that enables the fetch unit to squash scheduled replays on pending misses.
As the branch prediction API is coming to a close, I'd like to propose adding a decoupled frontend with an L1 instruction cache.
I did cover some of this with @arupc on Monday.
What I'd like to have is a separate BranchPredictionUnit (BPU) with it's own pipeline that streams requests into the fetch unit. The requests essentially contain an address, bytes to fetch, and the termination type (taken branch, return, none etc.). Requests could be one per cacheline (i.e. if the predictor predicts across a cacheline it'll separate them into multiple requests). The fetch unit queues these requests, and flow control on this interface is managed with credits.
Once the fetch unit has a request queued, and enough credits downstream, it'll forward the address to the instruction cache.
Hit responses from the cache go back to the fetch unit where they're broken into instructions and paired with the prediction meta data given in the original BPU request. Some BTB miss detection happens here, misses or misfetches redirect the BPU.
I'm not completely sure how to handle mapping the instructions from the trace onto the data read, handling alignment, unexpected change of flow, malformed traces etc. I've tried to illustrate an idea on how it could be done, but I'm open to other suggestions.
Some further details on the ICache
What I propose is that we change the existing MemoryAccessInfo class to be generic enough so that it can be used as a memory transaction.
There's a few technical details that need to be hashed out still:
I've identified a few changes needed to the current codebase before the ICache work can starts.
This might also help the prefetch work
The text was updated successfully, but these errors were encountered: