diff --git a/README.md b/README.md index f2af48d..fce4ad8 100644 --- a/README.md +++ b/README.md @@ -31,12 +31,10 @@ The HPDcache is an open-source High-Performance, Multi-requester, Out-of-Order L ## Documentation -The HPDcache specification document can be found in the *docs/hpdcache_spec_document* folder. -It is written in LaTeX. -You cand find pre-compiled PDF documents in *docs/hpdcache_spec_document/release*. +The HPDcache User Guide document can be found in the *docs* folder. +It is written in reStructuredText format. -If you need to recompile the specification document, a dedicated *Makefile* is in the specification folder. -This *Makefile* needs the *latexmk* command-line tool (included in most common LaTeX distributions) and the *inkscape* tool to convert SVG images into PDF. +If you need to compile the User Guide document, a dedicated *Makefile* is in the *docs* folder. ## Licensing diff --git a/docs/LICENSE b/docs/LICENSE new file mode 100644 index 0000000..279d97a --- /dev/null +++ b/docs/LICENSE @@ -0,0 +1,97 @@ +Solderpad Hardware License v2.1 + +This license operates as a wraparound license to the Apache License +Version 2.0 (the “Apache License”) and incorporates the terms and +conditions of the Apache License (which can be found here: +http://apache.org/licenses/LICENSE-2.0), with the following additions and +modifications. It must be read in conjunction with the Apache License. +Section 1 below modifies definitions and terminology in the Apache +License and Section 2 below replaces Section 2 of the Apache License. +The Appendix replaces the Appendix in the Apache License. You may, at +your option, choose to treat any Work released under this license as +released under the Apache License (thus ignoring all sections written +below entirely). + +1. Terminology in the Apache License is supplemented or modified as +follows: + +“Authorship”: any reference to ‘authorship’ shall be taken to read +“authorship or design”. + +“Copyright owner”: any reference to ‘copyright owner’ shall be taken to +read “Rights owner”. + +“Copyright statement”: the reference to ‘copyright statement’ shall be +taken to read ‘copyright or other statement pertaining to Rights’. + +The following new definition shall be added to the Definitions section of +the Apache License: + +“Rights” means copyright and any similar right including design right +(whether registered or unregistered), rights in semiconductor +topographies (mask works) and database rights (but excluding Patents and +Trademarks). + +The following definitions shall replace the corresponding definitions in +the Apache License: + +“License” shall mean this Solderpad Hardware License version 2.1, being +the terms and conditions for use, manufacture, instantiation, adaptation, +reproduction, and distribution as defined by Sections 1 through 9 of this +document. + +“Licensor” shall mean the owner of the Rights or entity authorized by the +owner of the Rights that is granting the License. + +“Derivative Works” shall mean any work, whether in Source or Object form, +that is based on (or derived from) the Work and for which the editorial +revisions, annotations, elaborations, or other modifications represent, +as a whole, an original work of authorship or design. For the purposes of +this License, Derivative Works shall not include works that remain +reversibly separable from, or merely link (or bind by name) or physically +connect to or interoperate with the Work and Derivative Works thereof. + +“Object” form shall mean any form resulting from mechanical +transformation or translation of a Source form or the application of a +Source form to physical material, including but not limited to compiled +object code, generated documentation, the instantiation of a hardware +design or physical object or material and conversions to other media +types, including intermediate forms such as bytecodes, FPGA bitstreams, +moulds, artwork and semiconductor topographies (mask works). + +“Source” form shall mean the preferred form for making modifications, +including but not limited to source code, net lists, board layouts, CAD +files, documentation source, and configuration files. + +“Work” shall mean the work of authorship or design, whether in Source or +Object form, made available under the License, as indicated by a notice +relating to Rights that is included in or attached to the work (an +example is provided in the Appendix below). + +2. Grant of License. Subject to the terms and conditions of this License, +each Contributor hereby grants to You a perpetual, worldwide, +non-exclusive, no-charge, royalty-free, irrevocable license under the +Rights to reproduce, prepare Derivative Works of, make, adapt, repair, +publicly display, publicly perform, sublicense, and distribute the Work +and such Derivative Works in Source or Object form and do anything in +relation to the Work as if the Rights did not exist. + +APPENDIX + +Copyright 2023 CEA* +*Commissariat a l'Energie Atomique et aux Energies Alternatives (CEA) + +SPDX-License-Identifier: Apache-2.0 WITH SHL-2.1 + +Licensed under the Solderpad Hardware License v 2.1 (the “License”); you +may not use this file except in compliance with the License, or, at your +option, the Apache License version 2.0. You may obtain a copy of the +License at + +https://solderpad.org/licenses/SHL-2.1/ + +Unless required by applicable law or agreed to in writing, any work +distributed under the License is distributed on an “AS IS” BASIS, WITHOUT +WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the +License for the specific language governing permissions and limitations +under the License. diff --git a/docs/Makefile b/docs/Makefile new file mode 100644 index 0000000..d0c3cbf --- /dev/null +++ b/docs/Makefile @@ -0,0 +1,20 @@ +# Minimal makefile for Sphinx documentation +# + +# You can set these variables from the command line, and also +# from the environment for the first two. +SPHINXOPTS ?= +SPHINXBUILD ?= sphinx-build +SOURCEDIR = source +BUILDDIR = build + +# Put it first so that "make" without argument is like "make help". +help: + @$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) + +.PHONY: help Makefile + +# Catch-all target: route all unknown targets to Sphinx using the new +# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS). +%: Makefile + @$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) diff --git a/docs/README.md b/docs/README.md new file mode 100644 index 0000000..82d581f --- /dev/null +++ b/docs/README.md @@ -0,0 +1,49 @@ +# Build instructions + +The documents in this directory are written in reStructuredText and compiled to HTML using Sphinx. For more information, check https://www.sphinx-doc.org/en/master/usage/restructuredtext/index.html. + +## Prerequisites + +This section outlines the necessary steps to build the document on Linux (tested on Debian-based distributions). + +Sphinx is based on Python and requires at least version 3.8. Additionally, `make` is required and can be installed through build-essential. + +```bash +sudo apt update +sudo apt install python3 +sudo apt install build-essential +``` + +Please verify your Python version using + +```bash +python3 --version +``` + +Sphinx requires certain packages to build these documents. These are listed in `requirements.txt`. They can be installed using + +```bash +pip install -r requirements.txt +``` + +## Building the documents + +Build is invoked via the `make` command. Typically, an HTML should be build. + +```bash +make html +``` + +A secondary build target is pdf. To build the pdf, additional prerequisites need to be met. To install `pdflatex`, run + +```bash +sudo apt-get install texlive-latex-base +``` + +A pdf document can be built using the command + +```bash +make latexpdf +``` + +Simply type `make` to view other available targets. diff --git a/docs/hpdcache_spec_document/.gitignore b/docs/old/.gitignore similarity index 100% rename from docs/hpdcache_spec_document/.gitignore rename to docs/old/.gitignore diff --git a/docs/hpdcache_spec_document/Makefile b/docs/old/Makefile similarity index 100% rename from docs/hpdcache_spec_document/Makefile rename to docs/old/Makefile diff --git a/docs/hpdcache_spec_document/latexmkrc b/docs/old/latexmkrc similarity index 100% rename from docs/hpdcache_spec_document/latexmkrc rename to docs/old/latexmkrc diff --git a/docs/hpdcache_spec_document/release/hpdcache_spec-1.0.0-draft.pdf b/docs/old/release/hpdcache_spec-1.0.0-draft.pdf similarity index 100% rename from docs/hpdcache_spec_document/release/hpdcache_spec-1.0.0-draft.pdf rename to docs/old/release/hpdcache_spec-1.0.0-draft.pdf diff --git a/docs/hpdcache_spec_document/source/hpdcache_spec.bib b/docs/old/source/hpdcache_spec.bib similarity index 100% rename from docs/hpdcache_spec_document/source/hpdcache_spec.bib rename to docs/old/source/hpdcache_spec.bib diff --git a/docs/hpdcache_spec_document/source/hpdcache_spec.tex b/docs/old/source/hpdcache_spec.tex similarity index 100% rename from docs/hpdcache_spec_document/source/hpdcache_spec.tex rename to docs/old/source/hpdcache_spec.tex diff --git a/docs/hpdcache_spec_document/source/hpdcache_spec_changelog.tex b/docs/old/source/hpdcache_spec_changelog.tex similarity index 100% rename from docs/hpdcache_spec_document/source/hpdcache_spec_changelog.tex rename to docs/old/source/hpdcache_spec_changelog.tex diff --git a/docs/hpdcache_spec_document/source/hpdcache_spec_preamble.tex b/docs/old/source/hpdcache_spec_preamble.tex similarity index 100% rename from docs/hpdcache_spec_document/source/hpdcache_spec_preamble.tex rename to docs/old/source/hpdcache_spec_preamble.tex diff --git a/docs/hpdcache_spec_document/source/images/exported/wave_back_to_back.svg b/docs/old/source/images/exported/wave_back_to_back.svg similarity index 100% rename from docs/hpdcache_spec_document/source/images/exported/wave_back_to_back.svg rename to docs/old/source/images/exported/wave_back_to_back.svg diff --git a/docs/hpdcache_spec_document/source/images/exported/wave_ready_before_valid.svg b/docs/old/source/images/exported/wave_ready_before_valid.svg similarity index 100% rename from docs/hpdcache_spec_document/source/images/exported/wave_ready_before_valid.svg rename to docs/old/source/images/exported/wave_ready_before_valid.svg diff --git a/docs/hpdcache_spec_document/source/images/exported/wave_ready_when_valid.svg b/docs/old/source/images/exported/wave_ready_when_valid.svg similarity index 100% rename from docs/hpdcache_spec_document/source/images/exported/wave_ready_when_valid.svg rename to docs/old/source/images/exported/wave_ready_when_valid.svg diff --git a/docs/hpdcache_spec_document/source/images/exported/wave_valid_before_ready.svg b/docs/old/source/images/exported/wave_valid_before_ready.svg similarity index 100% rename from docs/hpdcache_spec_document/source/images/exported/wave_valid_before_ready.svg rename to docs/old/source/images/exported/wave_valid_before_ready.svg diff --git a/docs/hpdcache_spec_document/source/images/hpdcache_core.svg b/docs/old/source/images/hpdcache_core.svg similarity index 100% rename from docs/hpdcache_spec_document/source/images/hpdcache_core.svg rename to docs/old/source/images/hpdcache_core.svg diff --git a/docs/hpdcache_spec_document/source/images/hpdcache_data_ram_organization.svg b/docs/old/source/images/hpdcache_data_ram_organization.svg similarity index 100% rename from docs/hpdcache_spec_document/source/images/hpdcache_data_ram_organization.svg rename to docs/old/source/images/hpdcache_data_ram_organization.svg diff --git a/docs/hpdcache_spec_document/source/images/hpdcache_request_address_data_alignment.svg b/docs/old/source/images/hpdcache_request_address_data_alignment.svg similarity index 100% rename from docs/hpdcache_spec_document/source/images/hpdcache_request_address_data_alignment.svg rename to docs/old/source/images/hpdcache_request_address_data_alignment.svg diff --git a/docs/hpdcache_spec_document/source/images/hpdcache_request_arbiter.svg b/docs/old/source/images/hpdcache_request_arbiter.svg similarity index 100% rename from docs/hpdcache_spec_document/source/images/hpdcache_request_arbiter.svg rename to docs/old/source/images/hpdcache_request_arbiter.svg diff --git a/docs/hpdcache_spec_document/source/images/wave_back_to_back.json b/docs/old/source/images/wave_back_to_back.json similarity index 100% rename from docs/hpdcache_spec_document/source/images/wave_back_to_back.json rename to docs/old/source/images/wave_back_to_back.json diff --git a/docs/hpdcache_spec_document/source/images/wave_ready_before_valid.json b/docs/old/source/images/wave_ready_before_valid.json similarity index 100% rename from docs/hpdcache_spec_document/source/images/wave_ready_before_valid.json rename to docs/old/source/images/wave_ready_before_valid.json diff --git a/docs/hpdcache_spec_document/source/images/wave_ready_when_valid.json b/docs/old/source/images/wave_ready_when_valid.json similarity index 100% rename from docs/hpdcache_spec_document/source/images/wave_ready_when_valid.json rename to docs/old/source/images/wave_ready_when_valid.json diff --git a/docs/hpdcache_spec_document/source/images/wave_valid_before_ready.json b/docs/old/source/images/wave_valid_before_ready.json similarity index 100% rename from docs/hpdcache_spec_document/source/images/wave_valid_before_ready.json rename to docs/old/source/images/wave_valid_before_ready.json diff --git a/docs/hpdcache_spec_document/supplement/download_wavedrom.sh b/docs/old/supplement/download_wavedrom.sh similarity index 100% rename from docs/hpdcache_spec_document/supplement/download_wavedrom.sh rename to docs/old/supplement/download_wavedrom.sh diff --git a/docs/hpdcache_spec_document/version b/docs/old/version similarity index 100% rename from docs/hpdcache_spec_document/version rename to docs/old/version diff --git a/docs/requirements.txt b/docs/requirements.txt new file mode 100644 index 0000000..cbf1e36 --- /dev/null +++ b/docs/requirements.txt @@ -0,0 +1,2 @@ +sphinx +sphinx-rtd-theme diff --git a/docs/source/_static/theme_overrides.css b/docs/source/_static/theme_overrides.css new file mode 100644 index 0000000..7c514af --- /dev/null +++ b/docs/source/_static/theme_overrides.css @@ -0,0 +1,9 @@ +/* override table width restrictions */ +.wy-table-responsive table td, .wy-table-responsive table th { + white-space: normal; +} +.wy-table-responsive { + margin-bottom: 24px; + max-width: 100%; + overflow: visible; +} diff --git a/docs/source/amo.rst b/docs/source/amo.rst new file mode 100644 index 0000000..21b3ec3 --- /dev/null +++ b/docs/source/amo.rst @@ -0,0 +1,202 @@ +.. + Copyright 2024 CEA* + *Commissariat a l'Energie Atomique et aux Energies Alternatives (CEA) + + SPDX-License-Identifier: Apache-2.0 WITH SHL-2.1 + + Licensed under the Solderpad Hardware License v 2.1 (the “License”); you + may not use this file except in compliance with the License, or, at your + option, the Apache License version 2.0. You may obtain a copy of the + License at + + https://solderpad.org/licenses/SHL-2.1/ + + Unless required by applicable law or agreed to in writing, any work + distributed under the License is distributed on an “AS IS” BASIS, WITHOUT + WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the + License for the specific language governing permissions and limitations + under the License. + + Authors : Cesar Fuguet + Description : HPDcache Atomic Memory Operations (AMOs) + +.. _sec_amo: + +Atomic Memory Operations (AMOs) +=============================== + +Background +---------- + +The AMOs are special load/store accesses that implements a read-modify-write +semantic. A single instruction is able to read a data from the memory, perform +an arithmetical/logical operation on that data, and store the result. All this +is performed as a single operation (no other operation can come in between the +read-modify-write operations). + +These operations are meant for synchronization in multi-core environments. To +enable this synchronization, AMOs need to be performed on the PoS +(Point-of-Serialization), point where all accesses from the different cores +converge. This is usually a shared cache memory (when multiple levels of cache +are implemented) or the external RAM controllers. Thus, the HPDcache needs to +forward these operations to the PoS through the NoC interface. + +Supported AMOs +-------------- + +On the interface from requesters, the supported AMOs are the ones listed in +:numref:`Table %s `. The supported AMOs are the ones defined +in the atomic (A) extension of the RISC-V ISA specification [RISCVUP2019]_. + +Implementation +-------------- + +If :math:`\scriptsize\mathsf{CONF\_HPDCACHE\_SUPPORT\_AMO}` is set to 0, the +HPDcache does not support AMOs from requesters. +See :ref:`sec_uncacheable_handler` for more details. + +When a requester performs an AMO operation, the HPDcache needs to forward it to +the PoS. It is only at the PoS that AMOs from different caches converge and thus +where the correct result may be computed in case of multiple, simultaneous, +address-colliding AMOs. In the case a cache has a replica of the target address, +as the modification is performed at the PoS, that replica becomes obsolete +(cache obsolescence problem). This issue shall be solved through a hardware (or +software) cache coherency protocol. + +To provide a consistent view of the data, the HPDcache updates the local replica +based on the result from the memory. The procedure is as follows: Forward the +AMO to the PoS, and wait for the response with the old data. If the target +address is replicated in the HPDcache, the HPDcache computes the new value +locally based on the data from the AMO request and the old data from the memory +(not the data of the local replica). Then it updates the replica. This allows +the core to have a consistent view with regards to its operations (single-thread +consistency). However, a cache coherency protocol (hardware or software) is +still required to ensure coherency in multi-core systems. + +The HPDcache handle AMOs as non-allocating operations. This is, AMOs never fetch +a replica of the target cacheline from the memory to the cache. If the target +cacheline IS NOT replicated in the cache, the AMO modifies ONLY the memory. If +the target cacheline IS replicated in the cache, the AMO modifies BOTH the +memory and the cache. + + +AMO ordering +------------ + +As specified in the RISC-V ISA specification [RISCVUP2019]_, the base RISC-V ISA +has a relaxed memory model. To provide additional ordering constraints, AMOs +(including LR/SC) specify two bits, *aq* and *rl*, for *acquire* and *release* +semantics. + +The HPDcache ignores *aq* and *rl* bits. It considers that they are always set. +Hence, HPDcache handles AMOs as sequentially consistent memory operations. The +HPDcache waits for all pending read and write operations to complete before +serving the AMO request. + +This behavior implies that when the HPDcache forwards an AMO to the NoC, it will +be the only pending request from the HPDcache. In addition, no new requests from +the requesters are served until the AMO is completed. + +LR/SC support +------------- + +LR and SC are part of the Atomic (A) extension of the RISC-V ISA specification +[RISCVUP2019]_. These instructions allow *"complex atomic operations on a single +memory word or double-word"*. + +The HPDcache fully supports all the instructions of the A extension of the +RISC-V ISA, including LR and SC operations. + +In the specification of these instructions in the RISC-V ISA document, some +details are dependent to the implementation. Namely, the size of the reservation +set and the return code of a SC failure. + +LR/SC reservation set +~~~~~~~~~~~~~~~~~~~~~ + +When a requester executes a LR operation, it "reserves" a set of bytes in +memory. This set contains at least the bytes solicited in the request but may +contain more. RISC-V ISA defines two sizes for LR operations: 4 bytes or 8 +bytes. **The HPDcache reserves 8-bytes (double-word) containing the addressed +memory location regardless of whether the LR size is 4 or 8 bytes**. The start +address of the reservation set is a 8-bytes aligned address. + +When the LR size is 8 bytes, the address is also aligned to 8 bytes. In this +case, the reservation set matches exactly the address interval defined in the +request. When the LR size is 4 bytes, there are two possibilities: + +-# the target address is not aligned to 8 bytes. The start address of the +reservation set contains additional 4 bytes before the target address + +-# the target address is aligned to 8 bytes. The reservation set starts at the +target address but contains additional 4 bytes after the requested ones. + +In summary, in case of LR operation, the reservation set address range is +computed as follows: + +.. math:: + + \small\mathbf{reservation\_set =} + \begin{cases} + \mathsf{(\lfloor{}HPDCACHE\_REQ\_ADDR / 8\rfloor{} \times 8)} & + (\text{start address}) \\ + \mathsf{(\lfloor{}HPDCACHE\_REQ\_ADDR / 8\rfloor{} \times 8) + 8} & + (\text{end address}) \\ + \end{cases} + +**When a requester executes a SC operation, the HPDcache forwards the operation +to the memory ONLY IF the bytes addressed by the SC are part of an active +reservation set**. If the SC accesses a smaller number of bytes that those in +the active reservation set but within that reservation set, the SC is still +forwarded to the memory. + +After a SC operation, the active reservation set, if any, is invalidated. This +is regardless whether the SC operation succeeds or not. + +.. admonition:: Caution + :class: caution + + The HPDcache keeps a unique active reservation set. If multiple requesters + perform LR operations, the unique active reservation set is the one specified + by the last LR operation. + + +The HPDcache also invalidates an active reservation set when there is an +address-colliding STORE operation. If a STORE access from any requester writes +one or more bytes within the active reservation set, the latter is invalidated. + + +SC failure response code +~~~~~~~~~~~~~~~~~~~~~~~~ + +The RISC-V ISA [RISCVUP2019]_ specifies that when a SC operation succeeds, the +core shall write zero into the destination register of the operation. Otherwise, +in case of SC failure, the core shall write a non-zero value into the +destination register. + +The HPDcache returns the status of an SC operation into the ``core_rsp_o.rdata`` +signal of the response interface to requesters. The following table specifies +the values returned by the HPDcache into the ``core_rsp_o.rdata`` signal in case +of SC operation. + +.. list-table:: + :widths: 30 30 + :header-rows: 1 + :align: center + + * - **Case** + - **Return value** + * - SC Success + - :math:`\small\mathsf{0x0000\_0000}` + * - SC Failure + - :math:`\small\mathsf{0x0000\_0001}` + +Depending on the specified size in the request (``core_req_i.size``), the +returned value is extended with zeros on the most significant bits. This is, if +the SC request size is 8 bytes, and the SC is a failure, then the returned value +is :math:`\small\mathsf{0x0000\_0000\_0000\_0001}`. + +In addition, if the :math:`\small\mathsf{CONF\_HPDCACHE\_REQ\_DATA\_WIDTH}` +width is wider than the size of the SC request, the return value is replicated +:math:`\small\mathsf{CONF\_HPDCACHE\_REQ\_WORDS}` times. + diff --git a/docs/source/architecture.rst b/docs/source/architecture.rst new file mode 100644 index 0000000..0db47b0 --- /dev/null +++ b/docs/source/architecture.rst @@ -0,0 +1,1295 @@ +.. + Copyright 2024 CEA* + *Commissariat a l'Energie Atomique et aux Energies Alternatives (CEA) + + SPDX-License-Identifier: Apache-2.0 WITH SHL-2.1 + + Licensed under the Solderpad Hardware License v 2.1 (the “License”); you + may not use this file except in compliance with the License, or, at your + option, the Apache License version 2.0. You may obtain a copy of the + License at + + https://solderpad.org/licenses/SHL-2.1/ + + Unless required by applicable law or agreed to in writing, any work + distributed under the License is distributed on an “AS IS” BASIS, WITHOUT + WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the + License for the specific language governing permissions and limitations + under the License. + + Authors : Cesar Fuguet + Description : HPDcache Architecture + +Architecture +============ + +:numref:`Figure %s ` depicts a global view of +the HPDcache (core partition that does not include the request arbitration). On +the upper part of the cache there is the interface from/to requesters. On the +bottom part there is the interface from/to the memory. + +.. _fig_architecture_hpdcache_core: + +.. figure:: images/hpdcache_core.* + :alt: HPDcache core + :align: center + :width: 100% + + HPDcache core + +Cache Controller +---------------- + +The cache controller is responsible for decoding and issuing the requests to the +appropriate handler. The cache controller implements a 3-stage pipeline. This +pipeline is capable of accepting one request per cycle. However, there are some +scenarios where the pipeline may either stall or put a request on hold in a side +buffer called Replay Table (RTAB). + +The first stage (stage 0) of the pipeline arbitrates between requests from the +miss handler (refill), RTAB, and requesters; the second stage (stage 1) responds +to loads (in case of hit) and to stores; the third stage (stage 2) is only used +by loads in case of miss. In this last stage, the cache triggers a read miss +transaction to the memory and allocates a new entry in the Miss Status Holding +Register (MSHR) to track the progress of the miss transaction. + +A request on stage 0 can either be consumed on that cycle (forwarded to the +stage 1) or stalled. A request on stage 1 or stage 2 always progresses. In stage +1 the request is either acknowledged (load hit or write acknowledgement), +forwarded to stage 2 (load miss), or put into the RTAB. In stage 2, the request +quits the pipeline and it is written in the MSHR. + +The arbiter in stage 0 uses a fixed-priority policy: refills have the higher +priority, followed by the RTAB, and finally core request (lower priority). + +Pipeline Stalls in Stage 0 +'''''''''''''''''''''''''' + +Stalls in stage 0 are necessary in some specific scenarios, that are listed +below. When there is a stall in stage 0, a new request from a requester cannot +be accepted, this is, the corresponding :math:`\mathsf{ready}` signal is kept +low (set to 0). Requests in the other stages (1 and 2) are processed normally +(even in case of a stall in stage 0). + +.. list-table:: Events that Stall the Pipeline + :widths: 5 50 40 + :header-rows: 1 + :align: center + + * - Event + - Description + - Stall Latency (Clock Cycles) + * - **1** + - The RTAB is full + - It depends on when an entry of the RTAB is freed + * - **2** + - A CMO invalidation or fence operation is being processed by the + corresponding handler + - It depends on the latency of the operation + * - **3** + - An uncacheable or atomic operation is being processed by the + corresponding handler + - It depends on the latency of the operation + * - **4** + - There is a load miss in stage 1 + - One cycle + * - **5** + - There is a store in stage 1 and the request in stage 0 is a load + (structural hazard on access to the internal cache data memory) + - One cycle + +.. _sec_onhold: + +On-Hold Requests +'''''''''''''''' + +In some scenarios, a request that has been accepted in the pipeline can be +later put on-hold by the cache controller. The cache controller puts a request +on-hold by removing it from the cache pipeline and writing it into the Replay +Table (RTAB). When a request is put on-hold, it is re-executed when all the +blocking conditions have been removed. The blocking conditions putting a +request on-hold are the following: + +.. _tab_onhold: + +.. list-table:: Conditions Putting a Request On-hold + :widths: 3 37 60 + :header-rows: 1 + + * - # + - Condition + - Description + * - **1** + - **Cacheable LOAD or PREFETCH, and there is a hit on a pending miss (hit + on the MSHR)** + - When there is a read miss on a given cacheline for which there is a + pending read miss, then the more recent one needs to wait for the + previous one to be served. This allows the latest one to read the data + from the cache after the refill operation completes. More importantly, + this frees the pipeline to accept the corresponding refill and prevent a + deadlock. + * - **2** + - **Cacheable LOAD or PREFETCH, there is a miss on the cache, and there is + a hit (cacheline granularity) on an opened, pending or sent entry of the + WBUF** + - When there is a read miss on an address, the cache controller needs to + read from the memory the missing cacheline. As the NoC implements + different physical channels for read and write requests, there is a race + condition between the read miss and a pending write operation. If the + read miss arrives first to the memory, it would read the old data (which + violates :ref:`sec_mcrs`). This blocking condition causes that the LOAD + or PREFETCH will have a delay penalty of up to two transaction delays: + one for the write to complete, then one for the read. + * - **3** + - **Cacheable STORE, there is a miss on the cache, and there is a hit on a + pending miss (hit on the MSHR)** + - When writing, as the NoC implements different physical channels for read + and write requests, there is a race condition between the STORE and the + pending read miss. If the STORE arrives first to the memory, the earlier + read miss would read the new data (which violates :ref:`sec_mcrs`). + * - **4** + - **Cacheable LOAD/PREFETCH/STORE, and there is a hit on an entry of the + RTAB** + - Accesses to the same cacheline SHALL be processed in order (to respect + :ref:`sec_mcrs`). In case of a hit with a valid entry in the RTAB, the + new request shall wait for previous requests on the same cacheline to + finish. + * - **5** + - **Cacheable LOAD or PREFETCH, there is a miss on the cache, and the MSHR + has no available slots** + - When there is a read miss on an address, the cache controller needs to + allocate a new entry in the MSHR. If there is no available entry, the + read request needs to wait for an entry in the MSHR to be freed. This + frees the pipeline to accept the corresponding refill and prevent a + deadlock. + * - **6** + - **Cacheable LOAD or PREFETCH, there is a miss on the cache, and the miss + handler FSM cannot send the read miss request** + - When there is a read miss on an address, the cache controller needs to + read from memory the missing cacheline. The read miss request is sent by + the miss handler FSM, but if there is congestion in the NoC, this read + request cannot be issued. This frees the pipeline to prevent a potential + deadlock. + +The cache controller checks all these conditions in the second stage (stage 1) +of the pipeline. If one of the conditions is met, the cache controller puts the +request into the RTAB and holds it there until its blocking condition is +resolved. At that moment, the cache can replay the request from the RTAB. + +The RTAB can store multiple on-hold requests. The idea is to improve the +throughput of the cache by reducing the number of cases where there is a head +of line blocking at the client interface. As mentioned in +:numref:`Table %s `, this also prevents deadlocks. To always allow +the cache controller to retire a request from the pipeline, the cache controller +does not accept new requests if the RTAB is full. + +Requests from the RTAB may be executed in an order that is different from +the order in which they were accepted (if they target different cachelines). +The requests, that target the same cacheline, are replayed by the RTAB in the +order they were accepted. + +Requests within the RTAB that have their dependencies resolved may be replayed. +These have higher priority than the new requests from requesters. + + +.. _sec_mcrs: + +Memory Consistency Rules (MCRs) +''''''''''''''''''''''''''''''' + +The cache controller processes requests following a set of Memory Consistency +Rules (MCRs). These rules allow the requesters to have a predictable behavior. + +The set MCRs respected by the cache controller are those defined by the RISC-V +Weak Memory Ordering (RVWMO) memory consistency model. [RISCVUP2019]_ specifies +this model. The following statement summarizes these rules: **if one memory +access (read or write), A, precedes another memory access (read or write), B, +and they access overlapping addresses, then they MUST be executed in program +order (A then B)**. It can be deduced from this statement, that non-overlapping +accesses can be executed in any order. + +The cache controller also needs to respect the progress axiom: **no memory +operation may be preceded by an infinite number of memory operations**. That +is, all memory operations need to be processed at some point in time. They +cannot wait indefinitely. + + +Cache Directory and Data +------------------------ + +Replacement Policy +'''''''''''''''''' + +The HPDcache supports the following two replacement policies: Pseudo Random or +Pseudo Least Recently Used (PLRU) replacement policy. The user selects the +actual policy at synthesis-time through the +:math:`\scriptsize\mathsf{CONF\_HPDCACHE\_VICTIM\_SEL}` +configuration parameter (:numref:`Table %s `). + +The cache uses the selected policy to select the victim way where a new +cacheline is written. At the arrival of the response for a read miss request, +the miss handler starts the refilling operations. During refill, the miss +handler applies the replacement policy to select the way where it writes the +new cacheline. + + +Pseudo Least Recently Used (PLRU) +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +This replacement policy requires one state bit per cacheline in the cache. This +bit is named Least Recently Used (LRU) state. All LRU bits are set to 0 on +reset. They are then updated at each read, store, atomic operation from the +requesters. They are also updated by refill operations. + +The following code snippet shows the declaration of the array containing the LRU +bits. As explained before, there are as many bits as cachelines in the cache. +Therefore the LRU bits are organized as a two-dimensional array of +:math:`\mathsf{CONF\_HPDCACHE\_SETS}` and +:math:`\mathsf{CONF\_HPDCACHE\_WAYS}` bits. + +.. code:: c + + // Two-dimension array containing LRU state bits + bool lru[CONF_HPDCACHE_SETS][CONF_HPDCACHE_WAYS]; + + +The following code snippet illustrates the algorithm (``update_plru`` function) +that the cache controller uses to update LRU bits. This function is used by +read, write or atomic requests from requesters, and also by the refill operation +from the miss handler. In the case of requests from requesters, the cache +controller first check for a hit in any way of the set designated by the request +address. If there is a hit, the cache controller applies the ``update_plru`` +algorithm on the corresponding set and way. In the case of a refill operation, +the miss handler first select a victim way, then applies the ``update_plru`` +algorithm. + +.. code:: c + + void update_plru(int set, int way) + { + // set the LRU bit of the target set and way + lru[set][way] = true; + + // check if all LRU bits of the target "set" contain 1 + for (int w = 0; w < HPDCACHE_WAYS; w++) { + // If there is at least one 0, the update is done + if (!lru[set][w]) return; + } + + // If all LRU bits are set to 1, reset to 0 the LRU bits of all the ways + // except the one being accessed + for (int w = 0; w < HPDCACHE_WAYS; w++) { + if (w != way) lru[set][w] = false; + } + } + + +The following code snippet illustrates the algorithm (``select_victim_way`` +function) that the miss handler uses to select a victim way during a refill +operation. In summary, the victim way is either the first way (starting from +way 0) where the valid bit is 0, or the first way where the LRU bit is unset. + +.. code:: c + + int select_victim_way(int set) + { + // Return the first way (of the target set) whose valid bit is unset + for (int w = 0; w < HPDCACHE_WAYS; w++) { + if (!valid[set][w]) { + return w; + } + } + + // If all ways are valid, return the first way (of the target set) whose + // LRU bit is unset + for (int w = 0; w < HPDCACHE_WAYS; w++) { + if (!lru[set][w]) { + return w; + } + } + + // This return statement should not be reached as there is always, at + // least, one LRU bit unset (refer to update_plru) + return -1; + } + + +Pseudo Random +~~~~~~~~~~~~~ + +This replacement policy requires only one 8-bit Linear Feedback Shift Register +(LFSR). + +Each time there is a refill operation, the miss handler selects either a free +way (valid bit is set to 0), or a way designated by the value in the LFSR. +Each time the miss handler uses the pseudo random value, it performs a shift of +the LFSR. + +This pseudo random policy has a lower area footprint than the PLRU policy +because it only uses a 8-bit LFSR. The PLRU policy requires one bit per +cacheline in the cache. However, some applications may exhibit lower +performance with the pseudo random replacement policy as locality is not +considered while selecting the victim. + + +RAM Organization +'''''''''''''''' + +The HPDcache cache uses SRAM macros for the directory and data parts of the +cache. These RAM macros are synchronous, read/write, single-port RAMs. + +The organization of the RAMs, for the directory and the data, targets the +following: + +#. **High memory bandwidth to/from the requesters** + + The organization allows to read 1, 2, 4, 8, 16, 32 or 64 bytes per cycle. The + maximum number of bytes per cycle is a configuration parameter of the cache. + Read latency is one cycle. + +#. **Low energy-consumption** + + To limit the energy-consumption, the RAMs are organized in a way that the + cache enables only a limited number of RAM macros. This number depends on the + number of requested bytes, and it also depends on the target technology. + Depending on the target technology, the RAM macros have different trade-offs + between width, depth and timing (performance). + +#. **Small RAM footprint** + + To limit the footprint of RAMs, the selected organization looks to implement + a small number of RAMs macros. The macros are selected in a way that they are + as deep and as wide as possible. The selected ratios (depth and width) depend + on the target technology node. + +.. _sec_cache_ram_organization: + +RAM Organization Parameters +''''''''''''''''''''''''''' + +The HPDcache provides a set of parameters to tune the organization of the SRAM +macros. These parameters allow to adapt the HPDcache to the Performance, Power +and Area (PPA) requirements of the system. + +The cache directory and data are both implemented using SRAM macros. + +Cache Directory Parameters +~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The cache directory contains the metadata that allows to identify the cachelines +which are present in the cache. + +Each entry contains the following information: + +.. list-table:: + :widths: 5 15 80 + :header-rows: 1 + + * - Field + - Description + - Width (in bits) + * - V + - Valid + - :math:`\mathsf{1}` + * - T + - Cache Tag + - :math:`\mathsf{HPDCACHE\_NLINE\_WIDTH - HPDCACHE\_SET\_WIDTH}` + +The depth of the macros is: + + :math:`\mathsf{CONF\_HPDCACHE\_SETS}`. + +The width (in bits) of the macros is: + + :math:`\mathsf{1 + T}` bits. + +Finally, the total number of SRAM macros for the cache directory is: + + :math:`\text{SRAM macros} = \mathsf{CONF\_HPDCACHE\_WAYS}` + + +.. admonition:: Possible Improvement + :class: note + + Allow to split sets in different RAMs as for the cache data + (:math:`\mathsf{CONF\_HPDCACHE\_DATA\_SETS\_PER\_RAM}`). + +.. admonition:: Possible Improvement + :class: note + + Allow to put ways side-by-side as for the cache data + (:math:`\mathsf{CONF\_HPDCACHE\_DATA\_WAYS\_PER\_RAM\_WORD}`). + +Cache Data Parameters +~~~~~~~~~~~~~~~~~~~~~ + +The depth of the macros is: + + :math:`\mathsf{\lceil\frac{CONF\_HPDCACHE\_CL\_WORDS}{CONF\_HPDCACHE\_ACCESS\_WORDS}\rceil{}\times{}CONF\_HPDCACHE\_DATA\_SETS\_PER\_RAM}` + +Multiple ways, for the same set, can be put side-by-side in the same SRAM word: + + :math:`\mathsf{CONF\_HPDCACHE\_DATA\_WAYS\_PER\_RAM\_WORD}` + +The width (in bits) of the macros is: + +.. math:: + + \mathsf{CONF\_HPDCACHE\_DATA\_WAYS\_PER\_RAM\_WORD}\\ + \mathsf{\times{}CONF\_HPDCACHE\_WORD\_WIDTH} + +Finally, the total number of SRAM macros is: + +.. math:: + + \mathsf{W} &= \mathsf{CONF\_HPDCACHE\_DATA\_WAYS}\\ + \mathsf{WR} &= \mathsf{CONF\_HPDCACHE\_DATA\_WAYS\_PER\_RAM\_WORD}\\ + \mathsf{S} &= \mathsf{CONF\_HPDCACHE\_DATA\_SETS}\\ + \mathsf{SR} &= \mathsf{CONF\_HPDCACHE\_DATA\_SETS\_PER\_RAM}\\ + \mathsf{A} &= \mathsf{CONF\_HPDCACHE\_ACCESS\_WORDS}\\ + \text{SRAM macros} &= \mathsf{A{}\times{}\lceil\frac{W}{WR}\rceil{}\times{}\lceil\frac{S}{SR}\rceil} + +The :math:`\mathsf{CONF\_HPDCACHE\_ACCESS\_WORDS}` defines the maximum number of +words that can be read or written in the same clock cycle. This parameter +affects both the refill latency and the maximum throughput (bytes/cycle) between +the HPDcache and the requesters. + +Here the refill latency is defined as the number of clock cycles that the miss +handler takes to write into the cache the cacheline data once the response of a +read miss request arrives. It does not consider the latency to receive the data +from the memory because this latency is variable and depends on the system. The +following formula determines the refill latency (in clock cycles) in the +HPDcache: + + :math:`\mathsf{max(2, \frac{CONF\_HPDCACHE\_CL\_WORDS}{CONF\_HPDCACHE\_ACCESS\_WORDS})}` + +The following formula determines the maximum throughput (bytes/cycle) between +the HPDcache and the requesters: + + :math:`\mathsf{\frac{CONF\_HPDCACHE\_ACCESS\_WORDS{}\times{}CONF\_HPDCACHE\_WORD\_WIDTH}{8}}` + +.. admonition:: Caution + :class: caution + + The choice of the :math:`\mathsf{CONF\_HPDCACHE\_ACCESS\_WORDS}` parameter is + important. It has an impact on performance because it determines the refill + latency and the request throughput. It also has an impact on the area because + the depth of SRAM macros depends on this parameter. Finally, it has an impact + on the timing (thus performance) because the number of inputs of the data + word selection multiplexor (and therefore the number of logic levels) also + depends on this parameter. + + As a rule of thumb, timing and area improves with smaller values of this + parameter. Latency and throughput improves with bigger values of this + parameter. The designer needs to choose the good tradeoff depending on the + target system. + + +Example cache data/directory RAM organization +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +:numref:`Figure %s ` illustrates a possible +organization of the RAMs. The illustrated organization allows to implement 32 +KB of data cache (128 sets, 4 ways and 64-byte cachelines). The corresponding +parameters of this organization are the following: + +.. list-table:: + :widths: 40 60 + :header-rows: 1 + + * - **Parameter** + - **Value** + * - :math:`\mathsf{CONF\_HPDCACHE\_SETS}` + - 128 + * - :math:`\mathsf{CONF\_HPDCACHE\_WAYS}` + - 4 + * - :math:`\mathsf{CONF\_HPDCACHE\_WORD\_WIDTH}` + - 64 + * - :math:`\mathsf{CONF\_HPDCACHE\_CL\_WORDS}` + - 8 + * - :math:`\mathsf{CONF\_HPDCACHE\_ACCESS\_WORDS}` + - 4 + * - :math:`\mathsf{CONF\_HPDCACHE\_DATA\_SETS\_PER\_RAM}` + - 128 + * - :math:`\mathsf{CONF\_HPDCACHE\_DATA\_WAYS\_PER\_RAM\_WORD}` + - 2 + + +.. _fig_data_ram_organization_example: + +.. figure:: images/hpdcache_data_ram_organization.* + :alt: HPDcache RAM Organization Example + :align: center + + HPDcache RAM Organization Example + +This example organization has the following characteristics: + +.. list-table:: + :widths: 25 75 + :header-rows: 0 + + * - **Refilling Latency** + - Two clock cycles. The cache needs to write two different entries on a + given memory cut + * - **Maximum Request Throughput (bytes/cycle)** + - 32 bytes/cycle. Requesters can read 1, 2, 4, 8, 16 or 32 bytes of a given + cacheline per cycle. + * - **Energy Consumption** + - It is proportional to the number of requested bytes. Accesses requesting + 1 to 8 bytes need to read two memory cuts (one containing ways 0 and 1, + and the other containing ways 2 and 3); accesses requesting 8 to 16 bytes + need to read 4 memory cuts; accesses requesting 24 to 32 bytes need to + access all the cuts at the same time (8 cuts). + + +Miss Handler +------------ + +This block is in charge of processing read miss requests to the memory. It has +three parts: + +#. One in charge of forwarding read miss requests to the memory + +#. One in charge of tracking the status of in-flight read misses + +#. One in charge of selecting a victim cacheline and updating the cache with + the response data from the memory + +Multiple-entry MSHR +''''''''''''''''''' + +The miss handler contains an essential component of the HPDcache: the +set-associative multi-entry MSHR. This components is the one responsible of +tracking the status of in-flight read miss requests to the memory. Each entry +of contains the status for each in-flight read miss request. Therefore, the +number of entries in the MSHR defines the maximum number of in-flight read miss +requests. + +The number of entries in the MSHR depends on two configuration values: +:math:`\mathsf{CONF\_HPDCACHE\_MSHR\_WAYS}` +and +:math:`\mathsf{CONF\_HPDCACHE\_MSHR\_SETS}`. +The number of entries is computed as follows: + +.. math:: \mathsf{CONF\_HPDCACHE\_MSHR\_SETS~\times~CONF\_HPDCACHE\_MSHR\_WAYS} + +The MSHR accepts the following configurations: + +.. list-table:: MSHR Configurations + :widths: 20 20 60 + :header-rows: 1 + + * - MSHR Ways + - MSHR Sets + - Configuration + * - :math:`\mathsf{= 1}` + - :math:`\mathsf{= 1}` + - Single-Entry + * - :math:`\mathsf{> 1}` + - :math:`\mathsf{= 1}` + - Fully-Associative Array + * - :math:`\mathsf{= 1}` + - :math:`\mathsf{> 1}` + - Direct Access Array + * - :math:`\mathsf{> 1}` + - :math:`\mathsf{> 1}` + - Set-Associative Access Array + + +A high number of entries in the MSHR allows to overlap multiple accesses to the +memory and hide its latency. Of course, the more entries there are, the more +silicon area the MSHR takes. Therefore, the system architect must choose the +MSHR parameters depending on a combination of memory latency, memory +throughput, area and performance. The system architecture must also consider +the capability of requesters to issue multiple read transactions. + +An entry in the MSHR contains the following information: + +.. list-table:: + :widths: 5 15 80 + :header-rows: 1 + + * - Field + - Description + - Width (in bits) + * - T + - MSHR Tag + - :math:`\mathsf{HPDCACHE\_NLINE\_WIDTH - log_2(CONF\_HPDCACHE\_MSHR\_SETS)}` + * - R + - Request ID + - :math:`\mathsf{CONF\_HPDCACHE\_REQ\_TRANS\_ID\_WIDTH}` + * - S + - Source ID + - :math:`\mathsf{CONF\_HPDCACHE\_REQ\_SRC\_ID\_WIDTH}` + * - W + - Word Index + - :math:`\mathsf{log_2(CONF\_HPDCACHE\_CL\_WORDS})` + * - N + - Need Response + - 1 + +MSHR Implementation +''''''''''''''''''' + +In order to limit the area cost, the MSHR can be implemented using SRAM +macros. SRAM macros have higher bit density than flip-flops. + +The depth of the macros is: + + :math:`\mathsf{CONF\_HPDCACHE\_MSHR\_SETS\_PER\_RAM}` + +Multiple ways, for the same set, can be put side-by-side in the same SRAM word: + + :math:`\mathsf{CONF\_HPDCACHE\_MSHR\_WAYS\_PER\_RAM\_WORD}` + +The width (in bits) of the macros is: + + :math:`\mathsf{CONF\_HPDCACHE\_MSHR\_WAYS\_PER\_RAM\_WORD{}\times{}(T + R + S + W + 1)}` + +Finally, the total number of SRAM macros is: + +.. math:: + + \mathsf{W} &= \mathsf{CONF\_HPDCACHE\_MSHR\_WAYS}\\ + \mathsf{WR} &= \mathsf{CONF\_HPDCACHE\_MSHR\_WAYS\_PER\_RAM\_WORD}\\ + \mathsf{S} &= \mathsf{CONF\_HPDCACHE\_MSHR\_SETS}\\ + \mathsf{SR} &= \mathsf{CONF\_HPDCACHE\_MSHR\_SETS\_PER\_RAM}\\ + \text{SRAM macros} &= \mathsf{\lceil\frac{W}{WR}\rceil{}\times{}\lceil\frac{S}{SR}\rceil} + +SRAM macros shall be selected depending on the required number of entries, and +the target technology node. + +When the number of entries is low +(e.g. :math:`\mathsf{MSHR\_SETS \times MSHR\_WAYS \le 16}`), +it is generally better to implement the MSHR using flip-flops. In such +configurations, the designer may use a fully-associative configuration to +remove associativity conflicts. + + +MSHR Associativity Conflicts +'''''''''''''''''''''''''''' +The MSHR implements a set-associative organization. In such organization, the +target "set" is designated by some bits of the cacheline address. + +If there are multiple in-flight read miss requests addressing different +cachelines with the same "set", this is named an associativity conflict. When +this happens, the cache will place each read miss request in a different "way" +of the MSHR. However, if there is no available way, the request is put on hold +(case 5 in :numref:`Table %s `). + + +.. _sec_uncacheable_handler: + +Uncacheable Handler +------------------- + +This block is responsible for processing uncacheable load and store requests +(see :ref:`sec_req_cacheability`), as well as atomic requests (regardless of +whether they are cacheable or not). For more information about atomic requests +see :ref:`sec_amo`. + +If :math:`\scriptsize\mathsf{CONF\_HPDCACHE\_SUPPORT\_AMO}` is set to 0, the +HPDcache does not support AMOs from requesters. In this case, the AMO has no +effect and the Uncacheable Handler responds with an error to the corresponding +requester (when ``core_req_i.need_rsp`` is set to 1). If ``core_req_i.need_rsp`` +is set to 0, the Uncacheable Handler ignores the AMO. + +All requests handled by this block produce a request to the memory. These +requests to the memory are issued through the CMI interfaces. Uncacheable read +requests are forwarded to the memory through the CMI read. Uncacheable write +requests or atomic requests are forwarded through the CMI write. + + +.. _sec_cmo_handler: + +Cache Management Operation (CMO) Handler +---------------------------------------- + +This block is responsible for handling CMOs. CMOs are special requests from +requesters that address the cache itself, and not the memory or a peripheral. +CMOs allow to invalidate or prefetch designated cachelines, or produce explicit +memory read and write fences. + +The complete list of supported CMOs is detailed in :ref:`sec_cmo`. + +If :math:`\scriptsize\mathsf{CONF\_HPDCACHE\_SUPPORT\_CMO}` is set to 0, the +HPDcache does not support CMOs from requesters. In this case, the CMO has no +effect and the CMO Handler responds with an error to the corresponding requester +(when ``core_req_i.need_rsp`` is set to 1). If ``core_req_i.need_rsp`` is set to +0, the CMO Handler ignores the CMO. + + +.. _sec_rtab: + +Replay Table (RTAB) +------------------- + +The RTAB is implemented as an array of linked lists. It is a fully-associative +multi-entry buffer where each valid entry belongs to a linked list. It is +implemented in flip-flops. Each linked lists contain requests that target the +same cacheline. There can be multiple linked lists but each shall target a +different cacheline. The head of each linked list contains the oldest request +while the tail contains the newest request. The requests are processed from the +head to the tail to respect the rules explained in section :ref:`sec_mcrs`. + +Regarding the pop operation (extracting a ready request from the replay table), +it is possible that once the request is replayed, some of the resources it needs +are again busy. Therefore, the request needs to be put on-hold again. In this +case, the request needs to keep its position as head of the linked list. This is +to preserve the program order. This is the reason why the pop operation is +implemented as a two-step operation: pop then commit, or pop then rollback. The +commit operation allows to actually retire the request, while the rollback +allows to undo the pop. + +An entry of the RTAB has the following structure (LL means Linked List): + +.. list-table:: + :widths: 10 40 50 + :header-rows: 1 + + * - **Field** + - **Description** + - **Width (in bits)** + * - Request + - Contains the on-hold request from the core (including data) + - :math:`\mathsf{\approx{}200}` + * - LL tail + - Indicates if the entry is the tail of a linked list + - :math:`\mathsf{1}` + * - LL head + - Indicates if the entry is the head of a linked list + - :math:`\mathsf{1}` + * - LL next + - Designates the next (older) request in the linked list + - :math:`\mathsf{log_2(CONF\_HPDCACHE\_RTAB\_ENTRIES)}` + * - Deps + - Indicates the kind of dependency that keeps the request on-hold. + - :math:`\mathsf{5}` + * - Valid + - Indicates if the entry is valid (if unset the entry is free). + - :math:`\mathsf{1}` + + +The following table briefly describes the possible dependencies between +memory requests. For each kind of dependency, there is a corresponding +bit in the "deps bits" field of RTAB entries. + +.. list-table:: + :widths: 25 75 + :header-rows: 1 + + * - **Dependency** + - **Description** + * - MSHR Hit + - Read miss and there is an outstanding miss request on the target address + * - MSHR Full + - Read miss and the MSHR has no available way for the requested address + * - MSHR Not Ready + - Read miss and the MSHR is busy and cannot send the miss request + * - WBUF Hit + - Read miss and there is a match with an open, pending, or sent entry in + the write buffer + * - WBUF Not Ready + - Write miss and there is a match with a sent entry in the write buffer or + the write-buffer is full + + +RTAB operations +''''''''''''''' + +The RTAB implements the following operations: + ++--------------------------+----------------------------------+ +| **Operation** | **Description** | ++==========================+==================================+ +| ``rtab_alloc`` | Allocate a new linked list | ++--------------------------+----------------------------------+ +| ``rtab_alloc_and_link`` | Allocate a new entry and link it | +| | to an existing linked list | ++--------------------------+----------------------------------+ +| ``rtab_pop_try`` | Get a ready request from one of | +| | the linked list (wihout actually | +| | removing it from the list) | ++--------------------------+----------------------------------+ +| ``rtab_pop_commit`` | Actually remove a popped request | +| | from the list | ++--------------------------+----------------------------------+ +| ``rtab_pop_rollback`` | Rollback a previously popped | +| | request (with a possible update | +| | of its dependencies) | ++--------------------------+----------------------------------+ +| ``rtab_find_ready`` | Find a ready request among the | +| | heads of valid linked lists | ++--------------------------+----------------------------------+ +| ``rtab_update_deps`` | Update the dependency bits of | +| | valid requests | ++--------------------------+----------------------------------+ +| ``rtab_find_empty`` | Find an empty request | ++--------------------------+----------------------------------+ +| ``rtab_is_full`` | Is the RTAB full ? | ++--------------------------+----------------------------------+ +| ``rtab_is_empty`` | Is the RTAB empty ? | ++--------------------------+----------------------------------+ +| ``rtab_match_tail`` | Find a 'tail' entry matching a | +| | given nline | ++--------------------------+----------------------------------+ +| ``rtab_match`` | Find an entry matching a given | +| | nline | ++--------------------------+----------------------------------+ + +The C-like functions below briefly describe the algorithms implemented in the +RTL code of the RTAB. + +The following function is called by the cache controller when it detects that +one or more of the conditions in :ref:`tab_onhold` is met, and the request shall +be put on hold. + +.. code:: c + + int rtab_alloc(req_t r, deps_t d) + { + int index = rtab_find_empty(); + rtab[index] = { + valid : 1, + deps : d, + ll_head : 1, + ll_tail : 1, + ll_next : 0, + request : r + }; + return index; + } + +The following function is called by the cache controller when it detects that +the request in the pipeline targets the same cacheline that one or more on-hold +requests. In this case, the request is linked in the tail of the corresponding list in the RTAB. + +.. code:: c + + int rtab_alloc_and_link(req_t r) + { + int index = rtab_find_empty(); + int match = rtab_match_tail(get_nline(r)); + + // replace the tail of the linked list + rtab[match].ll_tail = 0; + + // make the next pointer of the old tail to point to the new entry + rtab[match].ll_next = index; + + // add the new request as the tail of the linked list + rtab[index] = { + valid : 1, + deps : 0, + ll_head : 0, + ll_tail : 1, + ll_next : 0, + request : r + }; + + return index; + } + +The following function is called by the cache controller to select a ready +request (dependencies have been resolved) from the RTAB. + +.. code:: c + + req_t rtab_pop_try() + { + // These are global states (preserved between function calls) + static int pop_state = HEAD; + static int last = 0; + static int next = 0; + + int index; + + // Brief description of the following code: + // The rtab_pop_try function tries to retire all the requests of a given + // linked list. Then it passes to another one. + switch (pop_state) { + case HEAD: + // Find a list whose head request is ready + // (using a round-robin policy) + index = rtab_find_ready(last); + if (index == -1) return -1; + + // Update the pointer to the last linked list served + last = index; + + // If the list have more than one request, the next time this function + // is called, serve the next request of the list + if (!rtab[index].ll_tail) { + next = rtab[index].ll_next; + pop_state = NEXT; + } + + // Temporarily unset the head bit. This is to prevent the + // request to be rescheduled. + rtab[index].ll_head = 0; + break; + + case NEXT: + index = next; + + // If the list have more than one request, the next time this function + // is called, serve the next request of the list + if (!rtab[next].ll_tail) { + next = rtab[index].ll_next; + pop_state = NEXT; + } + // It it is the last element of the list, return to the HEAD state + else { + pop_state = HEAD; + } + + // Temporarily unset the head bit. This is to prevent the + // request to be rescheduled. + rtab[index].ll_head = 0; + } + + // Pop the selected request + return rtab[index].req; + } + +The following function is called by the cache controller when the replayed +request is retired (processed). + +.. code:: c + + void rtab_pop_commit(int index) + { + rtab[index].valid = 0; + } + +The following function is called by the cache controller when the replayed +request cannot be retired because, again, one or more of the conditions in +:ref:`tab_onhold` is met. In this case, the request is restored into the RTAB +with updated dependency bits. The restored request keeps the same position it +its corresponding linked list to respect the program execution order. + +.. code:: c + + void rtab_pop_rollback(int index, bitvector deps) + { + rtab[index].ll_head = 1; + rtab[index].deps = deps; + } + + +The following function is used to find a linked list whose head request can be +replayed (dependencies have been resolved). + +.. code:: c + + int rtab_find_ready(int last) + { + // choose a ready entry using a round-robin policy + int i = (last + 1) % RTAB_NENTRIES; + for (;;) { + // ready entry found + if (rtab[i].valid && rtab[i].ll_head && (rtab[i].deps == 0)) + return i; + + // there is no ready entry + if (i == last) + return -1; + + i = (i + 1) % RTAB_NENTRIES; + } + } + + +The following function is called by the miss hander and the write buffer on the +completion of any pending transaction. It allows to update the dependency bits +of any matching request (with the same cacheline address) in the RTAB. + +.. code:: c + + void rtab_update_deps(nline_t nline, bitvector deps) + { + int index = rtab_match(nline); + if (index != -1) { + rtab[index].deps = deps; + } + } + + +The following utility functions are used by functions above. + +.. code:: c + + int rtab_find_empty() + { + for (int i = 0; i < RTAB_NENTRIES; i++) + if (!rtab[i].valid) + return i; + + return -1; + } + +.. code:: c + + bool rtab_is_full() + { + return (rtab_find_empty() == -1); + } + + +.. code:: c + + int rtab_is_empty() + { + for (int i = 0; i < RTAB_NENTRIES; i++) + if (rtab[i].valid) + return 0; + + return 1; + } + + +.. code:: c + + int rtab_match_tail(nline_t nline) + { + for (int i = 0; i < RTAB_NENTRIES; i++) + if (rtab[i].valid && get_nline(rtab[i].req) == nline && rtab[i].ll_tail) + return i; + + return -1; + } + + +.. code:: c + + int rtab_match(nline_t nline) + { + for (int i = 0; i < RTAB_NENTRIES; i++) + if (rtab[i].valid && get_nline(rtab[i].req) == nline) + return i; + + return -1; + } + + + +Policy for taking new requests in the data cache +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The cache has three possible sources of requests: + +- The core (new request from requesters) + +- The RTAB (on-hold requests) + +- The miss handler (refill requests) + +The cache controller implements a fixed priority policy between these sources. +It accepts requests in the following order of priority: + +#. Refill request (highest priority) + +#. On-hold request + +#. New request (lowest priority) + + +Write-buffer +------------ + +This cache implements a write-through policy. In this policy, the write accesses +from requesters are systematically transferred to the memory, regardless of +whether the write access hits or misses in the HPDcache. + +To decouple write acknowledgements from the memory to the HPDcache, and the +write acknowledgements from the HPDcache to the requester, the HPDcache +implements a write-buffer. The goal is to increase the performance: the +requester does not wait the acknowledgement from the memory, which may suffer +from a very high latency. In addition, the write-buffer implements coalescing of +write data to improve the bandwidth utilization of data channels in the NoC. + +The write-buffer implements two different parts: directory and data. The +directory enables tracking of active writes. The data buffers are used to +coalesce writes from the requesters. + +Write-Buffer Organization +''''''''''''''''''''''''' + +The write-buffer implements two flip-flop arrays: one for the directory and +another for the data. + + +Write-Buffer Directory +~~~~~~~~~~~~~~~~~~~~~~ + +The write-buffer directory allows to track pending write transactions. + +A given entry in the directory of the write-buffer may be in four +different states: + +.. list-table:: + :widths: 15 85 + :header-rows: 1 + + * - **State** + - **Description** + * - **FREE** + - The entry is available + * - **OPEN** + - The entry is valid and contains pending write data. The entry accepts + new writes (coalescing) + * - **PEND** + - The entry is waiting to be sent to the memory. In this state, the entry + continues to accept new writes (coalescing). + * - **SENT** + - The entry was forwarded to the memory, and is waiting for the + acknowledgement + + +Each entry contains the following additional information: + +.. list-table:: + :widths: 5 15 80 + :header-rows: 1 + + * - Field + - Description + - Width (in bits) + * - S + - State + - 2 + * - T + - Write-Buffer Tag + - :math:`\mathsf{HPDCACHE\_PA\_WIDTH - HPDCACHE\_WBUF\_OFFSET\_WIDTH}` + * - C + - Live counter + - :math:`\mathsf{HPDCACHE\_WBUF\_TIMECNT\_WIDTH}` + * - P + - Pointer to the associated data buffer + - :math:`\mathsf{log_2(HPDCACHE\_WBUF\_DATA\_ENTRIES)}` + + +The number of entries in the directory array is: + + :math:`\mathsf{CONF\_HPDCACHE\_WBUF\_DIR\_ENTRIES}` + +The width (in bits) of data entries is: + +Write-Buffer Data +~~~~~~~~~~~~~~~~~~~~~~ + +The number of entries (depth) of the data array is: + + :math:`\mathsf{CONF\_HPDCACHE\_WBUF\_DATA\_ENTRIES}` + +The width (in bits) of data entries is: + + :math:`\mathsf{CONF\_HPDCACHE\_WBUF\_WORDS{}\times{}HPDCACHE\_WORD\_WIDTH}` + +Data buffers may be as wide or wider than the data interface of requesters: + + :math:`\mathsf{CONF\_HPDCACHE\_WBUF\_WORDS \ge CONF\_HPDCACHE\_REQ\_WORDS}` + +Designer may choose data buffers to be wider than requesters' data interface to +improve the NoC bandwidth utilization. There is however a constraint: data +buffers' width cannot be wider than the NoC interface. Therefore: + +.. math:: + + \mathsf{CONF\_HPDCACHE\_WBUF\_WORDS{}\times{}HPDCACHE\_WORD\_WIDTH} \le\\ + \mathsf{CONF\_HPDCACHE\_MEM\_DATA\_WIDTH} + + + +Memory Write Consistency Model +'''''''''''''''''''''''''''''' + +The HPDcache complies with the RVWMO memory consistency model. Regarding writes, +in this consistency model, there are two important properties: + +#. The order in which write accesses on different addresses are forwarded to + memory MAY differ from the order they arrived from the requester (program + order); + +#. Writes onto the same address, MUST be visible in order. If there is a data + written by a write A on address @x followed by an another write B on the same + address, the data of A cannot be visible after the processing of B. + +The second property allows write coalescing if the hardware ensures that the +last write persists. + +The write-buffer exploits the first property. Multiple “in-flight” writes are +supported due to the multiple directory and data entries. These write +transactions can be forwarded to the memory in an order different from the +program order. + +To comply with the second property, the write-buffer does not accept a write +access when there is an address conflict with a **SENT** entry. In that case, +the write access is put on-hold as described in :ref:`sec_onhold`. + +The system may choose to relax the constraint of putting a write on-hold in case +of an address conflict with a **SENT** entry. This can be relaxed when the NoC +guaranties in-order delivery. The runtime configuration bit +``cfig_wbuf.S`` (see :ref:`sec_csr`) shall be set to 0 to relax this +dependency. + + +Functional Description +'''''''''''''''''''''' + +When an entry of the write-buffer directory is in the **OPEN** or **PEND** +states, there is an allocated data buffer, and it contains data that has not yet +been sent to the memory. When an entry of the write-buffer directory is in the +**SENT** state, the corresponding data was transferred to the memory, and the +corresponding data buffer was freed (and can be reused for another write). A +given entry in the write-buffer directory goes from **FREE** to **OPEN** state +when a new write is accepted, and cannot be coalesced with another **OPEN** or +**PEND** entry (i.e. not in the same address range). + +A directory entry passes from **OPEN** to **PEND** after a given number of clock +cycles. This number of clock cycles depends on different runtime configuration +values. Each directory entry contains a life-time counter. This counter starts +at 0 when a new write is accepted (**FREE** :math:`\rightarrow` **OPEN**), and +incremented each cycle while in **OPEN**. When the counter reaches +``cfig_wbuf.T`` (see :ref:`sec_csr`), the write-buffer directory +entry goes to **PEND**. Another runtime configurable bit, +``cfig_wbuf.R`` (see :ref:`sec_csr`), defines the behavior of an entry when a +new write is coalesced into an **OPEN** entry. If this last configuration bit is +set, the life-time counter is reset to 0 when a new write is coalesced. +Otherwise, the counter continues to increment normally. + +The life-time of a given write-buffer directory entry is longer than the +life-time of a data entry. A given directory entry is freed +(**SENT** :math:`\rightarrow` **FREE**) when the write acknowledgement is +received from the memory. The number of cycles to get an acknowledgement from +the memory may be significant and it is system-dependent. Thus, to improve +utilization of data buffers, the number of entries in the directory is +generally greater than the number of data buffers. There is a trade-off +between area and performance when choosing the depth of the data buffer. The +area cost of data buffers is the most critical cost in the write-buffer. The +synthesis-time parameters +:math:`\mathsf{CONF\_HPDCACHE\_WBUF\_DIR\_ENTRIES}` and +:math:`\mathsf{CONF\_HPDCACHE\_WBUF\_DATA\_ENTRIES}` define the number of +entries in the write-buffer directory and write-buffer data, respectively. + +When the ``cfig_wbuf.I`` (see :ref:`sec_csr`) is set, the write buffer does not +perform any write coalescing. This means that the entry passes from **FREE** to +**PEND** (bypassing the **OPEN** state). While an entry is in the **PEND** +state, and ``cfig_wbuf.I`` is set, that entry does not accept any new writes. It +only waits for the data to be sent. The ``cfig_wbuf.T`` is ignored by the write +buffer when ``cfig_wbuf.I`` is set. + +Memory Fences +''''''''''''' + +In multi-core systems, or more generally, in systems with multiple DMA-capable +devices, when synchronization is needed, it is necessary to implement memory +fences from the software. In the case of RISC-V, there is specific instructions +for this (i.e. fence). + +Fence instructions shall be forwarded to the cache to ensure ordering of writes. +The fence will force the write-buffer to send all pending writes before +accepting new ones. This cache implements two ways of signalling a fence: +sending a specific CMO instruction from the core (described later on +:ref:`sec_cmo`), or by asserting ``wbuf_flush_i`` pin (during one cycle). diff --git a/docs/source/cmo.rst b/docs/source/cmo.rst new file mode 100644 index 0000000..abaed9b --- /dev/null +++ b/docs/source/cmo.rst @@ -0,0 +1,343 @@ +.. + Copyright 2024 CEA* + *Commissariat a l'Energie Atomique et aux Energies Alternatives (CEA) + + SPDX-License-Identifier: Apache-2.0 WITH SHL-2.1 + + Licensed under the Solderpad Hardware License v 2.1 (the “License”); you + may not use this file except in compliance with the License, or, at your + option, the Apache License version 2.0. You may obtain a copy of the + License at + + https://solderpad.org/licenses/SHL-2.1/ + + Unless required by applicable law or agreed to in writing, any work + distributed under the License is distributed on an “AS IS” BASIS, WITHOUT + WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the + License for the specific language governing permissions and limitations + under the License. + + Authors : Cesar Fuguet + Description : HPDcache Cache Management Operations (CMOs) + +.. _sec_cmo: + +Cache Management Operations (CMOs) +================================== + +The HPDcache is able of performing the following Cache Management Operations +(CMOs): + +- Memory write fence + +- Invalidate a cacheline given its address + +- Invalidate all cachelines + +- Prefetch the cacheline indicated given its physical address + +Any of the clients of the HPDCACHE can trigger one of this operation anytime by +using specific opcodes in their request (see +:numref:`Table %s `). + +.. _tab_cmo_optypes: + +.. list-table:: CMO Operation Types + :widths: 25 15 60 + :align: center + :header-rows: 1 + + * - **Mnemonic** + - **Encoding** + - **Description** + * - :math:`\small\mathsf{HPDCACHE\_CMO\_FENCE}` + - :math:`\small\mathsf{0b000}` + - Memory Write Fence + * - :math:`\small\mathsf{HPDCACHE\_CMO\_INVAL\_NLINE}` + - :math:`\small\mathsf{0b010}` + - Invalidate a Cacheline given its Address + * - :math:`\small\mathsf{HPDCACHE\_CMO\_INVAL\_ALL}` + - :math:`\small\mathsf{0b100}` + - Invalidate All Cachelines + * - :math:`\small\mathsf{HPDCACHE\_CMO\_PREFETCH}` + - :math:`\small\mathsf{0b101}` + - Prefetch a Cacheline given its Address + + +The ``core_req_i.op`` must be set to ``HPDCACHE_REQ_CMO`` +(see :numref:`Table %s `). The CMO subtype +(:numref:`Table %s `) is transferred into the +``core_req_i.size`` signal of the request. + +If :math:`\scriptsize\mathsf{CONF\_HPDCACHE\_SUPPORT\_CMO}` is set to 0, the +HPDcache does not support CMOs from requesters. +See :ref:`sec_cmo_handler` for more details. + +The following sections describe in detail each of the CMO operations, +and how the requests shall be encoded to trigger each of them. + + +Memory Write Fence +------------------ + +To make sure that the HPDcache accepts new requests only when all previous +writes are sent and acknowledged from the memory, a requester can issue a fence +operation. This is useful to ensure memory consistency in multi-core systems. It +may also be necessary to ensure such consistency between the core and other +peripherals with DMA capability. + +The default consistency model of RISC-V is a Weak Memory Ordering (WMO) model +(RVWMO). In this model, the system is able to reorder memory transactions with +respect to the program order. There are however some constraints detailed in +[RISCVUP2019]_ and briefly described in :ref:`sec_mcrs`. + +To issue a memory fence, the requester shall build the request as follows: + +.. list-table:: CMO Memory Write Fence Request Formatting + :widths: 30 50 + :align: center + :header-rows: 1 + + * - **Signal** + - **Value** + * - ``core_req_o.addr_offset`` + - *Don't Care* + * - ``core_req_o.op`` + - :math:`\small\mathsf{HPDCACHE\_REQ\_CMO}` + * - ``core_req_o.wdata`` + - *Don't Care* + * - ``core_req_o.be`` + - *Don't Care* + * - ``core_req_o.size`` + - :math:`\small\mathsf{HPDCACHE\_CMO\_FENCE}` + * - ``core_req_o.sid`` + - Corresponding source ID of the requester + * - ``core_req_o.tid`` + - Transaction ID of the request + * - ``core_req_o.need_rsp`` + - Indicates if the requester needs an acknowledgement when the operation is + completed. + * - ``core_req_o.phys_indexed`` + - *Don't Care* + * - ``core_req_o.addr_tag`` + - *Don't Care* + * - ``core_req_o.pma.uncacheable`` + - *Don't Care* + * - ``core_req_o.pma.io`` + - *Don't Care* + * - ``core_req_tag_i`` + - *Don't Care* + * - ``core_req_pma_i.uncacheable`` + - *Don't Care* + * - ``core_req_pma_i.io`` + - *Don't Care* + + +As for any regular request, the request shall follow the **VALID**/**READY** +handshake protocol described in :ref:`sec_ready_valid_handshake`. + +This memory fence operation has the following effects: + +- All open entries in the write buffer (write requests waiting to be sent to the + memory) are immediately closed; + +- No new requests from any requester are acknowledged until all pending write + requests in the cache have been acknowledged on the NoC interface. + +When the memory fence transaction is completed, and the ``core_req_o.need_rsp`` +signal was set to 1, an acknowledgement is sent to the corresponding requester. + + +Invalidate a Cacheline Given its Address +---------------------------------------- + +The program may need to invalidate cachelines to ensure cache coherency by +software. This may be needed in both multi-core systems or systems with +DMA-capable peripherals. + +To invalidate a cacheline given its address, the requester shall build the +request as follows: + +.. list-table:: CMO Cacheline Invalidation given its Address Request Formatting + :widths: 30 50 + :align: center + :header-rows: 1 + + * - **Signal** + - **Value** + * - ``core_req_o.addr_offset`` + - Least significant bits of the address + * - ``core_req_o.op`` + - :math:`\small\mathsf{HPDCACHE\_REQ\_CMO}` + * - ``core_req_o.wdata`` + - *Don't Care* + * - ``core_req_o.be`` + - *Don't Care* + * - ``core_req_o.size`` + - :math:`\small\mathsf{HPDCACHE\_CMO\_INVAL\_NLINE}` + * - ``core_req_o.sid`` + - Corresponding source ID of the requester + * - ``core_req_o.tid`` + - Transaction ID of the request + * - ``core_req_o.need_rsp`` + - Indicates if the requester needs an acknowledgement when the operation is + completed. + * - ``core_req_o.phys_indexed`` + - 1 if physical indexing, 0 if virtual indexing + * - ``core_req_o.addr_tag`` + - Most significant bits of the address if ``core_req_o.phys_indexed = 1``, + *Don't Care* otherwise + * - ``core_req_o.pma.uncacheable`` + - *Don't Care* + * - ``core_req_o.pma.io`` + - *Don't Care* + * - ``core_req_tag_i`` + - Most significant bits of the address if ``core_req_o.phys_indexed = 0``, + *Don't Care* otherwise + * - ``core_req_pma_i.uncacheable`` + - *Don't Care* + * - ``core_req_pma_i.io`` + - *Don't Care* + + +As for any regular request, the request shall follow the **VALID**/**READY** +handshake protocol (see :ref:`sec_ready_valid_handshake`). +This CMO request supports both virtual or physical indexed requests (see +:ref:`sec_vipt`). + +Regarding the latency of this operation, only one cycle is needed to invalidate +the corresponding cacheline. However, if there is a pending read miss on the +target address, the HPDcache waits for the response of the read miss then +invalidates the corresponding cacheline. + +If the target address is not cached, the operation does nothing. + +When the invalidation transaction is completed, and the ``core_req_o.need_rsp`` +signal was set to 1, an acknowledgement is sent to the corresponding requester. + + +Invalidate All Cachelines +------------------------- + +With this operation, all the cachelines in the HPDcache are invalidated. + +The requester shall build the request as follows to perform a complete +invalidation of the HPDcache: + +.. list-table:: CMO All Cachelines Invalidation + :widths: 30 50 + :align: center + :header-rows: 1 + + * - **Signal** + - **Value** + * - ``core_req_o.addr_offset`` + - *Don't Care* + * - ``core_req_o.op`` + - :math:`\small\mathsf{HPDCACHE\_REQ\_CMO}` + * - ``core_req_o.wdata`` + - *Don't Care* + * - ``core_req_o.be`` + - *Don't Care* + * - ``core_req_o.size`` + - :math:`\small\mathsf{HPDCACHE\_CMO\_INVAL\_ALL}` + * - ``core_req_o.sid`` + - Corresponding source ID of the requester + * - ``core_req_o.tid`` + - Transaction ID of the request + * - ``core_req_o.need_rsp`` + - Indicates if the requester needs an acknowledgement when the operation is + completed. + * - ``core_req_o.phys_indexed`` + - *Don't Care* + * - ``core_req_o.addr_tag`` + - *Don't Care* + * - ``core_req_o.pma.uncacheable`` + - *Don't Care* + * - ``core_req_o.pma.io`` + - *Don't Care* + * - ``core_req_tag_i`` + - *Don't Care* + * - ``core_req_pma_i.uncacheable`` + - *Don't Care* + * - ``core_req_pma_i.io`` + - *Don't Care* + + +As for any regular request, the request shall follow the **VALID**/**READY** +handshake protocol (see :ref:`sec_ready_valid_handshake`). + +This operation works as a memory read fence. This is, before handling the +operation, the HPDcache waits for all pending read misses to complete. + +Regarding the latency of this operation, it has two aggregated components: + +- The time to serve all pending reads. + +- One cycle per set implemented in the HPDcache (all ways of a given set are + invalidated simultaneously). + +When the invalidation transaction is completed, and the ``core_req_o.need_rsp`` +signal was set to 1, an acknowledgement is sent to the corresponding requester. + + +Prefetch a Cacheline given its Address +-------------------------------------- + +With this operation, the cacheline corresponding to the indicated address is +prefetched into the HPDcache. + +The requester shall build the request as follows to perform a prefetch: + +.. list-table:: CMO All Cachelines Invalidation + :widths: 30 50 + :align: center + :header-rows: 1 + + * - **Signal** + - **Value** + * - ``core_req_o.addr_offset`` + - Least significant bits of the address + * - ``core_req_o.op`` + - :math:`\small\mathsf{HPDCACHE\_REQ\_CMO}` + * - ``core_req_o.wdata`` + - *Don't Care* + * - ``core_req_o.be`` + - *Don't Care* + * - ``core_req_o.size`` + - :math:`\small\mathsf{HPDCACHE\_CMO\_PREFETCH}` + * - ``core_req_o.sid`` + - Corresponding source ID of the requester + * - ``core_req_o.tid`` + - Transaction ID of the request + * - ``core_req_o.need_rsp`` + - Indicates if the requester needs an acknowledgement when the operation is + completed. + * - ``core_req_o.phys_indexed`` + - 1 if physical indexing, 0 if virtual indexing + * - ``core_req_o.addr_tag`` + - Most significant bits of the address if ``core_req_o.phys_indexed = 1``, + *Don't Care* otherwise + * - ``core_req_o.pma.uncacheable`` + - *Don't Care* + * - ``core_req_o.pma.io`` + - *Don't Care* + * - ``core_req_tag_i`` + - Most significant bits of the address if ``core_req_o.phys_indexed = 0``, + *Don't Care* otherwise + * - ``core_req_pma_i.uncacheable`` + - *Don't Care* + * - ``core_req_pma_i.io`` + - *Don't Care* + + +As for any regular request, the request shall follow the **VALID**/**READY** +handshake protocol (see :ref:`sec_ready_valid_handshake`). This CMO request +supports both virtual or physical indexed requests (see :ref:`sec_vipt`). + +If the requested cacheline is already in the cache this request has no effect. +If the requested cacheline is not present in the cache, the cacheline is fetched +from the memory and replicated into the cache. + +When the prefetch transaction is completed, and the ``core_req_o.need_rsp`` +signal was set to 1, an acknowledgement is sent to the corresponding requester. diff --git a/docs/source/conf.py b/docs/source/conf.py new file mode 100644 index 0000000..b6ba5f9 --- /dev/null +++ b/docs/source/conf.py @@ -0,0 +1,71 @@ +# +# Copyright 2024 CEA* +# *Commissariat a l'Energie Atomique et aux Energies Alternatives (CEA) +# +# SPDX-License-Identifier: Apache-2.0 WITH SHL-2.1 +# +# Licensed under the Solderpad Hardware License v 2.1 (the “License”); you +# may not use this file except in compliance with the License, or, at your +# option, the Apache License version 2.0. You may obtain a copy of the +# License at +# +# https://solderpad.org/licenses/SHL-2.1/ +# +# Unless required by applicable law or agreed to in writing, any work +# distributed under the License is distributed on an “AS IS” BASIS, WITHOUT +# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the +# License for the specific language governing permissions and limitations +# under the License. +# +# Authors : Cesar Fuguet +# Description : Configuration file for the Sphinx documentation builder +# + +# -- Project information ----------------------------------------------------- + +project = 'HPDcache User Guide' +copyright = '2023-present Commissariat a l\'Energie Atomique et aux Energies Alternatives (CEA)' +author = 'César Fuguet' + +# The full version, including alpha/beta/rc tags +release = 'v1.0.1' + + +# -- General configuration --------------------------------------------------- + +# Add any Sphinx extension module names here, as strings. They can be +# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom +# ones. +extensions = [ + 'sphinx.ext.todo', + 'sphinx.ext.ifconfig', + 'sphinx.ext.graphviz', + 'sphinx.ext.autosectionlabel' +] + +# Add any paths that contain templates here, relative to this directory. +templates_path = ['_templates'] + +# List of patterns, relative to source directory, that match files and +# directories to ignore when looking for source files. +# This pattern also affects html_static_path and html_extra_path. +exclude_patterns = [] + +# Number figures, tables and code-blocks automatically +numfig = True + + +# -- Options for HTML output ------------------------------------------------- + +# The theme to use for HTML and HTML Help pages. See the documentation for +# a list of builtin themes. +# +html_theme = 'sphinx_rtd_theme' + +# Add any paths that contain custom static files (such as style sheets) here, +# relative to this directory. They are copied after the builtin static files, +# so a file named "default.css" will overwrite the builtin "default.css". +html_static_path = ['_static'] + +def setup(app): + app.add_css_file("theme_overrides.css") diff --git a/docs/source/csrs.rst b/docs/source/csrs.rst new file mode 100644 index 0000000..12bd4f9 --- /dev/null +++ b/docs/source/csrs.rst @@ -0,0 +1,349 @@ +.. + Copyright 2024 CEA* + *Commissariat a l'Energie Atomique et aux Energies Alternatives (CEA) + + SPDX-License-Identifier: Apache-2.0 WITH SHL-2.1 + + Licensed under the Solderpad Hardware License v 2.1 (the “License”); you + may not use this file except in compliance with the License, or, at your + option, the Apache License version 2.0. You may obtain a copy of the + License at + + https://solderpad.org/licenses/SHL-2.1/ + + Unless required by applicable law or agreed to in writing, any work + distributed under the License is distributed on an “AS IS” BASIS, WITHOUT + WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the + License for the specific language governing permissions and limitations + under the License. + + Authors : Cesar Fuguet + Description : HPDcache Control-Status Registers (CSRs) + +.. _sec_csr: + +Control-Status Registers (CSRs) +=============================== + +Dedicated CSR address space +--------------------------- + +The HPDcache defines a dedicated memory address space for configuring and +checking the internal status. This memory space is shared among all the +requesters connected to the same HPDcache. However, this space is private to +those requesters in a system-wide point of view. This is, this dedicated CSR +address space is not visible to other requesters integrated in the system. + +The dedicated CSR address space is aligned to 4 Kibytes and has this same size. +Current version of the HPDcache uses a very small subset of this address space, +but the aligning to 4 Kibytes, allows easier mapping in the virtual address +space by the OS. The smallest virtual/physical page size defined in the +[RISCVP2019]_ is 4 Kibytes. :numref:`Figure %s ` displays +the layout of the dedicated CSR address space of the HPDcache. + +The :math:`\mathsf{CFIG\_BASE}` address is specified through an input port of +the HPDcache. The name of this input pin is ``cfig_base_i``. It is a multi-bit +signal. The number of bits is :math:`\mathsf{CONF\_HPDCACHE\_PA\_WIDTH}`. As +mentioned above, as this segment must be aligned to 4 KiB, the 12 least +significant bits must be all 0. + +.. _fig_csr_addr_space: + +.. figure:: images/hpdcache_csr_addr_space.* + :alt: HPDcache CSR Dedicated Address Space + :align: center + :figwidth: 100% + :width: 100% + + HPDcache CSR Dedicated Address Space + + +Configuration registers +----------------------- + +:numref:`Table %s ` lists the configuration registers implemented +in the HPDcache. + +These are mapped on the :math:`\mathsf{CFIG}` memory address segment +(:numref:`Figure %s `). + + +.. _tab_csr_cfg: + +.. list-table:: Configuration Registers in the HPDcache + :widths: 5 60 35 + :header-rows: 2 + + * - **CFIG Segment** + - + - + * - **Register** + - **Description** + - **Base address** + * - ``cfig_version`` + - 64-bits register with cache version + - ``cfig_base_i`` + 0x00 + * - ``cfig_info`` + - 64-bits register with cache information (part I) + - ``cfig_base_i`` + 0x08 + * - ``cfig_info2`` + - 64-bits register with cache information (part II) + - ``cfig_base_i`` + 0x10 + * - ``cfig_cachectrl`` + - 64-bits register to configure the cache controller + - ``cfig_base_i`` + 0x18 + * - ``cfig_wbuf`` + - 64-bits register to configure the write buffer + - ``cfig_base_i`` + 0x20 + + +cfig_version +'''''''''''' + +.. list-table:: Configuration - Version Register + :widths: 10 15 35 5 30 + :header-rows: 1 + + * - Bits + - Field + - Description + - Mode + - Reset Value + * - [15:0] + - MinorID + - Minor Version ID of the HPDcache + - RO + - :math:`\small\mathsf{0x0001}` + * - [31:16] + - MajorID + - Major Version ID of the HPDcache + - RO + - :math:`\small\mathsf{0x0001}` + * - [47:32] + - IpID + - IP ID of the HPDcache + - RO + - :math:`\small\mathsf{0x0001}` + * - [63:48] + - VendorID + - Vendor ID + - RO + - :math:`\small\mathsf{0x0001}` + + +cfig_info +''''''''' + +.. list-table:: Configuration - Info Register + :widths: 10 15 35 5 30 + :header-rows: 1 + + * - Bits + - Field + - Description + - Mode + - Reset Value + * - [15:0] + - Sets + - Number of sets in the cache (one-based) + - RO + - :math:`\scriptsize\mathsf{CONF\_HPDCACHE\_SETS - 1}` + * - [23:16] + - Ways + - Number of ways in the cache (one-based) + - RO + - :math:`\scriptsize\mathsf{CONF\_HPDCACHE\_WAYS - 1}` + * - [27:24] + - ClBytes + - Number of bytes per cacheline (power of 2) + - RO + - :math:`\scriptsize\mathsf{log_2(CONF\_HPDCACHE\_CL\_WIDTH/8)}` + * - [39:32] + - MSHRSets + - Number of sets in the MSHR (one-based) + - RO + - :math:`\scriptsize\mathsf{CONF\_HPDCACHE\_MSHR\_SETS - 1}` + * - [47:40] + - MSHRWays + - Number of ways in the MSHR (one-based) + - RO + - :math:`\scriptsize\mathsf{CONF\_HPDCACHE\_MSHR\_WAYS - 1}` + +cfig_info2 +'''''''''' + +.. list-table:: Configuration - Info 2 Register + :widths: 10 15 35 5 30 + :header-rows: 1 + + * - Bits + - Field + - Description + - Mode + - Reset Value + * - [7:0] + - RTAB + - Number of entries in the RTAB (one-based) + - RO + - :math:`\scriptsize\mathsf{CONF\_HPDCACHE\_RTAB\_ENTRIES - 1}` + * - [23:16] + - WBufDir + - Number of entries in the directory of the Write Buffer (one-based) + - RO + - :math:`\scriptsize\mathsf{CONF\_HPDCACHE\_WBUF\_DIR\_ENTRIES - 1}` + * - [31:24] + - WBufData + - Number of entries in the data buffer of the Write Buffer (one-based) + - RO + - :math:`\scriptsize\mathsf{CONF\_HPDCACHE\_WBUF\_DATA\_ENTRIES - 1}` + * - [35:32] + - WBufBytes + - Number of bytes per Write-Buffer Data Entry (power of 2) + - RO + - :math:`\scriptsize\mathsf{log_2(CONF\_HPDCACHE\_WBUF\_WIDTH/8)}` + + +.. _sec_cfig_cachectrl: + +cfig_cachectrl +'''''''''''''' + +.. list-table:: Configuration - Cache Controller Register + :widths: 10 15 35 5 30 + :header-rows: 1 + + * - Bits + - Field + - Description + - Mode + - Reset Value + * - [0:0] + - E + - Cache Enable - When set to 0, all memory accesses are considered + uncacheable + - RW + - :math:`\small\mathsf{0}` + * - [8:8] + - P + - Performance Counters Enable - When set to 1, performance counter count + events + - RW + - :math:`\small\mathsf{1}` + * - [56:56] + - R + - Single-Entry RTAB (fallback mode) - When set to 1, the cache controller + only uses one entry of the Replay Table. + - RW + - :math:`\small\mathsf{0}` + + +cfig_wbuf +''''''''' + +.. list-table:: Configuration - Write Buffer Register + :widths: 10 15 35 5 30 + :header-rows: 1 + + * - Bits + - Field + - Description + - Mode + - Reset Value + * - [0:0] + - R + - Reset time-counter on write - When set to 1, write accesses restart the + time-counter to 0 of the used write-buffer entry + - RW + - :math:`\small\mathsf{1}` + * - [1:1] + - S + - Sequential Write after Write - When set to 1, the write buffer stalls + write accesses that collide with an in-flight write transaction (SENT). + - RW + - :math:`\small\mathsf{0}` + * - [2:2] + - I + - Inhibit Write Coalescing - When set to 1, entries in the write-buffer go + from the FREE state to the PEND state directly (bypassing the OPEN + state). Moreover, no coalescing is accepted while the entry is in the + PEND state. + - RW + - :math:`\small\mathsf{0}` + * - [15:8] + - T + - Time-counter Threshold - This field defines the time-counter threshold on + which open write-buffer entries (OPEN) go to the pending state (PEND) + - RW + - :math:`\small\mathsf{3}` + +.. _sec_perf_counters: + +Performance counters +-------------------- + +The HPDcache provides a set of performance counters. These counters provide +information that can be used by software developers, at OS level or +user application level, to, for example, debug performance issues. + +These counters are implemented in the HPDcache only if +:math:`\scriptsize\mathsf{CONF\_HPDCACHE\_SUPPORT\_PERF}` is set to 1. If this +configuration parameter is set to 0, any read or write to performance counters +is ignored and a response with an error is sent to the corresponding requester +when ``core_req_i.need_rsp`` is set to 1. + +Performance counters are incremented automatically by the hardware when the +corresponding event is triggered and the ``cfig_cachectrl.P`` +(see :ref:`sec_cfig_cachectrl`) bit is set to 1. + +:numref:`Table %s ` lists the performance counters provided +by the HPDcache. These are mapped on the :math:`\mathsf{PERF}` memory address +segment (:numref:`Figure %s `). + +.. _tab_perf_counters: + +.. list-table:: Performance Counters in the HPDcache + :widths: 20 50 30 + :header-rows: 1 + + * - **Counter** + - **Description** + - **Base Address** + * - ``perf_write_cnt`` + - 64-bit counter for processed write requests + - ``cfg_base_i`` + 0x400 + * - ``perf_read_cnt`` + - 64-bit counter for processed read requests + - ``cfg_base_i`` + 0x408 + * - ``perf_prefetch_cnt`` + - 64-bit counter for processed prefetch requests + - ``cfg_base_i`` + 0x410 + * - ``perf_uncached_cnt`` + - 64-bit counter for processed uncached requests + - ``cfg_base_i`` + 0x418 + * - ``perf_cmo_cnt`` + - 64-bit counter for processed CMO requests + - ``cfg_base_i`` + 0x420 + * - ``perf_accepted_cnt`` + - 64-bit counter for processed requests + - ``cfg_base_i`` + 0x428 + * - ``perf_write_miss_cnt`` + - 64-bit counter for write cache misses + - ``cfg_base_i`` + 0x430 + * - ``perf_read_miss_cnt`` + - 64-bit counter for read cache misses + - ``cfg_base_i`` + 0x438 + * - ``perf_onhold_cnt`` + - 64-bit counter for requests put on-hold + - ``cfg_base_i`` + 0x440 + * - ``perf_onhold_mshr_cnt`` + - 64-bit counter for requests put on-hold because of MSHR conflicts + - ``cfg_base_i`` + 0x448 + * - ``perf_onhold_wbuf_cnt`` + - 64-bit counter for requests put on-hold because of WBUF conflicts + - ``cfg_base_i`` + 0x450 + * - ``perf_onhold_rollback_cnt`` + - 64-bit counter for requests put on-hold (again) after a rollback + - ``cfg_base_i`` + 0x458 + * - ``perf_stall_cnt`` + - 64-bit counter for stall cycles (cache does not accept a request) + - ``cfg_base_i`` + 0x460 diff --git a/docs/source/images/hpdcache_core.pdf b/docs/source/images/hpdcache_core.pdf new file mode 100644 index 0000000..e09292c Binary files /dev/null and b/docs/source/images/hpdcache_core.pdf differ diff --git a/docs/source/images/hpdcache_core.svg b/docs/source/images/hpdcache_core.svg new file mode 100644 index 0000000..d5f03b1 --- /dev/null +++ b/docs/source/images/hpdcache_core.svg @@ -0,0 +1,3446 @@ + + + + + HPDcache Core + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + image/svg+xml + + HPDcache Core + February, 2023 + + + Cesar Fuguet + + + + + Commissaria a l'Energie Atomique et aux Energies Alternatives (CEA) + + + + + + + + English + + + + + + + + + + + + Miss Handler + + + + + + + + + +  Cache Directory and Data + + + + + + + + + + + + + + + + + + ways + + + sets + + v tag + + + + + = + + + + + + data to processor + + + + + + + + + tags + + @tag + hit + + + + + + + valids + @set + + way1 + way0 + + + + sets xwords/x_cuts + + {@set,@word[2]} + + + + + + + @word[1:0] + + + + + en[3] + en[2] + en[1] + en[0] + + 64 bits + + + + + @word[1:0] + + + + + + + + + way1 + way0 + + + way1 + way0 + + + way1 + way0 + + set0:word3 + set0:word2 + set0:word1 + set0:word0 + + + set0:word7 + set0:word6 + set0:word5 + set0:word4 + + + + set1:word3 + set1:word2 + set1:word1 + set1:word0 + + + set1:word7 + set1:word6 + set1:word5 + set1:word4 + + + set127:word3 + set127:word2 + set127:word1 + set127:word0 + + + set127:word7 + set127:word6 + set127:word5 + set127:word4 + + way3 + way2 + way3 + way2 + way3 + way2 + way3 + way2 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + way selection + word selection + + + + + MSHR + + + + Uncacheable & AMOHandler (UC) + + + + + + + + + + + CacheManagementOperationHandler (CMO) + + + + + + + + + + + + Cache Controller + + ARBITER + + REFILL + + + + Write Buffer(WBUF) + + + Data + + + + Dir + + + + CONFIGURATION + + READMISSREQ/RSP + HPDcache Core + CORE REQ + + CORE RSP + + + + + + ProtocolEngineStage 0 + + + + + + + + + + + + READUNCACHEDREQ/RSP + + + WRITE/AMOUNCACHEDREQUEST + + + WRITEREQUEST + + ProtocolEngineStage 1 + + + + + + + + ProtocolEngineStage 2 + + + + + + + + + + + + + + ReplayTable(RTAB) +   + + + + + + + + Arbiter + Arbiter + MEMORYREAD REQ/RSP + MEMORYWRITE REQ/RSP + + diff --git a/docs/source/images/hpdcache_csr_addr_space.pdf b/docs/source/images/hpdcache_csr_addr_space.pdf new file mode 100644 index 0000000..b8a2685 Binary files /dev/null and b/docs/source/images/hpdcache_csr_addr_space.pdf differ diff --git a/docs/source/images/hpdcache_csr_addr_space.svg b/docs/source/images/hpdcache_csr_addr_space.svg new file mode 100644 index 0000000..b1caa0f --- /dev/null +++ b/docs/source/images/hpdcache_csr_addr_space.svg @@ -0,0 +1,300 @@ + + + + + + + + + + + + + + + + + + CONFIG + + PERF + + + + + + + Reserved + Reserved + Reserved + Reserved + Reserved + Reserved + CFIG_BASE+ 0x0 + CFIG_BASE+ 0x400 + CFIG_BASE+ 0x1200 + CFIG_BASE+ 0x800 + + diff --git a/docs/source/images/hpdcache_data_ram_organization.pdf b/docs/source/images/hpdcache_data_ram_organization.pdf new file mode 100644 index 0000000..c37f24d Binary files /dev/null and b/docs/source/images/hpdcache_data_ram_organization.pdf differ diff --git a/docs/source/images/hpdcache_data_ram_organization.svg b/docs/source/images/hpdcache_data_ram_organization.svg new file mode 100755 index 0000000..915dd28 --- /dev/null +++ b/docs/source/images/hpdcache_data_ram_organization.svg @@ -0,0 +1,2344 @@ + + + + + HPDcache Data RAM Organization + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + image/svg+xml + + HPDcache Data RAM Organization + February, 2023 + + + Cesar Fuguet + + + + + Commissariat a l'Energie Atomique et aux Energies Alternatives (CEA) + + + English + + + + + + + + + + + + + + + + + + + + + + ways + + + sets + + + tag + + + + set + word + + tag + @ + 64-bits address from processor + 63 + 0 + + + + + = + + + + + + data to processor + + + + + + + + + tags + + @tag + hit + + + + + + + valids + @set + + + 55 + 56 + + 2 + 5 + 6 + 12 + 13 + + + way1 + way0 + + + + + + sets xwords/x_cuts + + {@set,@word[2]} + + + + + + + @word[1:0] + + + + + en[3] + en[2] + en[1] + en[0] + + 64 bits + + + + + @word[1:0] + + + + + + + + + way1 + way0 + + + way1 + way0 + + + way1 + way0 + + set0:word3 + set0:word2 + set0:word1 + set0:word0 + + + set0:word7 + set0:word6 + set0:word5 + set0:word4 + + + + set1:word3 + set1:word2 + set1:word1 + set1:word0 + + + set1:word7 + set1:word6 + set1:word5 + set1:word4 + + + set127:word3 + set127:word2 + set127:word1 + set127:word0 + + + set127:word7 + set127:word6 + set127:word5 + set127:word4 + + way3 + way2 + way3 + way2 + way3 + way2 + way3 + way2 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + way selection + word selection + 3 + + + diff --git a/docs/source/images/hpdcache_highlevel_integration.pdf b/docs/source/images/hpdcache_highlevel_integration.pdf new file mode 100644 index 0000000..3f6f2a9 Binary files /dev/null and b/docs/source/images/hpdcache_highlevel_integration.pdf differ diff --git a/docs/source/images/hpdcache_highlevel_integration.svg b/docs/source/images/hpdcache_highlevel_integration.svg new file mode 100644 index 0000000..df808ba --- /dev/null +++ b/docs/source/images/hpdcache_highlevel_integration.svg @@ -0,0 +1,501 @@ + + + +HPDcacheCSRsHPDcacheCore1 request/cycleMemory InterfaceWrite BuerMSHRData/DirectoryRTABHardwareMemoryPrefetcherAcceleratorLoad/StoreUnitCoreLoad/StoreUnitArbiterRISC-V Coreup to 64 bytes/cycleup to 64 bytes/channel/cycle012N-1ReadRequestWrite/AMORequest diff --git a/docs/source/images/hpdcache_request_address_data_alignment.pdf b/docs/source/images/hpdcache_request_address_data_alignment.pdf new file mode 100644 index 0000000..a30c92f Binary files /dev/null and b/docs/source/images/hpdcache_request_address_data_alignment.pdf differ diff --git a/docs/source/images/hpdcache_request_address_data_alignment.svg b/docs/source/images/hpdcache_request_address_data_alignment.svg new file mode 100755 index 0000000..5d9119e --- /dev/null +++ b/docs/source/images/hpdcache_request_address_data_alignment.svg @@ -0,0 +1,2016 @@ + + + HPDcache Request Address Data Alignment + + + + image/svg+xml + + HPDcache Request Address Data Alignment + February, 2023 + + + Cesar Fuguet + + + + + Commissariat a l'Energie Atomique et aux Energies Alternatives + + + English + + + + + + + + + @8004 + + + + + + WDATA + + BE + + + + 1 + + + + + + + + + 0 + + + + + + + + + 0xAA + + 0xBB + + 0xCC + + 0xDD + + + + + + + + + + @8008 + + 0 + + 0 + + + + + + 0 + + 0 + + + + + + 0xBB + + 0xAA + + 0x99 + + 0x88 + + 1 + + 1 + + 1 + + 1 + + 0xCC + + 0xDD + + 0xEE + + 0xFF + + 1 + + 1 + + 1 + + 1 + + XX + + XX + + + + 0 + + 0 + + 0 + + 0 + + @8009 + + + + 0 + + 56 + + 127 + + 64 + + + 8 + + 0 + + 1 + + 15 + + 8 + + 7 + + + + + + + + + + + + + + + + + + 0xAA + + + + + 1 + + + + + + + + + + + WDATA + + WDATA + + BE + + BE + + + XX + XX + XX + XX + XX + XX + 0 + 0 + 0 + 0 + 0 + 0 + 0 + 1 + 1 + 1 + 0 + 0 + 0 + 0 + XX + XX + XX + XX + XX + XX + XX + XX + XX + XX + XX + XX + 0 + 0 + 0 + 0 + 0 + 0 + 0 + 0 + 0 + 0 + 0 + 0 + 0 + 0 + 0 + XX + XX + XX + XX + XX + XX + XX + XX + XX + XX + XX + XX + XX + XX + XX + 0 + + 56 + 127 + + 64 + + + 8 + 0 + + 1 + 15 + + 8 + + 7 + 0 + 56 + 127 + 64 + 8 + 0 + + 1 + 15 + + 8 + + 7 + SIZE=3 (8 bytes) + SIZE=2 (4 bytes) + SIZE=0 (1 byte) + diff --git a/docs/source/images/hpdcache_request_arbiter.pdf b/docs/source/images/hpdcache_request_arbiter.pdf new file mode 100644 index 0000000..8c71d80 Binary files /dev/null and b/docs/source/images/hpdcache_request_arbiter.pdf differ diff --git a/docs/source/images/hpdcache_request_arbiter.svg b/docs/source/images/hpdcache_request_arbiter.svg new file mode 100644 index 0000000..9fe73bb --- /dev/null +++ b/docs/source/images/hpdcache_request_arbiter.svg @@ -0,0 +1,428 @@ + + + HPDcache Request Arbiter + + + + image/svg+xml + + HPDcache Request Arbiter + February, 2023 + + + Cesar Fuguet + + + + + Commissariat a l'Energie Atomique et aux Energies Alternatives + + + English + + + + + + + + + + + + + + + + + + + + + + + + + 0 + 1 + 2 + N-1 + + + N + HPDcache + + CSRs + + + HPDcacheCore + MemoryInterface + Requester0 + Requester1 + Requester2 + RequesterN-1 + ... + HardwareMemoryPrefetcher + 1 request/cycle + Fixed-Priority Arbiter + diff --git a/docs/source/images/hpdcache_vipt.pdf b/docs/source/images/hpdcache_vipt.pdf new file mode 100644 index 0000000..8788175 Binary files /dev/null and b/docs/source/images/hpdcache_vipt.pdf differ diff --git a/docs/source/images/hpdcache_vipt.svg b/docs/source/images/hpdcache_vipt.svg new file mode 100755 index 0000000..3c236b3 --- /dev/null +++ b/docs/source/images/hpdcache_vipt.svg @@ -0,0 +1,911 @@ + + + + + HPDcache: VIPT support + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + image/svg+xml + + HPDcache: VIPT support + October, 2023 + + + Cesar Fuguet + + + + + Commissariat a l'Energie Atomique et aux Energies Alternatives + + + + + + + + TLB + + + + + + + + + + + + + + + + CacheDir + CacheData + + LoadStoreUnit + + Arbiter + + + + + virtualindex + HPDcache + + + wayselection + + + + + + + + RSPdata + physicaltag + + + + Mux + + + + + + + + Cycle 0 + Cycle 1 + + + + + TranslationLookasideBuffer + + + = + + + + = + + + + = + + Tags + Data + + + + + + + + + + Sel + + diff --git a/docs/source/images/wave_back_to_back.json b/docs/source/images/wave_back_to_back.json new file mode 100755 index 0000000..c4fecb2 --- /dev/null +++ b/docs/source/images/wave_back_to_back.json @@ -0,0 +1,10 @@ +{signal: [ + {name: 'CLK', wave: 'p....'}, + {name: 'PAYLOAD', wave: 'xx22x', data: ['dat0', 'dat1']}, + {name: 'VALID', wave: '0.1.0'}, + {name: 'READY', wave: '01..0'} +], + head:{ + tick:0 + } +} diff --git a/docs/source/images/wave_back_to_back.pdf b/docs/source/images/wave_back_to_back.pdf new file mode 100644 index 0000000..7c3a2c3 Binary files /dev/null and b/docs/source/images/wave_back_to_back.pdf differ diff --git a/docs/source/images/wave_back_to_back.svg b/docs/source/images/wave_back_to_back.svg new file mode 100755 index 0000000..fea7140 --- /dev/null +++ b/docs/source/images/wave_back_to_back.svg @@ -0,0 +1,4 @@ + + + +012345CLKPAYLOADdat0dat1VALIDREADY \ No newline at end of file diff --git a/docs/source/images/wave_ready_before_valid.json b/docs/source/images/wave_ready_before_valid.json new file mode 100755 index 0000000..fe8a446 --- /dev/null +++ b/docs/source/images/wave_ready_before_valid.json @@ -0,0 +1,10 @@ +{signal: [ + {name: 'CLK', wave: 'p...'}, + {name: 'PAYLOAD', wave: 'x.2x', data: ['data']}, + {name: 'VALID', wave: '0.10'}, + {name: 'READY', wave: '01.0'} +], + head:{ + tick:0 + } +} diff --git a/docs/source/images/wave_ready_before_valid.pdf b/docs/source/images/wave_ready_before_valid.pdf new file mode 100644 index 0000000..cb1f95e Binary files /dev/null and b/docs/source/images/wave_ready_before_valid.pdf differ diff --git a/docs/source/images/wave_ready_before_valid.svg b/docs/source/images/wave_ready_before_valid.svg new file mode 100755 index 0000000..a7dd0fa --- /dev/null +++ b/docs/source/images/wave_ready_before_valid.svg @@ -0,0 +1,4 @@ + + + +01234CLKPAYLOADdataVALIDREADY \ No newline at end of file diff --git a/docs/source/images/wave_ready_when_valid.json b/docs/source/images/wave_ready_when_valid.json new file mode 100755 index 0000000..3507527 --- /dev/null +++ b/docs/source/images/wave_ready_when_valid.json @@ -0,0 +1,10 @@ +{signal: [ + {name: 'CLK', wave: 'p...'}, + {name: 'PAYLOAD', wave: 'x.2x', data: ['data']}, + {name: 'VALID', wave: '0.10'}, + {name: 'READY', wave: '0.10'} +], + head:{ + tick:0 + } +} diff --git a/docs/source/images/wave_ready_when_valid.pdf b/docs/source/images/wave_ready_when_valid.pdf new file mode 100644 index 0000000..0597a44 Binary files /dev/null and b/docs/source/images/wave_ready_when_valid.pdf differ diff --git a/docs/source/images/wave_ready_when_valid.svg b/docs/source/images/wave_ready_when_valid.svg new file mode 100755 index 0000000..587f606 --- /dev/null +++ b/docs/source/images/wave_ready_when_valid.svg @@ -0,0 +1,4 @@ + + + +01234CLKPAYLOADdataVALIDREADY \ No newline at end of file diff --git a/docs/source/images/wave_valid_before_ready.json b/docs/source/images/wave_valid_before_ready.json new file mode 100755 index 0000000..16c313e --- /dev/null +++ b/docs/source/images/wave_valid_before_ready.json @@ -0,0 +1,10 @@ +{signal: [ + {name: 'CLK', wave: 'p...'}, + {name: 'PAYLOAD', wave: 'x2.x', data: ['data']}, + {name: 'VALID', wave: '01.0'}, + {name: 'READY', wave: '0.10'} +], + head:{ + tick:0 + } +} diff --git a/docs/source/images/wave_valid_before_ready.pdf b/docs/source/images/wave_valid_before_ready.pdf new file mode 100644 index 0000000..48530cd Binary files /dev/null and b/docs/source/images/wave_valid_before_ready.pdf differ diff --git a/docs/source/images/wave_valid_before_ready.svg b/docs/source/images/wave_valid_before_ready.svg new file mode 100755 index 0000000..ccc7b4f --- /dev/null +++ b/docs/source/images/wave_valid_before_ready.svg @@ -0,0 +1,4 @@ + + + +01234CLKPAYLOADdataVALIDREADY \ No newline at end of file diff --git a/docs/source/index.rst b/docs/source/index.rst new file mode 100644 index 0000000..21794e7 --- /dev/null +++ b/docs/source/index.rst @@ -0,0 +1,37 @@ +.. + Copyright 2024 CEA* + *Commissariat a l'Energie Atomique et aux Energies Alternatives (CEA) + + SPDX-License-Identifier: Apache-2.0 WITH SHL-2.1 + + Licensed under the Solderpad Hardware License v 2.1 (the “License”); you + may not use this file except in compliance with the License, or, at your + option, the Apache License version 2.0. You may obtain a copy of the + License at + + https://solderpad.org/licenses/SHL-2.1/ + + Unless required by applicable law or agreed to in writing, any work + distributed under the License is distributed on an “AS IS” BASIS, WITHOUT + WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the + License for the specific language governing permissions and limitations + under the License. + + Authors : Cesar Fuguet + Description : HPDcache User Guide Master File + + +Welcome to the HPDcache User Guide +================================== + +.. toctree:: + :maxdepth: 2 + :caption: Contents: + + overview + interface + architecture + csrs + amo + cmo + references diff --git a/docs/source/interface.rst b/docs/source/interface.rst new file mode 100644 index 0000000..8694019 --- /dev/null +++ b/docs/source/interface.rst @@ -0,0 +1,1236 @@ +.. + Copyright 2024 CEA* + *Commissariat a l'Energie Atomique et aux Energies Alternatives (CEA) + + SPDX-License-Identifier: Apache-2.0 WITH SHL-2.1 + + Licensed under the Solderpad Hardware License v 2.1 (the “License”); you + may not use this file except in compliance with the License, or, at your + option, the Apache License version 2.0. You may obtain a copy of the + License at + + https://solderpad.org/licenses/SHL-2.1/ + + Unless required by applicable law or agreed to in writing, any work + distributed under the License is distributed on an “AS IS” BASIS, WITHOUT + WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the + License for the specific language governing permissions and limitations + under the License. + + Authors : Cesar Fuguet + Description : HPDcache Interface + +Parameters, Interfaces and Communication Protocols +================================================== + +Synthesis-time (Static) Configuration Parameters +------------------------------------------------ + +The HPDcache has several static configuration parameters. These parameters must +be defined at compilation/synthesis. + +:numref:`tab_synthesis_parameters` summarizes the list of parameters +that can be set when integrating the HPDcache. + +.. _tab_synthesis_parameters: + +.. list-table:: Static Synthesis-Time Parameters + :widths: 45 55 + :header-rows: 1 + + * - Parameter + - Description + * - :math:`\scriptsize\mathsf{NREQUESTERS}` + - Number of requesters to the HPDcache + * - :math:`\scriptsize\mathsf{CONF\_HPDCACHE\_PA\_WIDTH}` + - Physical address width (in bits) + * - :math:`\scriptsize\mathsf{CONF\_HPDCACHE\_SETS}` + - Number of sets + * - :math:`\scriptsize\mathsf{CONF\_HPDCACHE\_WAYS}` + - Number of ways (associativity) + * - :math:`\scriptsize\mathsf{CONF\_HPDCACHE\_WORD\_WIDTH}` + - Width (in bits) of a data word + * - :math:`\scriptsize\mathsf{CONF\_HPDCACHE\_CL\_WORDS}` + - Number of words in a cacheline + * - :math:`\scriptsize\mathsf{CONF\_HPDCACHE\_REQ\_WORDS}` + - Number of words in the data channels from/to requesters + * - :math:`\scriptsize\mathsf{CONF\_HPDCACHE\_REQ\_WORDS}` + - Number of words in the data channels from/to requesters + * - :math:`\scriptsize\mathsf{CONF\_HPDCACHE\_REQ\_TRANS\_ID\_WIDTH}` + - Width (in bits) of the transaction ID from requesters + * - :math:`\scriptsize\mathsf{CONF\_HPDCACHE\_REQ\_SRC\_ID\_WIDTH}` + - Width (in bits) of the source ID from requesters + * - :math:`\scriptsize\mathsf{CONF\_HPDCACHE\_VICTIM\_SEL}` + - It allows to choose the replacement selection policy + * - :math:`\scriptsize\mathsf{CONF\_HPDCACHE\_MSHR\_SETS}` + - Number of sets in the MSHR + * - :math:`\scriptsize\mathsf{CONF\_HPDCACHE\_MSHR\_WAYS}` + - Number of ways (associativity) in the MSHR + * - :math:`\scriptsize\mathsf{CONF\_HPDCACHE\_WBUF\_DIR\_ENTRIES}` + - Number of entries in the directory of the write buffer + * - :math:`\scriptsize\mathsf{CONF\_HPDCACHE\_WBUF\_DATA\_ENTRIES}` + - Number of entries in the data part of the write buffer + * - :math:`\scriptsize\mathsf{CONF\_HPDCACHE\_WBUF\_WORDS}` + - Number of data words per entry in the write buffer + * - :math:`\scriptsize\mathsf{CONF\_HPDCACHE\_WBUF\_TIMECNT\_WIDTH}` + - Width (in bits) of the time counter in write buffer entries + * - :math:`\scriptsize\mathsf{CONF\_HPDCACHE\_RTAB\_ENTRIES}` + - Number of entries in the replay table + * - :math:`\scriptsize\mathsf{CONF\_HPDCACHE\_MEM\_DATA\_WIDTH}` + - Width (in bits) of the data channels from/to the memory interface + * - :math:`\scriptsize\mathsf{CONF\_HPDCACHE\_MEM\_ID\_WIDTH}` + - Width (in bits) of the transaction ID from the memory interface + * - :math:`\scriptsize\mathsf{CONF\_HPDCACHE\_SUPPORT\_AMO}` + - When set to 1, the HPDCache supports Atomic Memory Operations (AMOs) + * - :math:`\scriptsize\mathsf{CONF\_HPDCACHE\_SUPPORT\_CMO}` + - When set to 1, the HPDCache supports Cache Management Operations (CMOs) + * - :math:`\scriptsize\mathsf{CONF\_HPDCACHE\_SUPPORT\_PERF}` + - When set to 1, the HPDcache integrates performance counters + +Some parameters are not directly related with functionality. Instead, they +allow adapting the HPDcache to physical constraints in the target technology +node. Typically, these control the geometry of SRAM macros. Depending on the +technology, some dimensions are more efficient than others (in terms of +performance, power and area). These also need to be provided by the user at +synthesis-time. :numref:`tab_synthesis_physical_parameters` lists the static +synthesis-time physical parameters of the HPDcache. The +:math:`\scriptsize\mathsf{CONF\_HPDCACHE\_ACCESS\_WORDS}` has an impact on the refill +latency (see section :ref:`sec_cache_ram_organization`). + +.. _tab_synthesis_physical_parameters: + +.. list-table:: Static Synthesis-Time Physical Parameters + :widths: 50 50 + :header-rows: 1 + + * - Parameter + - Description + * - :math:`\scriptsize\mathsf{CONF\_HPDCACHE\_ACCESS\_WORDS}` + - Number of words that can be accessed simultaneously from the CACHE data + array + * - :math:`\scriptsize\mathsf{CONF\_HPDCACHE\_DATA\_WAYS\_PER\_RAM\_WORD}` + - Number of ways in the same CACHE data SRAM word + * - :math:`\scriptsize\mathsf{CONF\_HPDCACHE\_DATA\_SETS\_PER\_RAM}` + - Number of sets per RAM macro in the DATA array of the cache + * - :math:`\scriptsize\mathsf{CONF\_HPDCACHE\_DATA\_RAM\_WBYTEENABLE}` + - Use RAM macros with byte-enable instead of bit-mask for the CACHE data + array + * - :math:`\scriptsize\mathsf{CONF\_HPDCACHE\_MSHR\_WAYS\_PER\_RAM\_WORD}` + - Number of ways in the same MSHR SRAM word + * - :math:`\scriptsize\mathsf{CONF\_HPDCACHE\_MSHR\_SETS\_PER\_RAM}` + - Number of sets per RAM macro in the MSHR array of the cache + * - :math:`\scriptsize\mathsf{CONF\_HPDCACHE\_MSHR\_RAM\_WBYTEENABLE}` + - Use RAM macros with byte-enable instead of bit-mask for the MSHR + +Several internal configuration values are computed from the above ones. +:numref:`tab_internal_parameters` has a non-complete list of these +internal configuration values that may be mentioned in the remainder of this +document. + +.. _tab_internal_parameters: + +.. list-table:: Internal Parameters + :widths: 35 25 40 + :header-rows: 1 + + * - Parameter + - Description + - Value + * - :math:`\scriptsize\mathsf{HPDCACHE\_CL\_WIDTH}` + - Width (in bits) of a cacheline + - | :math:`\scriptsize\mathsf{CONF\_HPDCACHE\_CL\_WORDS \times}` + | :math:`\scriptsize\mathsf{CONF\_HPDCACHE\_WORD\_WIDTH}` + * - :math:`\scriptsize\mathsf{HPDCACHE\_REQ\_DATA\_WIDTH}` + - Width (in bits) of request data interfaces + - | :math:`\scriptsize\mathsf{CONF\_HPDCACHE\_REQ\_WORDS \times}` + | :math:`\scriptsize\mathsf{CONF\_HPDCACHE\_WORD\_WIDTH}` + * - :math:`\scriptsize\mathsf{HPDCACHE\_NLINE\_WIDTH}` + - Width (in bits) of the cacheline index part of the address + - | :math:`\scriptsize\mathsf{CONF\_HPDCACHE\_PA\_WIDTH -}` + | :math:`\scriptsize\mathsf{log_2(\frac{HPDCACHE\_CL\_WIDTH}{8})}` + * - :math:`\scriptsize\mathsf{HPDCACHE\_SET\_WIDTH}` + - Width (in bits) of the SET part of the address + - :math:`\scriptsize\mathsf{log_2(CONF\_HPDCACHE\_SETS)}` + * - :math:`\scriptsize\mathsf{HPDCACHE\_TAG\_WIDTH}` + - Width (in bits) of the TAG part of the address + - | :math:`\scriptsize\mathsf{HPDCACHE\_NLINE\_WIDTH -}` + | :math:`\scriptsize\mathsf{HPDCACHE\_SET\_WIDTH}` + * - :math:`\scriptsize\mathsf{HPDCACHE\_WBUF\_WIDTH}` + - Width (in bits) of an entry in the write-buffer + - | :math:`\scriptsize\mathsf{CONF\_HPDCACHE\_WBUF\_WORDS \times}` + | :math:`\scriptsize\mathsf{CONF\_HPDCACHE\_WORD\_WIDTH}` + + +Conventions +----------- + +The HPDcache uses the following conventions in the naming of its signals: + + - The ``_i`` suffix for input ports + - The ``_o`` suffix for output ports + - The ``_n`` suffix for active low ports + - The ``clk_`` suffix for clock ports + - The ``rst_`` suffix for reset ports + - There may be a mix of suffixes. For example ``_ni`` indicates an active-low + input port + + +Global Signals +-------------- + +.. _tab_global_signals: + +.. list-table:: Global Signals + :widths: 25 15 60 + :header-rows: 1 + + * - Signal + - Source + - Description + * - ``clk_i`` + - Clock source + - Global clock signal. + The HPDcache is synchronous to the rising-edge of the clock. + * - ``rst_ni`` + - Reset source + - Global reset signal. Asynchronous, active LOW, reset signal. + * - ``wbuf_flush_i`` + - System + - Force the write-buffer to send all pending writes. Active HIGH, + one-cycle, pulse signal. Synchronous to ``clk_i``. + * - ``wbuf_empty_o`` + - Cache + - Indicates if the write-buffer is empty (there is no pending write + transactions). When this signal is set to 1, the write-buffer is empty. + * - ``cfig_base_i`` + - System + - Base address of the CSR segment in the HPDcache (:ref:`sec_csr`) + + +Cache-Requesters Interface +-------------------------------------- + +This section describes the Cache-Requesters Interface (CRI) between requesters +and the HPDcache. It contains two channels: one for requests and one for +responses. There are as many CRIs as requesters from the core/accelerator to +the HPDcache. + +This interface is synchronous to the rising edge of the global +clock ``clk_i``. + +The address (``core_req_i.addr_offset``), size (``core_req_i.size``), +byte-enable (``core_req_i.be``), write data (``core_req_i.wdata``) and +read data (``core_rsp_o.rdata``) signals shall comply with the alignment +constraints defined in section +:ref:`Address, data, and byte enable alignment `. + +CRI Signal Description +~~~~~~~~~~~~~~~~~~~~~~ + +.. _tab_req_channel_signals: + +.. list-table:: CRI Request Channel Signals + :widths: 31 13 52 + :align: center + :header-rows: 1 + + * - Signal + - Source + - Description + * - ``core_req_valid_i`` + - Requester + - Indicates that the corresponding requester has a valid request + * - ``core_req_ready_o`` + - Cache + - Indicates that the cache is ready to accept a request from the + corresponding requester + * - ``core_req_i.addr_offset`` + - Requester + - Least significant bits of the target address of the request + * - ``core_req_i.wdata`` + - Requester + - Write data (little-endian) + * - ``core_req_i.op`` + - Requester + - Indicates the type of operation to be performed + * - ``core_req_i.be`` + - Requester + - Byte-enable for write data (little-endian) + * - ``core_req_i.size`` + - Requester + - Indicate the size of the access. The size is encoded as the power-of-two + of the number of bytes (e.g. + 0 is :math:`\scriptsize\mathsf{2^0~=~1}`, + 5 is :math:`\scriptsize\mathsf{2^5~=~32}`) + * - ``core_req_i.sid`` + - Requester + - The identification tag for the requester. It shall be identical to the + index of the request port binded to that requester + * - ``core_req_i.tid`` + - Requester + - The identification tag for the request. A requester can issue multiple + requests. The corresponding response from the cache will return this tid + * - ``core_req_i.need_rsp`` + - Requester + - Indicates if the request needs a response from the cache. When unset, + the cache will not issue a response for the corresponding request + * - ``core_req_i.phys_indexed`` + - Requester + - Indicates wheter the access uses virtual (unset) or physical indexing + (set) + * - ``core_req_i.addr_tag`` + - Requester + - Most significant bits of the target address of the request. It is only + valid when using physical indexing (``core_req_i.phys_indexed = 1``) + * - ``core_req_i.pma.uncacheable`` + - Requester + - Indicates whether the access needs to be cached (unset) or not (set). + Uncacheable accesses are directly forwarded to the memory. It is only + valid when using physical indexing (``core_req_i.phys_indexed = 1``) + * - ``core_req_i.pma.io`` + - Requester + - Indicates whether the request targets input/output (IO) peripherals + (set) or not (unset). IO accesses are directly forwarded to the memory. + It is only valid when using physical indexing + (``core_req_i.phys_indexed = 1``) + * - ``core_req_tag_i`` + - Requester + - Most significant bits of the target address of the request. This signal + must be delayed of 1 cycle after + ``(core_req_valid_i & core_req_ready_o) = 1``. + It is valid when using virtual indexing + (``core_req_i.phys_indexed = 0``) + * - ``core_req_pma_i.uncacheable`` + - Requester + - Indicates whether the access needs to be cached (unset) or not (set). + Uncacheable accesses are directly forwarded to the memory. This signal + must be delayed of 1 cycle after + ``(core_req_valid_i & core_req_ready_o) = 1``. + It is only valid when using virtual indexing + (``core_req_i.phys_indexed = 0``) + * - ``core_req_pma_i.io`` + - Requester + - Indicates whether the access targets input/output (IO) peripherals (set) + or not (unset). IO accesses are directly forwarded to the memory. This + signal must be delayed of 1 cycle after + ``(core_req_valid_i & core_req_ready_o) = 1``. + It is only valid when using virtual indexing + (``core_req_i.phys_indexed = 0``) + +.. _tab_resp_channel_signals: + +.. list-table:: CRI Response Channel Signals + :widths: 31 13 52 + :header-rows: 1 + + * - Signal + - Source + - Description + * - ``core_rsp_valid_o`` + - Cache + - Indicates that the HPDcache has a valid response for the corresponding + requester + * - ``core_rsp_o.rdata`` + - Cache + - Response read data + * - ``core_rsp_o.sid`` + - Cache + - The identification tag for the requester. It corresponds to the **sid** + transferred with the request + * - ``core_rsp_o.tid`` + - Cache + - The identification tag for the request. It corresponds to the **tid** + transferred with the request + * - ``core_rsp_o.error`` + - Cache + - Indicates whether there was an error condition while processing the + request + * - ``core_rsp_o.aborted`` + - Cache + - Indicates if the request issued in the previous cycle shall be aborted. + It is only considered if the previous request used virtual indexing + +Cache Memory Interfaces +---------------------------------- + +This section describes the Cache-Memory Interface (CMI) between the HPDcache +and the NoC/memory. It implements 5 different channels. + +This interface is synchronous to the rising edge of the global clock +``clk_i``. + +All CMI interfaces implements the ready-valid protocol described in section +:ref:`sec_ready_valid_handshake` for the handshake +between the HPDcache and the NoC/Memory. + +The address (``mem_req_addr``), size (``mem_req_size``), +write data (``mem_req_w_data``) and write byte-enable (``mem_req_w_be``) +signals shall comply with the alignment constraints defined in section +:ref:`Address, data, and byte enable alignment `. + + +.. _sec_mi_signal_descriptions: + +CMI Signal Descriptions +~~~~~~~~~~~~~~~~~~~~~~~ + +- **Memory Read Interfaces** + +.. _tab_read_req_channel_signals: + +.. list-table:: CMI Read Request Channel Signals + :widths: 31 13 52 + :header-rows: 1 + + * - Signal + - Source + - Description + * - ``mem_req_read_valid_o`` + - Cache + - Indicates that the channel is signaling a valid request + * - ``mem_req_read_ready_i`` + - NoC + - Indicates that the NoC is ready to accept a request + * - ``mem_req_read_o.mem_req_addr`` + - Cache + - Target physical address of the request. The address shall be aligned to + the ``mem_req_read_o.mem_req_size`` field. + * - ``mem_req_read_o.mem_req_len`` + - Cache + - Indicates the number of transfers in a burst minus one + * - ``mem_req_read_o.mem_req_size`` + - Cache + - Indicate the size of the access. The size is encoded as the power-of-two + of the number of bytes + * - ``mem_req_read_o.mem_req_id`` + - Cache + - The identification tag for the request. The HPDcache always use unique + IDs on the memory interface (i.e. two or more in-flight requests cannot + share the same ID). + * - ``mem_req_read_o.mem_req_command`` + - Cache + - Indicates the type of operation to be performed + * - ``mem_req_read_o.mem_req_atomic`` + - Cache + - In case of atomic operations, it indicates its type + * - ``mem_req_read_o.mem_req_cacheable`` + - Cache + - This is a hint for the cache hierarchy in the system. It indicates if + the request can be allocated by the cache hierarchy. That is, data can + be prefetched from memory or can be reused for multiple read + transactions + + +.. _tab_read_miss_resp_channel_signals: + +.. list-table:: CMI Read Response Channel Signals + :widths: 31 13 52 + :header-rows: 1 + + * - Signal + - Source + - Description + * - ``mem_resp_read_valid_i`` + - NoC + - Indicates that the channel is signaling a valid response + * - ``mem_resp_read_ready_o`` + - Cache + - Indicates that the cache is ready to accept a response + * - ``mem_resp_read_i.mem_resp_r_error`` + - NoC + - Indicates whether there was an error condition while processing the + request + * - ``mem_resp_read_i.mem_resp_r_id`` + - NoC + - The identification tag for the request. It corresponds to the ID + transferred with the request + * - ``mem_resp_read_i.mem_resp_r_data`` + - NoC + - Response read data. It shall be naturally aligned to the request address + * - ``mem_resp_read_i.mem_resp_r_last`` + - NoC + - Indicates the last transfer in a read response burst + + +- **Memory Write Interfaces** + +.. _tab_write_req_channel_signals: + +.. list-table:: CMI Write Request Channel Signals + :widths: 31 13 52 + :header-rows: 1 + + * - Signal + - Source + - Description + * - ``mem_req_write_valid_o`` + - Cache + - Indicates that the channel is signaling a valid request + * - ``mem_req_write_ready_i`` + - NoC + - Indicates that the cache is ready to accept a response + * - ``mem_req_write_o.mem_req_addr`` + - Cache + - Target physical address of the request + * - ``mem_req_write_o.mem_req_len`` + - Cache + - Indicates the number of transfers in a burst minus one + * - ``mem_req_write_o.mem_req_size`` + - Cache + - Indicate the size of the access. The size is encoded as the + power-of-two of the number of bytes + * - ``mem_req_write_o.mem_req_id`` + - Cache + - The identification tag for the request. The HPDcache always use unique + IDs on the memory interface (i.e. two or more in-flight requests cannot + share the same ID). + * - ``mem_req_write_o.mem_req_command`` + - Cache + - Indicates the type of operation to be performed + * - ``mem_req_write_o.mem_req_atomic`` + - Cache + - In case of atomic operations, it indicates its type + * - ``mem_req_write_o.mem_req_cacheable`` + - Cache + - This is a hint for the cache hierarchy in the system. It indicates if + the write is bufferable by the cache hierarchy. This means that the + write must be visible in a timely manner at the final destination. + However, write responses can be obtained from an intermediate point + + +.. _tab_write_data_channel_signals: + +.. list-table:: CMI Write Data Channel Signals + :widths: 31 13 52 + :header-rows: 1 + + * - Signal + - Source + - Description + * - ``mem_req_write_data_valid_o`` + - Cache + - Indicates that the channel is transferring a valid data + * - ``mem_req_write_data_ready_i`` + - NoC + - Indicates that the target is ready to accept the data + * - ``mem_req_write_data_o.mem_req_w_data`` + - Cache + - Request write data. It shall be naturally aligned to the request + address + * - ``mem_req_write_data_o.mem_req_w_be`` + - Cache + - Request write byte-enable. It shall be naturally aligned to the request + address + * - ``mem_req_write_data_o.mem_req_w_last`` + - Cache + - Indicates the last transfer in a write request burst + + +.. _tab_write_resp_channel_signals: + +.. list-table:: CMI Write Response Channel Signals + :widths: 31 13 52 + :header-rows: 1 + + * - Signal + - Source + - Description + * - ``mem_resp_write_valid_i`` + - NoC + - Indicates that the channel is transferring a valid write acknowledgement + * - ``mem_resp_write_ready_o`` + - Cache + - Indicates that the cache is ready to accept the acknowledgement + * - ``mem_resp_write_i.mem_resp_w_is_atomic`` + - NoC + - Indicates whether the atomic operation was successfully processed + (atomically) + * - ``mem_resp_write_i.mem_resp_w_error`` + - NoC + - Indicates whether there was an error condition while processing the + request + * - ``mem_resp_write_i.mem_resp_w_id`` + - NoC + - The identification tag for the request. It corresponds to the ID + transferred with the request + + +Interfaces’ requirements +------------------------ + +This section describes the basic protocol transaction requirements for the +different interfaces in the HPDcache. + + +.. _sec_ready_valid_handshake: + +Valid/Ready handshake process +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +All interfaces in the HPDcache use a **valid**/**ready** handshake process to +transfer a payload between the source and the destination. The payload contains +the address, data and control information. + +As a reminder, the 7 interfaces in the HPDcache are the following: + +#. CRI request interface +#. CRI response interface +#. CMI read request interface +#. CMI read response interface +#. CMI write request interface +#. CMI write data request interface +#. CMI write response interface + +The source sets to 1 the **valid** signal to indicate when the payload is +available. The destination sets to 1 the **ready** signal to indicate that it +can accept that payload. Transfer occurs only when both the **valid** and +**ready** signals are set to 1 on the next rising edge of the clock. + +A source is not permitted to wait until **ready** is set to 1 before setting +**valid** to 1. + +A destination may or not wait for **valid** to set the **ready** to 1 +(:numref:`cases (a) and (d) in Table %s `). +In other words, a destination may set **ready** to 1 before an actual transfer +is available. + +When **valid** is set to 1, the source must keep it that way until the +handshake occurs. This is, at the next rising edge when both **valid** and +**ready** (from the destination) are set to 1. In other words, a source cannot +retire a pending **valid** transfer +(:numref:`Case (b) in Table %s `). + +After an effective transfer (**valid** and **ready** set to 1), the source may +keep **valid** set to 1 in the next cycle to signal a new transfer (with a new +payload). In the same manner, the destination may keep **ready** set to 1 if it +can accept a new transfer. This allows back-to-back transfers, with no idle +cycles, between a source and a destination +(:numref:`Case (d) in Table %s `). + +All interfaces are synchronous to the rising edge of the same global +clock (``clk_i``). + +.. _tab_ready_valid_scenarios: + +.. list-table:: valid/ready scenarios + :class: borderless + :align: center + + * - **(a)** + - **(b)** + * - .. image:: images/wave_ready_before_valid.* + - .. image:: images/wave_valid_before_ready.* + * - **(c)** + - **(d)** + * - .. image:: images/wave_ready_when_valid.* + - .. image:: images/wave_back_to_back.* + + +CRI Response Interface +''''''''''''''''''''''''''''' + +In the case of the CRI response interfaces, there is a particularity. +For these interfaces, it is assumed that the **ready** signal is always set to +1. That is why the **ready** signal is not actually implemented on those +interfaces. In other words, the requester unconditionally accepts any incoming +response. + +.. _sec_req_alignment: + +Address, data and byte enable alignment +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Address alignment +''''''''''''''''' + +The address transferred (**addr**) in all request interfaces (CRI and CMI) +shall be byte-aligned to the value of the corresponding **size** signal in that +interface. + +Some examples are illustrated in +:numref:`Figure %s `. In the first case, the +**size** value is 2 (which corresponds to :math:`\scriptsize\mathsf{2^2=4}` bytes). +Thus, the address must be a multiple of 4; In the second case, **size** value +is 3. Thus, the address must be a multiple of 8. Finally, in the third case, +**size** value is 0. Thus, there is no constraint on the address alignment. + +Data alignment +'''''''''''''' + +The data must be naturally aligned to the address (**addr**) and the maximum +valid bytes of the transfer must be equal to :math:`\scriptsize\mathsf{2^{size}}`. +This means that the first valid byte in the **data** signal must be at the +indicated offset of the address. Here, the offset corresponds to the least +significant bits of the address, that allow to indicate a byte within the +**data** word. For example, if the **data** signal is 128 bits wide (16 +bytes), then the offset corresponds to the first 4 bits of the **addr** signal. + +Some examples are illustrated in +:numref:`Figure %s `. As illustrated, within the +data word, only bytes in the range from the indicated offset in the address, to +that offset plus :math:`\scriptsize\mathsf{2^{size}}` can contain valid data. Other bytes must +be ignored by the destination. + +Additionally, within the range described above, the **be** signal indicates +which bytes within that range are actually valid. Bytes in the **data** +signal where the **be** signals are set to 0, must be ignored by the +destination. + +Byte Enable (BE) alignment +'''''''''''''''''''''''''' + +The **be** signal must be naturally aligned to the address (**addr**) and the +number of bits set in this signal must be less or equal to +:math:`\scriptsize\mathsf{2^\text{size}}`. This means that the first valid bit in the +**be** signal must be at the indicated offset of the address. The offset is +the same as the one explained above in the "Data alignment" paragraph. + +Some examples are illustrated in +:numref:`Figure %s `. As illustrated, within the +**be** word, only bits in the range from the indicated offset in the address, +to that offset plus :math:`\scriptsize\mathsf{2^{size}}` can be set. Other bits +outside that range must be set to 0. + +.. _fig_request_data_alignment: + +.. figure:: images/hpdcache_request_address_data_alignment.* + :align: center + :alt: Address, Data and Byte Enable Alignment in Requests + + Address, Data and Byte Enable Alignment in Requests + + +Cache-Requesters Interface (CRI) Attributes +------------------------------------------- + +.. _sec_vipt: + +Physical or Virtual Indexing +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The HPDcache allows the address and physical memory attributes (PMA) to be sent +by the requesters in two different (but consecutive) cycles. + +This is useful to allow the pipelining of the address translation mechanism +(when the core has one). This is illustrated in +:numref:`Figure %s `. +Doing the translation and directly forwarding to the cache is usually too +costly in terms of timing. Instead, the requesters can: + +.. list-table:: + :widths: 15 85 + :header-rows: 0 + + * - Cycle 0 + - During the first cycle, forward the least significant bits of the + address (``addr_offset``), which usually do not need to be translated, + along with the other fields of the request (operation, identifiers, + etc). In the meanwhile the core can perform the translation of the + address to compute the most significant bits (``addr_tag``) + + * - Cycle 1 + - During the second cycle, forward the previously translated most + significant bits of the address (``addr_tag``), and the corresponding + PMAs. PMAs are sent during this second cycle because usually they depend + on the target physical address. The requester can abort the request + during this cycle as explained in the next section (:ref:`sec_req_abort`). + +.. _fig_vipt: + +.. figure:: images/hpdcache_vipt.* + :align: center + :width: 80% + :alt: Pipelining of the Virtual and Physical Part of the Address + + Pipelining of the Virtual and Physical Part of the Address + +This kind of indexing is named **Virtually-Indexed Physically-Tagged (VIPT)**. + +The requester shall send the tag and PMAs the next cycle after the +``core_req_valid_i`` and ``core_req_ready_o`` signals were set to 1 and the +``core_req_i.phys_indexed`` signal was set to 0.The number of bits of the +address offset (``addr_offset``) depends on the number of cache sets +(:math:`\scriptsize\mathsf{CONF\_HPDCACHE\_SETS}`) and the size of the cachelines +(:math:`\scriptsize\mathsf{CONF\_HPDCACHE\_CL\_WIDTH/8}`). +The address offset represents the concatenation of these two fields of the +address: the byte offset in the cacheline and the set index. Requests can be +sent back-to-back with no idle cycle in-between. + +If requesters do not need virtual indexing, they can send the full address in +the first cycle by setting the ``core_req_i.phys_indexed`` bit to 1. The address +offset and the tag shall be sent through the ``core_req_i.addr_offset`` and +``core_req_i.addr_tag``, respectively. A given requester is free to alternate +between virtual and physical indexing on different clock cycles. Different +requesters can use different indexing schemes (virtual or physical). + +.. _sec_req_abort: + +Request Abortion +~~~~~~~~~~~~~~~~~~~~~ + +When using the virtual indexing, the requester can abort the request during the +second cycle of the addressing pipeline. In that case, the requester needs to +set the req_abort signal to 1. + +When a request is aborted, and the ``core_req_i.need_rsp`` field was set to 1, the +HPDcache respond to the corresponding requester with the bit +``core_rsp_o.aborted`` set to 1. + +CRI Type of Operation +~~~~~~~~~~~~~~~~~~~~~ + +A requester indicates the required operation on the 4-bit, ``HPDCACHE_REQ_OP`` +signal. The supported operation are detailed in :numref:`tab_req_op_types`. + +.. _tab_req_op_types: + +.. list-table:: Requesters Operation Types + :widths: 30 15 55 + :header-rows: 1 + + * - Mnemonic + - Encoding + - Type + * - ``HPDCACHE_REQ_LOAD`` + - 0b0000 + - Read operation + * - ``HPDCACHE_REQ_STORE`` + - 0b0001 + - Write operation + * - ``HPDCACHE_REQ_AMO_LR`` + - 0b0100 + - Atomic Load-reserved operation + * - ``HPDCACHE_REQ_AMO_SC`` + - 0b0101 + - Atomic Store-conditional operation + * - ``HPDCACHE_REQ_AMO_SWAP`` + - 0b0110 + - Atomic SWAP operation + * - ``HPDCACHE_REQ_AMO_ADD`` + - 0b0111 + - Atomic integer ADD operation + * - ``HPDCACHE_REQ_AMO_AND`` + - 0b1000 + - Atomic bitwise AND operation + * - ``HPDCACHE_REQ_AMO_OR`` + - 0b1001 + - Atomic bitwise OR operation + * - ``HPDCACHE_REQ_AMO_XOR`` + - 0b1010 + - Atomic bitwise XOR operation + * - ``HPDCACHE_REQ_AMO_MAX`` + - 0b1011 + - Atomic integer signed MAX operation + * - ``HPDCACHE_REQ_AMO_MAXU`` + - 0b1100 + - Atomic integer unsigned MAX operation + * - ``HPDCACHE_REQ_AMO_MIN`` + - 0b1101 + - Atomic integer signed MIN operation + * - ``HPDCACHE_REQ_AMO_MINU`` + - 0b1110 + - Atomic integer unsigned MIN operation + * - ``HPDCACHE_REQ_CMO`` + - 0b1111 + - Cache Management Operation (CMO) + +Load and store operations are normal read and write operations from/to the +specified address. + +Atomic operations are the ones specified in the Atomic (A) extension of the +[RISCVUP2019]_. More details on how the HPDcache implements AMOs are found in +section :ref:`sec_amo`. + +CMOs are explained in :ref:`sec_cmo`. + +Source identifier +~~~~~~~~~~~~~~~~~ + +Each request identifies its source through the ``core_req_i.sid`` signal. The +``core_req_i.sid`` signal shall be decoded when the ``core_req_valid_i`` signal +is set to 1. The width of this signal is +:math:`\scriptsize\mathsf{CONF\_HPDCACHE\_REQ\_SRC\_ID\_WIDTH}` bits. +The HPDcache reflects the value of the **sid** of the request into the +corresponding **sid** of the response. + +Each port must have a unique ID that corresponds to its number. Each port is +numbered from 0 to N-1. This number shall be constant for a given port +(requester). The HPDcache uses this information to route responses to the +correct requester. + +Transaction identifier +~~~~~~~~~~~~~~~~~~~~~~ + +Each request identifies transactions through the +``core_req_i.tid`` signal. The +``core_req_i.tid`` signal shall be decoded when the +``core_req_valid_i`` signal is set to 1. The width of this signal is +:math:`\scriptsize\mathsf{CONF\_HPDCACHE\_REQ\_TRANS\_ID\_WIDTH}` bits. + +This signal can contain any value from 0 to +:math:`\scriptsize\mathsf{2^{CONF\_HPDCACHE\_REQ\_TRANS\_ID\_WIDTH} - 1}`. +The HPDcache forwards the value of the **tid** of the request into the **tid** +of the corresponding response. + +A requester can issue multiple transactions without waiting for earlier +transactions to complete. Because the HPDcache can respond to these transactions +in a different order than the one of requests, the requester can use the **tid** +to match the responses with respect to requests. + +The ID of transactions is not necessarily unique. A requester may reuse a given +transaction ID for different transactions. That is, even when some of these +transactions are not yet completed. However, when the requester starts multiple +transactions with the same **tid**, it cannot match responses and requests +because responses can be in a different order that the one of requests. + + +.. _sec_req_cacheability: + +Cacheability +~~~~~~~~~~~~ + +This cache considers that the memory space is segmented. A segment corresponds +to an address range: a **base address** and an **end address**. Some segments +are cacheable and others not. The HPDcache needs to know which segments are +cacheable to determine if for a given read request, it needs to copy the read +data into the cache. + +The request interface implements an uncacheable bit +(``core_req_i.pma.uncacheable`` or ``core_req_pma_i.uncacheable``). When this +bit is set, the access is considered uncacheable. The +``core_req_i.pma.uncacheable`` signal shall be decoded when the +``core_req_valid_i`` signal is set to 1. The ``core_req_pma_i.uncacheable`` +shall be decoded when the ``core_req_valid_i``, ``core_req_ready_o`` and the +``core_req_i.phys_indexed`` signals were set to 1 the previous cycle. + +.. admonition:: Caution + :class: caution + + For a given address, the uncacheable attribute must be consistent between + accesses. The granularity is the cacheline. **In the event that the same + address is accessed with different values in the uncacheable attribute, the + behavior of the cache for that address is unpredictable**. + +Need response +~~~~~~~~~~~~~ + +For any given request, a requester can set the bit ``core_req_i.need_rsp`` to 0 +to indicate that it does not want a response for that request. The +``core_req_i.need_rsp`` signal shall be decoded when the ``core_req_valid_i`` +signal is set to 1. + +When ``core_req_i.need_rsp`` is set to 0, the HPDcache processes the request +but it does not send an acknowledgement to the corresponding requester when the +transaction is completed. + +Error response +~~~~~~~~~~~~~~ + +The response interface contains a single-bit ``core_rsp_o.error`` signal. This +signal is set to 1 by the HPDcache when some error condition occurred during the +processing of the corresponding request. The ``core_rsp_o.error`` signal shall +be decoded when the ``core_rsp_valid_o`` signal is set to 1. + +When the ``core_rsp_o.error`` signal is set to 1 in the response, the effect of +the corresponding request is undefined. If this **error** signal is set in the +case of **LOAD** or **AMOs** operations, the **rdata** signal does not contain +any valid data. + +Cache-Memory Interface (CMI) Attributes +--------------------------------------- + +.. _CMI_type-of-operation: + +CMI Type of operation +~~~~~~~~~~~~~~~~~~~~~ + +.. list-table:: Memory request operation types + :widths: 35 15 50 + :header-rows: 1 + + * - Mnemonic + - Encoding + - Type + * - ``HPDCACHE_MEM_READ`` + - 0b00 + - Read operation + * - ``HPDCACHE_MEM_WRITE`` + - 0b01 + - Write operation + * - ``HPDCACHE_MEM_ATOMIC`` + - 0b10 + - Atomic operation + +``HPDCACHE_MEM_READ`` and ``HPDCACHE_MEM_WRITE`` are respectively normal read +and write operations from/to the specified address. + +In case of an atomic operation request (``HPDCACHE_MEM_ATOMIC``), the specific +operation is specified in the ``MEM_REQ_ATOMIC`` signal. These operations are +listed in :numref:`tab_mem_req_atomics_types`. Note that these +operations are compatible with the ones defined in the AMBA AXI prototol. + +.. _tab_mem_req_atomics_types: + +.. list-table:: Memory request atomic operation types + :widths: 35 15 50 + :header-rows: 1 + + * - Mnemonic + - Encoding + - Type + * - ``HPDCACHE_MEM_ATOMIC_ADD`` + - 0b0000 + - Atomic fetch-and-add operation + * - ``HPDCACHE_MEM_ATOMIC_CLR`` + - 0b0001 + - Atomic fetch-and-clear operation + * - ``HPDCACHE_MEM_ATOMIC_SET`` + - 0b0010 + - Atomic fetch-and-set operation + * - ``HPDCACHE_MEM_ATOMIC_EOR`` + - 0b0011 + - Atomic fetch-and-exclusive-or operation + * - ``HPDCACHE_MEM_ATOMIC_SMAX`` + - 0b0100 + - Atomic fetch-and-maximum (signed) operation + * - ``HPDCACHE_MEM_ATOMIC_SMIN`` + - 0b0101 + - Atomic fetch-and-minimum (signed) operation + * - ``HPDCACHE_MEM_ATOMIC_UMAX`` + - 0b0110 + - Atomic fetch-and-maximum (unsigned) operation + * - ``HPDCACHE_MEM_ATOMIC_UMIN`` + - 0b0111 + - Atomic fetch-and-minimum (unsigned) operation + * - ``HPDCACHE_MEM_ATOMIC_SWAP`` + - 0b1000 + - Atomic swap operation + * - ``HPDCACHE_MEM_ATOMIC_LDEX`` + - 0b1100 + - Load-exclusive operation + * - ``HPDCACHE_MEM_ATOMIC_STEX`` + - 0b1101 + - Store-exclusive operation + + +Type of operation per CMI request channel +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +As a reminder, the HPDcache implements two request channels to the memory: + +#. Memory read request channel +#. Memory write request channel + +:numref:`tab_optypes_by_cmi_req_channel` indicates the type of +operations that each of these two request channels can issue. + +.. _tab_optypes_by_cmi_req_channel: + +.. list-table:: Operation Types Supported by CMI Request Channels + :widths: 30 50 + :header-rows: 1 + + * - Type + - Channels + * - ``HPDCACHE_MEM_READ`` + - - CMI read request + * - ``HPDCACHE_MEM_WRITE`` + - - CMI write request + * - ``HPDCACHE_MEM_ATOMIC`` + - - CMI write request + + +Read-Modify-Write Atomic Operations +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The following atomic operations behave as read-modify-write operations: + +- ``HPDCACHE_MEM_ATOMIC_ADD`` +- ``HPDCACHE_MEM_ATOMIC_CLR`` +- ``HPDCACHE_MEM_ATOMIC_SET`` +- ``HPDCACHE_MEM_ATOMIC_EOR`` +- ``HPDCACHE_MEM_ATOMIC_SMAX`` +- ``HPDCACHE_MEM_ATOMIC_SMIN`` +- ``HPDCACHE_MEM_ATOMIC_UMAX`` +- ``HPDCACHE_MEM_ATOMIC_UMIN`` +- ``HPDCACHE_MEM_ATOMIC_SWAP`` + +These requests are forwarded to the memory through the CMI write request +interface. A particularity of these requests is that they generate two responses +from the memory: + +#. Old data value from memory is returned through the CMI read response + interface. + +#. Write acknowledgement is returned through the CMI write response interface. + +Both responses may arrive in any given order to the initiating HPDcache. + +Regarding errors, if any response has its **error** signal set to 1 +(``mem_resp_*_i.mem_resp_r_error`` or ``mem_resp_*_i.mem_resp_w_error``), the +HPDcache considers that the operation was not completed. It waits for both +responses and it forwards an error response (``core_rsp_o.error = 1``) to the +corresponding requester on the HPDcache requesters’ side. + + +Exclusive Load/Store Atomic Operations +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Exclusive load and store operations are issued as normal load and store +operations on the CMI read request interface and CMI write request interface, +respectively. + +Specific operation types are however used on these exclusive requests: +``HPDCACHE_MEM_ATOMIC_LDEX`` for loads; and +``HPDCACHE_MEM_ATOMIC_STEX`` for stores. + +These requests behave similarly to normal load and store to the memory but +provide some additional properties described in :ref:`sec_amo`. + +In the case of the ``HPDCACHE_MEM_ATOMIC_STEX`` request, the write +acknowledgement contains an additional information in the +``mem_resp_w_is_atomic`` signal. +If this signal is set to 1, the exclusive store was "atomic", hence the data was +written in memory. +If this signal is set to 0, the exclusive store was "non-atomic". Hence the +write operation was abandoned. + +The HPDcache uses exclusive stores in case of SC operations from requesters. +Depending on the ``mem_resp_w_is_atomic`` value, the HPDcache responds to the +requester according to the rules explained in :ref:`sec_amo`. A "non-atomic" +response is considered a **SC Failure**, and a "atomic" response is considered a +**SC Success**. + +CMI Transaction identifier +~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Each request identifies transactions through the ``mem_req_*_o.mem_req_id`` +signals. The ``mem_req_*_o.mem_req_id`` signal shall be decoded when the +``mem_req_*_valid_o`` signal is set to 1. The width of these ID signals is +:math:`\scriptsize\mathsf{CONF\_HPDCACHE\_MEM\_ID\_WIDTH}` bits. + +The target (memory or peripheral) shall respond to a request by setting the +``mem_resp*_i.mem_resp_*_id`` signal to the corresponding +``mem_req*_i.mem_req_id``. + +``mem_req_*_o.mem_req_id`` signals can contain any value from 0 to +:math:`\scriptsize\mathsf{2^CONF\_HPDCACHE\_MEM\_ID\_WIDTH - 1}`. + +The HPDcache can issue multiple memory transactions without waiting for earlier +transactions to complete. The HPDcache uses unique IDs for each request. Unique +IDs means that two or more in-flight requests never share the same ID. In-flight +requests are those that have been issued by the HPDcache but have not yet +received their respective response. + +The target (memory or peripheral) of the in-flight request may respond to CMI +in-flight requests in any order. + + +- **Transaction IDs in the CMI read request channel** + +The HPDcache can have the following number of in-flight read miss transactions: + + :math:`\scriptsize\mathsf{CONF\_HPDCACHE\_MSHR\_SETS{}\times{}CONF\_HPDCACHE\_MSHR\_WAYS}` + +Each in-flight transaction has a unique transaction ID. This ID is formatted as +follows: + + - For cacheable requests: + + ``(mshr_way << log2(HPDCACHE_MSHR_SETS)) | mshr_set`` + + The ID is the concatenation of two indexes: the MSHR set and the MSHR way + occupied by the corresponding request. + + - For uncacheable requests + + The HPDcache can issue up to 1 in-flight, uncached, read transaction. + Uncached transactions have a unique transaction ID with all bits set to 1. + + +- **Transaction IDs in the CMI wbuf write request channel** + +The HPDcache can have the following number of in-flight write transactions: + + :math:`\scriptsize\mathsf{CONF\_HPDCACHE\_WBUF\_DIR\_ENTRIES}` + +Each in-flight transaction has a unique transaction ID. This ID is formatted as +follows: + + - For cacheable requests: + + The ID corresponds to the index of the entry in the write-buffer directory. + + ``wbuf_dir_index`` + + + - For uncacheable requests + + The HPDcache can issue up to 1 in-flight, uncached, write transaction. + Uncached transactions have a unique transaction ID with all bits set to 1. + + +Event signals +------------- + +In addition to the performance registers explained in :ref:`sec_perf_counters`, +the HPDcache provides a set of one-shot signals that indicate when a given event +is detected. These signals are set to 1 for one cycle each time the +corresponding event is detected. If the same event is detected N cycles in a +row, the corresponding event signal will remain set to 1 for N cycles. +:numref:`Table %s ` lists these event signals. + +These event signals are output-only. They can be either left unconnected, if +they are not used, or connected with the remainder of the system. The system can +use those signals, for example, for counting those events externally or for +triggering some specific actions. + +.. _tab_events: + +.. list-table:: Event Signals in the HPDcache + :widths: 31 13 52 + :header-rows: 1 + + * - **Signal** + - **Source** + - **Description** + * - ``evt_o.write_req`` + - Cache + - Write request accepted + * - ``evt_o.read_req`` + - Cache + - Read request accepted + * - ``evt_o.prefetch_req`` + - Cache + - Prefetch request accepted + * - ``evt_o.uncached_req`` + - Cache + - Uncached request accepted + * - ``evt_o.cmo_req`` + - Cache + - CMO request accepted + * - ``evt_o.accepted_req`` + - Cache + - One request accepted (any type) + * - ``evt_o.cache_write_miss`` + - Cache + - Write miss event + * - ``evt_o.cache_read_miss`` + - Cache + - Read miss event + * - ``evt_o.req_onhold`` + - Cache + - Request put on-hold in the RTAB + * - ``evt_o.req_onhold_mshr`` + - Cache + - Request put on-hold because of a MSHR conflict + * - ``evt_o.req_onhold_wbuf`` + - Cache + - Request put on-hold because of a WBUF conflict + * - ``evt_o.req_onhold_rollback`` + - Cache + - Request put on-hold (again) after a rollback + * - ``evt_o.stall`` + - Cache + - Cache stalls request event + diff --git a/docs/source/overview.rst b/docs/source/overview.rst new file mode 100644 index 0000000..83a06e3 --- /dev/null +++ b/docs/source/overview.rst @@ -0,0 +1,87 @@ +.. + Copyright 2024 CEA* + *Commissariat a l'Energie Atomique et aux Energies Alternatives (CEA) + + SPDX-License-Identifier: Apache-2.0 WITH SHL-2.1 + + Licensed under the Solderpad Hardware License v 2.1 (the “License”); you + may not use this file except in compliance with the License, or, at your + option, the Apache License version 2.0. You may obtain a copy of the + License at + + https://solderpad.org/licenses/SHL-2.1/ + + Unless required by applicable law or agreed to in writing, any work + distributed under the License is distributed on an “AS IS” BASIS, WITHOUT + WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the + License for the specific language governing permissions and limitations + under the License. + + Authors : Cesar Fuguet + Description : HPDcache Overview + +Overview +======== + +This HPDcache is the responsible for serving data accesses issued by a RISC-V core, tightly-coupled accelerators and hardware memory prefetchers. +All these "clients" are called requesters. + +The HPDcache implements a hardware pipeline capable of serving one request per cycle. +An arbiter in the requesters’ interface of the HPDcache guarantees the correct behavior when there are multiple requesters. +This is illustrated in :numref:`Figure %s `. + +.. _fig_request_arbiter: + +.. figure:: images/hpdcache_highlevel_integration.* + :alt: High-Level View of the HPDcache Sub-System + :align: center + :width: 90% + + High-Level View of the HPDcache Sub-System + +List of features +---------------- + +- Support for multiple outstanding requests per requester. + +- Support for multiple outstanding read misses and writes to memory. + +- Processes one request per cycle. + +- Any given requester can access 1 to 32 bytes of a cacheline per cycle. + +- Reduced energy consumption by limiting the number of RAMs consulted per + request. + +- Fixed priority arbiter between requesters: the requester port with the lowest + index has the highest priority. + +- Non-allocate, write-through policy. + +- Hardware write-buffer to mask the latency of write acknowledgements from + the memory system. + +- Compliance with RVWMO. + +- For address-overlapping transactions, the cache guarantees that these are + committed in the order in which they are consumed from the requesters. + +- For non-address-overlapping transactions, the cache may execute them in an + out-of-order fashion to improve performance. + +- Support for CMOs: cache invalidation operations, and memory fences for + multi-core synchronisation. Cache invalidation operations support the ones + defined in the RISC-V CMO Standard. + +- Memory-mapped CSRs for runtime configuration of the cache, status and + performance monitoring. + +- Ready-Valid, 5 channels (3 request/2 response), interface to the memory. This + interface, cache memory interface (CMI), can be easily adapted to mainstream + NoC interfaces like AMBA AXI [AXI2020]_. + +- An adapter for interfacing with AXI5 is provided. + +- External (optional), configurable, hardware, memory-prefetcher that supports + up to 4 simultaneous prefetching streams. + diff --git a/docs/source/references.rst b/docs/source/references.rst new file mode 100644 index 0000000..cb24ece --- /dev/null +++ b/docs/source/references.rst @@ -0,0 +1,37 @@ +.. + Copyright 2024 CEA* + *Commissariat a l'Energie Atomique et aux Energies Alternatives (CEA) + + SPDX-License-Identifier: Apache-2.0 WITH SHL-2.1 + + Licensed under the Solderpad Hardware License v 2.1 (the “License”); you + may not use this file except in compliance with the License, or, at your + option, the Apache License version 2.0. You may obtain a copy of the + License at + + https://solderpad.org/licenses/SHL-2.1/ + + Unless required by applicable law or agreed to in writing, any work + distributed under the License is distributed on an “AS IS” BASIS, WITHOUT + WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the + License for the specific language governing permissions and limitations + under the License. + + Authors : Cesar Fuguet + Description : HPDcache References + + +References +========== + +.. [RISCVUP2019] The RISC-V Instruction Set Manual, Volume I: Unpriviledged ISA, + A. Waterman and K. Asanovic, 2019, + https://github.com/riscv/riscv-isa-manual/releases/download/Ratified-IMAFDQC/riscv-spec-20191213.pdf + +.. [RISCVP2019] The RISC-V Instruction Set Manual, Volume II: Priviledged + Architecture, + A. Waterman, K. Asanovic, and J. Hauser, 2021, + https://github.com/riscv/riscv-isa-manual/releases/download/Priv-v1.12/riscv-privileged-20211203.pdf + +.. [AXI2020] AMBA AXI and ACE Protocol Specification, ARM, 2020, + https://developer.arm.com/documentation/ihi0022/hc/?lang=en