Add calyx-py AXI generator read channel (#1856)

* init commit of hardcoded axi wrapper for a 'main' kernel * add axi-reads-calix * hook up inputs to channels in the wrapper. tbd if this works * Working calyx verison of AR and R TBD if this actually implements AXI correctly. There are currently some hacks in place (marked with todos) to get this to compile, namely some splicing that doesn't consider what we actually want to splice (it just takes [31:0]) as opposed to dynamically considering actual bits we want. A few other things that should be cleaned up eventually Need to create a cocotb testbench to test correctness * Track output of compiled calyx read channel Maybe this shouldn't be here, but for now (having deleted my working directory earlier) putting it here * update gitignore to get rid of sim_build and other cocotb artifacts * Working make files for running cocotb tests Simply run make from the cocotb directory and axi-read-tests will be executed * Add xID signals for cocotb compatability We tie ARID low in our manager * Fix prefix issue on cocotb axi test bench Prefixes should not contain trailing "_" * commit to repro 'make WAVES=1' cocotb error from axi-reads-calyx.futil * axi-reads patch * sync debug * Add txn_len initialization to 16 in calyx program * AXI Read fixed to get to read channel start Got rid of "assert_val" and "block_transfer" groups and instead perform these things inside "do_ar_transfer", this is required because we cant assert valid before we drive the data correctly, so needs to happen in parallel. Currently: This seems to write 16 times to same place, this is due to hardcoding of 16 in ar transfer, not sure why address doesn't increment this is tbd (and next TODO) * Add integer byte conversion for tests on Calyx AXI testharness * WIP get reads to work. Add incr_curr_addr group This is part of read channel control sequence * remove .fst from tracking * Add more data to testbench to make waveform viewing easier * Reads seem to be terminating correctly at RLAST * AR transfers seem to work, valid is high for 1 cycle * Unreduced axi-reads-calyx.futil Also reduces data bus width to 32 * Cocotb testbench now passes * Formatted and passing axi-read-tests * Reduce and comment axi-reads-calyx.futil * remove axi-reads.v from being tracked * add a todo * add required ARPROT signal. This is hardcoded to be priviliged * rename directories to yxi/axi-calyx * initial commit of axi-writes-calyx, a copy of axi-reads-calyx * WIP axi writes * rename directories * WIP imlpementing writes * add testing for writes, note makefile is overwritten so now tests writes, not reads * passing axi writes and testing * init commit of hardcoded axi wrapper for a 'main' kernel * add axi-reads-calix * hook up inputs to channels in the wrapper. tbd if this works * Working calyx verison of AR and R TBD if this actually implements AXI correctly. There are currently some hacks in place (marked with todos) to get this to compile, namely some splicing that doesn't consider what we actually want to splice (it just takes [31:0]) as opposed to dynamically considering actual bits we want. A few other things that should be cleaned up eventually Need to create a cocotb testbench to test correctness * Track output of compiled calyx read channel Maybe this shouldn't be here, but for now (having deleted my working directory earlier) putting it here * Working make files for running cocotb tests Simply run make from the cocotb directory and axi-read-tests will be executed * Add xID signals for cocotb compatability We tie ARID low in our manager * Fix prefix issue on cocotb axi test bench Prefixes should not contain trailing "_" * commit to repro 'make WAVES=1' cocotb error from axi-reads-calyx.futil * axi-reads patch * sync debug * Add txn_len initialization to 16 in calyx program * AXI Read fixed to get to read channel start Got rid of "assert_val" and "block_transfer" groups and instead perform these things inside "do_ar_transfer", this is required because we cant assert valid before we drive the data correctly, so needs to happen in parallel. Currently: This seems to write 16 times to same place, this is due to hardcoding of 16 in ar transfer, not sure why address doesn't increment this is tbd (and next TODO) * Add integer byte conversion for tests on Calyx AXI testharness * WIP get reads to work. Add incr_curr_addr group This is part of read channel control sequence * remove .fst from tracking * Add more data to testbench to make waveform viewing easier * Reads seem to be terminating correctly at RLAST * AR transfers seem to work, valid is high for 1 cycle * Unreduced axi-reads-calyx.futil Also reduces data bus width to 32 * Cocotb testbench now passes * Formatted and passing axi-read-tests * Reduce and comment axi-reads-calyx.futil * remove axi-reads.v from being tracked * add a todo * add required ARPROT signal. This is hardcoded to be priviliged * rename directories to yxi/axi-calyx * initial commit of axi-writes-calyx, a copy of axi-reads-calyx * WIP axi writes * rename directories * WIP imlpementing writes * add testing for writes, note makefile is overwritten so now tests writes, not reads * passing axi writes and testing * Work on full AXI wrapper, reads and compute works * cleaned up combined futil and tests * delete axi-reads* which is subsumed by axi-combined * add axi-combined-tests.py * remove axi-writes as it is subsumed by axi-combined * formatting * Update yxi/axi-calyx/axi-combined-calyx.futil Co-authored-by: Adrian Sampson <asampson@cs.cornell.edu> * formatting * add sim.sh which goes from calyx to running tests * simplify valid.in signals * WIP: replace groups with reg invokes * add python file that enables waveform (vcd/fst) generation * formatting * simplify valid.in signals * WIP: replace groups with reg invokes * Replaces register-init groups with invokes * Formatting of invokes * Replace reg groups with invokes in main * Modify tests to account for base address != 0 * Separate base-address calyx-mem-address dependency This solution, made for our load->compute->store scheme, simply increments the base_addr and curr_addr differently. This should make it easy to have multiple transactions, which this hardcoded does not support * move incrs into par block * iitial axi-generator commit * WIP get arread-channel working * Finished ARREAD channel. TODO: Compare two, look at getting binary built. Look at improving *_use/modifying to fit needs better * Create m_to_s_address_channel for {AR,AW} channels * WIP: Add read channel * Finished read_channel. Still need to fix #1850 * Finished read channels * Remove read channel to break up into multiple PRs * Add read channel back --------- Co-authored-by: Rachit Nigam <rachit.nigam12@gmail.com> Co-authored-by: Adrian Sampson <asampson@cs.cornell.edu>
calyxir · Feb 16, 2024 · d08d239 · d08d239
1 parent 8e2fe15
commit d08d239
Showing 1 changed file with 122 additions and 3 deletions.
diff --git a/yxi/axi-calyx/axi-generator.py b/yxi/axi-calyx/axi-generator.py
@@ -4,9 +4,10 @@
  invoke,
  while_with,
  par,
+ while_,
 )
 from typing import Literal
-from math import log2
+from math import log2, ceil
 import json
 
 # In general, ports to the wrapper are uppercase, internal registers are lower case.
@@ -177,6 +178,124 @@ def _add_m_to_s_address_channel(prog, mem, prefix: Literal["AW", "AR"]):
  return m_to_s_address_channel
 
 
+def add_read_channel(prog, mem):
+ # Inputs/Outputs
+ read_channel = prog.component("m_read_channel")
+ # TODO(nathanielnrn): We currently assume RDATA is the same width as the
+ # memory. This limits throughput many AXI data busses are much wider
+ # i.e., 512 bits.
+ channel_inputs = [
+ ("ARESETn", 1),
+ ("RVALID", 1),
+ ("RLAST", 1),
+ ("RDATA", mem["width"]),
+ ("RRESP", 2),
+ ]
+ channel_outputs = [("RREADY", 1)]
+ add_comp_params(read_channel, channel_inputs, channel_outputs)
+
+ # Cells
+
+ # We assume idx_size is exactly clog2(len). See comment in #1751
+ # https://github.com/calyxir/calyx/issues/1751#issuecomment-1778360566
+ mem_ref = read_channel.seq_mem_d1(
+ name="mem_ref",
+ bitwidth=mem["width"],
+ len=mem["size"],
+ idx_size=clog2(mem["size"]),
+ is_external=False,
+ is_ref=True,
+ )
+
+ # according to zipcpu, rready should be registered
+ rready = read_channel.reg("rready", 1)
+ curr_addr = read_channel.reg("curr_addr", clog2(mem["size"]), is_ref=True)
+ base_addr = read_channel.reg("base_addr", 64, is_ref=True)
+ # Registed because RLAST is high with laster transfer, not after
+ # before this we were terminating immediately with
+ # last transfer and not servicing it
+ n_RLAST = read_channel.reg("n_RLAST", 1)
+ # Stores data we want to write to our memory at end of block_transfer group
+ read_data_reg = read_channel.reg("read_data_reg", mem["width"])
+
+ bt_reg = read_channel.reg("bt_reg", 1)
+
+ # Groups
+ with read_channel.continuous:
+ read_channel.this()["RREADY"] = rready.out
+ # Tie this low as we are only ever writing to seq_mem
+ mem_ref.read_en = 0
+
+ # Wait for handshake. Ensure that when this is done we are ready to write
+ # (i.e., read_data_reg.write_en = is_rdy.out)
+ # xVALID signals must be high until xREADY is high too, this works because
+ # if xREADY is high, then xVALID being high makes 1 flip and group
+ # is done by bt_reg.out
+ with read_channel.group("block_transfer") as block_transfer:
+ RVALID = read_channel.this()["RVALID"]
+ RDATA = read_channel.this()["RDATA"]
+ RLAST = read_channel.this()["RLAST"]
+ # TODO(nathanielnrn): We are allowed to have RREADY depend on RVALID.
+ # Can we simplify to just RVALID?
+
+ # rready.in = 1 does not work because it leaves RREADY high for 2 cycles.
+ # The way it is below leaves it high for only 1 cycle. See #1828
+ # https://github.com/calyxir/calyx/issues/1828
+
+ # TODO(nathanielnrn): Spec recommends defaulting xREADY high to get rid
+ # of extra cycles. Can we do this as opposed to waiting for RVALID?
+ rready.in_ = ~(rready.out & RVALID) @ 1
+ rready.in_ = (rready.out & RVALID) @ 0
+ rready.write_en = 1
+
+ # Store data we want to write
+ read_data_reg.in_ = RDATA
+ read_data_reg.write_en = (rready.out & RVALID) @ 1
+ read_data_reg.write_en = ~(rready.out & RVALID) @ 0
+
+ n_RLAST.in_ = RLAST @ 0
+ n_RLAST.in_ = ~RLAST @ 1
+ n_RLAST.write_en = 1
+
+ # We are done after handshake
+ bt_reg.in_ = (rready.out & RVALID) @ 1
+ bt_reg.in_ = ~(rready.out & RVALID) @ 0
+ bt_reg.write_en = 1
+ block_transfer.done = bt_reg.out
+
+ with read_channel.group("service_read_transfer") as service_read_transfer:
+ # not ready till done servicing
+ rready.in_ = 0
+ rready.write_en = 1
+
+ # write data we received to mem_ref
+ mem_ref.addr0 = curr_addr.out
+ mem_ref.write_data = read_data_reg.out
+ mem_ref.write_en = 1
+ service_read_transfer.done = mem_ref.done
+
+ # creates group that increments curr_addr by 1. Creates adder and wires up correctly
+ curr_addr_incr = read_channel.incr(curr_addr, 1)
+ # TODO(nathanielnrn): Currently we assume that width is a power of 2.
+ # In the future we should allow for non-power of 2 widths, will need some
+ # splicing for this.
+ # See https://cucapra.slack.com/archives/C05TRBNKY93/p1705587169286609?thread_ts=1705524171.974079&cid=C05TRBNKY93 # noqa: E501
+ base_addr_incr = read_channel.incr(base_addr, ceil(mem["width"] / 8))
+
+ # Control
+ invoke_n_RLAST = invoke(n_RLAST, in_in=1)
+ invoke_bt_reg = invoke(bt_reg, in_in=0)
+ while_body = [
+ invoke_bt_reg,
+ block_transfer,
+ service_read_transfer,
+ par(curr_addr_incr, base_addr_incr),
+ ]
+ while_n_RLAST = while_(n_RLAST.out, while_body)
+
+ read_channel.control += [invoke_n_RLAST, while_n_RLAST]
+
+
 # Helper functions
 def width_in_bytes(width: int):
  assert width % 8 == 0, "Width must be a multiple of 8."
@@ -198,9 +317,9 @@ def clog2(x):
 
 def build():
  prog = Builder()
- # add_arread_channel(prog, mems[0])
+ add_arread_channel(prog, mems[0])
  add_awwrite_channel(prog, mems[0])
- # add_read_channel(prog, mems[0])
+ add_read_channel(prog, mems[0])
  return prog.program