[SCFToCalyx] Draft SCFToCalyx pass #1630

mortbopet · 2021-08-24T13:26:32Z

SCF to Calyx conversion

This pass is intended to be part of a high-level synthesis flow for lowering SCF-based code into a statically scheduled FSMD model, through use of Calyx.

Preface: this is early work and as such is heavily subject to change; I'm mainly hoping for comments on the big-picture and general lowering flow. Various things such as the lowering of comb and return operations are based on assumptions, and not necessarily the current gospel of the Calyx dialect. The hope for this PR is also to spur discussion regarding what is missing in the currect Calyx dialect to be able to write a complete front-end.

Design

The pass follows a partial lowering style, somewhat similar as to what is done in StandardToHandshake. Throughout lowering, state is maintained in the ComponentLoweringState/ProgramLoweringState objects, which act as key-value stores. (see: 1)
Basic block arguments are passed through registers (see: 1, 2).
Registers are for basic block arguments, while loop iter args and function return values.
The contents of basic blocks are executed sequentially in the control schedule.
The input program should be a combination of scf + std control flow operations and arithmetic operations.
Calyx uses if- and while constructs for specifying control flow in the schedule. To ensure that only while loops reaches the pass, I've proposed a pass for converting SCF for loops to while loops.
Control flow backedges through branches are not supported (given the above). Instead, backedges are expected to be raised to while-loops before running the conversion pass.
Emitting the control schedule is done in two steps:
1. All possible control paths are traversed, and written in the schedule (simplifies writing the schedule). (see 1)
2. A set of simplification patterns are then run to reduce the size of the schedule (see 1)
index types used in conjunction with memory accesses are truncated to integers of equal width as the address port which they target. Index types used in arithmetic, which are not trivially linked to some memref, are assumed to be 32-bits. Proper bitwidth inference should be handled in a future pass.

mikeurbach · 2021-08-24T17:25:19Z

Thanks for sharing the draft @mortbopet. I have taken a quick look, and have some high level discussion points below. I'll checkout this branch and experiment with it myself.

Lowering standard operation primitives

I've been thinking about this problem as well. I think the approach you have to wrap each Standard operation in a component that uses Comb operations makes sense.

The alternative to define primitive operations in the Calyx dialect could also be a useful intermediate step, but wouldn't these ultimately lower into Comb operations later on (presumably in the CalyxToHW pass)? At some point, it would be great to lower to Comb, so that we can take advantage of all the optimizations in that dialect. Given that, I think what you have done so far to lower Standard operations to components using Comb operations is a good start.

Taking a step back, is the wrapping with components necessary? I was thinking a std.addi could lower directly to a comb.add placed directly inside the calyx.wires section. Could that work?

Let me propose the approach I've been thinking about, which could be used by this pass as well as others. This is about lowering Standard operations in general during HLS flows. I'm imagining something similar to how the Linalg structured op interface works. Specifically w.r.t. property 5. We could have a set of patterns that lower Standard/Math operations into their corresponding Comb operations (e.g. std.addi -> comb.add). And we could have a mechanism to register patterns that lower known operations to "library calls", which are implemented as external modules in hardware (e.g. std.addf -> Xilinx floating point IP instantiation). If a library call pattern matches, we could lower to the "library call", or else fall back to lowering to generic Comb logic when possible. These patterns could be used by this pass, but they could also be used in the Handshake lowering path (which currently lowers Standard operations to FIRRTL operations, but I'm hoping to slowly evolve off FIRRTL). We could also have a standalone StandardToComb pass using these patterns. Would it make sense to pull out the Standard to Comb/library lowering, and plug that into this as a library?

Representing loop kernels

You touched on having some sort of pipeline_for/while operation to represent this. I've been poking around with what that might look like. I was going to write a Discourse post about this today, but I can just share here.

One option I considered along the lines of pipeline_for would be to have a nest of scf.for loops wrapping a staticlogic.pipeline. Something like this:

    %c1 = constant 1 : index
    %c0 = constant 0 : index
    %c64 = constant 64 : index
    scf.for %arg3 = %c0 to %c64 step %c1 {
      scf.for %arg4 = %c0 to %c64 step %c1 {
        scf.for %arg5 = %c0 to %c64 step %c1 {
          "staticlogic.pipeline"(%arg3, %arg4, %arg5) ( {
          ^bb0(%arg6: index, %arg7: index, %arg8: index):  // no predecessors
            %0 = memref.load %arg0[%arg6, %arg8] : memref<?x64xf64>
            %1 = memref.load %arg1[%arg8, %arg7] : memref<?x64xf64>
            br ^bb1
          ^bb1:  // pred: ^bb0
            %2 = mulf %0, %1 : f64
            %3 = memref.load %arg2[%arg6, %arg7] : memref<?x64xf64>
            br ^bb2
          ^bb2:  // pred: ^bb1
            %4 = addf %3, %2 : f64
            br ^bb3
          ^bb3:  // pred: ^bb2
            memref.store %4, %arg2[%arg6, %arg7] : memref<?x64xf64>
            "staticlogic.return"() : () -> ()
          }) : (index, index, index) -> ()
        }
      }
    }

In my mind, the loop nest represents a set of address counters and an FSM controlling their increments, which are fed into the pipeline.

The staticlogic.pipeline is designed to be IsolatedFromAbove, and have the inputs to the pipeline passed as arguments to the first block. Another approach is to use scf.execute_region, which is sort of taking the opposite: it is not IsolatedFromAbove, and instead relies on SSA dominance to make the inputs available. This composes well with both the SCF dialect and the Affine dialect. This approach was hinted at in this comment: https://llvm.discourse.group/t/better-modelling-of-pipeline-loops/3917/6. That might look something like this:

    %c1 = constant 1 : index
    %c0 = constant 0 : index
    %c64 = constant 64 : index
    scf.for %arg3 = %c0 to %c64 step %c1 {
      scf.for %arg4 = %c0 to %c64 step %c1 {
        scf.for %arg5 = %c0 to %c64 step %c1 {
          scf.execute_region -> {
          ^bb0:  // no predecessors
            %0 = memref.load %arg0[%arg3, %arg5] : memref<?x64xf64>
            %1 = memref.load %arg1[%arg5, %arg4] : memref<?x64xf64>
            br ^bb1
          ^bb1:  // pred: ^bb0
            %2 = mulf %0, %1 : f64
            %3 = memref.load %arg2[%arg3, %arg4] : memref<?x64xf64>
            br ^bb2
          ^bb2:  // pred: ^bb1
            %4 = addf %3, %2 : f64
            br ^bb3
          ^bb3:  // pred: ^bb2
            memref.store %4, %arg2[%arg3, %arg4] : memref<?x64xf64>
            "staticlogic.return"() : () -> ()
          }
        }
      }
    }

Is there a benefit to staticlogic.pipeline versus scf.execute_region? To me they are quite similar, so I'm wondering if the IsolatedFromAbove version makes any transformation/analysis easier.

Your design point mentions basic block arguments should be used to infer registers. The above examples don't explicitly use block arguments, but they could. Another approach might be to analyze the live-ins for the blocks and infer registers appropriately. Perhaps there are other approaches. Curious what will make it easiest for this pass. At the moment, I'm leaning towards "block arguments represent registers", which could let frontends move Values that should be registered into block arguments, and any Values passed into a block through SSA dominance could be combinational in the hardware.

Finally, these examples represent the loops using scf.for, but I think your approach to have all scf.fors turned into scf.whiles as a pre-processing step makes sense. Really, I'm curious about what the body of the innermost loop nest should look like.

Scheduling infrastructure

I've been playing around with the Scheduling infrastructure in CIRCT. I imagine we would want to apply that earlier in the pipeline, before this pass, to determine the schedule at a high level. This seems to align with your design around basic blocks: we could use the Scheduling infrastructure to determine pipeline stages, and create a block for each stage. That's actually how the above examples were generated. Does this align with your thinking?

Pass infrastructure

I've been thinking about how to structure the phased application of patterns in passes like these. Have you had a look at the Linalg CodegenStrategy, which is built on applyStagedPatterns?

It seems like we could build our own HLS "CodegenStrategy", using applyStagedPatterns. To summarize, it uses three stages:

The first stage has a list of RewritePatternSets, and applies each set of patterns in the list in order. This could manage the places you have sequential calls to runPartialPatternLowering.

The second stage applies a single RewritePatternSet, which is responsible for cleanups/canonicalizations after the first stage. This could handle the CleanupFuncOps and runControlFlowSimplifications step. It could also include the Comb canonicalization patterns, if that makes sense.

The final stage applies a callback function to do any last minute processing that might not have fit into the earlier stages. I'm not sure there is a need for this.

Anyway, curious if you think it is worth exploring. I think the CodegenStrategy has been a success, so perhaps following a similar structure in this pass would make sense.

Memories

I have a local patch to add a MemoryOp primitive to Calyx, similar to the RegisterOp. I can share a PR for that.

Cell interface

It's mentioned as a TODO, and something I was planning to work on after the MemoryOp is added. I can implement a CellOpInterface, and make all the CalyxPrimitive operations (RegisterOp/MemoryOp) implement it. I think that would be useful for this as well.

mortbopet · 2021-08-24T20:03:03Z

@mikeurbach Thank you for the elaborate reply, it's great to get a discussion started on this.

Lowering standard operation primitives

The main motivator for doing it this way (inside a component) is to be able to share component instances. This is not so much a concern for this pass in its current state (since a new instance of an operator is instantiated for each operator in the circuit). However, if a significant Calyx infrastructure is built within CIRCT, there would be a need for the ability to share functional units during binding, and to have this expressible in the IR.
Benefitting from comb optimizations i would imagine would happen after a CalyxToHW pass, where the (now bound) set of functional units would have been inlined into some design.

On lowering std to comb, i think both of your proposals are great - a standalone pass would make sense seeing that partial lowering of std to comb arithmetic (excluding, for instance, float ops) is already implemented in StandardToHandshake, but in reality has nothing to do with the handshake dialect in and of itself; there is an almost 1:1 correspondence between comb and a subset of std, so a standalone pass would in my view have merit.

Next, as you mention, there are arithmetic operations which cannot be lowered, but might map to hard logic units on a target architecture. This could probably tie in with the discussion on target triples, where information on hard IP available on the target is tabulated. This, however, I think should be distinct from a StdToComb pass, to separate concerns between lowering "simple" arithmetic to "simple" combinational logic, and the more complex process of binding operators to a limited set of resources on a target device.

Representing loop kernels

I must admit that i am not a big fan of staticlogic.pipelines approach, mainly for the reason that it hijacks the use of blocks to be something that they are not. The most critical thing about this is that (as i understand) we remove the option of having control flow within a pipelined loop kernel. A control flow boundary should not imply a pipeline stage boundary - we might have plenty of combinational control flow (using muxes) within a pipeline stage. If pipeline stages are distinct (ordered) regions (like the scf.pipeline_for proposal) This then makes registers be inferred when there are cross-stage dependencies, and not cross basic-block dependencies (Again, this contradicts the approach implemented in this PR which has mainly been done like so given its simplicity).

Scheduling infrastructure

Yes, i see scheduling into pipeline stages as coming before SCFToCalyx. In essence, i think the SCFToCalyx pass should be kept as "dumb" as possible and not try to include too many circuit transformations. Such transformations should be kept to a (to be) HLS area within circuit, containing i.e., transformation of scf.for to scf.pipeline_for.
Lowering to scf.while is simple and doing this lowering mindful of i.e., a scf.pipeline_for body should not complicate things too much, seeing as the while loop only adds a block pre- and post loop body.

Pass infrastructure

These are some great pointers; the implementation in this PR was mainly guided by what i found when working on fixing InstanceOp lowering in StandardToHandshake, but if partial lowering is already a solved problem, then we should use that method. I'll look into it and come back with a more informed comment.

Memories

Cool! does it follow the memories available in the native calyx compiler?

Cell interface

Great! There is a lot of places where this interface is needed, and it would clean up things alot.

lib/Conversion/SCFToCalyx/SCFToCalyx.cpp

stephenneuendorffer · 2021-08-24T15:14:09Z

lib/Conversion/SCFToCalyx/SCFToCalyx.cpp

+///     ...  ->  }
+///   }
+/// }
+struct NestedSeqPattern : mlir::OpRewritePattern<calyx::SeqOp> {


This doesn't seem like it strictly needs to be part of the conversion. Did you consider making this a separate pass?

Just passing through and saw this; the native compiler indeed has a pass for collapsing control.

stephenneuendorffer · 2021-08-24T15:18:37Z

lib/Conversion/SCFToCalyx/SCFToCalyx.cpp

+    signalPassFailure();
+    return;
+  }
+  if (failed(runPartialPatternLowering<InlineExecuteRegionOpPattern>())) {


Seems like this could be made more compact by using logical operations on the results from runPartialPatternLowering:
if(successful(status)) status |= runPartialPatternLowering(...)

Or pass the status by reference into runPartialPatternLowering?

lib/Conversion/SCFToCalyx/SCFToCalyx.cpp

stephenneuendorffer · 2021-08-24T15:24:47Z

lib/Conversion/SCFToCalyx/SCFToCalyx.cpp

+  PartiallyLowerFuncToComp(mlir::FuncOp funcOp,
+                           PatternRewriter &rewriter) const override {
+    funcOp.walk([&](mlir::Operation *op) {
+      if (op->getDialect()->getNamespace() == "comb") {


Seems like you'd want an interface or a trait here?

stephenneuendorffer · 2021-08-24T20:11:14Z

I must admit that i am not a big fan of staticlogic.pipelines approach, mainly for the reason that it hijacks the use of blocks to be something that they are not. The most critical thing about this is that (as i understand) we remove the option of having control flow within a pipelined loop kernel. A control flow boundary should not imply a pipeline stage boundary

FWIW, the intention of staticlogic.pipeline is that control flow within the loop would already have been reduced to a pure pipeline. This is what allows the use of blocks to represent something that is semantically distinct from control flow.

mortbopet · 2021-08-24T20:20:15Z

I must admit that i am not a big fan of staticlogic.pipelines approach, mainly for the reason that it hijacks the use of blocks to be something that they are not. The most critical thing about this is that (as i understand) we remove the option of having control flow within a pipelined loop kernel. A control flow boundary should not imply a pipeline stage boundary

FWIW, the intention of staticlogic.pipeline is that control flow within the loop would already have been reduced to a pure pipeline. This is what allows the use of blocks to represent something that is semantically distinct from control flow.

I see, as in, the design is expected to be fully lowered to comb w/ control flow as muxes? It makes sense, however, makes me think about what level to lower to before targeting the Calyx flow.

mikeurbach · 2021-08-25T03:10:29Z

Thanks for the point-by-point reply @mortbopet. It sounds like we are more or less on the same page. To me, the most interesting part to discuss is the scf.pipeline_for level of abstraction we should be targetting from the higher levels. Looking forward to the ODM tomorrow.

cgyurgyik · 2021-08-25T04:26:53Z

A better representation might be to add a new operand which represent a primitive cell instance:
%left, %right, %out = calyx.prim sub : i32, i32, i32

+1 for this!

The main motivator for doing it this way (inside a component) is to be able to share component instances. This is not so much a concern for this pass in its current state (since a new instance of an operator is instantiated for each operator in the circuit). However, if a significant Calyx infrastructure is built within CIRCT, there would be a need for the ability to share functional units during binding, and to have this expressible in the IR.

I'm not sure I'm following. You're saying we can't represent instance sharing of ops if we have the above representation? I'm not quite sure I understand why this is the case. To see if a group is using some primitive, we just need to see if any port of the given primitive is found within the group, right?

Let me propose the approach I've been thinking about, which could be used by this pass as well as others. This is about lowering Standard operations in general during HLS flows. I'm imagining something similar to how the Linalg structured op interface works. Specifically w.r.t. property 5. We could have a set of patterns that lower Standard/Math operations into their corresponding Comb operations (e.g. std.addi -> comb.add). And we could have a mechanism to register patterns that lower known operations to "library calls", which are implemented as external modules in hardware (e.g. std.addf -> Xilinx floating point IP instantiation).

This is cool.

mortbopet · 2021-08-25T08:08:45Z

I'm not sure I'm following. You're saying we can't represent instance sharing of ops if we have the above representation? I'm not quite sure I understand why this is the case. To see if a group is using some primitive, we just need to see if any port of the given primitive is found within the group, right?

I should have been more clear; i was arguing for why i opted for using component wrapped comb ops in this draft PR instead of what is currently done in the Calyx mlir tests, i.e. using comb ops inside groups:

circt/test/Dialect/Calyx/round-trip.mlir

Lines 42 to 44 in eb6b178

    
                   // CHECK: calyx.group_done %c2.done, %0 ? : i1 
        
                   %guard = comb.and %c1_i1, %c2.out : i1 
        
                   calyx.group_done %c2.done, %guard ? : i1

The representation used in the PR (wrapping comb ops in a component) was used as a placeholder, to be able to have the in/out port SSA values of the comb operator dominate anything in the calyx.wires/calyx.control ops (to allow for sharing).

The %left, %right, %out = calyx.prim sub : i32, i32, i32 style was then proposed as a way to these wrapped components.

jopperm · 2021-08-25T08:16:18Z

Scheduling infrastructure

Yes, i see scheduling into pipeline stages as coming before SCFToCalyx.

I'm wondering if that's too early. Do you (@mortbopet, @mikeurbach) envision transformations of the pipeline, once it's formed?

My understanding of Calyx (mostly from their ASPLOS paper, no practical experience yet) is that one of its USPs is being a latency-insensitive target IR that takes the burden of emitting the FSMs off the frontend. Statically scheduling components then is an optimisation to simplify the FSM and omit the handshaking signals.

@rachitnigam, you said earlier that you are thinking about adding a data flow operator to the language -- is that still on the table?

If yes, then mabye SCFToCalyx could just map the SCF constructs to the ones in Calyx, and wrap their bodies in such a DF operator, which is scheduled "late" (i.e., really close to construction of the controller FSM) by circt::scheduling?

rachitnigam · 2021-08-25T11:14:03Z

@mortbopet Thanks for the awesome writeup! I'm super excited to see this work! @mikeurbach @stephenneuendorffer thanks for the comments and context around the other passes in CIRCT. One top-level suggestion: could we setup a meeting with people interested in Calyx + CIRCT so that the Calyx team can update all of you on our current trajectory? @cgyurgyik mentioned that the 25th August CIRCT meeting is going to have some Calyx discussion already. Unfortunately, I'm currently in an inconvenient time zone (India) and cannot make the meeting. If all of you are interested, I'd be happy to coordinate a meeting.

Next, a summary on some new things in Calyx that may address some question in this discussion:

The invoke operator: We've added a new operator to Calyx called invoke which corresponds to function calls. It is used by all our frontends (Dahlia, TCAM, Systolic, TVM) and should probably be a part of the Calyx CIRCT. This test shows how invoke can be used to run a component.
Combinational Groups: We've come to the realization that groups with a constant done condition (usually a 1'd1) are really just trying to encode a combinational circuit and that these groups need to be treated fundamentally differently from normal groups. For example, enabling a combinational group from a seq is meaningless because it cannot affect any state in the program. To this end, we want to make combinational groups and components a first class concept in Calyx. The first step towards this has been a new pass (implemented in this PR) that transforms all combinational groups to "real groups". I think it would be cool/useful for the Calyx CIRCT dialect to start off with this concept instead of waiting for Calyx main to implement it first.
Pipelines in Calyx: We have used the linked discussion thread a little bit to outline pipelines in Calyx. I initially thought that we need a new first-class operator to implement pipelines in Calyx but have recently figured out a way to encode pipelines using the existing control language. I think this might be a good place to start our discussion on pipelines Calyx and how they relate to staticlogic.pipeline and other representations.

Since I'll be asleep when the CIRCT meeting happens, I'm preemptively putting a when2meet link with my availability. I'll also add the link the CIRCT meeting notes for the 25th. If we end up not needing the meeting, feel free to ignore.

CC @sampsyo @EclecticGriffin @sgpthomas @cgyurgyik

mikeurbach · 2021-08-25T19:33:56Z

Following the discussion today, I'll post on Discourse about how we might evolve the staticlogic.pipeline into the most suitable representation for this pass to consume.

rachitnigam · 2021-08-26T03:39:07Z

According to when2meet, ~~9.30am-10.30am EST~~ 10am-11am EST seems to work best. Made a discourse post about the meeting: https://llvm.discourse.group/t/calyx-circt-meeting/4174

mortbopet · 2021-09-06T12:45:07Z

@rachitnigam If/when you have time, would you like to sanity check/play around with this pass? There will have to be some changes when invoke/comb groups are supported in the Calyx dialect, but I think we're closing in on a good initial version for the pass.

The pass is currently tested with the tests in test/Conversion/SCFToCalyx, so those could be entry your entry points.

rachitnigam · 2021-09-06T17:34:29Z

Happy to try it out in the next few days! However, things are likely to break again once calyxir/calyx#635 lands and makes comb group syntax required for if <port> with <comb-group> and while <port> with <comb-group> syntax. I'm going to make an issue in CIRCT repo to track required implementation changes.

lattner

I didn't review the pass itself in detail, but the integration into CIRCT looks good. Thank you for putting this in its own library!

lib/Conversion/SCFToCalyx/SCFToCalyx.cpp

rachitnigam · 2021-09-10T22:57:27Z

Ran into #1769 & #1770 when trying to push the code generated by this PR into the native Calyx compiler.

@mortbopet Looks like we've finished most of the tasks in #1690 and the next reasonable step is getting code generated in this PR working with the native Calyx compiler. I think it'd be helpful if you can start running the native Calyx compiler on the generated code and telling us where things fail. Happy to help you setup the compiler.

mortbopet · 2021-09-11T08:14:25Z

@mortbopet Looks like we've finished most of the tasks in #1690 and the next reasonable step is getting code generated in this PR working with the native Calyx compiler. I think it'd be helpful if you can start running the native Calyx compiler on the generated code and telling us where things fail. Happy to help you setup the compiler.

Will do! I have already got the native compiler setup locally, but I'll ping you if anything comes up.

rachitnigam · 2021-09-16T17:06:58Z

test/Conversion/SCFToCalyx/convert_controlflow.mlir

+// CHECK-NEXT:       }
+// CHECK-NEXT:       calyx.control  {
+// CHECK-NEXT:         calyx.seq  {
+// CHECK-NEXT:           calyx.enable @assign_while_0_init {compiledGroups = []}


What is the compiledGroups attributed used for here?

This would indicate that the calyx printer for calyx.enable per default prints the compiledGroups attribute of calyx.enable

circt/include/circt/Dialect/Calyx/CalyxControl.td

Lines 127 to 133 in 072abc3

let arguments = (ins

FlatSymbolRefAttr:$groupName,

OptionalAttr<ArrayAttr>:$compiledGroups

);

let assemblyFormat = "$groupName attr-dict";

let verifier = "return ::verify$cppClass(*this);";

}

rachitnigam · 2021-09-16T17:13:27Z

When lowering the following SCF program, the generated Calyx program has unused groups:

func @main(%a0 : i32, %a1 : i32) -> i32 {
  %b = cmpi uge, %a0, %a1 : i32
  cond_br %b, ^bb1(%a0: i32), ^bb2(%a1: i32)
^bb1(%aa: i32):
  %b11 = subi %aa, %a1 : i32
  %b12 = subi %aa, %a0 : i32
  br ^bb3(%b11, %b12: i32, i32)
^bb2(%ab: i32):
  %b21 = addi %ab, %a1 : i32
  %b22 = addi %ab, %a0 : i32
  br ^bb3(%b21, %b22: i32, i32)
^bb3(%r: i32, %r2: i32):
  %r3 = addi %r, %r2 : i32
  return %r3 : i32
}

Such groups can be eliminated using Calyx's -p dead-group-removal pass but usually indicate that the frontend forgot to use a group. If SCF is expected to generate some dead groups then this is not a problem.

mortbopet · 2021-09-16T17:32:11Z

When lowering the following SCF program, the generated Calyx program has unused groups:

This should be due to #1802. The output of SCFToCalyx has all groups referenced in the schedule (see checks in convert_controlflow.mlir).

mikeurbach · 2021-09-16T21:46:00Z

Let me try to summarize what this pass accepts as input:

Standard operations that have primitive counterparts in hardware
MemRef operations
CDFG with basic blocks and branches (i.e. un-structured control flow)
Structured control flow with execute region and while

I think we agree this pass continues to grow and should be landed so we can do more incremental work. But I want to make sure I'm on the same page w.r.t. the goal.

As the new pipeline representation emerges, and the scheduling infrastructure is applied, my hope is this pass can be greatly reduced in scope.

For one, I'm still doubting the role of CDFG at this level. It seems like the frontends we are coming from are able to generate Affine/SCF control flow, which we can schedule and represent as a pipeline without ever getting to the level of basic blocks. Is there a client that needs CDFG support? If there is, I'd still expect such programs to be scheduled and converted to a staticlogic pipeline before this. If we don't want to statically schedule a CDFG, then we already have the Handshake path.

Down the road, I'm imagining this pass need only accept the following:

Standard operations and MemRef operations, which can be lowered according to a component library
StaticLogic pipeline loops that have been scheduled

Does this target line up? I want to make sure if I do all the work laid out here, we can have a more simplified lowering to Calyx.

mortbopet · 2021-09-17T07:48:47Z

@mikeurbach with regards to the inputs, correct, those are what is currently accepted for the pass.

In terms of how this pass fits into the big picture long term, then i hope to sometime within the next couple of months start looking at interactions between the handshake and calyx paths (mixing dynamically and statically scheduled HLS). I'm unsure about whether we want to completely eliminate the option of passing CDFGs to SCFToCalyx, but i do understand the need for separation of concerns. I'm thinking that there might be programs with setup code before i.e. a pipelined loop, which looks and acts like an FSM and the sensible thing would be to just pass it directly to Calyx (through SCFToCalyx), instead of trying to make a pipeline out of something which shouldn't be.

jopperm · 2021-09-17T09:14:46Z

I'm thinking that there might be programs with setup code before i.e. a pipelined loop, which looks and acts like an FSM and the sensible thing would be to just pass it directly to Calyx (through SCFToCalyx), instead of trying to make a pipeline out of something which shouldn't be.

I agree, this is a valid concern. Another use-case: Loops with unbalanced control paths (e.g. simple logic on one side of branch, multi-cycle divider on the other) usually also benefit from a FSM-of-basic-blocks execution model, because the pipeline's latency is equal to the latency of the longest control path. The datapaths corresponding to each basic block could be statically scheduled before going into this pass, similar to the new staticlogic.pipeline.while op. (Though I'm not sure whether that would gain much over just passing the code to Calyx, other than consistency.)

rachitnigam · 2021-09-17T12:39:43Z

I think there is also a simple, understated benefit of keeping a non-pipelined flow working, even in the presence of pipelined flow—there is always a source of truth w.r.t. program execution.

Deciding when and what to pipeline feels like optimization and there should be another way to quickly get unoptimized but functional designs working. I add “quickly” because traditional SDC + SAT scheduling can get pretty slow with complex programs.

mikeurbach · 2021-09-17T14:02:38Z

Thanks for the ideas w.r.t. CDFGs and this pass. I can see how they have a place, but I still wonder if that belongs here in the long term, as they aren't really "SCF". Perhaps the pass just needs to be split up into ArithmeticToCalyx, CDFGToCalyx, SCFToCalyx, etc.

I'm also wondering about the interactions with Handshake, because when I think about a CDFG that doesn't make sense as a pipeline, it seems Handshake can fill the need. So far we've been talking about Calyx because it can represent a static schedule nicely, but taking advantage of its ability to do latency insensitive things would be interesting too. Perhaps we need to add HandshakeToCalyx to the mix.

there should be another way to quickly get unoptimized but functional designs working

This is already supported in the Handshake workflow, which goes all the way to Verilog that simulates. As far as I'm concerned, we are focused on the optimized part now. What we're doing here had better be an improvement on the dynamically scheduled dataflow, (for the programs we can analyze and statically schedule).

Anyway, I'm rambling. My real concern is this pass has become quite monolithic. I think this is a reasonable place to start, and we can break it up as we go. This is what happened with the StandardToLLVM pass upstream, for example.

rachitnigam · 2021-09-17T18:13:09Z

This is already supported in the Handshake workflow, which goes all the way to Verilog that simulates.

Ah, got it. I guess in that case an interesting but tangential question would be if it makes sense to lower handshake to Calyx since there are still some control/structural optimizations Calyx may be able to do that Handshake can't.

Anyway, I'm rambling. My real concern is this pass has become quite monolithic

Agreed on both points. Not having the pass be monolithic in the long term would be great but hats off to @mortbopet on getting it working!

stephenneuendorffer · 2021-09-17T18:21:26Z

I'm also wondering about the interactions with Handshake, because when I think about a CDFG that doesn't make sense as a pipeline, it seems Handshake can fill the need. So far we've been talking about Calyx because it can represent a static schedule nicely, but taking advantage of its ability to do latency insensitive things would be interesting too. Perhaps we need to add HandshakeToCalyx to the mix.

I see a hierarchy of solutions here:
Statically scheduled and pipelined
Statically scheduled and not pipelined
Dynamically scheduled and pipelined
Dynamically scheduled and not pipelined

Generally statically scheduled things (whether pipelined or not) can get lowered to an FSMD/Calyx style of representation, while the dynamically scheduled options can get lowered to Handshake. An important question is how we combine statically and dynamically scheduled regions, i.e., a Calyx Module inside an Handshake region.

cgyurgyik

I haven't looked through all of the code yet, and this is mostly nits and style/code changes.

This is really cool!! I think it is some great work for the first frontend to the Calyx dialect. However, the code size is making it difficult to review :-)

There has to be some way this PR can be split off, e.g.

boiler plate
lowering a stupid simple function with no basic blocks.
lowering a function that requires registers.
MemoryOp uses (and optionally splitting off between single / multiple read).
... // index casting, chaining combinationals, multiple returns, ...?

include/circt/Conversion/SCFToCalyx/SCFToCalyx.h

lib/Conversion/SCFToCalyx/SCFToCalyx.cpp

cgyurgyik · 2021-09-19T22:13:40Z

lib/Conversion/SCFToCalyx/SCFToCalyx.cpp

+/// inside other utility functions.
+template <
+    typename F,
+    std::enable_if_t<!std::is_void<typename std::result_of<F()>::type>::value,


NIt: Not entirely necessary, but may be useful to describe why this std::enable_if_t is being used, e.g. // Expecting F to be a non-void function. This isn't easy to read for C++ beginners (can't wait for C++20 concepts support).

I also question if this is better than the more canonical approach of using IRRewriter::InsertionGuard.

+1 on using IRRewriter::InsertionGuard, seems like the intention of that is exactly what i intended with persistInsertionPoint.

lib/Conversion/SCFToCalyx/SCFToCalyx.cpp

cgyurgyik · 2021-09-19T22:52:56Z

lib/Conversion/SCFToCalyx/SCFToCalyx.cpp

+    auto assignRegInOp =
+        rewriter.create<calyx::AssignOp>(loc, reg.in(), v, Value());


We should use a builder that doesn't take in a value for the guard, if that's what you're trying to do here. Feel free to open an issue for this.

This is already an issue: #1611

lib/Conversion/SCFToCalyx/SCFToCalyx.cpp

mortbopet · 2021-09-20T09:17:16Z

Thank you for the reviews @cgyurgyik; What i'll do is to close this PR and come up with a PR series in line with what you suggested.

mortbopet · 2021-09-20T15:30:52Z

Closing this in favor of a PR series starting from #1812

mortbopet changed the title ~~[SCFToCalyx] Initial commit for SCFToCalyx pass~~ [SCFToCalyx] Draft SCFToCalyx pass Aug 24, 2021

mortbopet added Calyx The Calyx dialect enhancement New feature or request labels Aug 24, 2021

stephenneuendorffer reviewed Aug 24, 2021

View reviewed changes

mortbopet mentioned this pull request Aug 25, 2021

[Calyx] Add a primitive cell instance operation #1636

Merged

mortbopet force-pushed the calyx/scftocalyx branch 4 times, most recently from a98f473 to dc65f0d Compare August 30, 2021 13:14

mortbopet mentioned this pull request Aug 30, 2021

[Calyx] Add helper functions to access MemoryOp ports (#1635) #1663

Merged

mortbopet force-pushed the calyx/scftocalyx branch 2 times, most recently from 68d75ca to 4f06c10 Compare September 2, 2021 15:25

mortbopet mentioned this pull request Sep 5, 2021

[Calyx] Round tripping with the native Calyx compiler #1690

Closed

12 tasks

mortbopet force-pushed the calyx/scftocalyx branch from 4f06c10 to bcc1c06 Compare September 6, 2021 09:29

lattner reviewed Sep 8, 2021

View reviewed changes

lib/Conversion/SCFToCalyx/SCFToCalyx.cpp Outdated Show resolved Hide resolved

mortbopet force-pushed the calyx/scftocalyx branch from be84269 to c5e9be7 Compare September 9, 2021 09:05

rachitnigam mentioned this pull request Sep 10, 2021

[Calyx] module compilation generates multiple redundant import statements #1770

Closed

rachitnigam mentioned this pull request Sep 13, 2021

[Calyx] [Tracker] Make SCF to Calyx work with native compiler #1777

Closed

17 tasks

mortbopet mentioned this pull request Sep 14, 2021

Lowering combinational groups and components calyxir/calyx#672

Closed

mortbopet force-pushed the calyx/scftocalyx branch 4 times, most recently from c77f64e to b55193b Compare September 16, 2021 15:03

mortbopet marked this pull request as ready for review September 16, 2021 16:32

rachitnigam requested a review from cgyurgyik September 16, 2021 17:05

rachitnigam reviewed Sep 16, 2021

View reviewed changes

mortbopet force-pushed the calyx/scftocalyx branch from 9a628b8 to 8eb6d28 Compare September 17, 2021 08:59

cgyurgyik requested changes Sep 19, 2021

View reviewed changes

[SCFToCalyx] Add SCFToCalyx conversion pass

cd895de

mortbopet force-pushed the calyx/scftocalyx branch from becde79 to cd895de Compare September 20, 2021 12:38

mortbopet closed this Sep 20, 2021

mortbopet deleted the calyx/scftocalyx branch September 24, 2021 07:14

	let arguments = (ins
	FlatSymbolRefAttr:$groupName,
	OptionalAttr<ArrayAttr>:$compiledGroups
	);
	let assemblyFormat = "$groupName attr-dict";
	let verifier = "return ::verify$cppClass(*this);";
	}

		auto assignRegInOp =
		rewriter.create<calyx::AssignOp>(loc, reg.in(), v, Value());

[SCFToCalyx] Draft SCFToCalyx pass #1630

[SCFToCalyx] Draft SCFToCalyx pass #1630

Conversation

mortbopet commented Aug 24, 2021 • edited Loading

SCF to Calyx conversion

Design

mikeurbach commented Aug 24, 2021

Lowering standard operation primitives

Representing loop kernels

Scheduling infrastructure

Pass infrastructure

Memories

Cell interface

mortbopet commented Aug 24, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stephenneuendorffer commented Aug 24, 2021

mortbopet commented Aug 24, 2021

mikeurbach commented Aug 25, 2021

cgyurgyik commented Aug 25, 2021 • edited Loading

mortbopet commented Aug 25, 2021 • edited Loading

jopperm commented Aug 25, 2021

rachitnigam commented Aug 25, 2021

mikeurbach commented Aug 25, 2021

rachitnigam commented Aug 26, 2021 • edited Loading

mortbopet commented Sep 6, 2021

rachitnigam commented Sep 6, 2021

lattner left a comment

Choose a reason for hiding this comment

rachitnigam commented Sep 10, 2021

mortbopet commented Sep 11, 2021

Choose a reason for hiding this comment

mortbopet Sep 16, 2021 • edited Loading

Choose a reason for hiding this comment

rachitnigam commented Sep 16, 2021

mortbopet commented Sep 16, 2021

mikeurbach commented Sep 16, 2021 • edited Loading

mortbopet commented Sep 17, 2021

jopperm commented Sep 17, 2021

rachitnigam commented Sep 17, 2021

mikeurbach commented Sep 17, 2021

rachitnigam commented Sep 17, 2021

stephenneuendorffer commented Sep 17, 2021

cgyurgyik left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mortbopet commented Sep 20, 2021

mortbopet commented Sep 20, 2021

mortbopet commented Aug 24, 2021 •

edited

Loading

mortbopet commented Aug 24, 2021 •

edited

Loading

cgyurgyik commented Aug 25, 2021 •

edited

Loading

mortbopet commented Aug 25, 2021 •

edited

Loading

rachitnigam commented Aug 26, 2021 •

edited

Loading

mortbopet Sep 16, 2021 •

edited

Loading

mikeurbach commented Sep 16, 2021 •

edited

Loading