Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow specifying constraints on encoding fields/operands #96

Open
PhilippvK opened this issue May 22, 2023 · 8 comments
Open

Allow specifying constraints on encoding fields/operands #96

PhilippvK opened this issue May 22, 2023 · 8 comments
Labels
enhancement New feature or request
Milestone

Comments

@PhilippvK
Copy link

Currently, constraints on operands/fields in the encoding are evaluated in the behavior block by explicitly raising an Illegal Instruction. It would be great if we could find a way to define those conditions somewhere else.

Examples:

  • RV32E: Hardwire the MSB of register operands to zero. (See #?)
  • Zpfsoperand: 64bit register pair operands made up of an even and odd register (rd & rd+1 where rd%2==0 and (optionally )non-overlapping constraints between source and dest registers)
  • Excluding reserved values in encoding fields
  • RVP: Implement SUNPKD810,SUNPKD820,SUNPKD830,SUNPKD831,SUNPKD832 in a single instruction block (see Add Mnemonic field to CoreDSL Syntax #80)
  • RVV: implement Vector Load/Stores in a more compact fashion (See Add Mnemonic field to CoreDSL Syntax #80, Allowing us to cut down >150 instruction blocks to less than 20)
@eyck eyck added the enhancement New feature or request label May 22, 2023
@eyck eyck added this to the CoreDSL 2.1 milestone May 22, 2023
@eyck
Copy link
Contributor

eyck commented May 22, 2023

Any proposals on some syntax? One option I can think of would be to add constraints to an instruction definition like:

        ADD {
            encoding: 7'b0000000 :: rs2[4:0] :: rs1[4:0] :: 3'b000 :: rd[4:0] :: 7'b0110011;
            constraints: rs2<RFS,rs1<RFS,rd<RFS
            assembly:"{name(rd)}, {name(rs1)}, {name(rs2)}";
            behavior: if (rd != 0) X[rd] = X[rs1] + X[rs2];
        }

In relation to #94 this would be more universal but would create more effort on the generation of e.g. decoders from the grammar.

@PhilippvK
Copy link
Author

@eyck I had something like this in mind. However here are some thoughts about the approach. We still would need to add this line for every single instruction. It would be great to specify this only once for a complete InstructionSet to make use of inheritance.

This on the other hand would add more complexity and we would need to define how the constraints should be merged (combine or overwrite). Since we do not have something like super() this might not be a very good idea. BTW we have a similar situation with the [enable=…] Attribute right now. (Undefined behavior if specified at different levels in the hierarchy. Please correct me if I am wrong)

Should we limit the constraints to encoding fields only or also allow static architectural state variables? This would add an alternative way to deal with different/optional implementations for XLEN=32/64 without the need to define two separate Instruction sets:

ADD64 {
    constraints: XLEN==32;
    …
}

ADD32 {
    constraints: XLEN==64;
    ….
}

FOOBAR {
    constraints: XLEN==32;
    …
}

FOOBAR {
    constraints: XLEN==64;
    …
}

@wysiwyng
Copy link

this could possibly be handled quite elegantly with #69, the operands section could also support constraints.

@AtomCrafty
Copy link
Contributor

AtomCrafty commented May 23, 2023

@wysiwyng Could you elaborate on that? The operands section from #69 is a list of parameter declarations. I don't immediately see how constraints could work there syntactically.

@PhilippvK The main issue I see with pulling common constraints out of the individual instructions is that the operands in question would not be declared in the same lexical scope. That feels very unintuitive to me and also poses a bunch of non-trivial implementation issues. An identifier in this constraint block would suddenly no longer be referencing a single declaration, but refer to all operands with the same name declared in multiple instructions, which wouldn't even necessarily have the same type.

I'm not a user of CoreDSL, so I might be misjudging things here, but it feels to me like most of these issues could easily be solved by pulling out the common checks to a function, without the need for any new language features.

image

And as a final note: In case you decide on the separate section syntax, I'd suggest changing the keyword to requires, just for aesthetical reasons, as it has the same number of characters as encoding, assembly, behavior (and operands).

@eyck
Copy link
Contributor

eyck commented May 23, 2023

@AtomCrafty from a language perspective I agree that the name based mapping comes with consistency issues.

But your function proposal comes also with overhead since you need at least 3 versions of the rfs_check function. Moreover we mix in here exception behavior by using the raise() function to indicate that the instruction is illegal. For these things constraints are much more appropriate. I had this morning some discussions with @wysiwyng and we came up with an idea as follows:

InstructionSet RV32I extends RISCVBase {
    architectural_state {
        XLEN = 32;
    }
    default_operands {
	operand unsigned<5> rs1 {{rs1<RFS}};
	operand unsigned<5> rs2 {{rs2<RFS}};
	operand unsigned<5> rd {{rd<RFS}};
    }
    instructions {
        LUI { // an instruction having an additional operand
	    operands: unsigned<XLEN> imm;
            encoding: imm[31:12] :: rd[4:0] :: 7'b0110111;
            assembly: "{name(rd)}, {imm:#05x}";
            behavior: if(rd!=0) X[rd] = imm;
        }
        SW {
	    operands: signed<XLEN> imm;
            encoding: imm[11:5] :: rs2[4:0] :: rs1[4:0] :: 3'b010 :: imm[4:0] :: 7'b0100011;
            assembly:"{name(rs2)}, {imm}({name(rs1)})";
            behavior: {
                unsigned<XLEN> store_address = X[rs1] + imm;
                MEM[store_address] = X[rs2];
            }
        }
        ADD { // an instruction only using default operands 
            encoding: 7'b0000000 :: rs2[4:0] :: rs1[4:0] :: 3'b000 :: rd[4:0] :: 7'b0110011;
            assembly:"{name(rd)}, {name(rs1)}, {name(rs2)}";
            behavior: if (rd != 0) X[rd] = X[rs1] + X[rs2];
        }
    }
}

@AtomCrafty
Copy link
Contributor

@eyck I like your solution.
It sidesteps the scoping issue by pulling out the declarations themselves. And it also seems to address the other issues I noted in #69. I believe the syntax can be improved, but conceptually this should work out nicely.

The way you wrote it above, this would introduce three new keywords and a completely new syntax construct with the double curly braces. I think we can largely re-use existing syntax the users are already familiar with.

  • Instead of default_operands I would just call the section operands and re-use the same keyword as within the instruction.
  • The operand keyword is redundant since all declarations in this section are operands, so it can be removed without replacement.
  • And finally, the brace syntax doesn't make much sense to me. Where would we for example put constraints that use multiple operands in their expression?
    I would just pull out the constraint expressions to the top level. This is syntactically consistent with how both declarations and assignments are allowed in the architectural_state section.
operands {
    unsigned<5> rs1;
    unsigned<5> rs2;
    unsigned<5> rd;
    
    rs1 < RFS && rs2 < RFS && rd < RFS;
}

The implementor can of course still choose to format it in a way that keeps declarations and related constraints close together.

operands {
    unsigned<5> rs1;    rs1 < RFS;
    unsigned<5> rs2;    rs2 < RFS;
    unsigned<5> rd;     rd < RFS;
}

@PhilippvK
Copy link
Author

@AtomCrafty

The main issue I see with pulling common constraints out of the individual instructions is that the operands in question would not be declared in the same lexical scope. That feels very unintuitive to me and also poses a bunch of non-trivial implementation issues.

I totally agree that this approach is probably not a good idea.

I like @eyck’s proposal (plus your comments) a lot as I am also a fan of #69 and I think this would be a great addition to the CoreDSL syntax. But we would need to define what happens if such Operands+Constraints are defined on different hierarchy levels. Would the complete operands {…} block be replaced or merged (conjunction vs. disjunction of constraints)?

@AtomCrafty
Copy link
Contributor

@PhilippvK From a language design point of view my approach would be to conceptually concatenate the blocks, so that all constraints defined on any level in the hierarchy must hold. As an implementor of the more abstract instruction set I should be able to rely on the constraints I defined myself. An instruction set B deriving from A should not be able to break assumptions made in A (a similar notion to the Liskov substitution principle).

Then again, I'm not familiar with the subject matter, so please let me know if there are valid use cases for breaking these assumptions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants