Add Mnemonic field to CoreDSL Syntax #80

PhilippvK · 2023-03-02T15:41:02Z

Motivation

With CoreDSL 2 instruction names can not include dots (.) while there are many extensions using names with a dot (see first & second example). While this might be neglectible for HLS and ISS, it affects further integrations, such as disassembler-Generation.

Therefore I propose to add an optional mnemonic: field to the CoreDSL 2 syntax which can be used to provide the actual instruction name if the name used in CoreDSL does not match the real one.

In addition I oftern run into situation where combining multiple instructions (which minor differences in the encoding/behavior) into a single one (see second & third example below), which of course will end up having an invalid name for that instructions.

To deal with this sort of problem I would like to be able to use similar formating options as already allowed for the assembly: field. For dealing with more complex types of instructions (having non-trivial mappings between encoding/operands and names) we would need to come up with a more powerful variant of this feature (see third example)

Examples

Mnemonics with a dot: Custom Multiply-Accumulate (Pulp/CoreV)

Spec: https://github.com/openhwgroup/cv32e40p/blob/master/docs/source/instruction_set_extensions.rst#mac-encoding

Usage: cv.mac rd, rs1, rs2

Before:

CV_MAC {
  encoding: 7'b1001000 :: rs2[4:0] :: rs1[4:0] :: 3'b011 :: rd[4:0] :: 7'b0101011;
  assembly:"{name(rd)}, {name(rs1)}, {name(rs2)}";
  behavior: {
    signed<65> result = (signed)X[rs1] * (signed)X[rs2] + (signed)X[rd];
    if(rd != 0) X[rd] = result[31:0];
  }
}

Problems:

wrong mnemonic used in (dis)assembly (underscore instead of dot)

After:

CV_MAC {
  mnemonic: "cv.mac";
  encoding: 7'b1001000 :: rs2[4:0] :: rs1[4:0] :: 3'b011 :: rd[4:0] :: 7'b0101011;
  assembly:"{name(rd)}, {name(rs1)}, {name(rs2)}";
  behavior: {
    signed<65> result = (signed)X[rs1] * (signed)X[rs2] + (signed)X[rd];
    if(rd != 0) X[rd] = result[31:0];
  }
}

Potential problems:

None as long as the mnemonic field is optional
Upper case for Instruction names vs. lower case or mnemonic?

Trivial mnemonic formatting: Vector Strided Segment Loads (RVV)

See: https://github.com/openhwgroup/cv32e40p/blob/master/docs/source/instruction_set_extensions.rst#mac-encoding

Usage: vlsseg<nfields>e<eew>.v vd, (rs1), rs2, vm

Details:

nfields=1...8
eew=8,16,32,64
Results in 32 Instructions
Only define 1 (or 4) times with nfields (or nfields+eew) taken from encoding?
Many similar examples in RVV!

For now let's only consider nfields. (eew has non-trivial encoding)

Before (combined):

VLSSEGE64_V {
    encoding: nf[2:0] :: 1'b0 :: 2'b10 :: vm[0:0] :: rs2[4:0] :: rs1[4:0] :: 3'b111 :: vd[4:0] :: 7'b0000111;
    assembly:"{name(vd)}, {name(rs1)}, {name(vm)}";
    behavior: {
        unsigned<4> nfields = nf + 1;
        ... // call to external softvector lib
    }
}

Problems:

Wrong mnemonic used in (dis)assembly
Can not distinguish between the 8 variants

Before (separate):

VLSSEG1E64_V {
    encoding: 2'b00 :: 1'b0 :: 2'b10 :: vm[0:0] :: rs2[4:0] :: rs1[4:0] :: 3'b111 :: vd[4:0] :: 7'b0000111;
    assembly:"{name(vd)}, {name(rs1)}, {name(vm)}";
    behavior: {
        unsigned<4> nfields = 1;  // nf + 1
        ... // call to external softvector lib
    }
}
VLSSEG2E64_V { ... }
VLSSEG3E64_V { ... }
VLSSEG4E64_V { ... }
VLSSEG5E64_V { ... }
VLSSEG6E64_V { ... }
VLSSEG7E64_V { ... }
VLSSEG8E64_V { ... }

Problems:

much redundant code
wrong mnemonic (underscore instead of dot)

After (combined):

VLSSEGE64_V {
    mnemonic: "vlsseg{nf+1}e64.v";
    encoding: nf[2:0] :: 1'b0 :: 2'b10 :: vm[0:0] :: rs2[4:0] :: rs1[4:0] :: 3'b111 :: vd[4:0] :: 7'b0000111;
    assembly:"{name(vd)}, {name(rs1)}, {name(vm)}";
    behavior: {
        unsigned<4> nfields = nf + 1;
        ... // call to external softvector lib
    }
}

Potential problems:

allow access to operands during formatting?
allow {imm:#08x} style formatting similary to assembly definition?

Non-trivial mnemonic formatting : Byte Unpacking (RVP)

Spec: https://github.com/riscv/riscv-p-spec/blob/master/P-ext-proposal.adoc#sunpkd810-sunpkd820-sunpkd830-sunpkd831-sunpkd832

Usage:

SUNPKD810 rd, rs1 (Signed Unpacking Bytes 1 & 0)
SUNPKD820 rd, rs1 (Signed Unpacking Bytes 2 & 0)
SUNPKD830 rd, rs1 (Signed Unpacking Bytes 3 & 0)
SUNPKD831 rd, rs1 (Signed Unpacking Bytes 3 & 1)
SUNPKD832 rd, rs1 (Signed Unpacking Bytes 3 & 2)

Details:

There are many further instructions in RVP
- i.e. having suffix: BB(Bottom/Bottom), TT(Top/Top), BT(Bottom/Top), TB(Top/Bottom)

Before (combined):

SUNPKD8 { // or SUNPKD8XY
    encoding: 7'b1010110 :: code[4:0] :: rs1[4:0] :: 3'b000 :: rd[4:0] :: 7'b1110111;
    assembly:"{name(rs1)}, {name(rd)}";
    behavior: {
        if(rd != 0) {
            unsigned<32> rs1_val = X[rs1];
            if(code == 5'b01000) {  // SUNPKD810
                signed<8> rs1_val_hi = rs1_val[15:8];
                signed<8> rs1_val_lo = rs1_val[7:0];
            } else if (code == 5'b01001) {  // SUNPKD820
                signed<8> rs1_val_hi = rs1_val[23:16];
                signed<8> rs1_val_lo = rs1_val[7:0];
            } else if (code == 5'b01010) {  // SUNPKD830
                signed<8> rs1_val_hi = rs1_val[31:24];
                signed<8> rs1_val_lo = rs1_val[7:0];
            } else if (code == 5'b01011) {  // SUNPKD831
                signed<8> rs1_val_hi = rs1_val[31:24];
                signed<8> rs1_val_lo = rs1_val[15:8];
            } else if (code == 5'b10011) {  // SUNPKD832
                signed<8> rs1_val_hi = rs1_val[31:24];
                signed<8> rs1_val_lo = rs1_val[23:16];
            } else {
                raise(0, 2);  // Invalid instruction
            }
            X[rd] = (signed<16>)rs1_val_hi :: (unsigned<16>)(signed<16>)rs1_val_lo;
        }
    }
}

Problems:

Wrong mnemonic used in (dis)assembly
can not distinguish between the 5 variants

Before (separate):

SUNPKD810 {
    encoding: 7'b1010110 :: 5'b01000 :: rs1[4:0] :: 3'b000 :: rd[4:0] :: 7'b1110111;
    assembly:"{name(rs1)}, {name(rd)}";
    behavior: {
        if(rd != 0) {
            unsigned<32> rs1_val = X[rs1];
            signed<8> rs1_val_hi = rs1_val[15:8];
            signed<8> rs1_val_lo = rs1_val[7:0];
            X[rd] = (signed<16>)rs1_val_hi :: (unsigned<16>)(signed<16>)rs1_val_lo;
        }
    }
}
SUNPKD820 { ... }
SUNPKD830 { ... }
SUNPKD831 { ... }
SUNPKD832 { ... }

Problems:

Too much redundant code

Before (separate + helper function):

unsigned<32> sunpkd8_helper(unsigned<32> data, unsigned<5> code) {
    if(code == 5'b01000) {  // SUNPKD810
        signed<8> rs1_val_hi = rs1_val[15:8];
        signed<8> rs1_val_lo = rs1_val[7:0];
    } else if (code == 5'b01001) {  // SUNPKD820
        signed<8> rs1_val_hi = rs1_val[23:16];
        signed<8> rs1_val_lo = rs1_val[7:0];
    } else if (code == 5'b01010) {  // SUNPKD830
        signed<8> rs1_val_hi = rs1_val[31:24];
        signed<8> rs1_val_lo = rs1_val[7:0];
    } else if (code == 5'b01011) {  // SUNPKD831
        signed<8> rs1_val_hi = rs1_val[31:24];
        signed<8> rs1_val_lo = rs1_val[15:8];
    } else if (code == 5'b10011) {  // SUNPKD832
        signed<8> rs1_val_hi = rs1_val[31:24];
        signed<8> rs1_val_lo = rs1_val[23:16];
    } else {
        raise(0, 2);  // Invalid instruction
    }
    return (signed<16>)rs1_val_hi :: (unsigned<16>)(signed<16>)rs1_val_lo;
}

SUNPKD810 {
    encoding: 7'b1010110 :: 5'b01000 :: rs1[4:0] :: 3'b000 :: rd[4:0] :: 7'b1110111;
    assembly:"{name(rs1)}, {name(rd)}";
    behavior: {
        if(rd != 0) {
            X[rd] = sunpkd8_helper(X[rs1], 5'b01000)
        }
    }
}
SUNPKD820 { ... }
SUNPKD830 { ... }
SUNPKD831 { ... }
SUNPKD832 { ... }

Problem:

Less intuitive
Encoding etc. still redundant

After (combined only):

string decode_xy(unsigend<5> code) {
    if(code == 5'b01000) {  // SUNPKD810
        return "10";
    } else if (code == 5'b01001) {  // SUNPKD820
        return "20";
    } else if (code == 5'b01010) {  // SUNPKD830
        return "30";
    } else if (code == 5'b01011) {  // SUNPKD831
        return "31";
    } else if (code == 5'b10011) {  // SUNPKD832
        return "32";
    } else {
        return "";
    }
}

SUNPKD8 { // or SUNPKD8XY
    mnemonic: "sunpkd8{decode_xy(code)}"
    encoding: 7'b1010110 :: code[4:0] :: rs1[4:0] :: 3'b000 :: rd[4:0] :: 7'b1110111;
    assembly:"{name(rs1)}, {name(rd)}";
    behavior: {
        if(rd != 0) {
            unsigned<32> rs1_val = X[rs1];
            if(code == 5'b01000) {  // SUNPKD810
                signed<8> rs1_val_hi = rs1_val[15:8];
                signed<8> rs1_val_lo = rs1_val[7:0];
            } else if (code == 5'b01001) {  // SUNPKD820
                signed<8> rs1_val_hi = rs1_val[23:16];
                signed<8> rs1_val_lo = rs1_val[7:0];
            } else if (code == 5'b01010) {  // SUNPKD830
                signed<8> rs1_val_hi = rs1_val[31:24];
                signed<8> rs1_val_lo = rs1_val[7:0];
            } else if (code == 5'b01011) {  // SUNPKD831
                signed<8> rs1_val_hi = rs1_val[31:24];
                signed<8> rs1_val_lo = rs1_val[15:8];
            } else if (code == 5'b10011) {  // SUNPKD832
                signed<8> rs1_val_hi = rs1_val[31:24];
                signed<8> rs1_val_lo = rs1_val[23:16];
            } else {
                raise(0, 2);  // Invalid instruction
            }
            X[rd] = (signed<16>)rs1_val_hi :: (unsigned<16>)(signed<16>)rs1_val_lo;
        }
    }
}

Potential problems:

See previous example
needs string type?
allow to calling helper functions during formatting?
- BTW: {name(...)} is also allowed and backend-implementation specific which is a bit unintuitive. With the proposed change, this could be implemented as (external) function instead.

The text was updated successfully, but these errors were encountered:

PhilippvK · 2023-03-02T15:42:32Z

CC @eyck @jopperm @wysiwyng @DanMueGri

jopperm · 2023-03-06T03:25:11Z

Neat! I agree that this would be important to support more real-world extensions. @PhilippvK can you present this proposal in the WG call on 13/3?

PhilippvK · 2023-03-06T08:03:25Z

I can present it next Monday!

jopperm · 2023-03-20T15:17:41Z

I talked with @AtomCrafty about allowing dots in the instruction name. Having two kinds of identifiers is actually not possible because the lexer cannot distinguish them when tokenizing the input. We could

a) allow strings for the name

instructions {
  ADDI {...}
  "cv.mac" {...}
}

or

b) parse ID-with-dots as expressions (which would require some processing in the downstream tools, though the effort should be manageable, as instruction names are not references anywhere else).

Thoughts?

PhilippvK · 2023-04-06T09:45:26Z

I figured out that the XCoreVMem ISA extension has some more good examples for the Mnemonic use-case: https://cv32e40p.readthedocs.io/en/latest/instruction_set_extensions.html#load-operations

They have register-register load/store instructions with and without Post-Increment which share the same mnemonic but use differend assembly arguments for differentiation (beside the encoding of course):

cv.lb rD, rs2(rs1) vs. cv.lb rD, rs2(rs1!)

Using the mnemonic as CoreDSL identifier leads to two instructions with the same name which can lead to issues. In my opinion these identifiers should be unique, even if only the encoding is used on the backend side. Hence I would prefer to use the following syntax:

CV_LB_rr {
    mnemonic: "cv.lb";
    assembly: "{name(rd)}, {name(rs2)}({name(rs1)})";
    ....
}
CV_LB_rr_inc {
    mnemonic: "cv.lb";
    assembly: "{name(rd)}, {name(rs2)}({name(rs1)!})";
    ....
}

Would you now agree, that the mnemonic: field would be a better solution, than just allowing . in instruction identifiers?
@jopperm @wysiwyng @eyck

eyck · 2023-04-06T09:59:15Z

There is no objection wrt. to the possibility if specifying a mnemonic. But I'm a bit hessitant to add keywords to the language itself.
Why not extending the assembly to either take a string (as of now) or take a list of strings enclosed in braces. The example would look like:

CV_LB_rr {
    assembly: {"cv.lb", "{name(rd)}, {name(rs2)}({name(rs1)})"};
    ....
}
CV_LB_rr_inc {
    assembly: { "cv.lb", "{name(rd)}, {name(rs2)}({name(rs1)!})"};
    ....
}

Since the frontend is not validating the content of the string this would be a minor change in the grammar and validation framework (@AtomCrafty correct my if I'm wrong)

AtomCrafty · 2023-04-06T10:02:06Z

I believe that would be a pure grammar change. The frontend doesn't perform any validation on the assembly field.

PhilippvK · 2023-04-06T10:02:13Z

@eyck I am totally fine with that proposal as long as it is properly documented in the manual

jopperm · 2023-04-06T10:17:18Z

SGTM!

jopperm · 2023-04-07T14:49:20Z

Frontend support has been implemented. I'll leave this issue open to track the necessary changes to the spec.

jopperm · 2023-04-12T12:47:53Z

Diff

jopperm added the enhancement New feature or request label Mar 6, 2023

jopperm added this to the CoreDSL 2.1 (202x) milestone Apr 6, 2023

eyck modified the milestones: CoreDSL 2.1, CoreDSL 2.0 Apr 6, 2023

eyck assigned AtomCrafty and jopperm Apr 6, 2023

AtomCrafty mentioned this issue Apr 6, 2023

Milestone 2.0 #91

Merged

jopperm added the documentation Improvements or additions to documentation label Apr 7, 2023

jopperm unassigned AtomCrafty Apr 7, 2023

jopperm removed the enhancement New feature or request label Apr 7, 2023

jopperm closed this as completed Apr 12, 2023

PhilippvK mentioned this issue May 22, 2023

Allow specifying constraints on encoding fields/operands #96

Open

PhilippvK mentioned this issue Aug 21, 2024

Updates DLR-SE/riscv-coredsl-extensions#3

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Mnemonic field to CoreDSL Syntax #80

Add Mnemonic field to CoreDSL Syntax #80

PhilippvK commented Mar 2, 2023 •

edited

Loading

PhilippvK commented Mar 2, 2023

jopperm commented Mar 6, 2023

PhilippvK commented Mar 6, 2023

jopperm commented Mar 20, 2023

PhilippvK commented Apr 6, 2023

eyck commented Apr 6, 2023

AtomCrafty commented Apr 6, 2023

PhilippvK commented Apr 6, 2023

jopperm commented Apr 6, 2023

jopperm commented Apr 7, 2023 •

edited

Loading

jopperm commented Apr 12, 2023

Add Mnemonic field to CoreDSL Syntax #80

Add Mnemonic field to CoreDSL Syntax #80

Comments

PhilippvK commented Mar 2, 2023 • edited Loading

Motivation

Examples

Mnemonics with a dot: Custom Multiply-Accumulate (Pulp/CoreV)

Trivial mnemonic formatting: Vector Strided Segment Loads (RVV)

Non-trivial mnemonic formatting : Byte Unpacking (RVP)

PhilippvK commented Mar 2, 2023

jopperm commented Mar 6, 2023

PhilippvK commented Mar 6, 2023

jopperm commented Mar 20, 2023

PhilippvK commented Apr 6, 2023

eyck commented Apr 6, 2023

AtomCrafty commented Apr 6, 2023

PhilippvK commented Apr 6, 2023

jopperm commented Apr 6, 2023

jopperm commented Apr 7, 2023 • edited Loading

jopperm commented Apr 12, 2023

PhilippvK commented Mar 2, 2023 •

edited

Loading

jopperm commented Apr 7, 2023 •

edited

Loading