LLVM verification #356

AFOliveira · 2024-12-13T13:48:23Z

Following up on #258.

I've been doing work on what @lenary proposed as a first approach, I think this is an ok first mock-up. Still a WIP since many instructions still have bugs, but if you have any comments or recommendations, please LMK :).

I did not add LLVM as a submodule yet because it may be easier licensing wise to just point at it in the script? Usage is python3 riscv_parser.py <tablegen_json_file> <arch_inst_directory>"). I've also used jq to enhance readibility on the output of llvm-tblgen -I llvm/include -I llvm/lib/Target/RISCV llvm/lib/Target/RISCV/RISCV.td --dump-json -o <path-to-json-output>.

Signed-off-by: Afonso Oliveira <Afonso.Oliveira@synopsys.com>

lenary

I think this is a good start, and is heading in the right direction. Some immediate comments before I look again in the next few days.

Is there anything specific you want feedback on?

Which instructions are you running into issues with so far?

Another overall thought I have is: if this is a test/validation suite, could this be structured using pytest? For instance, you could parse the YAML and the JSON to match up instruction descriptions, and use that to create parameterized fixtures (one instance per matched description) - the advantage of this is that you get all of the niceness of pytest's asserts, and pytest's test suite reports, without having to reimplement all the testcase management. This also makes it easier to split the test code that checks the encoding matches from tests that check the assembly strings match (for example, but we can think of others).

lenary · 2024-12-13T14:15:50Z

ext/auto-inst/parsing.py

+        output_stream.write("-" * 20 + "\n")
+        output_stream.write(f"Name:              {name}\n")
+        output_stream.write(f"Assembly Format:   {safe_get(data, 'AsmString', 'N/A')}\n")
+        output_stream.write(f"Size:              {safe_get(data, 'Size', 'N/A')} bytes\n")


If an instruction doesn't have a Size, it cannot be encoded, so it's likely not of interest to checking the encoding.

lenary · 2024-12-13T14:21:53Z

ext/auto-inst/parsing.py

+        output_stream.write(f"Commutable:        {'Yes' if safe_get(data, 'isCommutable', 0) else 'No'}\n")
+        output_stream.write(f"Memory Load:       {'Yes' if safe_get(data, 'mayLoad', 0) else 'No'}\n")
+        output_stream.write(f"Memory Store:      {'Yes' if safe_get(data, 'mayStore', 0) else 'No'}\n")
+        output_stream.write(f"Side Effects:      {'Yes' if safe_get(data, 'hasSideEffects', 0) else 'No'}\n")


At some point we probably need a good discussion about these and how we verify them.

Broadly:

isCommutable will only be set if we can teach LLVM how to commute the instruction - using a hook implemented in C++. We do this quite a lot for e.g. inverting conditions. This is not something the assembler will do, it's done earlier, in code generation though.

mayLoad and mayStore should be accurate enough, but I guess you'll need to statically analyse the pseudocode to work out if a load or store happens. We really only model loads/stores to conventional memory (and not, e.g., loads for page table walks or permission checks). I'm not sure there's any modelling of ordering. These are used to prevent code motion during code generation.

hasSideEffects is a catch all for "be very careful with this instruction during codegen", and usually points to other effects that LLVM doesn't model. Generic CSR accesses are part of this (except the floating point CSR, which I think we model correctly), but so are other effects I haven't thought hard about.

Do you think those should be explained in the UDB as well?

I think this probably needs a longer discussion on how denormalised this information should be. i.e., the information is there if you statically-analyse the pseudocode (which should absolutely be something we should be able to do with the pseudocode), but we probably don't want that to be the only way to find out this sort of thing as the pseudocode operations might be a large amount of code.

I don't think you should be exactly matching LLVM's internal representation, but I do think there is the opportunity to denormalise more information that might be generally useful for this kind of tool.

I agree that this needs a more appropriate discussion, l think @dhower-qc has been working on instruction representations lately, so he probably also has been thinking about this.

I don't have time for this discussion before mid-January.

Let's try get the things that are already in the YAML done first, i.e.:

Instruction Encodings

Extensions and Profiles

ext/auto-inst/parsing.py

lenary · 2024-12-13T14:24:43Z

ext/auto-inst/parsing.py

+    """
+    Attempt to find a matching key in json_data for instr_name, considering different
+    naming conventions: replacing '.' with '_', and trying various case transformations.
+    """


I really suggest using the name from AsmString rather than this.

ok, I'll change that, thanks for the suggestion!

AFOliveira · 2024-12-13T15:40:16Z

I think this is a good start, and is heading in the right direction. Some immediate comments before I look again in the next few days.

Thanks for your feedback!

Is there anything specific you want feedback on?

Not yet, the point was just some general considerations about the initial approach.

Which instructions are you running into issues with so far?

I still didnt find a pattern, but I'll try to fix this soon and if I run into any issue I'm not able to solve by myself, I'll try to bring it up, thanks!

Another overall thought I have is: if this is a test/validation suite, could this be structured using pytest? For instance, you could parse the YAML and the JSON to match up instruction descriptions, and use that to create parameterized fixtures (one instance per matched description) - the advantage of this is that you get all of the niceness of pytest's asserts, and pytest's test suite reports, without having to reimplement all the testcase management. This also makes it easier to split the test code that checks the encoding matches from tests that check the assembly strings match (for example, but we can think of others).

I'll take a look into pytest and see how to port this, thanks for the suggestion!

AFOliveira requested a review from lenary December 13, 2024 13:48

Add simple Docker environment variable

d9b50b2

Signed-off-by: Afonso Oliveira <Afonso.Oliveira@synopsys.com>

AFOliveira force-pushed the AFOliveira/LLVM branch from a56b0b1 to d9b50b2 Compare December 13, 2024 14:13

lenary reviewed Dec 13, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LLVM verification #356

LLVM verification #356

AFOliveira commented Dec 13, 2024

lenary left a comment

lenary Dec 13, 2024

lenary Dec 13, 2024

AFOliveira Dec 13, 2024

lenary Dec 13, 2024

AFOliveira Dec 14, 2024 •

edited

Loading

lenary Dec 15, 2024

lenary Dec 13, 2024

AFOliveira Dec 13, 2024

AFOliveira commented Dec 13, 2024

LLVM verification #356

Are you sure you want to change the base?

LLVM verification #356

Conversation

AFOliveira commented Dec 13, 2024

lenary left a comment

Choose a reason for hiding this comment

lenary Dec 13, 2024

Choose a reason for hiding this comment

lenary Dec 13, 2024

Choose a reason for hiding this comment

AFOliveira Dec 13, 2024

Choose a reason for hiding this comment

lenary Dec 13, 2024

Choose a reason for hiding this comment

AFOliveira Dec 14, 2024 • edited Loading

Choose a reason for hiding this comment

lenary Dec 15, 2024

Choose a reason for hiding this comment

lenary Dec 13, 2024

Choose a reason for hiding this comment

AFOliveira Dec 13, 2024

Choose a reason for hiding this comment

AFOliveira commented Dec 13, 2024

AFOliveira Dec 14, 2024 •

edited

Loading