Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tool: tblgen-to-py script #3210

Merged
merged 28 commits into from
Oct 3, 2024
Merged

tool: tblgen-to-py script #3210

merged 28 commits into from
Oct 3, 2024

Conversation

alexarice
Copy link
Collaborator

This is the start of a "tblgen-to-py" script for reading the json output of llvm-tblgen --dump-json and creating an xdsl dialect. I would like some help/advice on how to best integrate this into the existing tooling. I think something similar to the "irdl-to-pyrdl" script would be good, where the irdl file is used to create a python file.

I've included some json files for reference, but as they are huge I expect we would want to cache the result of this script in practice.

Quite a few constraints are missing at the moment but this can be iterated on.

@alexarice alexarice added enhancement New feature or request help wanted Extra attention is needed question Further information is requested dialects Changes on the dialects tool labels Sep 23, 2024
@alexarice alexarice self-assigned this Sep 23, 2024
@alexarice alexarice marked this pull request as draft September 23, 2024 16:24
Copy link

codecov bot commented Sep 23, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Please upload report for BASE (main@cd2d585). Learn more about missing BASE report.
Report is 37 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #3210   +/-   ##
=======================================
  Coverage        ?   89.91%           
=======================================
  Files           ?      440           
  Lines           ?    55279           
  Branches        ?     8624           
=======================================
  Hits            ?    49705           
  Misses          ?     4149           
  Partials        ?     1425           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@alexarice
Copy link
Collaborator Author

Would it likely be better to go with "script that prints out some python code" than what I've done here? Programatically generating the classes seems to introduce some difficulties and makes caching harder

@tobiasgrosser
Copy link
Contributor

Would it likely be better to go with "script that prints out some python code" than what I've done here? Programatically generating the classes seems to introduce some difficulties and makes caching harder

I think that would make a lot of sense.

@alexarice
Copy link
Collaborator Author

alexarice commented Sep 25, 2024

Just pushed version 2 which just creates a python file. Think this is close to being ready, just need to work out how to test it

@alexarice
Copy link
Collaborator Author

alexarice commented Sep 26, 2024

Struggling to come up with a way to test this, as I'd ideally like a filecheck test similar to the one for irdl-to-pyrdl. The optional I can think of are:

  • Include a 17000 line json in the repository (surely not ok)
  • Manually cull test.json to the parts that are actually relevant
  • Make an automated way to cull the tblgen json files
  • Dynamically generate a test.json (needs access to llvm-tblgen and the mlir include files). This has the advantage of testing stuff a bit more end-to-end but I can't think of a good way to ensure access to the include files

Would love to hear any good ideas for this.

@tobiasgrosser
Copy link
Contributor

Manually cull test.json to the parts that are actually relevant

Either this or a manually crafted json file that tests the key features?

Copy link
Collaborator

@AntonLydike AntonLydike left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't get through all the code yet, but can you add some more comments, and maybe a readme for this tool? Oh, also some minimal tests would be super neat to see what it actually does!

index.json Outdated Show resolved Hide resolved
test.json Outdated Show resolved Hide resolved

@dataclass
class TblgenLoader:
js: Any
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a docstring to the class (and this attribute) to explain roughly what it does?

@AntonLydike
Copy link
Collaborator

@alexarice I think hand-crafting a json example would a good way to test this

@alexarice
Copy link
Collaborator Author

This is quite a dump of code, no? Is there no way to make this smaller/split the changes up?

I could remove all the argument constraints for a first pass, which would simplify it a little

Is there a reason why you go directly from TableGen, instead of doing the TableGen -> IRDL -> Python route? My guess is that IRDL cannot express what you currently want?

I agree with the end goal. At least short term my problems with it are:

  1. People aren't writing dialects in irdl, and probably won't be soon.
  2. There are a lot of constraints that can't be represented nicely in irdl at the moment (e.g. things like the shaped types)
  3. To add a new feature to IRDL, you need to update IRDL in mlir, update the tblgen-to-irdl script, and update the xdsl irdl importer. In contrast this only relies on the json tablegen dump which shouldn't need modification.
  4. It is still unclear to me how to nicely interface with builtin types in irdl. For example, in mlir the builtin Complex type is not parametrized. In this tool we can ignore the mlir representation and just use the xdsl one which is parametrized.

@math-fehr
Copy link
Collaborator

This is quite a dump of code, no? Is there no way to make this smaller/split the changes up?

I could remove all the argument constraints for a first pass, which would simplify it a little

Is there a reason why you go directly from TableGen, instead of doing the TableGen -> IRDL -> Python route? My guess is that IRDL cannot express what you currently want?

I agree with the end goal. At least short term my problems with it are:

1. People aren't writing dialects in irdl, and probably won't be soon.

2. There are a lot of constraints that can't be represented nicely in irdl at the moment (e.g. things like the shaped types)

3. To add a new feature to IRDL, you need to update IRDL in mlir, update the tblgen-to-irdl script, and update the xdsl irdl importer. In contrast this only relies on the json tablegen dump which shouldn't need modification.

4. It is still unclear to me how to nicely interface with builtin types in irdl. For example, in mlir the builtin `Complex` type is not parametrized. In this tool we can ignore the mlir representation and just use the xdsl one which _is_ parametrized.

Yeah I agree, let's do this right now, we'll see later if IRDL turns out to be usable enough to make it pass through it ;)

@alexarice
Copy link
Collaborator Author

Think I've fixed up all the comments, was there a consensus on trying to split this up into separate PRs?

Copy link
Member

@superlopuh superlopuh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The actual dialect definition is missing, do we want to include it? Also, the dialect name is missing from all the ops.

It would be great to have more documentation about this, like how the json was generated

Comment on lines 5 to 15
output = subprocess.run(
[
"xdsl-tblgen",
"-i",
"tests/tblgen_to_py/test.json",
],
capture_output=True,
text=True,
)

out_str = output.stdout
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to go via subprocess here now that we've separated the command-line parsing from the generation? This feels more like an integration test than a unit test to me...

Comment on lines 533 to 537
if output_file is not None:
with open(output_file, "w") as out_file:
print(json.dumps(culled), file=out_file)
else:
print(json.dumps(culled))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it feels like the caller should do this dance, and the out parameter here should be IO[str] | None


@irdl_attr_definition
class Test_SingletonAType(ParametrizedAttribute, TypeAttribute):
""""""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might as well also drop these empty docstrings

@alexarice
Copy link
Collaborator Author

Ready for review again

@alexarice alexarice merged commit 6a3446a into main Oct 3, 2024
14 checks passed
@alexarice alexarice deleted the alexarice/tblgen-to-py branch October 3, 2024 08:36
emmau678 pushed a commit that referenced this pull request Oct 8, 2024
This is the start of a "tblgen-to-py" script for reading the json output
of `llvm-tblgen --dump-json` and creating an xdsl dialect.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dialects Changes on the dialects enhancement New feature or request tool
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants