Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core: Improved treatment of whitespace in assembly format #3044

Merged
merged 7 commits into from
Aug 16, 2024

Conversation

alexarice
Copy link
Collaborator

@alexarice alexarice commented Aug 16, 2024

This makes the following changes:

  • Makes assembly format recognise the empty whitespace directive ``
  • Fixes a bug in the assembly format parsing so that this is actually parsed correctly
  • Allows whitespace at the start of an optional group
  • Modifies existing stencil tests now that they print correctly (i.e. without a space in stencil.store
  • Adds tests for new functionality

These changes bring the assembly format closer to the assembly format of mlir

@alexarice alexarice added testing PRs that modify tests xdsl xdsl framework specific changes core xDSL core (ir, textual format, ...) labels Aug 16, 2024
@alexarice alexarice self-assigned this Aug 16, 2024
@@ -487,12 +487,21 @@ def parse_optional_group(self) -> FormatDirective:
anchor = then_elements[-1]
self.parse_punctuation("?")

if not then_elements:
# Pull whitespace element of front, as they are not parsed
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be keen to know if there's a more "pythonic" way of doing this.

Copy link

codecov bot commented Aug 16, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 89.86%. Comparing base (4a5e0df) to head (d46b201).
Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3044      +/-   ##
==========================================
- Coverage   89.86%   89.86%   -0.01%     
==========================================
  Files         415      415              
  Lines       52060    52078      +18     
  Branches     8045     8048       +3     
==========================================
+ Hits        46784    46799      +15     
- Misses       3983     3984       +1     
- Partials     1293     1295       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

@PapyChacal PapyChacal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!! stencil dialect be pretty 👌

And I thought it was partially supported ahah, but it was actually accidentally accepted as a degenerate keyword, is that right?

Comment on lines 491 to 495
whitespace_counter = 0
while whitespace_counter < len(then_elements) and isinstance(
then_elements[whitespace_counter], WhitespaceDirective
):
whitespace_counter += 1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's a maybe Pythonic way?

Suggested change
whitespace_counter = 0
while whitespace_counter < len(then_elements) and isinstance(
then_elements[whitespace_counter], WhitespaceDirective
):
whitespace_counter += 1
whitespace_counter = next(
(
i
for i, e in enumerate(then_elements)
if not isinstance(e, WhitespaceDirective)
),
len(then_elements),
)

I don't find it necessarily clearer myself, but answering as requested!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I personally find it less clear what this does, but happy to go either way

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For an attempt at pedagogy:

        #next: get the next thing of an iterable
        whitespace_counter = next(
            
            (
                # Just list the indices
                i
                # enumerate: list all elements and their indices ((0, l[0]), (1, l[1]), ..., (len(l)-1, ...))
                for i, e in enumerate(then_elements)
                # Here, filter those pairs to get only the non-whitespace ones
                if not isinstance(e, WhitespaceDirective)
            ),
            # Default value for next to return, if the iterable is empty/exhausted
            # (Here, if there are no elements or no non-whitespace elements)
            # Could use a None or what, I was just golfing for getting the exact same counting
            # with those here :) Do what suits you
            len(then_elements),
        )

So the above is literally "give me the index of the first non-whitespace element, or the number of elements if none"

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I personally find it less clear what this does, but happy to go either way

Yeah I don't mind myself; as long as it works and is understandable - I'd say both styles are both for the everyday xDSL dev!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if all the elements are whitespace then it should throw an error (as the next part of the code does)

Copy link
Collaborator

@PapyChacal PapyChacal Aug 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep this one kept it, you get the length of the list out so you would trigger the exception down

Otherwise, just remove the len(then_elements) from there and next() itself would throw an error, but not a nice hand-written one; You could thus try/catch but I'd find this way OTT here.

You could also just replace len(then_elements) in there by None and just

if whitespace_counter is None:
    # fancy_error

To make it less cognitively loaded!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend something like this:

Suggested change
whitespace_counter = 0
while whitespace_counter < len(then_elements) and isinstance(
then_elements[whitespace_counter], WhitespaceDirective
):
whitespace_counter += 1
first_non_whitespace_index = 0
for first_non_whitespace_index, el in enumerate(then_elements):
if isinstance(el, WhitespaceDirective):
break

@alexarice
Copy link
Collaborator Author

alexarice commented Aug 16, 2024

The content parsed for any whitespace directive was whatever whitespace followed the directive. It just so happens that in nearly every case this is a single space for both cases.

This meant `` ... actually parsed as the single space whitespace directive, the opposite of what you wanted to convey

@alexarice
Copy link
Collaborator Author

Also I don't know much about the stencil dialect, but is the result of this pr correct for it?

@PapyChacal
Copy link
Collaborator

PapyChacal commented Aug 16, 2024

Also I don't know much about the stencil dialect, but is the result of this pr correct for it?

Yeah, the stencil dialect is a modified one anyway so there is no strict compatibility to have; thanks for asking!
(Plus I like the change so 😋)

"""The whitespace that should be printed."""

def parse(self, parser: Parser, state: ParsingState) -> None:
pass

def print(self, printer: Printer, state: PrintingState, op: IRDLOperation) -> None:
printer.print(self.whitespace)
state.last_was_punctuation = False
state.last_was_punctuation = len(self.whitespace) == 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Length of strings is O(n) :)

Suggested change
state.last_was_punctuation = len(self.whitespace) == 0
state.last_was_punctuation = not bool(self.whitespace)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This string always has length 0 or 1

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we could compromise on the simpler self.whitespace == ""?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good

@alexarice
Copy link
Collaborator Author

alexarice commented Aug 16, 2024

I feel that this is pretty clean now (an amalgamation of some of the suggestions):

# Pull whitespace element of front, as they are not parsed
first_non_whitespace_index = None
for i, x in enumerate(then_elements):
    if not isinstance(x, WhitespaceDirective):
        first_non_whitespace_index = i
        break

if first_non_whitespace_index is None:
    self.raise_error("An optional group must have a non-whitespace directive")

@alexarice
Copy link
Collaborator Author

If no-one objects to the above way of getting the index I think this is ready to merge?

@alexarice alexarice merged commit e70919f into main Aug 16, 2024
10 checks passed
@alexarice alexarice deleted the alexarice/assembly-format branch August 16, 2024 13:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core xDSL core (ir, textual format, ...) testing PRs that modify tests xdsl xdsl framework specific changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants