Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure } and { are valid boundary characters #17001

Merged
merged 9 commits into from
Mar 6, 2025
Merged

Conversation

RobinMalfait
Copy link
Member

@RobinMalfait RobinMalfait commented Mar 6, 2025

In a lot of templating languages, {…} is used to escape the template language to write some code logic, this means that if you have input that looks like this:

<div class="flex items-center mx-4{% if session.isValid %}{% else %} h-4{% endif %}"></div>

That the if condition will be gone once it's compiled to HTML, but it also means that the } and { characters next to mx-4{ and h-4 should be valid boundary characters because Tailwind CSS typically operates on the source code, not the compiled code.

I didn't create a pre processor for this because unfortunately this syntax is often embedded in .html files, and I would love to not have to create a pre processor for .html files.

You might notice that there are other changes in this PR, they are all related to boundary characters, let me explain:

In the named utility machine, we repeated the allowed boundary characters. We will now re-use the shared function so that it's all in one location.
We still check for a few additional characters such as : because of JS keys, / because it could be a start of a modifier and ! because the named utility could be important.

This actually exposed a bug where the class attribute in <div class="…">…</div> was not extracted (which we typically want). However, in languages like Svelte, this is a real thing you can do: <div class:px-4="condition"></div>. The pre processor will make sure to convert this input to <div class px-4="condition"></div> but that still means that we need to extract the px-4.

if a utility ends in an number this worked as expected, but <div class:flex="condition"></div> did not work. This PR also fixes that by adding a dedicated test and it's also the reason why you see a bunch of class candidates in the tests.

Fixes: #16999

Test plan

  1. Added new test
  2. Existing tests pass

@RobinMalfait RobinMalfait requested a review from a team as a code owner March 6, 2025 16:38
@RobinMalfait RobinMalfait marked this pull request as draft March 6, 2025 16:44
We were duplicating the valid boundary characters in the
`named_utility_machine`, let's re-use the `is_valid_after_boundary` such
that the valid boundary characters are listed in a single spot.

However, in the `named_utility_machine` there are a few characters that
_are_ valid as boundary characters because this machine on it's own is
incomplete, there could be any of these characters as well:

- `:`, because of JS object keys
- `/`, because of modifiers
- `!`, because of important
A `.` is not a valid boundary character anymore so these tests start
failing. However it is safe to delete these because:
1. The Pug syntax tests live in the main extractor already
2. The tests in the main extractor will run through the pre processing step
Now that we re-use the boundary characters, it does mean that the
`class` in `<div class="…"></div>` is also extracted.

This wasn't the case before, however there are languages where we do
need this, e.g.:

In Svelte you can do this:

```svelte
<div class:px-4="bool"></div>
```

We pre-process this, such that it becomes:
```svelte
<div class px-4="bool"></div>
```

But we still need to extract the `px-4` here. Best part is, if this
`px-4` was just a `flex` instead, then it wouldn't have worked (due to
hitting this current code path instead of the one for numbers).

Added an additional test for:
```svelte
<div class:flex="bool"></div>
```
@RobinMalfait RobinMalfait marked this pull request as ready for review March 6, 2025 19:04
}

// Still valid characters
cursor.advance()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine for now but we'll want to perf test this. I wouldn't be surprised if this implementation causes a bit of a hit

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep will do some classification next, and I still have a WIP branch to get rid of the Cursor

} =>
{
return self.done(self.start_pos, cursor)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto here too

@RobinMalfait RobinMalfait merged commit 57e91a6 into main Mar 6, 2025
5 checks passed
@RobinMalfait RobinMalfait deleted the fix/issue-16999 branch March 6, 2025 19:40
RobinMalfait added a commit that referenced this pull request Mar 7, 2025
This PR cleans up the boundary character checking by using similar
classification techniques as we used for other classification problems.

For starters, this moves the boundary related items to its own file,
next we setup the classification enum.

Last but not least, we removed `}` as an _after_ boundary character, and
instead handle that situation in the Ruby pre processor where we need
it. This means the `%w{flex}` will still work in Ruby files.

---

This PR is a followup for
#17001, the main goal is
to clean up some of the boundary character checking code. The other big
improvement is performance. Changing the boundary character checking to
use a classification instead results in:

Took the best score of 10 runs each:
```diff
- CandidateMachine: Throughput: 311.96 MB/s
+ CandidateMachine: Throughput: 333.52 MB/s
```

So a ~20MB/s improvement.

# Test plan

1. Existing tests should pass. Due to the removal of `}` as an after
boundary character, some tests are updated.
2. Added new tests to ensure the Ruby pre processor still works as
expected.

---------

Co-authored-by: Jordan Pittman <jordan@cryptica.me>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Missing classes because of { boundary and new extractor
2 participants