-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure }
and {
are valid boundary characters
#17001
Conversation
We were duplicating the valid boundary characters in the `named_utility_machine`, let's re-use the `is_valid_after_boundary` such that the valid boundary characters are listed in a single spot. However, in the `named_utility_machine` there are a few characters that _are_ valid as boundary characters because this machine on it's own is incomplete, there could be any of these characters as well: - `:`, because of JS object keys - `/`, because of modifiers - `!`, because of important
A `.` is not a valid boundary character anymore so these tests start failing. However it is safe to delete these because: 1. The Pug syntax tests live in the main extractor already 2. The tests in the main extractor will run through the pre processing step
Now that we re-use the boundary characters, it does mean that the `class` in `<div class="…"></div>` is also extracted. This wasn't the case before, however there are languages where we do need this, e.g.: In Svelte you can do this: ```svelte <div class:px-4="bool"></div> ``` We pre-process this, such that it becomes: ```svelte <div class px-4="bool"></div> ``` But we still need to extract the `px-4` here. Best part is, if this `px-4` was just a `flex` instead, then it wouldn't have worked (due to hitting this current code path instead of the one for numbers). Added an additional test for: ```svelte <div class:flex="bool"></div> ```
} | ||
|
||
// Still valid characters | ||
cursor.advance() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fine for now but we'll want to perf test this. I wouldn't be surprised if this implementation causes a bit of a hit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep will do some classification next, and I still have a WIP branch to get rid of the Cursor
} => | ||
{ | ||
return self.done(self.start_pos, cursor) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto here too
This PR cleans up the boundary character checking by using similar classification techniques as we used for other classification problems. For starters, this moves the boundary related items to its own file, next we setup the classification enum. Last but not least, we removed `}` as an _after_ boundary character, and instead handle that situation in the Ruby pre processor where we need it. This means the `%w{flex}` will still work in Ruby files. --- This PR is a followup for #17001, the main goal is to clean up some of the boundary character checking code. The other big improvement is performance. Changing the boundary character checking to use a classification instead results in: Took the best score of 10 runs each: ```diff - CandidateMachine: Throughput: 311.96 MB/s + CandidateMachine: Throughput: 333.52 MB/s ``` So a ~20MB/s improvement. # Test plan 1. Existing tests should pass. Due to the removal of `}` as an after boundary character, some tests are updated. 2. Added new tests to ensure the Ruby pre processor still works as expected. --------- Co-authored-by: Jordan Pittman <jordan@cryptica.me>
In a lot of templating languages,
{…}
is used to escape the template language to write some code logic, this means that if you have input that looks like this:That the if condition will be gone once it's compiled to HTML, but it also means that the
}
and{
characters next tomx-4{
andh-4
should be valid boundary characters because Tailwind CSS typically operates on the source code, not the compiled code.I didn't create a pre processor for this because unfortunately this syntax is often embedded in
.html
files, and I would love to not have to create a pre processor for.html
files.You might notice that there are other changes in this PR, they are all related to boundary characters, let me explain:
In the named utility machine, we repeated the allowed boundary characters. We will now re-use the shared function so that it's all in one location.
We still check for a few additional characters such as
:
because of JS keys,/
because it could be a start of a modifier and!
because the named utility could be important.This actually exposed a bug where the
class
attribute in<div class="…">…</div>
was not extracted (which we typically want). However, in languages like Svelte, this is a real thing you can do:<div class:px-4="condition"></div>
. The pre processor will make sure to convert this input to<div class px-4="condition"></div>
but that still means that we need to extract thepx-4
.if a utility ends in an number this worked as expected, but
<div class:flex="condition"></div>
did not work. This PR also fixes that by adding a dedicated test and it's also the reason why you see a bunch ofclass
candidates in the tests.Fixes: #16999
Test plan