Skip to content

Commit 3706292

Browse files
Improve candidate extraction when candidates contain . characters (#17113)
This PR fixes an issue where some classes weren't properly extracted due to some incorrect assumptions in the pre processors. Templating languages such as Haml, Slim and Pug allow you to write classes in a shorter way that are not properly contained inside of strings. E.g.: ```slim p.flex.px-2 ``` These candidates are not properly extracted because there are no bounding characters like quotes. To solve this, we pre-process it and replace `.` with ` ` characters. This results in something like: ``` p flex px-2 ``` However, this has some challenges on its own. Candidates like `px-2.5` cannot be written in this shorthand form, instead they need to be in strings. Now we _cannot_ replace the `.` because otherwise we would change `px-2.5` to `px-2 5` which is wrong. The next problem is that we need to know when they are in a "string". This has another set of problems because these templating languages allow you to write normal text that will eventually be the contents of the HTML tags. ```haml .text-red-500.text-3xl | This text can't should be red ^ Wait, is this the start of a string now??? ``` In this example, if we consider the `'` the start of a string, when it's clearly not, how would we know it's for _sure_ not a string? This ended up as a bit of a rabbit hole, but we came up with another approach entirely if we think about the original problem we want to solve which is when do we change `.` to ` ` characters. One of the rules in our current extractor is that a `.` has to be between 2 numbers. Which works great in a scenario like: `px-2.5`. However, if you look at Haml or Slim syntax, this is also allowed: ```slim p.bg-red-500.2xl:flex ^^^ Uh oh... ``` In this scenario, a `.` is surrounded by numbers so we shouldn't replace it with a space. But as you can see, we clearly do... so we need another heuristic in this case. Luckily, one of the rules in Tailwind CSS is that a utility cannot start with a number, but a variant _can_. This means that if we see a scenario like `<digit>.<digit>` then we can just check if the value after the `.` is a valid variant or not. In this case it is a valid variant so we _do_ want to replace the `.` with a ` ` even though we do have the `<digit>.<digit>` format. 🥴 # Test plan 1. Added additional tests. 2. Existing tests still pass --------- Co-authored-by: Philipp Spiess <hello@philippspiess.com>
1 parent 22746e6 commit 3706292

File tree

6 files changed

+557
-93
lines changed

6 files changed

+557
-93
lines changed

CHANGELOG.md

+2-1
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
2323
- Ensure classes between `>` and `<` are properly extracted ([#17094](https://github.com/tailwindlabs/tailwindcss/pull/17094))
2424
- Treat starting single quote as verbatim text in Slim ([#17085](https://github.com/tailwindlabs/tailwindcss/pull/17085))
2525
- Ensure `.node` and `.wasm` files are not scanned for utilities ([#17123](https://github.com/tailwindlabs/tailwindcss/pull/17123))
26-
- Improve performance when scanning `JSON` files ([#17125](https://github.com/tailwindlabs/tailwindcss/pull/17125))
26+
- Improve performance when scanning JSON files ([#17125](https://github.com/tailwindlabs/tailwindcss/pull/17125))
27+
- Fix extracting candidates containing dots in Haml, Pug, and Slim pre processors ([#17113](https://github.com/tailwindlabs/tailwindcss/pull/17113))
2728
- Don't create invalid CSS when encountering a link wrapped in square brackets ([#17129](https://github.com/tailwindlabs/tailwindcss/pull/17129))
2829

2930
## [4.0.12] - 2025-03-07

crates/oxide/src/extractor/pre_processors/haml.rs

+88-19
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
11
use crate::cursor;
22
use crate::extractor::bracket_stack::BracketStack;
3+
use crate::extractor::machine::{Machine, MachineState};
34
use crate::extractor::pre_processors::pre_processor::PreProcessor;
5+
use crate::extractor::variant_machine::VariantMachine;
46

57
#[derive(Debug, Default)]
68
pub struct Haml;
@@ -14,29 +16,48 @@ impl PreProcessor for Haml {
1416

1517
while cursor.pos < len {
1618
match cursor.curr {
17-
// Consume strings as-is
18-
b'\'' | b'"' => {
19-
let len = cursor.input.len();
20-
let end_char = cursor.curr;
21-
22-
cursor.advance();
23-
24-
while cursor.pos < len {
25-
match cursor.curr {
26-
// Escaped character, skip ahead to the next character
27-
b'\\' => cursor.advance_twice(),
28-
29-
// End of the string
30-
b'\'' | b'"' if cursor.curr == end_char => break,
19+
// Only replace `.` with a space if it's not surrounded by numbers. E.g.:
20+
//
21+
// ```diff
22+
// - .flex.items-center
23+
// + flex items-center
24+
// ```
25+
//
26+
// But with numbers, it's allowed:
27+
//
28+
// ```diff
29+
// - px-2.5
30+
// + px-2.5
31+
// ```
32+
b'.' => {
33+
// Don't replace dots with spaces when inside of any type of brackets, because
34+
// this could be part of arbitrary values. E.g.: `bg-[url(https://example.com)]`
35+
// ^
36+
if !bracket_stack.is_empty() {
37+
cursor.advance();
38+
continue;
39+
}
3140

32-
// Everything else is valid
33-
_ => cursor.advance(),
34-
};
41+
// If the dot is surrounded by digits, we want to keep it. E.g.: `px-2.5`
42+
// EXCEPT if it's followed by a valid variant that happens to start with a
43+
// digit.
44+
// E.g.: `bg-red-500.2xl:flex`
45+
// ^^^
46+
if cursor.prev.is_ascii_digit() && cursor.next.is_ascii_digit() {
47+
let mut next_cursor = cursor.clone();
48+
next_cursor.advance();
49+
50+
let mut variant_machine = VariantMachine::default();
51+
if let MachineState::Done(_) = variant_machine.next(&mut next_cursor) {
52+
result[cursor.pos] = b' ';
53+
}
54+
} else {
55+
result[cursor.pos] = b' ';
3556
}
3657
}
3758

3859
// Replace following characters with spaces if they are not inside of brackets
39-
b'.' | b'#' | b'=' if bracket_stack.is_empty() => {
60+
b'#' | b'=' if bracket_stack.is_empty() => {
4061
result[cursor.pos] = b' ';
4162
}
4263

@@ -48,7 +69,7 @@ impl PreProcessor for Haml {
4869
bracket_stack.push(cursor.curr);
4970
}
5071

51-
b')' | b']' | b'}' => {
72+
b')' | b']' | b'}' if !bracket_stack.is_empty() => {
5273
bracket_stack.pop(cursor.curr);
5374

5475
// Replace closing bracket with a space
@@ -116,8 +137,56 @@ mod tests {
116137
".text-lime-500.xl:text-emerald-500#root",
117138
" text-lime-500 xl:text-emerald-500 root",
118139
),
140+
// Dots in strings in HTML attributes stay as-is
141+
(r#"<div id="px-2.5"></div>"#, r#"<div id "px-2.5"></div>"#),
119142
] {
120143
Haml::test(input, expected);
121144
}
122145
}
146+
147+
#[test]
148+
fn test_strings_only_occur_when_nested() {
149+
let input = r#"
150+
%p.mt-2.text-xl
151+
The quote in the next word, can't be the start of a string
152+
153+
%h3.mt-24.text-center.text-4xl.font-bold.italic
154+
The classes above should be extracted
155+
"#;
156+
157+
Haml::test_extract_contains(
158+
input,
159+
vec![
160+
// First paragraph
161+
"mt-2",
162+
"text-xl",
163+
// second paragraph
164+
"mt-24",
165+
"text-center",
166+
"text-4xl",
167+
"font-bold",
168+
"italic",
169+
],
170+
);
171+
}
172+
173+
// https://github.com/tailwindlabs/tailwindcss/pull/17051#issuecomment-2711181352
174+
#[test]
175+
fn test_haml_full_file() {
176+
let processed = Haml.process(include_bytes!("./test-fixtures/haml/src-1.haml"));
177+
let actual = std::str::from_utf8(&processed).unwrap();
178+
let expected = include_str!("./test-fixtures/haml/dst-1.haml");
179+
180+
assert_eq!(actual, expected);
181+
}
182+
183+
#[test]
184+
fn test_arbitrary_code_followed_by_classes() {
185+
let input = r#"
186+
%p
187+
= i < 3
188+
.flex.items-center
189+
"#;
190+
Haml::test_extract_contains(input, vec!["flex", "items-center"]);
191+
}
123192
}

crates/oxide/src/extractor/pre_processors/pug.rs

+74-21
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
11
use crate::cursor;
22
use crate::extractor::bracket_stack::BracketStack;
3+
use crate::extractor::machine::{Machine, MachineState};
34
use crate::extractor::pre_processors::pre_processor::PreProcessor;
5+
use crate::extractor::variant_machine::VariantMachine;
46

57
#[derive(Debug, Default)]
68
pub struct Pug;
@@ -14,32 +16,46 @@ impl PreProcessor for Pug {
1416

1517
while cursor.pos < len {
1618
match cursor.curr {
17-
// Consume strings as-is
18-
b'\'' | b'"' => {
19-
let len = cursor.input.len();
20-
let end_char = cursor.curr;
21-
22-
cursor.advance();
23-
24-
while cursor.pos < len {
25-
match cursor.curr {
26-
// Escaped character, skip ahead to the next character
27-
b'\\' => cursor.advance_twice(),
19+
// Only replace `.` with a space if it's not surrounded by numbers. E.g.:
20+
//
21+
// ```diff
22+
// - .flex.items-center
23+
// + flex items-center
24+
// ```
25+
//
26+
// But with numbers, it's allowed:
27+
//
28+
// ```diff
29+
// - px-2.5
30+
// + px-2.5
31+
// ```
32+
b'.' => {
33+
// Don't replace dots with spaces when inside of any type of brackets, because
34+
// this could be part of arbitrary values. E.g.: `bg-[url(https://example.com)]`
35+
// ^
36+
if !bracket_stack.is_empty() {
37+
cursor.advance();
38+
continue;
39+
}
2840

29-
// End of the string
30-
b'\'' | b'"' if cursor.curr == end_char => break,
41+
// If the dot is surrounded by digits, we want to keep it. E.g.: `px-2.5`
42+
// EXCEPT if it's followed by a valid variant that happens to start with a
43+
// digit.
44+
// E.g.: `bg-red-500.2xl:flex`
45+
// ^^^
46+
if cursor.prev.is_ascii_digit() && cursor.next.is_ascii_digit() {
47+
let mut next_cursor = cursor.clone();
48+
next_cursor.advance();
3149

32-
// Everything else is valid
33-
_ => cursor.advance(),
34-
};
50+
let mut variant_machine = VariantMachine::default();
51+
if let MachineState::Done(_) = variant_machine.next(&mut next_cursor) {
52+
result[cursor.pos] = b' ';
53+
}
54+
} else {
55+
result[cursor.pos] = b' ';
3556
}
3657
}
3758

38-
// Replace dots with spaces
39-
b'.' if bracket_stack.is_empty() => {
40-
result[cursor.pos] = b' ';
41-
}
42-
4359
b'(' | b'[' | b'{' => {
4460
bracket_stack.push(cursor.curr);
4561
}
@@ -77,8 +93,45 @@ mod tests {
7793
"bg-[url(https://example.com/?q=[1,2])]",
7894
"bg-[url(https://example.com/?q=[1,2])]",
7995
),
96+
// Classes in HTML attributes
97+
(r#"<div id="px-2.5"></div>"#, r#"<div id="px-2.5"></div>"#),
8098
] {
8199
Pug::test(input, expected);
82100
}
83101
}
102+
103+
#[test]
104+
fn test_strings_only_occur_when_nested() {
105+
let input = r#"
106+
p.mt-2.text-xl
107+
div The quote in the next word, can't be the start of a string
108+
109+
h3.mt-24.text-center.text-4xl.font-bold.italic
110+
div The classes above should be extracted
111+
"#;
112+
113+
Pug::test_extract_contains(
114+
input,
115+
vec![
116+
// First paragraph
117+
"mt-2",
118+
"text-xl",
119+
// second paragraph
120+
"mt-24",
121+
"text-center",
122+
"text-4xl",
123+
"font-bold",
124+
"italic",
125+
],
126+
);
127+
}
128+
129+
#[test]
130+
fn test_arbitrary_code_followed_by_classes() {
131+
let input = r#"
132+
- i < 3
133+
.flex.items-center
134+
"#;
135+
Pug::test_extract_contains(input, vec!["flex", "items-center"]);
136+
}
84137
}

0 commit comments

Comments
 (0)