-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[refurb] Count codepoints not bytes for slice-to-remove-prefix-or-suffix (FURB188)
#13631
Conversation
|
Another subtlety worth testing is strings with surrogates. In Python, each surrogate counts as 1 and surrogate pairs are not special so they count as 2; for example, |
TIL @dscorbett - neat! Added a test for this, and it appears to be handled correctly (I think this happens in the guts of the parser, so by the time I'm looking at |
I think the reason it works is that Ruff’s representation of a Python string as a Rust string replaces surrogates with replacement characters. That is fine for counting the code points but could be a problem for other rules. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, thanks. I only have two nit comments.
.and_then(ast::Int::as_u32) | ||
.and_then(|x| usize::try_from(x).ok()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest converting to a u64
considering that you have to use usize::try_from
anyways (for 32 bit platforms)
.and_then(ast::Int::as_u32) | |
.and_then(|x| usize::try_from(x).ok()) | |
.and_then(ast::Int::as_u64) | |
.and_then(|x| usize::try_from(x).ok()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or you could consider adding a as_usize
method to ast::Int
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
went with the latter
// Only support prefix removal for size at most `u32::MAX` | ||
.and_then(ast::Int::as_u32) | ||
.and_then(|x| usize::try_from(x).ok()) | ||
.is_some_and(|x| x == string_val.to_str().chars().count()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.is_some_and(|x| x == string_val.to_str().chars().count()), | |
.is_some_and(|x| x == string_val.chars().count()), |
@@ -370,7 +372,8 @@ fn affix_matches_slice_bound(data: &RemoveAffixData, semantic: &SemanticModel) - | |||
value | |||
.as_int() | |||
.and_then(ast::Int::as_u32) | |||
.is_some_and(|x| x == string_val.to_str().text_len().to_u32()) | |||
.and_then(|x| usize::try_from(x).ok()) | |||
.is_some_and(|x| x == string_val.to_str().chars().count()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.is_some_and(|x| x == string_val.to_str().chars().count()) | |
.is_some_and(|x| x == string_val.chars().count()) |
This PR fixes the calculation of string length for the purposes of verifying when to suggest
removeprefix
/removesuffix
(FURB188). Before, we usedtext_len
which was counting bytes rather than codepoints (chars) and therefore disagreed with Python'slen
for non-ASCII text.Closes #13620