-
Notifications
You must be signed in to change notification settings - Fork 1.6k
[refurb] Preserve digit separators in Decimal constructor (FURB157)
#20588
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
ntBre
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! But I think we need to be more careful here. At least one of the new snapshots is incorrect.
crates/ruff_linter/src/rules/refurb/rules/verbose_decimal_constructor.rs
Outdated
Show resolved
Hide resolved
| let numeric_value = format!("{unary}{rest}"); | ||
| add_thousand_separators(&numeric_value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be better to just strip leading underscores, and otherwise preserve the placement and frequency, rather than assume that thousands are the only use of underscores in numeric literals?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leading underscores aren't the only problem. Duplicates in the middle cause a syntax error too, but not in the Decimal constructor:
>>> from decimal import Decimal
>>> 1__2
File "<python-input-2>", line 1
1__2
^
SyntaxError: invalid decimal literal
>>> Decimal("1__2")
Decimal('12')But I agree in principle, add_thousand_separators does not seem like the right approach to me.
I'm starting to think we should just mark the fix as unsafe if the value we're replacing the original with is different. The edge cases here seem pretty tricky.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching that; I agree that marking it unsafe is probably the way to go to not break too much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't stripping leading underscores and replacing duplicates with a single underscore be enough?
In my opinion keeping Decimal("10_000_000") -> Decimal(10_000_000) transition as safe would be very convenient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be enough to strip leading underscores, strip trailing underscores, strip leading zeros (unless all the digits are zeros), and collapse medial underscore sequences: Decimal(" _-_0_01__2_ ") would become Decimal(-1_2).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's the behavior I fight for ❤️ 🚀 :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds reasonable to me, I gave up a bit too soon :) @danparizher do you want to give this a shot? We can still revert to marking the fix unsafe if the code gets too complicated, but this sounds feasible with the transformations we're already doing.
amyreese
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
|
The current state of this PR is that underscores are normalized to be thousands separators, but cf. @amyreese’s earlier comment. If a user writes |
|
What's the status of this PR? Is there something left that needs addressing or is it good to go? |
|
I don't think we should normalize the digit separators to thousands separators, as @dscorbett pointed out. |
Update normalize_digit_separators to retain the original pattern of digit separators instead of forcing thousands separators. This change ensures that only syntax errors are fixed while the user's formatting is preserved.
| // Extract only digits from the trimmed string | ||
| let digits: String = trimmed.chars().filter(char::is_ascii_digit).collect(); | ||
|
|
||
| // If no digits found, return 0 | ||
| if digits.is_empty() { | ||
| return format!("{unary}0"); | ||
| } | ||
|
|
||
| let mut result = String::new(); | ||
| let chars: Vec<char> = trimmed.chars().collect(); | ||
| let mut i = 0; | ||
|
|
||
| while i < chars.len() { | ||
| if chars[i].is_ascii_digit() { | ||
| result.push(chars[i]); | ||
| } else if chars[i] == '_' { | ||
| // Only add underscore if the previous character was a digit | ||
| // and we haven't already added an underscore | ||
| if !result.is_empty() && !result.ends_with('_') { | ||
| result.push('_'); | ||
| } | ||
| } | ||
| i += 1; | ||
| } | ||
|
|
||
| // Remove trailing underscores | ||
| result = result.trim_end_matches('_').to_string(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this ends up being overly verbose and defensive, and lines 225-251 could be replaced with just:
let result = trimmed
.chars()
.dedup_by(|a, b| {*a == '_' && a == b})
.collect::<String>();
Leading/trailing underscores were already trimmed by line 218, so there shouldn't be a need to both check that there are some number of digits and the result after trimming/deduplication is not empty.
| Decimal("15_000_000") # Safe fix: normalizes separators, becomes Decimal(15_000_000) | ||
| Decimal("1_234_567") # Safe fix: normalizes separators, becomes Decimal(1_234_567) | ||
| Decimal("-5_000") # Safe fix: normalizes separators, becomes Decimal(-5_000) | ||
| Decimal("+9_999") # Safe fix: normalizes separators, becomes Decimal(+9_999) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice to include test cases showing that non-thousands separators are maintained accordingly, eg "12_34_56_78" and "1234_5678".
| if has_digit_separators { | ||
| diagnostic.set_fix(Fix::unsafe_edit(Edit::range_replacement( | ||
| replacement, | ||
| value.range(), | ||
| ))); | ||
| } else { | ||
| diagnostic.set_fix(Fix::safe_edit(Edit::range_replacement( | ||
| replacement, | ||
| value.range(), | ||
| ))); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should use Fix::applicable_edit here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or wait, if we're going to preserve the original formatting, should it just be safe again?
| /// - Stripping leading and trailing underscores | ||
| /// - Collapsing medial underscore sequences to single underscores | ||
| /// - Do not force thousands separators | ||
| fn normalize_digit_separators(original_str: &str, unary: &str, _numeric_part: &str) -> String { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should either remove _numeric_part if it's unused or rename it to numeric_part, if it's used.
| // If no digits found, return 0 | ||
| if digits.is_empty() { | ||
| return format!("{unary}0"); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These is_empty checks don't seem right to me at all. I don't think we should return an arbitrary value for invalid input, and I think we've already validated the input in verbose_decimal_constructor, but I could also be missing something.
| let without_unary = if original_str.starts_with('+') || original_str.starts_with('-') { | ||
| &original_str[1..] | ||
| } else { | ||
| original_str | ||
| }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already stripped the prefix once on line 99. Can we reuse more of the work from the parent function?
Updates the verbose_decimal_constructor rule to always mark fixes for digit separators in Decimal constructors as safe, removing the previous unsafe applicability. Adds test cases for non-thousands separators and updates related snapshots.
ntBre
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed a few commits simplifying things slightly to reuse the earlier unary prefix removal, but this looks good to me.
|
After a bit more thought, I came up with some additional edge cases with separators and leading zeros. Hopefully everything is covered now. |
* origin/main: (21 commits) [ty] Update "constraint implication" relation to work on constraints between two typevars (#21068) [`flake8-type-checking`] Fix `TC003` false positive with `future-annotations` (#21125) [ty] Fix lookup of `__new__` on instances (#21147) Fix syntax error false positive on nested alternative patterns (#21104) [`pyupgrade`] Fix false positive for `TypeVar` with default on Python <3.13 (`UP046`,`UP047`) (#21045) [ty] Reachability and narrowing for enum methods (#21130) [ty] Use `range` instead of custom `IntIterable` (#21138) [`ruff`] Add support for additional eager conversion patterns (`RUF065`) (#20657) [`ruff-ecosystem`] Fix CLI crash on Python 3.14 (#21092) [ty] Infer type of `self` for decorated methods and properties (#21123) [`flake8-bandit`] Fix correct example for `S308` (#21128) [ty] Dont provide goto definition for definitions which are not reexported in builtins (#21127) [`airflow`] warning `airflow....DAG.create_dagrun` has been removed (`AIR301`) (#21093) [ty] follow the breaking API changes made in salsa-rs/salsa#1015 (#21117) [ty] Rename `Type::into_nominal_instance` (#21124) [ty] Filter out "unimported" from the current module [ty] Add evaluation test for auto-import including symbols in current module [ty] Refactor `ty_ide` completion tests [ty] Render `import <...>` in completions when "label details" isn't supported [`refurb`] Preserve digit separators in `Decimal` constructor (`FURB157`) (#20588) ...
Summary
Fixes #20572