Skip to content

Conversation

@danparizher
Copy link
Contributor

Summary

Fixes #20572

@github-actions
Copy link
Contributor

github-actions bot commented Sep 26, 2025

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

@amyreese amyreese requested a review from ntBre September 30, 2025 19:52
@amyreese amyreese added rule Implementing or modifying a lint rule fixes Related to suggested fixes for violations labels Sep 30, 2025
Copy link
Contributor

@ntBre ntBre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! But I think we need to be more careful here. At least one of the new snapshots is incorrect.

@danparizher danparizher requested a review from ntBre September 30, 2025 23:50
Comment on lines 137 to 138
let numeric_value = format!("{unary}{rest}");
add_thousand_separators(&numeric_value)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to just strip leading underscores, and otherwise preserve the placement and frequency, rather than assume that thousands are the only use of underscores in numeric literals?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leading underscores aren't the only problem. Duplicates in the middle cause a syntax error too, but not in the Decimal constructor:

>>> from decimal import Decimal
>>> 1__2
  File "<python-input-2>", line 1
    1__2
     ^
SyntaxError: invalid decimal literal
>>> Decimal("1__2")
Decimal('12')

But I agree in principle, add_thousand_separators does not seem like the right approach to me.

I'm starting to think we should just mark the fix as unsafe if the value we're replacing the original with is different. The edge cases here seem pretty tricky.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching that; I agree that marking it unsafe is probably the way to go to not break too much.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't stripping leading underscores and replacing duplicates with a single underscore be enough?

In my opinion keeping Decimal("10_000_000") -> Decimal(10_000_000) transition as safe would be very convenient.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be enough to strip leading underscores, strip trailing underscores, strip leading zeros (unless all the digits are zeros), and collapse medial underscore sequences: Decimal(" _-_0_01__2_ ") would become Decimal(-1_2).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the behavior I fight for ❤️ 🚀 :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds reasonable to me, I gave up a bit too soon :) @danparizher do you want to give this a shot? We can still revert to marking the fix unsafe if the code gets too complicated, but this sounds feasible with the transformations we're already doing.

Copy link
Contributor

@amyreese amyreese left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@dscorbett
Copy link

The current state of this PR is that underscores are normalized to be thousands separators, but cf. @amyreese’s earlier comment. If a user writes Decimal("1__23_45_000") that formatting was probably intentional so it should become Decimal(1_23_45_000). I think normalizing the underscores more than the minimum needed to avoid a syntax error should be left for a different rule as in #18221.

@MichaReiser
Copy link
Member

What's the status of this PR? Is there something left that needs addressing or is it good to go?

@ntBre
Copy link
Contributor

ntBre commented Oct 21, 2025

I don't think we should normalize the digit separators to thousands separators, as @dscorbett pointed out.

Update normalize_digit_separators to retain the original pattern of digit separators instead of forcing thousands separators. This change ensures that only syntax errors are fixed while the user's formatting is preserved.
Comment on lines 225 to 251
// Extract only digits from the trimmed string
let digits: String = trimmed.chars().filter(char::is_ascii_digit).collect();

// If no digits found, return 0
if digits.is_empty() {
return format!("{unary}0");
}

let mut result = String::new();
let chars: Vec<char> = trimmed.chars().collect();
let mut i = 0;

while i < chars.len() {
if chars[i].is_ascii_digit() {
result.push(chars[i]);
} else if chars[i] == '_' {
// Only add underscore if the previous character was a digit
// and we haven't already added an underscore
if !result.is_empty() && !result.ends_with('_') {
result.push('_');
}
}
i += 1;
}

// Remove trailing underscores
result = result.trim_end_matches('_').to_string();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this ends up being overly verbose and defensive, and lines 225-251 could be replaced with just:

let result = trimmed
    .chars()
    .dedup_by(|a, b| {*a == '_' && a == b})
    .collect::<String>();

Leading/trailing underscores were already trimmed by line 218, so there shouldn't be a need to both check that there are some number of digits and the result after trimming/deduplication is not empty.

Decimal("15_000_000") # Safe fix: normalizes separators, becomes Decimal(15_000_000)
Decimal("1_234_567") # Safe fix: normalizes separators, becomes Decimal(1_234_567)
Decimal("-5_000") # Safe fix: normalizes separators, becomes Decimal(-5_000)
Decimal("+9_999") # Safe fix: normalizes separators, becomes Decimal(+9_999)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to include test cases showing that non-thousands separators are maintained accordingly, eg "12_34_56_78" and "1234_5678".

Comment on lines 149 to 159
if has_digit_separators {
diagnostic.set_fix(Fix::unsafe_edit(Edit::range_replacement(
replacement,
value.range(),
)));
} else {
diagnostic.set_fix(Fix::safe_edit(Edit::range_replacement(
replacement,
value.range(),
)));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use Fix::applicable_edit here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or wait, if we're going to preserve the original formatting, should it just be safe again?

/// - Stripping leading and trailing underscores
/// - Collapsing medial underscore sequences to single underscores
/// - Do not force thousands separators
fn normalize_digit_separators(original_str: &str, unary: &str, _numeric_part: &str) -> String {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should either remove _numeric_part if it's unused or rename it to numeric_part, if it's used.

Comment on lines 228 to 231
// If no digits found, return 0
if digits.is_empty() {
return format!("{unary}0");
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These is_empty checks don't seem right to me at all. I don't think we should return an arbitrary value for invalid input, and I think we've already validated the input in verbose_decimal_constructor, but I could also be missing something.

Comment on lines 211 to 215
let without_unary = if original_str.starts_with('+') || original_str.starts_with('-') {
&original_str[1..]
} else {
original_str
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already stripped the prefix once on line 99. Can we reuse more of the work from the parent function?

Updates the verbose_decimal_constructor rule to always mark fixes for digit separators in Decimal constructors as safe, removing the previous unsafe applicability. Adds test cases for non-thousands separators and updates related snapshots.
@danparizher danparizher requested a review from ntBre October 24, 2025 22:22
Copy link
Contributor

@ntBre ntBre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed a few commits simplifying things slightly to reuse the earlier unary prefix removal, but this looks good to me.

@ntBre
Copy link
Contributor

ntBre commented Oct 28, 2025

After a bit more thought, I came up with some additional edge cases with separators and leading zeros. Hopefully everything is covered now.

@ntBre ntBre merged commit 3490611 into astral-sh:main Oct 28, 2025
37 checks passed
@danparizher danparizher deleted the fix-20572 branch October 28, 2025 21:55
dcreager added a commit that referenced this pull request Oct 30, 2025
* origin/main: (21 commits)
  [ty] Update "constraint implication" relation to work on constraints between two typevars (#21068)
  [`flake8-type-checking`] Fix `TC003` false positive with `future-annotations` (#21125)
  [ty] Fix lookup of `__new__` on instances (#21147)
  Fix syntax error false positive on nested alternative patterns (#21104)
  [`pyupgrade`] Fix false positive for `TypeVar` with default on Python <3.13 (`UP046`,`UP047`) (#21045)
  [ty] Reachability and narrowing for enum methods (#21130)
  [ty] Use `range` instead of custom `IntIterable` (#21138)
  [`ruff`] Add support for additional eager conversion patterns (`RUF065`) (#20657)
  [`ruff-ecosystem`] Fix CLI crash on Python 3.14 (#21092)
  [ty] Infer type of `self` for decorated methods and properties (#21123)
  [`flake8-bandit`] Fix correct example for `S308` (#21128)
  [ty] Dont provide goto definition for definitions which are not reexported in builtins (#21127)
  [`airflow`] warning `airflow....DAG.create_dagrun` has been removed (`AIR301`) (#21093)
  [ty] follow the breaking API changes made in salsa-rs/salsa#1015 (#21117)
  [ty] Rename `Type::into_nominal_instance` (#21124)
  [ty] Filter out "unimported" from the current module
  [ty] Add evaluation test for auto-import including symbols in current module
  [ty] Refactor `ty_ide` completion tests
  [ty] Render `import <...>` in completions when "label details" isn't supported
  [`refurb`] Preserve digit separators in `Decimal` constructor (`FURB157`) (#20588)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

fixes Related to suggested fixes for violations rule Implementing or modifying a lint rule

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fix for FURB157 removes digit separators

6 participants