Improve handling of builtin symbols in linter rules #10919

AlexWaygood · 2024-04-13T18:09:32Z

Summary

A pattern that shows up a lot in the ruff_linter crate currently is to do something like this to determine whether an ast::Expr node refers to a builtin symbol:

fn do_the_check(checker: &mut Checker, expr: &ast::Expr) {
    let ast::Expr::Name(ast::ExprName { id, .. }) = expr else {
        return;
    }
    if id != "zip" {
        return;
    }
    if !checker.semantic().is_builtin("zip") {
        return;
    }
    // and now we do the check that only applies to the builtin zip function...
}

This has two problems:

It's pretty verbose and repetitive, and lots of our checks have some interaction with a builtin function or class in some way
This pattern doesn't account for the fact that it's perfectly possible to explicitly import the builtins module and e.g. refer to zip by using builtins.zip rather than just zip

This PR adds a new method to the semantic model that means that this kind of logic becomes both more concise and more accurate. The above logic can now be replaced with this, which will also (unlike the old function) recognise builtins.zip as being a reference to the builtin zip function:

fn do_the_check(checker: &mut Checker, expr: &ast::Expr) {
    if !checker.semantic().match_builtin_expr(expr, "zip") {
        return;
    }
    // and now we do the check that only applies to the builtin zip function...
}

A new method is also added to crates/ruff_linter/src/importer/mod.rs to enable us to provide autofixes involving builtin functions/classes, even when they've been shadowed by a user-defined function or class in the current scope.

Test Plan

cargo test. Several fixtures have been extended with new examples to test the new functionality.

This PR should be easiest to review commit-by-commit: each commit passes the full test suite by itself.

crates/ruff_python_semantic/src/model.rs

github-actions · 2024-04-13T18:22:41Z

`ruff-ecosystem` results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Formatter (stable)

✅ ecosystem check detected no format changes.

Formatter (preview)

✅ ecosystem check detected no format changes.

AlexWaygood · 2024-04-13T18:31:30Z

stdlib/asyncio/trsock.pyi:93:29: PYI036 The first argument in __exit__ should be annotated with object or type[BaseException] | None

Ah, that's a regression: something can be imported explicitly from builtins even if it's not an ast::Name node. My fast path doesn't work as it's currently written.

AlexWaygood · 2024-04-13T18:47:47Z

Seems like this leads to a 1% perf regression overall, with regressions of 2-3% on some linter benchmarks: https://codspeed.io/astral-sh/ruff/branches/AlexWaygood:improve-builtins-handling

I don't know whether that's worth it or not — it's less repetitive code for us and it's more principled in that we're handling the semantics of Python more accurately, but importing builtins explicitly also isn't that common.

crates/ruff_python_semantic/src/model.rs

charliermarsh · 2024-04-13T19:01:30Z

Would you have expected ~no change here for benchmarks that don't import builtins?

AlexWaygood · 2024-04-13T19:05:45Z

Would you have expected ~no change here for benchmarks that don't import builtins?

yeah. I would have hoped that the fast path from the if self.seen_modules(Modules::BUILTINS) check would mean that performance would be basically unchanged for those benchmarks.

Any idea what might be the cause of the slowdown? pyflakes::rules::unused_import seems to be showing up in a bunch of the codspeed flamegraphs, which is weird, because I haven't touched that function in this PR

crates/ruff_python_semantic/src/model.rs

charliermarsh · 2024-04-13T21:48:13Z

No, I'm not sure. I looked through the changes again and my expectation would've been no change. You can try just pushing again and see if the benchmarks move in the other direction, since it might just be noise. I wouldn't read into the flamegraph diff though -- I think CodSpeed is just confused.

charliermarsh · 2024-04-13T21:49:47Z

The other thing you might consider doing is benchmarking locally with Hyperfine. E.g., you could build a release build on main, stash it (mv ./target/release/ruff ./target/release/main), then build on your branch and do something like:

hyperfine --warmup 10 --runs 100 \
    "./target/release/main ./crates/ruff/resources/test/cpython/ --no-cache --silent -e" \ 
    "./target/release/ruff ./crates/ruff/resources/test/cpython/ --no-cache --silent -e"

I'd suggest running it a few times and switching the order of main and ruff around to reach a safe conclusion.

AlexWaygood · 2024-04-13T22:20:10Z

since it might just be noise.

I'm getting similar results from running cargo benchmark locally (tried it a few times earlier), so I don't think it's just noise :/ I'll try tomorrow with hyperfine to see if the macrobenchmark shows anything different to the microbenchmarks...

charliermarsh · 2024-04-13T22:51:25Z

If you run the micro-benchmarks locally, I suggest running them a few times consecutively for each PR. At least on my machine, the second run of the same binary is always significant faster.

carljm · 2024-04-14T11:26:36Z

It seems like this PR is doing two separable things: improving detection of what is a builtin, and allowing auto-fixes that want to use a builtin to import it via builtins module if the name is shadowed.

Have you benchmarked those changes in isolation, to clarify which one the regression is coming from? IIUC we always eagerly generate fixes, so it seems like a change that allows more fixes in more cases could easily cause a regression?

charliermarsh · 2024-04-14T11:43:02Z

Yeah, although we should only be generating more diagnostics (and therefore more fixes) for files that import builtins, and we don’t have any such files in our benchmarks (IIUC).

It could be that computing the fixes become more expensive (the use of get_or_import_symbol will be more expensive) even in the case that the file doesn’t import builtins, though I don’t have intuition on whether that could account for the delta here. You could try removing the changes that involve modifying the fixes, to see if the benchmarks return to normal as Carl suggests.

carljm · 2024-04-14T16:37:34Z

It looks to me like all the rules that are changed to use try_set_fix with the new get_or_import_builtin_symbol (e.g. unreliable_call_check is one example) will now generate more fixes than before in any file where the builtin they want to add a reference to is shadowed. E.g. in a module that contains callable = ..., unreliable_call_check will now generate a fix that imports builtins and uses builtins.callable, rather than bailing and not generating a fix at all. Similar for a number of other rules. So this would generate more fixes even in files that don't currently import builtins.

I haven't checked whether the benchmarks in question do this, but this change seems entirely separable from the better detection of imported builtins, so if there are any mysteries it seems like a good first step is to separate the changes and narrow down the cause. (And if it's still a mystery, perhaps further separate by only changing one rule at a time.)

charliermarsh · 2024-04-14T17:16:20Z

Oh, that's true! Good call. I would be surprised if we had any instances of those in our benchmark files (and especially not in all of them) but it's possible there are a few and it's definitely a good idea to isolate the changes.

AlexWaygood · 2024-04-14T18:07:58Z

Yet another possibility: the change to the fixes also involves changing a bunch of fixes to use Fix::safe_edits instead of Fix::safe_edit, even if a builtin symbol is not overridden in the relevant scope. It's possible safe_edits is slower than safe_edit, even if only a single edit is passed? Anyway, I didn't get to it today, but I'll do some benchmarking tomorrow without the changes to the fixes, as @carljm suggests :-)

crates/ruff_python_semantic/src/model.rs

MichaReiser · 2024-04-15T07:22:19Z

crates/ruff_linter/src/rules/ruff/rules/mutable_fromkeys_value.rs

-    if name_expr.id != "dict" {
-        return;
-    }
-    if !checker.semantic().is_builtin("dict") {


Is it intentional that is_builtin uses lookup_symbol but the new implementation now uses resolve_qualified_name (which also works for member expression which the old implementation didn't)?

I don't think there's a need for the new abstraction to work with member expressions. I just used resolve_qualified_name because it seemed like the simplest way to implement the new abstraction. Do you think using lookup_symbol would be more efficient here? (Goes to compare the two definitions...)

I'm not sure how we would be able to use lookup_symbol here, as lookup_symbol takes &str (representing the name of the symbol) rather than an AST node. And we don't know the name of the symbol -- even if it's a builtin, as you can do from builtins import open as o -- unless we resolve the qualified name

crates/ruff_linter/src/rules/ruff/rules/mutable_fromkeys_value.rs

crates/ruff_linter/src/rules/refurb/rules/verbose_decimal_constructor.rs

crates/ruff_linter/src/rules/refurb/rules/write_whole_file.rs

crates/ruff_linter/src/rules/pylint/rules/bad_open_mode.rs

MichaReiser · 2024-04-15T07:36:00Z

crates/ruff_linter/src/rules/pylint/rules/nested_min_max.rs

+        match qualified_name.segments() {
+            ["" | "builtins", "min"] => Some(MinMax::Min),
+            ["" | "builtins", "max"] => Some(MinMax::Max),
+            _ => None,
        }


Nit: Consider adding a method is_builtin to QualifiedName that tests the segments

if qualified_name.is_builtin("min") { Some(MinMax::Min) } else if qualified_name.is_builtin("max") { Some(MinMax::Max) } else { None }

QualifiedName already has an is_builtin method, which does something slightly different right now:

ruff/crates/ruff_python_ast/src/name.rs

Lines 50 to 54 in cbd5001

/// If the first segment is empty, the `CallPath` is that of a builtin.

/// Ex) `["", "bool"]` -> `"bool"`

pub fn is_builtin(&self) -> bool {

matches!(self.segments(), ["", ..])

}

MichaReiser · 2024-04-15T07:42:25Z

crates/ruff_python_semantic/src/model.rs

+    /// Return `true` if `member` is a reference to `builtins.$target`,
+    /// i.e. either `object` (where `object` is not overridden in the global scope),
+    /// or `builtins.object` (where `builtins` is imported as a module at the top level)
+    pub fn match_builtin_expr(&self, expr: &Expr, symbol: &str) -> bool {


I wonder if we could instead change the method to return a QualifiedName where constructing the QualifiedName short-circuits if Modules::BUILTINS isn't imported.

I'm bringing this up because QualifiedName already has methods for handling builtins (that might be incorrect?)

pub fn is_builtin(&self) -> bool { matches!(self.segments(), ["", ..]) }

So what I would have in mind is

model.resolve_builtin_name(expr).is_some_and(|builtin| builtin.is_builtin_name("open"))

We could still have a helper as you have today to avoid some of the repetition (is_some_and is somewhat annoying). But it would build up on the same concepts.

I think 4f62717 goes some way to addressing this comment, though I haven't implemented your suggestion in exactly the way you suggested. What do you think of that solution?

AlexWaygood · 2024-04-15T09:06:01Z

@MichaReiser's suggestions to move the symbol lookups lower down the function for each rule got rid of the performance regression 🥳🥳

charliermarsh · 2024-04-15T11:44:14Z

That's great! In my head those lookups didn't matter since they were only occurring when we saw the "right" name (e.g., open for a rule that cares about open), but of course that's not true -- we were doing them far more frequently on this branch.

charliermarsh · 2024-04-15T11:47:51Z

crates/ruff_linter/src/rules/flake8_bugbear/rules/getattr_with_constant.rs

@@ -75,7 +69,7 @@ pub(crate) fn getattr_with_constant(
    if is_mangled_private(value.to_str()) {
        return;
    }
-    if !checker.semantic().is_builtin("getattr") {
+    if !checker.semantic().match_builtin_expr(func, "getattr") {


The counterpoint is that we're now doing other work that isn't strictly necessary (for example, if this is a name, and builtins isn't imported, and the name isn't getattr, then the call to is_mangled_private wasn't necessary -- we could've exited much earlier; similarly, we risk calling is_identifier on many more function calls that won't ever match).

This is clearly still the right tradeoff, but that's the tradeoff.

The only way around this would be to, like, have some could_be_builtin("getattr") method you call at the top that returns false if builtins isn't imported, it's not a name, or the name isn't getattr, but IDK, that seems not great / worthwhile.

These things (like is_identifier) do show up in traces sometimes because we tend to be calling these rules for every function call, which is why being able to gate on something as cheap as "the name of the function" is useful.

I don't think we need to change anything here but wanted to raise visibility on it.

AlexWaygood · 2024-04-15T11:56:06Z

Benchmarks are now showing an... improvement? Not sure how that's possible, but I'll take it https://codspeed.io/astral-sh/ruff/branches/AlexWaygood:improve-builtins-handling

…overridden

…ltin_symbol` or `match_builtin_expr` (#10982) ## Summary This PR switches more callsites of `SemanticModel::is_builtin` to move over to the new methods I introduced in #10919, which are more concise and more accurate. I missed these calls in the first PR.

AlexWaygood changed the title ~~Add a new method to check whether an Expr refers to a builtin symbol~~ Improve handling of builtin symbols in linter rules Apr 13, 2024

AlexWaygood added the linter Related to the linter label Apr 13, 2024

charliermarsh reviewed Apr 13, 2024

View reviewed changes

crates/ruff_python_semantic/src/model.rs Outdated Show resolved Hide resolved

AlexWaygood requested a review from charliermarsh April 13, 2024 18:51

charliermarsh reviewed Apr 13, 2024

View reviewed changes

crates/ruff_python_semantic/src/model.rs Outdated Show resolved Hide resolved

AlexWaygood commented Apr 13, 2024

View reviewed changes

crates/ruff_python_semantic/src/model.rs Outdated Show resolved Hide resolved

diceroll123 mentioned this pull request Apr 14, 2024

[refurb] Implement fstring-number-format (FURB116) #10921

Merged

AlexWaygood force-pushed the improve-builtins-handling branch 2 times, most recently from 168f9b7 to 054f397 Compare April 14, 2024 10:06

MichaReiser reviewed Apr 15, 2024

View reviewed changes

AlexWaygood force-pushed the improve-builtins-handling branch 2 times, most recently from b8bfb9c to 4e3c7c0 Compare April 15, 2024 08:46

AlexWaygood force-pushed the improve-builtins-handling branch 3 times, most recently from 513de18 to 08c5e30 Compare April 15, 2024 11:39

charliermarsh reviewed Apr 15, 2024

View reviewed changes

AlexWaygood force-pushed the improve-builtins-handling branch from 3f4729b to 7cefed0 Compare April 15, 2024 11:47

charliermarsh approved these changes Apr 15, 2024

View reviewed changes

AlexWaygood requested a review from MichaReiser April 15, 2024 12:58

AlexWaygood added 2 commits April 15, 2024 14:03

Add a new method to check whether an Expr refers to a builtin symbol

41df399

Use the new method across our linter rules

096abdd

AlexWaygood force-pushed the improve-builtins-handling branch 2 times, most recently from 03dada4 to 4f62717 Compare April 15, 2024 13:22

AlexWaygood added 7 commits April 15, 2024 16:20

Use the new semantic-model method in refurb rules

87fca4e

Add a new method to improve our ability to autofix when builtins are …

7167702

…overridden

Use the new method for autofixes across various lint rules

7cbd0a6

Move match_builtin_expr calls lower down all the rules

04ad9d0

Improve other areas of the semantic model also

f6cb958

Equivalent fixes to ruff_python_stdlib also

4567626

Reimplement the high-level abstraction based on a lower-level one

e441e77

AlexWaygood force-pushed the improve-builtins-handling branch from 4f62717 to e441e77 Compare April 15, 2024 15:21

AlexWaygood merged commit f779bab into astral-sh:main Apr 16, 2024
17 checks passed

AlexWaygood deleted the improve-builtins-handling branch April 16, 2024 10:37

This was referenced Apr 16, 2024

Improve inference capabilities of the BuiltinTypeChecker #10976

Merged

Change more usages of SemanticModel::is_builtin to use resolve_builtin_symbol or match_builtin_expr #10982

Merged

BrewTestBot mentioned this pull request Apr 18, 2024

ruff 0.4.0 Homebrew/homebrew-core#169465

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve handling of builtin symbols in linter rules #10919

Improve handling of builtin symbols in linter rules #10919

AlexWaygood commented Apr 13, 2024

github-actions bot commented Apr 13, 2024 •

edited

Loading

AlexWaygood commented Apr 13, 2024 •

edited

Loading

AlexWaygood commented Apr 13, 2024 •

edited

Loading

charliermarsh commented Apr 13, 2024

AlexWaygood commented Apr 13, 2024

charliermarsh commented Apr 13, 2024

charliermarsh commented Apr 13, 2024

AlexWaygood commented Apr 13, 2024 •

edited

Loading

charliermarsh commented Apr 13, 2024

carljm commented Apr 14, 2024

charliermarsh commented Apr 14, 2024

carljm commented Apr 14, 2024 •

edited

Loading

charliermarsh commented Apr 14, 2024

AlexWaygood commented Apr 14, 2024 •

edited

Loading

MichaReiser Apr 15, 2024

AlexWaygood Apr 15, 2024

AlexWaygood Apr 15, 2024

MichaReiser Apr 15, 2024

AlexWaygood Apr 15, 2024

MichaReiser Apr 15, 2024

AlexWaygood Apr 15, 2024 •

edited

Loading

MichaReiser Apr 16, 2024

AlexWaygood commented Apr 15, 2024

charliermarsh commented Apr 15, 2024

charliermarsh Apr 15, 2024

charliermarsh Apr 15, 2024

AlexWaygood Apr 15, 2024

charliermarsh Apr 15, 2024

charliermarsh Apr 15, 2024

charliermarsh Apr 15, 2024

AlexWaygood commented Apr 15, 2024

	/// If the first segment is empty, the `CallPath` is that of a builtin.
	/// Ex) `["", "bool"]` -> `"bool"`
	pub fn is_builtin(&self) -> bool {
	matches!(self.segments(), ["", ..])
	}

Improve handling of builtin symbols in linter rules #10919

Improve handling of builtin symbols in linter rules #10919

Conversation

AlexWaygood commented Apr 13, 2024

Summary

Test Plan

github-actions bot commented Apr 13, 2024 • edited Loading

ruff-ecosystem results

Linter (stable)

Linter (preview)

Formatter (stable)

Formatter (preview)

AlexWaygood commented Apr 13, 2024 • edited Loading

AlexWaygood commented Apr 13, 2024 • edited Loading

charliermarsh commented Apr 13, 2024

AlexWaygood commented Apr 13, 2024

charliermarsh commented Apr 13, 2024

charliermarsh commented Apr 13, 2024

AlexWaygood commented Apr 13, 2024 • edited Loading

charliermarsh commented Apr 13, 2024

carljm commented Apr 14, 2024

charliermarsh commented Apr 14, 2024

carljm commented Apr 14, 2024 • edited Loading

charliermarsh commented Apr 14, 2024

AlexWaygood commented Apr 14, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AlexWaygood Apr 15, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AlexWaygood commented Apr 15, 2024

charliermarsh commented Apr 15, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AlexWaygood commented Apr 15, 2024

github-actions bot commented Apr 13, 2024 •

edited

Loading

`ruff-ecosystem` results

AlexWaygood commented Apr 13, 2024 •

edited

Loading

AlexWaygood commented Apr 13, 2024 •

edited

Loading

AlexWaygood commented Apr 13, 2024 •

edited

Loading

carljm commented Apr 14, 2024 •

edited

Loading

AlexWaygood commented Apr 14, 2024 •

edited

Loading

AlexWaygood Apr 15, 2024 •

edited

Loading