Skip to content

Conversation

@ntBre
Copy link
Contributor

@ntBre ntBre commented May 12, 2025

Summary

This PR unifies the ruff Message enum variants for syntax errors and rule violations into a single Message struct consisting of a shared db::Diagnostic and some additional, optional fields used for some rule violations.

This version of Message is nearly a drop-in replacement for ruff_diagnostics::Diagnostic, which is the next step I have in mind for the refactor.

I think this is also a useful checkpoint because we could possibly add some of these optional fields to the new Diagnostic type. I think we've previously discussed wanting support for Fixes, but the other fields seem less relevant, so we may just need to preserve the Message wrapper for a bit longer.

Test plan

Existing tests

@ntBre ntBre added the diagnostics Related to reporting of diagnostics. label May 12, 2025
@github-actions
Copy link
Contributor

github-actions bot commented May 12, 2025

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

@codspeed-hq
Copy link

codspeed-hq bot commented May 12, 2025

CodSpeed Performance Report

Merging #18051 will not alter performance

Comparing brent/diagnostic-refactor-2 (ae0b0ad) with main (220137c)

Summary

✅ 34 untouched benchmarks

@ntBre
Copy link
Contributor Author

ntBre commented May 12, 2025

I made a couple of tweaks in 28ad260 to avoid constructing a Rule from a DiagnosticKind up front. This means we don't get a meaningful DiagnosticId anymore (we aren't really using it yet anyway), but the perf regression dropped from a worst case of 8% to 3%. If I'm reading the graph correctly, the remaining change is from allocating the Vecs inside the new Diagnostic type, which I think is unavoidable.

parent: Option<TextSize>,
fix: Option<Fix>,
noqa_offset: TextSize,
noqa_offset: Option<TextSize>,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change didn't actually end up helping at all (besides None being slightly shorter than TextSize::default()), so I'm happy to revert it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It probably even increases the size of the struct but I think it's more correct overall.

Comment on lines 244 to 257
let kind = DiagnosticKind {
name: self.name().to_string(),
body: String::new(),
suggestion: None,
};
Some(kind.rule())
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can add a newtype with a generated AsRule implementation here instead of reusing DiagnosticKind if we want to keep this approach.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you check where rule is used? Would it be sufficient to return a NoqaCode instead of the entire rule?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

message.rule().noqa_code() does look like the most common usage, but there are multiple places accessing the Rule name and URL too.

@ntBre ntBre force-pushed the brent/diagnostic-refactor-2 branch from 2080d9c to 83d0fdf Compare May 12, 2025 20:01
@ntBre ntBre closed this May 12, 2025
@ntBre ntBre reopened this May 12, 2025
@ntBre ntBre marked this pull request as ready for review May 12, 2025 21:06
@ntBre ntBre requested review from BurntSushi and removed request for AlexWaygood, carljm, dcreager and sharkdp May 12, 2025 21:06
Copy link
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this.

I haven't done an in-depth review. The two things that stand out to me:

  • We'll need a way to store the noqa comment. I think the way you avoided this for now is to lookup the rule by code when necessary. I wonder if it would make sense to extend the diagnostic model to support a secondary code (where id = name, secondary code = noqa)
  • I don't think we should abuse primary_message for the id and fake an UnknownRule Diagnostic. I think it should be possible to change DiagnosticKindname to&'static str`
  • Regarding perf: It may be worth exploring if using small vecs in Diagnostic helps with perf (it comes at the downside of increasing the Diagnostic size overall). Maybe best to explore separately

Comment on lines 112 to 117
// TODO we'd prefer to use DiagnosticId::Lint(LintName::of(kind.rule().into())) since that
// will give the proper kebab-case lint name, but DiagnosticKind::rule caused a substantial
// perf regression
let mut diagnostic = db::Diagnostic::new(DiagnosticId::UnknownRule, Severity::Error, name);
let span = Span::from(file).with_range(range);
diagnostic.annotate(Annotation::primary(span).message(body));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should do this because it isn't a temporary hack.

Unlike ty, ruff has two codes for every rule: The name and the noqa code. I don't think we can avoid storing both those fields on Diagnostic.

One option is to defer this problem for now and store the NoqaCode as a separate field.

I'm also not sure if this is the right way to work around the performance regression. DiagnosticKind stores a name but that name would be unused once we use the DiagnosticId. Could we change DiagnosticKind::name to store a &'static str? to avoid the rule call or could we store the rule on DiagnosticKind?

I think it is worth doing some more exploration on how and if we can change the DiagnosticKind/Diagnostic representation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. I guess hand-wavily I was hoping it would be temporary and that a better solution would appear when replacing Diagnostic or something, but it definitely makes sense to use DiagnosticId correctly here.

I don't think we can change DiagnosticKind::name to be a &'static str, at least without calling Box::leak or similar. It's stored in CacheMessage and used in serialization.

I'll look more closely at storing the rule on DiagnosticKind. I'm not sure it would help perf-wise, but it still might make sense. I think this is how they're typically constructed from a Violation:

impl<T> From<T> for DiagnosticKind
where
T: Violation,
{
fn from(value: T) -> Self {
Self {
body: Violation::message(&value),
suggestion: Violation::fix_title(&value),
name: T::rule_name().to_string(),
}
}
}

Copy link
Member

@MichaReiser MichaReiser May 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use a different representation for caching that then goes through the rules lookup to get the diagnostic kind?

Comment on lines 244 to 257
let kind = DiagnosticKind {
name: self.name().to_string(),
body: String::new(),
suggestion: None,
};
Some(kind.rule())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you check where rule is used? Would it be sufficient to return a NoqaCode instead of the entire rule?

@MichaReiser
Copy link
Member

MichaReiser commented May 13, 2025

To extend on my comment. I expect that one of the next steps is to eliminate DiagnosticKind. Because of that: Could we change DiagnosticKind in ways that make this refactor easier:

  • Change name to a LintName?
  • Store the noqa code in addition to the name (would this remove the need to call rule in code using DiagnosticKind and higher?
  • ...

If so, it could then make sense to implement some of those changes as a separate PR

Copy link
Member

@BurntSushi BurntSushi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a ton of context on the Ruff side of things here, but extending the diagnostic model for the noqa code seems fine to me.

And trying out smallvecs in Diagnostic also seems okay. Note that a Diagnostic is already using an Arc internally, so a Diagnostic itself should stay pointer-sized.

@ntBre ntBre marked this pull request as draft May 13, 2025 17:59
ntBre added a commit that referenced this pull request May 15, 2025
## Summary

This PR deletes the `DiagnosticKind` type by inlining its three fields
(`name`, `body`, and `suggestion`) into three other diagnostic types:
`Diagnostic`, `DiagnosticMessage`, and `CacheMessage`.

Instead of deferring to an internal `DiagnosticKind`, both `Diagnostic`
and `DiagnosticMessage` now have their own macro-generated `AsRule`
implementations.

This should make both #18051 and
another follow-up PR changing the type of `name` on `CacheMessage`
easier since its type will be able to change separately from
`Diagnostic` and `DiagnosticMessage`.

## Test Plan

Existing tests
@ntBre ntBre force-pushed the brent/diagnostic-refactor-2 branch 3 times, most recently from 719c18e to bfd37ac Compare May 15, 2025 19:27
@ntBre ntBre force-pushed the brent/diagnostic-refactor-2 branch from 089c838 to 3ff0881 Compare May 16, 2025 13:11
@ntBre ntBre added the internal An internal refactor or improvement label May 16, 2025
@ntBre ntBre marked this pull request as ready for review May 16, 2025 22:11
Copy link
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great.

Could you run the hyperfine benchmarks documented in the contribution guideline. Just to make sure we don't regress caching (and test that caching works too because I don't think we have any tests for it). In general. I think it would be good to do some extensive testing of the CLI (statistics etc)

}

/// Create a [`Message`] from the given [`Diagnostic`] corresponding to a rule violation.
pub fn from_diagnostic(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: The naming here is a bit confusing. Maybe from_ruff_diagnostic? (But I'm also okay leaving it as is, if the type is going away anyway)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, this and the diagnostic method don't make sense without the history of the DiagnosticMessage variant. I think I will just leave them for now since the intention is to delete them both in my next PR after this and #18142.

Message::SyntaxError(_) => "SyntaxError",
pub fn name(&self) -> &'static str {
if self.is_syntax_error() {
"syntax-error"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this is interesting. We need to think about how we want to handle the migration from syntax-error to invalid-syntax.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing this to

    pub fn name(&self) -> &'static str {
        self.diagnostic.id().as_str()
    }

currently only fails one test for the --statistics flag.

But I agree, I'm not sure where else this could pop up that might not be tested.

Comment on lines 214 to 218
self.diagnostic
.primary_annotation()
.expect("Expected a primary annotation for a ruff diagnostic")
.get_message()
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I think I'd lean towards being more forgiving here and allow missing primary annotations, given that the method returns an Option anyway.

Suggested change
self.diagnostic
.primary_annotation()
.expect("Expected a primary annotation for a ruff diagnostic")
.get_message()
}
self.diagnostic
.primary_annotation()?
.get_message()
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, there were a couple of places like that. I'll look for those too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, the only other one was the rule. I think we might still want to expect a valid rule name, otherwise I think to_rule is sometimes used to indicate a syntax error in the None case.

}

/// Returns the [`Rule`] corresponding to the diagnostic message.
pub fn rule(&self) -> Option<Rule> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be great if we can remove Rule from Message and only store the noqa code (alternate code). But I agree with you that we should leave this to a separate refactor.

.to_string(),
),
}
pub fn filename(&self) -> String {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, it's annoying that we need to return a String here. I'm leaning towards changing primary_span to return a &Span. We can leave this to a separate PR to also get @BurntSushi's approval but I think it would be great if we don't need to clone unnecessarily

Comment on lines +13 to 14
#[derive(Clone, Copy, PartialEq, Eq, PartialOrd, Ord)]
pub struct NoqaCode(&'static str, &'static str);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More an idea for a separate PR if you're interested in it. I wonder if we should store the code as a single &str, together with the byte-offset where the prefix ends and the suffix starts. This would have the advantage that it isn't necessary to call to_string to get the full noqa code (and we can still return a &str for both prefix and suffix in O(1))

Thinking about this more. It's probably not even necessary to store the offset and we can simply compute the split point on demand.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added these three (NoqaCode instead of rule, &Span, and this) to my todo list as follow-ups!

@ntBre
Copy link
Contributor Author

ntBre commented May 19, 2025

This is great.

Thank you! And thanks for the reviews! I think all of the inline comments should be resolved.

Could you run the hyperfine benchmarks documented in the contribution guideline. Just to make sure we don't regress caching (and test that caching works too because I don't think we have any tests for it). In general. I think it would be good to do some extensive testing of the CLI (statistics etc)

This is looking like a good suggestion, unlike the 6x speedup in the contributing guide, I'm seeing a 1.3x speedup:

Benchmark 1: ./target/release/ruff check ./crates/ruff_linter/resources/test/cpython/ --no-cache -e
  Time (mean ± σ):     740.2 ms ±   6.2 ms    [User: 3241.7 ms, System: 135.5 ms]
  Range (min … max):   730.9 ms … 751.4 ms    10 runs

Benchmark 2: ./target/release/ruff check ./crates/ruff_linter/resources/test/cpython/ -e
  Time (mean ± σ):     564.5 ms ±   5.6 ms    [User: 628.6 ms, System: 133.7 ms]
  Range (min … max):   556.8 ms … 575.6 ms    10 runs

Summary
  ./target/release/ruff check ./crates/ruff_linter/resources/test/cpython/ -e ran
    1.31 ± 0.02 times faster than ./target/release/ruff check ./crates/ruff_linter/resources/test/cpython/ --no-cache -e

However, that's almost identical on main:

Benchmark 1: ./target/release/ruff check ./crates/ruff_linter/resources/test/cpython/ --no-cache -e
  Time (mean ± σ):     710.9 ms ±   4.5 ms    [User: 3164.4 ms, System: 130.1 ms]
  Range (min … max):   701.2 ms … 718.4 ms    10 runs

Benchmark 2: ./target/release/ruff check ./crates/ruff_linter/resources/test/cpython/ -e
  Time (mean ± σ):     546.8 ms ±   5.1 ms    [User: 611.6 ms, System: 135.7 ms]
  Range (min … max):   539.1 ms … 554.2 ms    10 runs

Summary
  ./target/release/ruff check ./crates/ruff_linter/resources/test/cpython/ -e ran
    1.30 ± 0.01 times faster than ./target/release/ruff check ./crates/ruff_linter/resources/test/cpython/ --no-cache -e

Do you have anything in particular in mind for testing the CLI otherwise? Or should I just try the release build on a few projects? I guess the ecosystem check showing no changes is somewhat comforting too.

@MichaReiser
Copy link
Member

The caching speed is surprising. I wonder if this is something we broke recently? If you've time, it might be great to do a quick check that we aren't rerunning any lint rules (maybe just add a panic?). Definetely not blocking this PR, given that this already was like this before

Do you have anything in particular in mind for testing the CLI otherwise? Or should I just try the release build on a few projects? I guess the ecosystem check showing no changes is somewhat comforting too.

Not really. Maybe play with a few output formats?

@ntBre
Copy link
Contributor Author

ntBre commented May 19, 2025

The caching speed is surprising. I wonder if this is something we broke recently? If you've time, it might be great to do a quick check that we aren't rerunning any lint rules (maybe just add a panic?). Definetely not blocking this PR, given that this already was like this before

Will do, I might bisect a bit too.

Not really. Maybe play with a few output formats?

Will do!

@ntBre
Copy link
Contributor Author

ntBre commented May 19, 2025

It looks like the caching slowdown is not recent, it's been in a pretty steady decline since that 6x number was recorded. These tags aren't exact, just the tag before the commit I was bisecting:

Tag Speedup
0.11.10 1.3
0.9.2 1.4
0.7.3 1.7
0.5.7 1.7
0.5.0 1.7
0.4.5 2.6
0.3.0 3.4
0.0.290 3.6
0.0.240 6.1

Speedup is the relative speedup reading from cache, as reported in the hyperfine summary above.

I couldn't actually build 0.0.240 because of a git dep on libCST that doesn't exist anymore, so that number is from CONTRIBUTING.md. 0.0.290 is the earliest version I could build easily because it added the ruff_linter crate, which is where I had my CPython clone.

I added a panic after the caching early return in lint_path and can confirm that the cache is working too. (I also tried adding a panic before the early return, right at the top of lint_path and it definitely panicked)

@MichaReiser
Copy link
Member

Thanks for the thorough analysis. Let's create an issue to investigate if we can speed up caching. A 30% speed up is rather ridiculous (it should be at least 2x to justify its complexity)

@ntBre ntBre mentioned this pull request May 19, 2025
@ntBre
Copy link
Contributor Author

ntBre commented May 19, 2025

For the CLI testing, I wrote a little script to loop through and diff the various output formats. It looks like the main difference is actually the noqa_offset change, which is encoded as "noqa_row": null instead of "noqa_row": 1 in the JSON and JSON-lines output formats.

The gitlab fingerprints also differ compared to 0.11.10, but I think that's from the previous PR, and we expected that there.

I also diffed the --statistics output, and I think it is unchanged, except that the sort appears to be unstable within groups of rules with the same number of diagnostics.

Script:

formats=(
	concise
	full
	json
	json-lines
	junit
	grouped
	github
	gitlab
	pylint
	rdjson
	azure
	sarif
)

for format in ${formats[@]}; do
	echo "checking output format: $format"
	args="check --no-cache crates/ruff_linter/resources/test/cpython --output-format $format"
	diff <(ruff $args) <(target/release/ruff $args)
done

Statistics check:

diff  <(ruff check --no-cache crates/ruff_linter/resources/test/cpython --statistics) <(target/release/ruff check --no-cache crates/ruff_linter/resources/test/cpython --statistics)

@MichaReiser
Copy link
Member

MichaReiser commented May 19, 2025

For the CLI testing, I wrote a little script to loop through and diff the various output formats. It looks like the main difference is actually the noqa_offset change, which is encoded as "noqa_row": null instead of "noqa_row": 1 in the JSON and JSON-lines output formats.

Can you double check that this won't break ruff-lsp?

@ntBre
Copy link
Contributor Author

ntBre commented May 19, 2025

I tested in VS Code with the native server disabled on both files from CPython and from the openai-cookbook (I knew they had many notebooks from seeing it in the ecosystem check), and it seems to be working. I also confirmed that DiagnosticData.noqa_row is annotated as int | None in ruff-lsp, so I think null should be handled gracefully.

I also tested the playground locally, just in case.

Should we ping Dhruv for (or revert) the noqa_offset change? That seems like the biggest potential issue to me, but the things I've tried look okay so far.

@MichaReiser
Copy link
Member

I'm fine moving forward. Making it Option would be nice for the Diagnostic type and you did your share of testing.

@ntBre
Copy link
Contributor Author

ntBre commented May 19, 2025

Sounds good, thanks for the testing ideas! I'll plan to merge this soon then.

@ntBre ntBre merged commit d6009eb into main May 19, 2025
34 checks passed
@ntBre ntBre deleted the brent/diagnostic-refactor-2 branch May 19, 2025 17:34
dcreager added a commit that referenced this pull request May 19, 2025
…rals

* origin/main:
  [ty] Add hint that PEP 604 union syntax is only available in 3.10+ (#18192)
  Unify `Message` variants (#18051)
  [`airflow`] Update `AIR301` and `AIR311` with the latest Airflow implementations (#17985)
  [`airflow`] Move rules from `AIR312` to `AIR302` (#17940)
  [ty] Small LSP cleanups (#18201)
  [ty] Show related information in diagnostic (#17359)
  Default `src.root` to `['.', '<project_name>']` if the directory exists (#18141)
dcreager added a commit that referenced this pull request May 19, 2025
* main:
  [ty] Use first matching constructor overload when inferring specializations (#18204)
  [ty] Add hint that PEP 604 union syntax is only available in 3.10+ (#18192)
  Unify `Message` variants (#18051)
  [`airflow`] Update `AIR301` and `AIR311` with the latest Airflow implementations (#17985)
  [`airflow`] Move rules from `AIR312` to `AIR302` (#17940)
  [ty] Small LSP cleanups (#18201)
  [ty] Show related information in diagnostic (#17359)
  Default `src.root` to `['.', '<project_name>']` if the directory exists (#18141)
Glyphack pushed a commit to Glyphack/ruff that referenced this pull request May 21, 2025
## Summary

This PR deletes the `DiagnosticKind` type by inlining its three fields
(`name`, `body`, and `suggestion`) into three other diagnostic types:
`Diagnostic`, `DiagnosticMessage`, and `CacheMessage`.

Instead of deferring to an internal `DiagnosticKind`, both `Diagnostic`
and `DiagnosticMessage` now have their own macro-generated `AsRule`
implementations.

This should make both astral-sh#18051 and
another follow-up PR changing the type of `name` on `CacheMessage`
easier since its type will be able to change separately from
`Diagnostic` and `DiagnosticMessage`.

## Test Plan

Existing tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

diagnostics Related to reporting of diagnostics. internal An internal refactor or improvement

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants