Give all fatals, errors, and warnings unique diagnostic codes #12144

brson · 2014-02-10T02:05:32Z

Initial support for #2092.

There's a lot more that can be done to make this useful, but I'm hoping to get at least the error code conversion upstream.

Codes are a single letter followed by four digits, 'C0348'. The letter is just a simple namespace, currently rustc uses 'A', syntax::ext uses 'B', and the rest of syntax uses 'C'. For errors to be stable we have to live with this scheme forever, so think about it.

The procedure for introducing a new error is:

Add a new code to either libsyntax/diag_index.rs or librustc/diag_index.rs, depending on which crate emits the errors.
Use the code in the span_fatal!, span_err!, span_warn!, alert_fatal!, alert_err!, alert_warn! macros:

span_fatal!(cx, sp, C0348, "found a rotten {}", vegetable);

Then some time later to add a FAQ about it you modify the diag_db_data.rs file:

(C0348, "

Some markdown about the error.

")

When the user hits the error it says

../src/test/compile-fail/bad-bang-ann.rs:14:5: 14:36 error: found a rotten potato [C0348*]
../src/test/compile-fail/bad-bang-ann.rs:14     if i < 0u { } else { fail!(); }
                                                ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
error: aborting due to previous error
note: some of these errors have extended explanations (see `rustc --explain help`)

The error code is displayed in magenta [C0348*], with the asterisk indicating there is additional info. Then there's a note at the end that hints how to use it.

You then run rustc --explain help to learn about this feature.

Rust includes extended documentation about some compiler errors
that explain in greater depth what the errors means, present examples,
and suggestions for how to fix them.

Each Rust error message has a corresponding code. When emitted by
rustc the code will be included in square brackets, like `[A0001]`. If
the error has additional documentation the code will be appended with
an asterisk, as in `[A0002*]`.

To view the extended documentation, run `rustc --explain A0002`, replacing
'A0002' with your error code.

The extent and quality of extended error documentation depends on user
contributions. To learn how to improve Rust's error documentation visit
http://github.com/mozilla/rust/wiki/Note-extended-diagnostics.

You then rust rustc --explain C0348 to get

# C0348: found a rotten {}

Some markdown about the error.

Weaknesses

There are a lot of very similar errors, particularly in the syntax crate, that all get different codes.
Lints all fall under a single code for now because they are reported in a way that is not compatible with the macros used in this scheme.
Splitting up the diag_index.rs files when refactoring crates could get ugly.
Editing docs means editing Rust macros in a .rs file.
The extended docs use the error's format string as a title, and some of our fmt strings are like "{} {}", which is not instructive. I think we may be able to substitute parameter names into the fmt strings to make them display better.
Could be documented better.
No actually extended docs yet.
This adds a __tt_map feature gate and two ugly syntax extensions behind it.

Future

Needs to export docs to markdown so examples can be tested
Need to create wiki page mentioned in --explain help

Behind the `__tt_map` feature gate. This feature name starts with a double underscore to emphasize it's a hack, but there's no precedent for naming features this way.

This provides a way to create stable diagnostic codes, while keeping the overal burden on compiler writers low, and introducing a way to incrementally provide extended documentation of diagnostics.

This pairs almost every error in libsyntax and librustc with an error code, emitting with the macros: * alert_fatal!(cx, code, fmt, args); * alert_err!(cx, code, fmt, args); * alert_warn!(cx, code, fmt, args); * span_fatal!(cx, span, code, fmt, args); * span_err!(cx, span, code, fmt, args); * span_warn!(cx, span, code, fmt, args); * resolve_err!(cx, span, code, fmt, args); These macros call the methods 'fatal_without_diagnostic_code' etc. on any given context. For the most part the old diagnostic methods on the various sessions and handles and contexts have been renamed in a way that is obnoxious to call (use the macros), but the macro ExtCtxt still contains the simple `fatal` methods, etc. for ease of use by out of tree procedural macros. Lint errors are difficult to convert to this system because they don't use string literals for reporting errors, so they don't have their own codes yet.

When diagnostics with codes are emitted they appear in magenta brackets, like `[A0002]`. After failure a note is emitted indicating whether any errors had extended documentation and suggesting the `--explain help` flag.

brson · 2014-02-10T02:13:22Z

@larsbergstrom wondering what you think about codes for compiler errors.

larsbergstrom · 2014-02-10T02:41:53Z

@brson My memory is that they are useful for two things:

You have the freedom to change your warning & error messages. The two big scenarios where the numbers get used are in the source code entries that disable warnings for an individual block and when people write tools that scrape the output of the compiler (either for strange build systems or for custom IDE tools).
They're dramatically easier for users to get help on, particularly when you localize the error message strings to different languages (which I don't know if we're planning to do for Rust). Then, somebody who gets the Japanese error text can still search for the error number in the documentation or on stackoverflow and find a useful entry, where searching for the text would probably miss for any language other than English.

I'll ask around with some former colleagues if there were any other tradeoffs (since the practice long predated me).

chris-morgan · 2014-02-10T04:48:27Z

For comparison, PyLint uses the letter-number scheme also and letters refer to categories. F for fatal, E for error, W for warning, R for refactor, C for convention.

I believe that some other languages and surrounding tools operate in this way also; beyond that I know that Vim's error formatting expects one letter to indicate the category (E for error, W for warning), but it's common for that to be handled in the 'errorformat' with items like %trror and %tarning` for "error" and "warning"; given that we have the words "error", "warning" and "note", I don't think we need concern ourselves in that way—I merely mention it as a data point for consideration.

As a counter-example, C♯ uses CS0000–CS9999 for both errors and warnings; I do not know whether there is logic in the assignation of numbers therein. Still, using the common prefix CS will make searching for the errors slightly easier.

I like the significant-letters scheme demonstrated in PyLint. Granted, the rustc/syntax::ext/syntax split is already a fair way along in that direction, but I'd prefer to see it more clear, especially with obvious letters like W for warning and E for error and L for lint. I can see something like number series allocation occurring, e.g. numbers starting with 1 being syntax (e.g. E1001, W1001), 2 being syntax::ext (e.g. E2001, W2001), and 3 being rustc. Certainly spending time figuring out nice series that should work in the longer term is a good plan; a second digit could be allocated to specific areas, such as having numbers starting with 31 being lifetime-related things in rustc.

Thinking along these lines, it'd be kinda nice if we could produce some form of error hash for ICEs. The Python exception handling/reporting library Mongoose produces Mongoose Incident Identifiers which are just this. Of course, by the time we get to 1.0 we don't want to have any ICEs occurring, and the failure message is typically good enough for the rarity with which we desire these to occur. Still, it's a nice idea. But certainly out of scope for this issue.

alexcrichton · 2014-02-10T05:59:34Z

src/libsyntax/diag_macros.rs

+        reg_diag_msg!($name, $msg);
+        let f: |&str, &str| -> () = $f;
+        f(stringify!($name), $msg);
+    } }


This could cause subtle differences in error messages, perhaps these two cases should be combined? If you specify the error string "#" (no arguments) you can't add an argument without changing the error string to r"\# {}". This is more of a usability thing, but I think you can merge these cases like:

macro_rules! report_diag ( ($f: expr, $name: tt, $msg:expr $($arg: tt)*) => { { reg_diag_msg!($name, $msg); let msg: &str = format!($msg $($arg)*); let f: |&str, &str| -> () = $f; f(stringify!($name), msg); } } )

Note that using $($arg:tt)* you also allow for named arguments. In using $($arg:expr),* you're requiring valid rust expressions which I'm not sure foo = bar, bar = baz will work, but perhaps it may?

Ah, my second question was why you coerce f to returning (), because it seems unfortunate to fail!() explicitly below when you could rely on the return value of sess.fatal_with_diagnostic_code to return !.

Also if this changes to $($arg:tt)* then the changes need to be propagated below.

alexcrichton · 2014-02-10T06:29:40Z

I personally like the preceding letter being an indicator for the component which is emitting the diagnostic, although A, B, and C may not be granular enough to be too useful. I thought we'd give typechecking a letter, borrow checking another letter, perhaps resolve + privacy a letter, etc.

It would be interesting to print out whether the diagnostic is fatal/warning/error as part of --explain, you should have all the info to go in the database anyway.

huonw · 2014-02-10T06:47:49Z

W for warning and E for error and L for lint

$ git grep '\.span_err' src/lib{syntax,rustc} | wc -l
334
$ git grep '\.span_fatal' src/lib{syntax,rustc} | wc -l
69
$ git grep '\.span_note' src/lib{syntax,rustc} | wc -l
61
$ git grep '\.span_warn' src/lib{syntax,rustc} | wc -l
10

I don't think it's worth distinguishing based on warning/errors... almost all compilation output is either an error, a note or a lint (which can be anything from nothing to an error). Almost everything that could be a warning (i.e. triggering on code that's not incorrect/nonsensical) is a lint.

As @alexcrichton suggests, something like T for typeck error, B for borrowck errors, R for resolution/privacy, P for parse, M ("macros") or E for syntax::ext seems more sensible.

huonw · 2014-02-10T06:58:50Z

src/librustc/front/feature_gate.rs

                        }
                        Some(&(_, Accepted)) => {
-                            sess.span_warn(mi.span, "feature has added to rust, \
+                            span_warn!(sess, mi.span, A0330, "feature has added to rust, \


Not directly relevant; but... either I'm very tired or "feature has added to rust" makes no sense.

I noticed that too.

brson · 2014-02-10T18:59:10Z

The reason I didn't make the letters mean anything is because I figured that would be hard to maintain in a project like Rust.

larsbergstrom · 2014-02-10T19:08:15Z

@brson One other thing to watch out with on the letters is not to distinguish them between warnings and errors if you want to be able to either switch them in the future or add a --warn-as-error flag (my C# contacts mentioned that's part of why they're undifferentiated there, as @chris-morgan pointed out).

pnkfelix · 2014-02-12T14:23:51Z

We might also consider making the letters more fine-grained within rustc. In particular, considering the suggestion of #12166, it might make sense to have the letters be tied to stages. (Or maybe not, since the letters/code themselves may end up being longer lived then the particular staging architecture we have at any point in time.)

bill-myers · 2014-02-17T20:23:31Z

The major downside of such a scheme is that it makes all patches that add new errors or warnings (which includes those that modify an existing error enough that it needs a new number) conflict with each other.

If you document the errors, then there is the downside that error documentation must be maintained and be in sync with the compiler code.

Before web search engines, documented error codes were useful, but nowadays you can just paste the fixed part of the error string into Google and usually find a bug report or stackoverflow post, which makes both error codes and documentation of much less value.

It also has an extremely "enterprisey" feel, as generally only large enterprise teams bother with such bureaucratic things (in fact, I think Microsoft compilers are the only popular compilers with error codes).

I'd suggest to not do this; if you really want to document errors, add the documentation as an extra parameter to functions that emit errors, and add a compiler flag to print out the documentation, and either before the first error is emitted or at the end of compilations with errors, print "Use --foo to get verbose documentation about all errors".

larsbergstrom · 2014-02-17T20:31:38Z

@bill-myers The numbers were used in compilers at Microsoft because they also localize the error messages, and if the users want to be able to look them up, the strings they'll get out aren't searchable without the numbers.

But even if we're not localizing and if you'd prefer to rely on stack overflow for documentation of errors, if you don't have warning/error numbers, then if your users want to be able to disable them from some part of the code base, you're stuck with some hideous approach like GCC, where I believe every uniquely disableable warning has to have its own command line flag and then users write those flags in special GCC pragmas:
http://gcc.gnu.org/onlinedocs/gcc-4.3.3/gcc/Diagnostic-Pragmas.html#Diagnostic-Pragmas

Ick.

bill-myers · 2014-02-17T20:50:13Z

Localizing compiler text is counterproductive, because programmers need to know English anyway to use libraries (there's no way 3rd party libraries will have localized documentation, and identifiers aren't localized anyway), and it makes it impossible to communicate or search the web.

Warnings are intrinsically a bad design, since something should either be allowed or be an error, and while unsafe standardized languages with undefined behavior like C need them, a safe language with defined behavior like Rust should not need them.

Disabling them by number is even worse, as it effectively adds the warning numbers to the language (awkward for other implementations of the language) and makes it impossible to figure out what is being disabled by looking at the source without looking up the number.

In general, it's best to offer simple and intuitive ways to avoid warnings, such as GCC's ``if((x = 0))` syntax to avoid the "did you meant '=='?" warnings.

larsbergstrom · 2014-02-17T21:10:33Z

Localizing compiler output makes sense if you are targeting certain markets (e.g., enterprise Japan) where they have sufficient monetary resources and, in practice, not only do third party libraries localize their documentation for them, but that market also has completely separate local library vendors that don't localize to English. But existence != necessity, and Rust should go the right way for its intended audience.

Certainly, I agree that it's nice to prefer errors to warnings. But with a systems language, if you don't have warnings then you just end up calling a bunch of things "diagnostics" or "lints" or "static analyses" as soon as you want to check things that are beyond your type system but are either undecidable or computationally expensive, such as integer overflow analysis or a possibly redundant clone call or whatever else we'll have in Rust. And now you've reinvented warnings, but they're either a separate tool or some separate set of flags that users think are unimportant, like "that valgrind thing."

All that said, I do agree with @bill-myers that having the --explain flag in the compiler is probably a bad idea. At the very least, we'd like to be able to make the explanations better asynchronous with users updating their version of rustc, and a wiki that explains the errors seems better for that. And improving the error text does not require going through the bors queue!

flaper87 · 2014-02-17T21:11:04Z

Have we thought about what happens if an error needs to be removed? Will that slot remain empty forever? I'd prefer keeping the slot free to avoid having different errors with the same code for different versions of Rust.

Although this might be quite obvious, I want to make sure we have thought about it and have it discussed somewhere.

pnkfelix · 2014-02-18T12:53:26Z

@flaper87 yes I believe the intention has always been that a code, once assigned, can never be reused. I think that is the meaning of @brson's use of the term "stable ids" as seen in e.g. #2092.

See e.g. the second paragraph of #8161 description. And of course, @brson's sentence "For errors to be stable we have to live with this scheme forever, so think about it" is a pretty strong hint that codes won't be reused.

flaper87 · 2014-02-18T13:36:11Z

@pnkfelix awesome, all that makes sense to me. I wanted to make sure we explicitly talked about it and that we agree on this.

alexcrichton · 2014-04-04T21:01:58Z

Closing due to inactivity.

I still think that this is a great idea to do, and I would love to see this implemented before 1.0. I think that the googleability of errors to find common solutions will benefit greatly from this.

pnkfelix · 2014-04-10T15:24:00Z

@bill-myers I emphatically object to your claim that web search engines make error codes have less value. (I am quoting that claim here: "Before web search engines, documented error codes were useful, but nowadays you can just paste the fixed part of the error string into Google and usually find a bug report or stackoverflow post, which makes both error codes and documentation of much less value.")

From my point of view, web search engines are a reason to put in stable error codes; such codes enable stable searches for error in question, while allowing the compiler developers to be free to change the error message output (e.g. to improve presentation/phrasing).

(It also enables a common hook for conversations regarding that error to use.)

Without the codes in place, I do not doubt that people will follow exactly the protocol that you describe for finding answers to your questions, but I think providing error codes will only enrich that protocol, not detract from it.

ghost · 2014-07-01T16:36:18Z

Okay, I'll pick this up. Especially having code examples in --explain would be fantastic.

I'm not gonna make any radical changes to @brson's design (is everyone still happy with it?) apart from using a compiler plugin for the macros and also extracting what's possible into a libdiagnostics crate.

Another thing I've pondered is that the actual Markdown descriptions for errors could perhaps live in separate files somewhere in the source tree rather than all in one file.

To add to the discussion on codes, what should also be considered is machine readability of errors, which may be one of the requirements to making rustc tooling-friendly.

@brson

This is a continuation of @brson's work from #12144. This implements the minimal scaffolding that allows mapping diagnostic messages to alpha-numeric codes, which could improve the searchability of errors. In addition, there's a new compiler option, `--explain {code}` which takes an error code and prints out a somewhat detailed explanation of the error. Example: ```rust fn f(x: Option<bool>) { match x { Some(true) | Some(false) => (), None => (), Some(true) => () } } ``` ```shell [~/rust]$ ./build/x86_64-apple-darwin/stage2/bin/rustc ./diagnostics.rs --crate-type dylib diagnostics.rs:5:3: 5:13 error: unreachable pattern [E0001] (pass `--explain E0001` to see a detailed explanation) diagnostics.rs:5 Some(true) => () ^~~~~~~~~~ error: aborting due to previous error [~/rust]$ ./build/x86_64-apple-darwin/stage2/bin/rustc --explain E0001 This error suggests that the expression arm corresponding to the noted pattern will never be reached as for all possible values of the expression being matched, one of the preceeding patterns will match. This means that perhaps some of the preceeding patterns are too general, this one is too specific or the ordering is incorrect. ``` I've refrained from migrating many errors to actually use the new macros as it can be done in an incremental fashion but if we're happy with the approach, it'd be good to do all of them sooner rather than later. Originally, I was going to make libdiagnostics a separate crate but that's posing some interesting challenges with semi-circular dependencies. In particular, librustc would have a plugin-phase dependency on libdiagnostics, which itself depends on librustc. Per my conversation with @alexcrichton, it seems like the snapshotting process would also have to change. So for now the relevant modules from libdiagnostics are included using `#[path = ...] mod`.

…ykril fix: Retrigger visibility completion after parentheses close rust-lang#12390 This PR add `(` to trigger_characters as discussed in original issue. Some questions: 1. Is lsp's `ctx.trigger_character` from `params.context` is the same as `ctx.original_token` inside actually completions? 1. If not what's the difference? 2. if they are the same, it's unnecessary to pass it down from handler at all. 3. if they are the same, maybe we could parse it from fixture directly instead of using the `check_with_trigger_character` I added. 2. Some completion fixtures written as `($0)` ( https://github.com/rust-lang/rust-analyzer/blob/master/crates/ide-completion/src/tests/fn_param.rs#L105 as an example), If I understand correctly they are not invoked outside tests at all? 1. using `ctx.original_token` directly would break these tests as well as parsing trigger_character from fixture for now. 2. I think it make sense to allow `(` triggering these cases? 3. I hope this line up with rust-lang#12144

Add . to end of lint lists in configuration + Fix typo in pub_underscore_fields_behavior Fixes rust-lang/rust-clippy#10283 (comment) In the "/// Lint: " list on each configuration option, you have to end with a dot. If the lint list doesn't have a dot, the configuration won't have documentation. This PR adds those missing dots in some of the configuration, thus also adding their documentation. changelog: Fix bug where a lot of config documentation wasn't showing. changelog: Fix typo in `pub_underscore_fields_behavior` (`PublicallyExported` -> `PubliclyExported`)

brson added 6 commits February 9, 2014 17:40

rustc: Add the __tt_map_insert and __tt_map_get_expr macros

b4f483d

Behind the `__tt_map` feature gate. This feature name starts with a double underscore to emphasize it's a hack, but there's no precedent for naming features this way.

rustc: Introduce a new diagnostic code registry

dda818b

This provides a way to create stable diagnostic codes, while keeping the overal burden on compiler writers low, and introducing a way to incrementally provide extended documentation of diagnostics.

Introduce more diagnostic infrastructure

24cc6bd

rustc: Add UI for extended diagnostics

d9c7a8c

When diagnostics with codes are emitted they appear in magenta brackets, like `[A0002]`. After failure a note is emitted indicating whether any errors had extended documentation and suggesting the `--explain help` flag.

Slightly improve the format of the diagnostic database.

febb7f8

brson mentioned this pull request Feb 10, 2014

RFC: Diagnostic registry prototype #11460

Closed

alexcrichton reviewed Feb 10, 2014
View reviewed changes

huonw reviewed Feb 10, 2014
View reviewed changes

Address review feedback

e610e6e

adrientetar mentioned this pull request Feb 17, 2014

Better error message about [T] as a bare type (and in general) #12347

Closed

alexcrichton closed this Apr 4, 2014

ghost mentioned this pull request Jul 2, 2014

Give all fatals, errors, and warnings unique diagnostic codes #15336

Merged

Give all fatals, errors, and warnings unique diagnostic codes #12144

Give all fatals, errors, and warnings unique diagnostic codes #12144

Uh oh!

Conversation

brson commented Feb 10, 2014

Weaknesses

Future

Uh oh!

brson commented Feb 10, 2014

Uh oh!

larsbergstrom commented Feb 10, 2014

Uh oh!

chris-morgan commented Feb 10, 2014

Uh oh!

alexcrichton Feb 10, 2014

Choose a reason for hiding this comment

Uh oh!

alexcrichton Feb 10, 2014

Choose a reason for hiding this comment

Uh oh!

alexcrichton Feb 10, 2014

Choose a reason for hiding this comment

Uh oh!

alexcrichton commented Feb 10, 2014

Uh oh!

huonw commented Feb 10, 2014

Uh oh!

huonw Feb 10, 2014

Choose a reason for hiding this comment

Uh oh!

brson Feb 12, 2014

Choose a reason for hiding this comment

Uh oh!

brson commented Feb 10, 2014

Uh oh!

larsbergstrom commented Feb 10, 2014

Uh oh!

pnkfelix commented Feb 12, 2014

Uh oh!

bill-myers commented Feb 17, 2014

Uh oh!

larsbergstrom commented Feb 17, 2014

Uh oh!

bill-myers commented Feb 17, 2014

Uh oh!

larsbergstrom commented Feb 17, 2014

Uh oh!

flaper87 commented Feb 17, 2014

Uh oh!

pnkfelix commented Feb 18, 2014

Uh oh!

flaper87 commented Feb 18, 2014

Uh oh!

alexcrichton commented Apr 4, 2014

Uh oh!

pnkfelix commented Apr 10, 2014

Uh oh!

ghost commented Jul 1, 2014

Uh oh!

Uh oh!