Skip to content

refactor: add isActive to ErrorMessageID #14965

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 23, 2022

Conversation

ckipp01
Copy link
Member

@ckipp01 ckipp01 commented Apr 18, 2022

Currently there is no way to tell if a given ErrorMessageID is emitted
at all from the compiler. While working on an index of all the IDs I
keep needing to dig into the code to see if there is a message that uses
an ErrorMessageID and then see if that message is actually used
anywhere. From what I can gather so far, the ones that I've marked as
false are no longer emitted and therefore marked isActive = false.

refs: #14904

case TypeMismatchID // errorNumber: 7
case NotAMemberID // errorNumber: 8
case EarlyDefinitionsNotSupportedID // errorNumber: 9
case TopLevelImplicitClassID extends ErrorMessageID(isActive = false) // errorNumber: 10
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I know this is no longer an error, however there is a message that uses this and it's actually referenced in Desugar.scala

https://github.com/lampepfl/dotty/blob/53f5531e7e31eca02b3db158c8c384e713b4542a/compiler/src/dotty/tools/dotc/ast/Desugar.scala#L755

But I'm unable to ever get it to fall into that branch.

case DoesNotConformToSelfTypeID // errorNumber: 58
case DoesNotConformToSelfTypeCantBeInstantiatedID // errorNumber: 59
case AbstractMemberMayNotHaveModifierID // errorNumber: 60
case TopLevelCantBeImplicitID extends ErrorMessageID(isActive = false) // errorNumber: 61
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related to the other top level implicit comment this is also still referenced in one place, but I think it's dead code as I'm unable to reproduce it or get anything to touch that branch in

https://github.com/lampepfl/dotty/blob/53f5531e7e31eca02b3db158c8c384e713b4542a/compiler/src/dotty/tools/dotc/typer/Checking.scala#L477

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI there's no longer reference to this Error #14935

Comment on lines +91 to +92
case OnlyCaseClassOrCaseObjectAllowedID extends ErrorMessageID(isActive = false) // errorNumber: 79
case ExpectedTopLevelDefID extends ErrorMessageID(isActive = false) // errorNumber: 80
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ended up added this in only for the sake of records. I don't think we should rename these, but rather mark them as active = false if they are no longer being used. This way they can still be mentioned in the index with their number in case anyone with an old version of the compiler for some reason hits on E079 and wonders what that is.

@@ -1066,37 +1066,6 @@ import transform.SymUtils._
|"""
}

class DanglingThisInPath()(using Context) extends SyntaxMsg(DanglingThisInPathID) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This wasn't used anywhere.

Currently there is no way to tell if a given `ErrorMessageID` is emitted
at all from the compiler. While working on an index of all the IDs I
keep needing to dig into the code to see if there is a message that uses
an `ErrorMessageID` and then see if that message is actually used
anywhere. From what I can gather so far, the ones that I've marked as
`false` are no longer emitted and therefore marked `isActive = false`.

refs: scala#14904
@ckipp01 ckipp01 force-pushed the activeErrorMessageID branch from 17041a0 to 73e6351 Compare May 11, 2022 12:42
@ckipp01
Copy link
Member Author

ckipp01 commented May 16, 2022

Any input on this?

@nicolasstucki nicolasstucki self-assigned this May 23, 2022
@odersky
Copy link
Contributor

odersky commented May 23, 2022

I think if an error message id is no longer used, we should just replace it with something having UNUSED in the name. That's what is currently done. I.e. we have UNUSED1, UNUSED2 in the message list.

So I am not sure anything needs to be done here.

@ckipp01
Copy link
Member Author

ckipp01 commented May 23, 2022

I think if an error message id is no longer used, we should just replace it with something having UNUSED in the name. That's what is currently done. I.e. we have UNUSED1, UNUSED2 in the message list.

So I am not sure anything needs to be done here.

I think for record purposes, using things like UNUSED1 is a really bad idea. I ended up doing it this way since this is what other language like rust do in their index, and so that they can still be documented in our own index.

@nicolasstucki
Copy link
Contributor

Actually, using the Unused is a mistake because we cannot use those IDs for new errors. We also need to be able to know the name of an old error that does not exist anymore for documentation of deprecated errors.

Taking about this reminded me that we are missing a documentation page that lists all errors by ID and gives a small description of each error.

@ckipp01
Copy link
Member Author

ckipp01 commented May 23, 2022

Taking about this reminded me that we are missing a documentation page that lists all errors by ID and gives a small description of each error.

Correct, that was the main reason behind me working on this. I plan to port it to the documentation when it's complete, which I mentioned in the contributors forum and the referenced issue. However, there does seem to be some so far that I can't reproduce or that I think are actually referenced by dead code.

@nicolasstucki
Copy link
Contributor

nicolasstucki commented May 23, 2022

We should generate the documentation @ckipp01 collected in index in an official part of the Scala 3 documentation. This list should be up to date with all the errors listed in ErrorMessageID.scala. A good way to make sure that the documentation is complete would be to make the IDs hold the explanations of the error messages. Not sure if those would be the same explanations that we have in Message.scala, if it is the case it would be simpler to maintain.

Also see rust error index.

@nicolasstucki nicolasstucki merged commit b769da1 into scala:main May 23, 2022
@ckipp01 ckipp01 deleted the activeErrorMessageID branch May 23, 2022 08:26
@odersky
Copy link
Contributor

odersky commented May 23, 2022

Why can't we use unused IDs for new errors?

I have the impressions we are generating a huge complication for nothing. I resent adding boilerplate where it is pointless, and I think this is exactly one of these cases.

I am also against investing now in error message IDs or publishing them. That's a sure way for ossification!
Error messages have rooms for LOTS of improvements and this should have high priority. Making the error message infrastructure more bureaucratic and harder to change is precisely the way we will avoid improvements, since anyone who would actually have the expertise to improve things for real will not want to get involved since dealing with all that boilerplate is just too high friction.

Let's invest our energy to get really great error messages instead of adding to bureaucracy!

@odersky
Copy link
Contributor

odersky commented May 23, 2022

I think this should be reverted and discussed again. Even if we accept the premise of error message ids (which I don't) I believe this concrete implementation is not the way to do it. If you need unchanged, documented numbers for your error messages, you should not rely on comments! Much better to own up to it and write

  inline val PkgDuplicateSymbolID = 33

and so on.

@ckipp01
Copy link
Member Author

ckipp01 commented May 23, 2022

I have the impressions we are generating a huge complication for nothing. I resent adding boilerplate where it is pointless, and I think this is exactly one of these cases.

I am also against investing now in error message IDs or publishing them.

Even if we accept the premise of error message ids (which I don't)

Don't you think it would have been a great idea to voice any of this when I created the initial discussion around this in March or maybe in April when I created the overall plan here, or any time over the last month while this PR sat here? I specifically stated in #14904:

I've tried to over-communicate on this in multiple places to give ample opportunity for feedback, and I'm interpreting silence as support. So it'd be nice to have some explicit 🆗's from the core team before pushing this even further.

I even outlined the specific points about UNUSED and asked that it was ok to start back on this path. I must say, it's quite frustrating and discouraging to read your input now.

@nicolasstucki
Copy link
Contributor

Why can't we use unused IDs for new errors?

The issue with those ids that were called unused is that they were used, this clearly leads to confusion. The only IDs that are really unused are those that we have never used in the past, hence IDs are larger than the current largest. For simplicity, we kept the ID range compact, only allowing the addition of the next consecutive ID.

I am also against investing now in error message IDs or publishing them.

It is also worth mentioning that these are user-facing IDs that the compiler emits with the message. These must be unambiguously googlable. This is why the IDs of older messages that are not emitted anymore cannot be reused.

By not publishing them ourselves, we are now in the nonideal situation where this user documentation is not centralized and up to date. Part of it is now in https://github.com/ckipp01/dotty-error-index and probably more in StackOverflow. If we published it ourselves we would be able to have a consistent and complete description of the errors.

inline val PkgDuplicateSymbolID = 33

This was the initial design we used. It failed miserably due to its un-maintainability: many contributors renumbered the IDs without us noticing the problem, there were many occurrences where we go duplicated IDs due to concurrent merges, and some IDs were removed without us noticing the problem. This pushed us to use a resident enumeration-based approach where we only append new elements. Since we added this 5 years ago, we have not seen any of these mistakes happen again, the closest we got was almost reusing the mislabeled UNUSED ID.

@odersky
Copy link
Contributor

odersky commented May 23, 2022

@ckipp01. I'm sorry. I was missing the larger context and I wrote too quickly. I was a bit irked of this being merged over my objections.

If it unblocks you I am OK with the proposed changed of exposing the ID. It's just as a caveat that our error messages are nowhere near a state where we want to keep them that way! So I believe effort needs to be spent on working on the content and structure of these messages, and conversely, we should make it as easy as possible to change and refactor messages in the future.

Regarding the PR, I still think it would be better to write

val PkgDuplicateSymbolID = 33

instead of

case PkgDuplicateSymbolID // errorNumber = 33

If we need to keep track of inactive messages we can use a set:

object ErrorMessageID:
  val inactive = Set(...)

@odersky
Copy link
Contributor

odersky commented May 23, 2022

This was the initial design we used. It failed miserably due to its un-maintainability: many contributors renumbered the IDs without us noticing the problem, there were many occurrences where we go duplicated IDs due to concurrent merges, and some IDs were removed without us noticing the problem. This pushed us to use a resident enumeration-based approach where we only append new elements. Since we added this 5 years ago, we have not seen any of these mistakes happen again, the closest we got was almost reusing the mislabeled UNUSED ID.

I think this was to a large degree a lack of communication what the rules about error message ids were. The enumeration-based designs also has a number of implicit rules, but now contributors know them better, it seems.

How important are the precise IDs? It seems if they are important we need to make them explicit. Can't we write a comment: "only add to the end" or something like that in the header?

I mean the only gain of using an enumeration rather than explicit integer vals is that I can add new values in the middle without changing the right hand sides of subsequent definitions. But that's precisely what is forbidden for error message ids. So, why use an enumeration at all?

@odersky
Copy link
Contributor

odersky commented May 23, 2022

If precise IDs are important and we want to guard against errors in modifications I would recommend a design like this:

enum ErrorMessageID(id: Int, active = true):
  assert(id == ordinal)
  case Msg1 extends ErrorMessageID(0)
  case Msg2 extends ErrorMessageID(1)
  case Msg3 extends ErrorMessageID(2, active = false)
  ...

That way we make sure that ids correspond to ordinals and are not duplicated. WDYT?

@ckipp01
Copy link
Member Author

ckipp01 commented May 23, 2022

That way we make sure that ids correspond to ordinals and are not duplicated. WDYT?

I think what you have outlined would be a nice minimal way to accomplish what we want, yes. As yea, precise IDs here I do think are important. With that assert we'd also have to probably just throw out LazyErrorId and NoExplanationID as they are negative now or add them to the end with active = false. I'm assuming those are negative since 5 years ago when the enum was created they weren't being used. Getting rid of those (or adding them to the end) would actually be nice because then we could also get rid of the extra logic of having to always make sure you subtract 2 from the ordinal to actually get the errorNumber. If that is fine, would you rather just remove them or move them to the end? I can then go ahead and make that change.

@nicolasstucki
Copy link
Contributor

How important are the precise IDs?

Users will see something like -- [E003] Syntax Deprecation Warning: ... in their error or warnings. E003 is the ID that they will google for if they cannot figure out how to fix the issue. This ID should not change in a new version to avoid contradictory information in the web about a specific error.

It seems if they are important we need to make them explicit. Can't we write a comment: "only add to the end" or something like that in the header?

We have this comment ErrorMessageID.scala#L8 in the header. Did you miss it? Should we move it to a better place?

So, why use an enumeration at all?

Apart from reducing the risk of adding the wrong IDs, I remember we wanted an enumeration to be able to enumerate them. Probably to be able to make the documentation tool.

@odersky
Copy link
Contributor

odersky commented May 23, 2022

I think LazyErrorID is not used, and in that case we can delete it. NoExplanationID is used. It can either stay what it is and we change the assert to

assert(id == ordinal + 1)

or we move it to the end. Both would work IMO.

@nicolasstucki I think we are converging on a design that uses an enumeration without the downsides.

@nicolasstucki
Copy link
Contributor

@nicolasstucki I think we are converging on a design that uses an enumeration without the downsides.

We should also see if we can make it a proper Scala enumeration. We did not do it at the time because we had not implemented them yet.

@nicolasstucki
Copy link
Contributor

Opened #15271 to remove LazyErrorID

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants