Integrate GHC's new diagnostic infrastructure

Hello folks,

This is a meta ticket, feel free to close it if you think this is not the appropriate place to discuss the subject.

_Summary: GHC changed how diagnostics are emitted, HLS can benefit from this, help needed to integrate the work and feedback welcome to improve GHC to best serve HLS._

Well-Typed has been working together with @goldfirere for the past few months on what used to be GHC proposal [306](https://github.com/ghc-proposals/ghc-proposals/pull/306), i.e. refactor GHC so that it would emit richer Haskell types instead of `SDoc`s for its diagnostic messages. Since the original GHC ticket (cfr.  [ghc#18516](https://gitlab.haskell.org/ghc/ghc/-/issues/18516)) and the inception of our work, we tried to do our best to keep in mind HLS and Haskell IDEs in general as the primary consumers/beneficiaries of this work. The [Wiki page](https://gitlab.haskell.org/ghc/ghc/-/wikis/Errors-as-(structured)-values) goes a bit deeper into the original design work, if people are interested.

I think we are now at a stage where it would be nice to start thinking how we can collectively integrate this into HLS and what we can learn from this exercise so that we can improve what we have already in GHC before 9.4 (that would be nice but not strictly required, I guess).

### 30 seconds overview

To give you a 30 seconds overview of GHC's `HEAD`, we now have a typeclass:

```hs
class Diagnostic a where
  diagnosticMessage :: a -> DecoratedSDoc
  diagnosticReason  :: a -> DiagnosticReason
  diagnosticHints   :: a -> [GhcHint]
```

And a bunch of types that implement instances for it, which forms a hierarchy:

```hs
data GhcMessage where
  -- | A message from the parsing phase.
  GhcPsMessage      :: PsMessage -> GhcMessage
  -- | A message from typecheck/renaming phase.
  GhcTcRnMessage    :: TcRnMessage -> GhcMessage
  -- | A message from the desugaring (HsToCore) phase.
  GhcDsMessage      :: DsMessage -> GhcMessage
  -- | A message from the driver.
  GhcDriverMessage  :: DriverMessage -> GhcMessage
  -- | An \"escape\" hatch which can be used when we don't know the source of
  -- the message or if the message is not one of the typed ones. 
  GhcUnknownMessage :: forall a. (Diagnostic a, Typeable a) => a -> GhcMessage
```

GHC now emits these types (each individually wrapped in a `MsgEnvelope` which gives you `Severity` and a `SrcSpan` amongst other things) which can be "queried" to get not just the pretty-printed message but also the reason why it arose (it will include a`WarningFlag` if this is a warning) and a list of `GhcHint`. Furthermore, both these types and `GhcHint` will include proper information about the diagnostic: for example, if this is a message coming from the type checker about unused pattern binds, GHC will emit a `TcRnUnusedPatternBinds :: HsBind GhcRn -> TcRnMessage` that will give the caller the actual binds, not just a list of `SDoc`. The `GhcHint` type follows a similar logic, for example GHC might emit `[SuggestExtension LangExt.BangPatterns]`. 

### Little disclaimer

Currently not all diagnostics have been ported within GHC (due to their sheer amount). For some of those, we still have the old `SDoc` wrapped into a constructor (specific to the type of message we want to emit), like, for example a `TcRnUnknownMessage :: SDoc -> TcRnMessage`. Similarly, not all diagnostics have a proper list of hints attached to them, for some not-yet-ported or not-fully-ported, we still have the hints bundled with the pretty-printed message (i.e. the `diagnosticMessage`). We are actively working on this.

### Where to go from here

The idea would be for HLS to use this information directly instead of parsing and using regexes on the raw output of the compiler. In a GHC meta-ticket about the "remaining work" (see "Little Disclaimer" section) @wz1000 and @bgamari [suggested](https://gitlab.haskell.org/ghc/ghc/-/issues/19905#note_363751) to look at the [error parsing logic](https://github.com/haskell/haskell-language-server/blob/master/ghcide/src/Development/IDE/Plugin/CodeAction.hs) in `ghcide` to see if we can get rid of all those "matchRegex" calls and just use the new infrastructure.

I wanted to have some fun taking a crack at it (as a proof-of-concept, mostly), but it revealed to be harder than I was expecting, mostly due to the perfect storm arising from the changes between GHC 9.2 and 9.3/HEAD (including Alan's work on exactprint) and the changes to `Data.List` and its import which broke virtually all the packages I have tried to build. Despite I have used the patched versions of @fendor I have found online, having a working version of `ghcide` built with `HEAD` was too hard and I have abandoned ship, but I am happy to be told I was doing it all wrong and there is a branch that magically compiles :)

Having said that, I've started taking a cursory look at the `CodeAction.hs` module, and the first thing that stood out was that currently `ghcide` is running regexes on the `_message` field on the `Diagnostic` type, which comes from `lsp`:

```hs
data Diagnostic =
  Diagnostic
    { _range              :: Range
    , _severity           :: Maybe DiagnosticSeverity
    , _code               :: Maybe (Int |? Text)
    , _source             :: Maybe DiagnosticSource
    , _message            :: Text
    , _tags               :: Maybe (List DiagnosticTag)
    , _relatedInformation :: Maybe (List DiagnosticRelatedInformation)
    } deriving (Show, Read, Eq, Ord, Generic)
```

Here `_message` is `Text` whereas we would need to be given something more structured. @wz1000 pointed out (in the same GHC 19905 issue) that we can [change this](https://gitlab.haskell.org/ghc/ghc/-/issues/19905#note_364312), but rather than keeping this conversation tucked into the GHC issue tracker, I wanted to share this on this repo to collect feedback and useful pointers on how we could push the work forward. Again, from a super quick look at `ghcide` codebase it seems like the `IdeResult` type could be changed to return not just a `Diagnostic` but a `GhcMessage`, which we would get (almost) from free from GHC's `tcRnModule` (in `HEAD`, that is):

```hs
tcRnModule :: HscEnv
           -> ModSummary
           -> Bool
           -> HsParsedModule
           -> IO (Messages TcRnMessage, Maybe TcGblEnv) -- We would need to `fmap` GhcTcRnMessage over this.
```

Therefore, concretely, I guess what I am asking is the following:

1. What would be the least painful way to try out whether GHC is solving HLS/ghcide's actual problems? Is
   there some magical fork targeting GHC `HEAD` somewhere?

2. What would be the required steps to feed `ghcide` with structured types instead of a textual `_message`?

3. Is there any other place apart form the `CodeAction.hs` we should focus or lens on to understand if the GHC work is going into the right direction?

Thanks in advance! 😉 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Integrate GHC's new diagnostic infrastructure #2014

30 seconds overview

Little disclaimer

Where to go from here

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Integrate GHC's new diagnostic infrastructure #2014

Description

30 seconds overview

Little disclaimer

Where to go from here

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions