Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate GHC's new diagnostic infrastructure #2014

Open
adinapoli opened this issue Jul 13, 2021 · 12 comments
Open

Integrate GHC's new diagnostic infrastructure #2014

adinapoli opened this issue Jul 13, 2021 · 12 comments
Labels
component: ghcide old_type: meta Planing and organizing other issues type: enhancement New feature or request

Comments

@adinapoli
Copy link
Contributor

Hello folks,

This is a meta ticket, feel free to close it if you think this is not the appropriate place to discuss the subject.

Summary: GHC changed how diagnostics are emitted, HLS can benefit from this, help needed to integrate the work and feedback welcome to improve GHC to best serve HLS.

Well-Typed has been working together with @goldfirere for the past few months on what used to be GHC proposal 306, i.e. refactor GHC so that it would emit richer Haskell types instead of SDocs for its diagnostic messages. Since the original GHC ticket (cfr. ghc#18516) and the inception of our work, we tried to do our best to keep in mind HLS and Haskell IDEs in general as the primary consumers/beneficiaries of this work. The Wiki page goes a bit deeper into the original design work, if people are interested.

I think we are now at a stage where it would be nice to start thinking how we can collectively integrate this into HLS and what we can learn from this exercise so that we can improve what we have already in GHC before 9.4 (that would be nice but not strictly required, I guess).

30 seconds overview

To give you a 30 seconds overview of GHC's HEAD, we now have a typeclass:

class Diagnostic a where
  diagnosticMessage :: a -> DecoratedSDoc
  diagnosticReason  :: a -> DiagnosticReason
  diagnosticHints   :: a -> [GhcHint]

And a bunch of types that implement instances for it, which forms a hierarchy:

data GhcMessage where
  -- | A message from the parsing phase.
  GhcPsMessage      :: PsMessage -> GhcMessage
  -- | A message from typecheck/renaming phase.
  GhcTcRnMessage    :: TcRnMessage -> GhcMessage
  -- | A message from the desugaring (HsToCore) phase.
  GhcDsMessage      :: DsMessage -> GhcMessage
  -- | A message from the driver.
  GhcDriverMessage  :: DriverMessage -> GhcMessage
  -- | An \"escape\" hatch which can be used when we don't know the source of
  -- the message or if the message is not one of the typed ones. 
  GhcUnknownMessage :: forall a. (Diagnostic a, Typeable a) => a -> GhcMessage

GHC now emits these types (each individually wrapped in a MsgEnvelope which gives you Severity and a SrcSpan amongst other things) which can be "queried" to get not just the pretty-printed message but also the reason why it arose (it will include aWarningFlag if this is a warning) and a list of GhcHint. Furthermore, both these types and GhcHint will include proper information about the diagnostic: for example, if this is a message coming from the type checker about unused pattern binds, GHC will emit a TcRnUnusedPatternBinds :: HsBind GhcRn -> TcRnMessage that will give the caller the actual binds, not just a list of SDoc. The GhcHint type follows a similar logic, for example GHC might emit [SuggestExtension LangExt.BangPatterns].

Little disclaimer

Currently not all diagnostics have been ported within GHC (due to their sheer amount). For some of those, we still have the old SDoc wrapped into a constructor (specific to the type of message we want to emit), like, for example a TcRnUnknownMessage :: SDoc -> TcRnMessage. Similarly, not all diagnostics have a proper list of hints attached to them, for some not-yet-ported or not-fully-ported, we still have the hints bundled with the pretty-printed message (i.e. the diagnosticMessage). We are actively working on this.

Where to go from here

The idea would be for HLS to use this information directly instead of parsing and using regexes on the raw output of the compiler. In a GHC meta-ticket about the "remaining work" (see "Little Disclaimer" section) @wz1000 and @bgamari suggested to look at the error parsing logic in ghcide to see if we can get rid of all those "matchRegex" calls and just use the new infrastructure.

I wanted to have some fun taking a crack at it (as a proof-of-concept, mostly), but it revealed to be harder than I was expecting, mostly due to the perfect storm arising from the changes between GHC 9.2 and 9.3/HEAD (including Alan's work on exactprint) and the changes to Data.List and its import which broke virtually all the packages I have tried to build. Despite I have used the patched versions of @fendor I have found online, having a working version of ghcide built with HEAD was too hard and I have abandoned ship, but I am happy to be told I was doing it all wrong and there is a branch that magically compiles :)

Having said that, I've started taking a cursory look at the CodeAction.hs module, and the first thing that stood out was that currently ghcide is running regexes on the _message field on the Diagnostic type, which comes from lsp:

data Diagnostic =
  Diagnostic
    { _range              :: Range
    , _severity           :: Maybe DiagnosticSeverity
    , _code               :: Maybe (Int |? Text)
    , _source             :: Maybe DiagnosticSource
    , _message            :: Text
    , _tags               :: Maybe (List DiagnosticTag)
    , _relatedInformation :: Maybe (List DiagnosticRelatedInformation)
    } deriving (Show, Read, Eq, Ord, Generic)

Here _message is Text whereas we would need to be given something more structured. @wz1000 pointed out (in the same GHC 19905 issue) that we can change this, but rather than keeping this conversation tucked into the GHC issue tracker, I wanted to share this on this repo to collect feedback and useful pointers on how we could push the work forward. Again, from a super quick look at ghcide codebase it seems like the IdeResult type could be changed to return not just a Diagnostic but a GhcMessage, which we would get (almost) from free from GHC's tcRnModule (in HEAD, that is):

tcRnModule :: HscEnv
           -> ModSummary
           -> Bool
           -> HsParsedModule
           -> IO (Messages TcRnMessage, Maybe TcGblEnv) -- We would need to `fmap` GhcTcRnMessage over this.

Therefore, concretely, I guess what I am asking is the following:

  1. What would be the least painful way to try out whether GHC is solving HLS/ghcide's actual problems? Is
    there some magical fork targeting GHC HEAD somewhere?

  2. What would be the required steps to feed ghcide with structured types instead of a textual _message?

  3. Is there any other place apart form the CodeAction.hs we should focus or lens on to understand if the GHC work is going into the right direction?

Thanks in advance! 😉

@jneira jneira added component: ghcide type: enhancement New feature or request old_type: meta Planing and organizing other issues status: needs attention labels Jul 13, 2021
@pepeiborra
Copy link
Collaborator

  1. There isn't any fork of HLS for working with GHC head, but @fendor is working towards 9.2 compatibility. In case it's useful to you or @fendor a while ago I put together a cabal.project file with allow-newer stanzas for 9.2 here
  2. Upgrading ghcide to work with structural diagnostics would be quite complex. I think it would be something like this:
    a. Parameterize the IdeResult type by diagnostic type, instead of the current fixed FileDiagnostic.
    b. Similarly, parameterise the DiagnosticStore that lives in ShakeExtras by the diagnostics type
    c. as @wz1000 said: use ranges to drive code actions instead of the textual _message
  3. I believe Wingman also parses diagnostics to discover holes, /cc @isovector.

A simpler path for 2 is possible if we assume that tcRnModule is the only function that produces structured diagnostics. In that case you could store the diagnostics in the RuleResult Typecheck type, and retrieve them in the code action handler itself.

@adinapoli
Copy link
Contributor Author

Hello @pepeiborra , thanks for chiming in!

There isn't any fork of HLS for working with GHC head, but @fendor is working towards 9.2 compatibility. In case it's useful to you or @fendor a while ago I put together a cabal.project file with allow-newer stanzas for 9.2 here

Thanks, I'll take a look, although I think I came across something similar (if not the same cabal.project) during my initial scouting. However, lots has changed in GHC between 9.2 and HEAD.I go by memory and I might be wrong, but I think the biggest difference was merging the exactprint work that Alan did, which means that the exactprint library itself is now broken (I had to temporarily jerry-rig it by commenting out the bits which didn't compile, essentially 😅 ) and event @wz1000 's HieDB will need further refactoring as GHC internals also changed. To name one thing, the NameCacheUpdater is gone, and lots of functions were renamed/modified. I guess the TL;DR is that my initial impression is right, no magic fork alas, but hopefully we can get there at some point in the (nearish) future.

Upgrading ghcide to work with structural diagnostics would be quite complex. I think it would be something like this:

Yes, that matches roughly my superficial understanding as well!

I believe Wingman also parses diagnostics to discover holes, /cc @isovector.

In theory, and this applies to 3 as much as it does to 2, we could explore some hybrid solutions where the porting towards the new GHC API happens gradually, where initially we patch, say, only ghcide to use the structured diagnostics but we still rely on the parsing of the raw GHC output for the other components. To say it differently, ghcide could be the "pilot" component where we test out the ergonomics of the new GHC API, we learn some lessons and apply them back to GHC and, once the dust settled, we could port the rest.

A simpler path for 2 is possible if we assume that tcRnModule is the only function that produces structured diagnostics.

Hehe no, structured diagnostics are now pervasive. You will get them even if GHC throws an exception (like it likes to do in parseModule, for example). If you call handleSourceError, the handler would give you a SourceError which is now just a newtype wrapper around Messages GhcMessage.

@kk-hainq
Copy link

Our team would love to work on this!
Perhaps from a GHC HEAD staging branch for HLS?
We can start as soon as one of the maintainers is happy with the direction.

@pepeiborra
Copy link
Collaborator

I am happy to provide feedback and code reviews.

Preferably work on HEAD as much as possible, a branch should be only a last resort. @fendor is already working on 9.2 compatibility

@kk-hainq
Copy link

I am happy to provide feedback and code reviews.

Preferably work on HEAD as much as possible, a branch should be only a last resort. @fendor is already working on 9.2 compatibility

Sorry, that was a mouthful expression. I was asking if I should create a new HLS branch to stage the latest GHC HEAD support into HLS. This branch should still rebase on HLS's master.

@pepeiborra
Copy link
Collaborator

I am happy to provide feedback and code reviews.
Preferably work on HEAD as much as possible, a branch should be only a last resort. @fendor is already working on 9.2 compatibility

Sorry, that was a mouthful expression. I was asking if I should create a new HLS branch to stage the latest GHC HEAD support into HLS. This branch should still rebase on HLS's master.

Sorry I wasn't clear - my recommendation is to use compatibility modules (Development.IDE.GHC.Compat) or packages (ghc-api-compat) to conduct as much work as possible on the HEAD branch, avoiding a long-lived parallel branch as much as possible.

@kk-hainq
Copy link

kk-hainq commented Aug 14, 2021

Sorry I wasn't clear - my recommendation is to use compatibility modules (Development.IDE.GHC.Compat) or packages (ghc-api-compat) to conduct as much work as possible on the HEAD branch, avoiding a long-lived parallel branch as much as possible.

Oh, I see, that was my plan too. To target only affected packages, piece by piece!

@isovector
Copy link
Collaborator

I believe Wingman also parses diagnostics to discover holes, /cc @isovector.

Nah, I took the bold route and am SYBing the AST to find holes. #yolo

@goldfirere
Copy link

@kk-hainq Thanks so much for stepping up here! If there's anything on the GHC end that would be helpful to you, please ask. (I sometimes drown in GitHub etc notifications, so if need be, just send a direct email.)

@kk-hainq
Copy link

@kk-hainq Thanks so much for stepping up here! If there's anything on the GHC end that would be helpful to you, please ask. (I sometimes drown in GitHub etc notifications, so if need be, just send a direct email.)

Thanks! We've just booked https://gitlab.haskell.org/ghc/ghc/-/issues/20118 too. This whole error-as-structured-type project is awesome. Hope we can help complete it all the way from GHC to HLS, HLint, etc.

@pepeiborra
Copy link
Collaborator

What is the status of this? Would it be suitable for a GSoC project? /cc @jneira @jaspervdj

@dylan-thinnes
Copy link

Working on this for ZuriHac

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component: ghcide old_type: meta Planing and organizing other issues type: enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

8 participants