Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enabling loose unmarshaling of requests #31

Closed
radeksimko opened this issue Nov 17, 2020 · 11 comments · Fixed by #32
Closed

Enabling loose unmarshaling of requests #31

radeksimko opened this issue Nov 17, 2020 · 11 comments · Fixed by #32
Assignees
Labels
enhancement New feature or request

Comments

@radeksimko
Copy link
Contributor

radeksimko commented Nov 17, 2020

I just wanted to reopen the discussion originally started in #5 as it is still a source of friction for us in hashicorp/terraform-ls and the only reason we keep using a fork of this library of go-lsp - it worked mostly well for us otherwise! 👍

Here is what happened since closure of that PR:

  • I tried generating custom unmarshalers for go-lsp Generate missing custom unmarshalers sourcegraph/go-lsp#8
  • That work made me realize this approach doesn't work very well with embedded structs (parent structs inherit unmarshalers of the embedded ones), so this would require some more significant overhaul of the library to avoid embedding or otherwise work around that
  • Maintainer of go-lsp suggested that it would be great for gopls folks to expose their structs (currently at internal/lsp/protocol)
  • (via Gopher Slack) gopls maintainers expressed their hesitation to externalizing internal/lsp/protocol at this point due to their experience with the generator based on TypeScript implementation of the spec. Breaking changes are very common there, unfortunately.

All of the above, but most importantly the conversation with gopls folks is what made me realize that LSP really needs to be treated as a moving target and something that is constantly changing. If this library intends to support LSP as one of the common use cases, then I think it should reflect this fact and strict unmarshalling perhaps should not be the default behaviour.

Based on the conversations in #5 I understand it's not easy to solve this problem in a backwards-compatible way, but I hope the context above helps in understanding why it should be solved.

@creachadair creachadair self-assigned this Nov 17, 2020
@creachadair creachadair added the enhancement New feature or request label Nov 17, 2020
@creachadair
Copy link
Owner

Your reasoning makes sense. I had another idea as to how we might tackle this problem, and would be interested in your thoughts.

As I recall, the crux of the problem previously (cf. #5) was how to plumb strictness information in through the handler package, which I suppose you are probably using. Obviously one can bypass that using a custom UnmarshalJSON method as you mentioned (the jrpc2.NonStrict helper is meant to make this a little easier to write). But it sounds like that approach is not serving you well.

It occurred to me that we could introduce a new interface to control strictness, without requiring a fully-custom decoder. For example, suppose we define:

package jrpc2

// A Stricter reports whether its receiver should be unmarshaled strictly (disallowing unknown fields).
// N.B. Name is provisional.
type Stricter interface {
  Strict() bool
}

and then modify the decoding logic to check for this, e.g.,

if s, ok := v.(Stricter); ok && s.Strict() {
        dec.DisallowUnknownFields()
}

With this formulation, a type that does not implement the interface gets default behaviour. We can discuss which the default should be, though I suspect you'd prefer non-strict as the default, which is why I wrote the example that way. In this construction, a caller that wants strict unmarshaling can simply define:

type T struct { /* … */ }

func (T) Strict() bool { return true }

Regardless of which direction is the default, the nice thing about this formulation is that it addresses the issue you raised with nested types: If the top-level type is non-strict, the whole tree will be; and vice versa.

@creachadair
Copy link
Owner

Omitted from my previous comment are various bikeshed issues, including:

  1. Maybe the interface method can be completely void; its presence alone could be enough.
  2. Maybe there should be one interface for each direction, so the default can change without breaking stuff.
  3. Names and defaults should be settled.

creachadair added a commit that referenced this issue Nov 17, 2020
A possible solution for #31.
@radeksimko
Copy link
Contributor Author

radeksimko commented Nov 18, 2020

As I recall, the crux of the problem previously (cf. #5) was how to plumb strictness information in through the handler package

Yes, that matches my recollection.

I think the interface approach would make it possible to avoid the problem with embedded structs, but I would still wish for non-strict (loose) mode to become the default.

If at some point in the future there is a package similar to go-lsp which becomes "official" (either maintained by MSFT or by Gopls folks or by anyone else), then I reckon it would need to have Strict() method on every struct returning false (or NotStrict() depending on whether we go with (2) from your comment). This isn't too hard to achieve, especially if the rest of the code is already generated. The question is however if there would be interest in adding something like that to account for what is perceived as "default behavior" from LSP's perspective. I just think it would be great if it worked out of the box.

I think the question of default boils down to:

  • Is LSP common use case for consumers of this library?
  • Would it hurt the rest of the existing/new consumers if this wasn't strict?

I am obviously biased as someone using it just in the context of LSP, but you may be able to answer these more objectively.

@creachadair
Copy link
Owner

I think the interface approach would make it possible to avoid the problem with embedded structs, but I would still wish for non-strict (loose) mode to become the default.

I don't have any strong objections to "loose mode" being the default. It's a breaking API change, but not a very difficult one, and "loose" is the default for existing code that uses json.Unmarshal anyway (which, similarly, has no switch).

I am less clear on the case for making a "flippable" default. The specification doesn't stake a clear position on the handling of unknown fields, but there are some hints that tolerance is presumed (e.g., "Servers receiving a ClientCapabilities object literal with unknown properties should ignore these properties").

  • Is LSP common use case for consumers of this library?
  • Would it hurt the rest of the existing/new consumers if this wasn't strict?

You folks are the only other consumers of the package who have made themselves known to me, so from my perspective I'm the other main customer of this library. LSP is the only protocol I've seen in widespread use that is based on JSON-RPC, so I consider it an important use case.

I do not think it would cause any great harm for the default to be loose. I foresee there are probably two mental models that a developer might take:

  1. "This is JSON, in which anything goes, like json.Unmarshal".
  2. "This is RPC, in which the parameters should match the schema, and I shouldn't have to check manually."

I lean toward (2), but am sympathetic to (1).

@creachadair
Copy link
Owner

I created #32 as a possible solution for this issue, I welcome your comments there as well as here.

@radeksimko
Copy link
Contributor Author

The specification doesn't stake a clear position on the handling of unknown fields, but there are some hints that tolerance is presumed (e.g., "Servers receiving a ClientCapabilities object literal with unknown properties should ignore these properties").

AFAIK most implementations which have reached a certain level of maturity ignore unknown fields, just because the spec changes and new fields are added and the protocol does not have any version negotiation capabilities, so you have to expect that the client or server can speak any version of the protocol and you don't have any way of knowing what version it is.

I'm guessing one reason this might not seem like a topic for LSP maintainers is because the spec and the canonical implementation is written in TypeScript, where you'd probably more often just decode the data into an arbitrary object (similar to decoding to map[string]interface{} in Go) and attempt to access whatever is available? I admit I'm not a TypeScript developer, so that's just a guess. 🤷‍♂️

However I will raise this on the LSP GitHub repo to see if this can be documented more explicitly. I doubt they will say strict is preferred as that's practically impossible given the context I explained above.

@creachadair
Copy link
Owner

AFAIK most implementations which have reached a certain level of maturity ignore unknown fields, just because the spec changes and new fields are added and the protocol does not have any version negotiation capabilities, so you have to expect that the client or server can speak any version of the protocol and you don't have any way of knowing what version it is.

Yes, that seems consistent with what I've seen. So I think we're in agreement, that ignoring unknown fields makes sense as a default for LSP for the foreseeable future.

However I will raise this on the LSP GitHub repo to see if this can be documented more explicitly. I doubt they will say strict is preferred as that's practically impossible given the context I explained above.

I agree, that seems unlikely.

@radeksimko
Copy link
Contributor Author

As promised I opened an issue about this upstream: microsoft/language-server-protocol#1144

creachadair added a commit that referenced this issue Nov 18, 2020
Commit 78545e2 disallowed unknown struct fields when unmarshaling
parameters. The check could be bypassed by implementing the json.Unmarshaler
interface, but the need to do so causes friction for implementations of LSP,
which has a loose and rapidly-changing schema (see #5, #31).

This changes the default back to ignoring unknown fields, and adds a new
optional interface to re-enable strict checking without a custom unmarshaler.

N.B.: This is a breaking API change.

Relevant changes:

- Update UnmarshalParams and UnmarshalResult to check for the target having a
  DisallowUnknownFields method, and to enable the check for such targets.

- Rework the decoding to make the default path do less allocation.

- Update documentation and tests.
@creachadair
Copy link
Owner

I merged #32 and have tagged it as v0.11.0. Please let me know if it addresses your concerns!

@radeksimko
Copy link
Contributor Author

Just to follow up on this microsoft/language-server-protocol#1144 (comment)

I still believe we made the right pragmatic decision here. It's possible that a feature-based "stricter" unmarshaling can be implemented, but that would require somehow injecting the context with enabled/disabled features into custom unmarshalers, which I think would be very difficult with the current model with reflection where data is generally unmarshaled out of context. We'd probably have to avoid all the reflection and explicitly unmarshal each request inside the handler dynamically, which also means we would loose most of the useful compile-time checks.

Also even if we went down that route it would require modelling all these relationships in an LSP library somehow, presumably by hand, because I don't see how this could be expressed in the spec in a machine-readable way.

@creachadair
Copy link
Owner

I agree with your synopsis. The capability system advertises what fields are available, but I did not see any language that constrains what fields are sent. The capabilities allow the client or server to choose not to send certain fields, but the expectation of dynamism is baked fairly deeply into the protocol. That isn't too surprising—the protocol wasn't designed in a vacuum, but fell out of a JS/TS implementation that was generalized post hoc.

In that ecosystem, breaking API changes are commonplace and expected, and are worked around with dynamic poly-fills and other head-patching mechanisms. As you pointed out, that is harder to adapt to a statically-typed language without doing some heavyweight decoding shenanigans, or just decoding everything into unconstrained maps and validating separately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants