Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple sources of LSP messages on frontend and backend #184

Open
bollwyvl opened this issue Jan 28, 2020 · 9 comments
Open

Multiple sources of LSP messages on frontend and backend #184

bollwyvl opened this issue Jan 28, 2020 · 9 comments
Labels

Comments

@bollwyvl
Copy link
Collaborator

Elevator Pitch

Enable things other chains than...

Feature ⇄ LSPConnection ⇄ WebSocket ⇄ Handler ⇄ Session ⇄ Stdio ⇄ Language Server 

...to complete LSP requests.

Motivation

We've done a lot of work to make the first roundtrip work. However, there are a number of sources of messages that would provide more flexibility and robustness, and in some cases end-user simplicity.

Design Ideas

the frontend language server

Feature ⇄ FrontendLSPConnection
  • what: many of the node-based language servers can actually be webpacked (with appropriate shims)
  • why: would let us minimize the install pain for a large swath of servers/users
  • how: create a FrontendLSPConnection extends ILSPConnection
    • each server would then be a packages/jupyterlab-language-server-xyz, with enough boilerplate to wrap/patch its guts (e.g. fs over contents REST API)

the tcp language server

Feature ⇄ LSPConnection ⇄ WebSocket ⇄ Handler ⇄ Session ⇄ Tcp ⇄ Language Server 
  • what: in playing with live development of a language server in-loop with pygls, i really wanted to just pop open a tcp port and and add it to my language servers... but we only support stdio
  • why: some things are better over tcp. will probably need this eventually anyway for sourcegraph, docker-hosted things, etc.
  • how: create a TcpReader and Writer, add mode to the spec, allow a {{ port }} pass-through in the args

the kernel language server

Feature ⇄ KernelLSPConnection ⇄ WebSocket ⇄ Kernel 
  • what: similar to the above, and as demonstrated over on lintotype, we can define a comm and let kernels stuff LSP into it
  • why: the kernels know a lot about... things. like crazy % syntax, and where real files are (except for the notebook you're working on 🤣 )
  • how: add a KernelLSPConnection implements ILSPConnection

the static language server

Feature ⇄ LSIFConnection ← DocRegistry ← REST ← ContentsManager
  • what: LSIF defines a spec for a JSON-based dump of a language server in a workspace
  • why: something like jyve or a lab-forward nbconvert would let you keep all the LSP features (hover, symbols, etc) in a static published setting
  • how: add an LSIFConnection implements ILSPConnection. add a button for "save workspace", and be able to open an LSIF as a new language server
@bollwyvl
Copy link
Collaborator Author

Another thing here: a motivating case for this would be including (at least one) webpack-compiled language server by default for the language that we all have to deal with: markdown. It should be wired up for using that for markdown cells... and perhaps optionally in code block comments (though python folk are still rewarded for rst, but that looks like a bear to package).

Markdown will really stress the nested language (#191) work, as well... for example, de-facto Jupyter markdown includes support for LaTeX math. It would be interesting to see if there is a reasonable-to-embed language server for that sub-dialect... again, many of the options look hard to package, but perhaps one could theoretically compile digestif and lua to WASM (buh).

@The3DWizard
Copy link

tcp language server: I tried to get jupyterlab-lsp running with the wolfram language server from @kenkangxgwe. Right now this wolfram language server only supports tcp connections and no stdio connection, although the author seems to be trying to support stdio connections in future releases. So +1 for the tcp language server support.

@bollwyvl
Copy link
Collaborator Author

bollwyvl commented May 8, 2020 via email

@FlyingSamson
Copy link

FlyingSamson commented Feb 11, 2021

Thanks for the heads-up on that. The big thing I want tcp for is being able to launch a pyls (kind of a DIY language server too kit) inside a running ipython kernel to interactively develop new language servers. It's really not that big of a change: we basically need to: - subclass the Reader and Writer classes - pick the appropriate client (presumably there's already one) - handle any asynchrony - add a config flag - add some tests - release! I don't imagine we'll test with the Wolfram one, but a number of open source servers support tcp, for sure. If we did get it working, we could at least add the detection logic for it!

Hey there, I am currently looking into this (for the Wolfram LS actually). Might even be as easy as using socket.makefile() and reusing the existing stdio reader and writer.

However, could you be a little bit more specific about this part:

allow a {{ port }} pass-through in the args

I already added the mode=stdio/tcp flag in the spec, but I'm not sure what would be the correct way of specifying the port (and potentially the host). I mean for instance the Wolfram LS requires the argument -tcp=, so one could pick any free port for this and hard-code it into the args in the specs file. I presume you mean something like a placeholder which is then set by the extension during process execution/initialization? So for instance we would reserve the tag {{port}} and in the LanguageServerSession.init_process() we would use re.sub() to replace it with the appropriate port?

Of course we also need to pass the port into the extension such that it can open the TCP connection when creating the reader and writer (and potentially fill in the placeholder in the args) and this is were I'm stuck. Where would I put this and how would I access it? Also in the specs? Or should there be some magic which will search for a free port and use that one instead? What if the LS does not allow changing the port (don't know if that would be the case for any LS)?

@bollwyvl
Copy link
Collaborator Author

For my specific use case of interactively developing a language server locally

I'd want this in-loop with pygls on ipython. The process would already be running, and would already have selected a port in my spec. This is fine, but probably only a very specific testing case would have such a hard-coded port, or the port would be selected and hard-coded at test start time.

For the case of a local TCP LS which has its entire lifecycle managed by the server extension

  • if it does have a hard-coded port, you'd just put that in argv and pray... but we're unlikely to autodetect or test anything that works like that
  • if it can accept a port (either as an arg or environment variable), it would need a placeholder
    • picking a random open port is not a huge deal (we already have it in our acceptance tests someplace) and is more reliable than hardcoding something

For a remote connection... well, that's a whole other kettle of wax. I don't know how you'd get away with anything other than a hard-coded host/port.

mode=stdio/tcp flag in the spec

as in: changed the schema? we'd need to be very careful here, as that's the contract between the frontend and backend. Further, it should be possible to add this to the v2 schema, as it's only adding some properties, so they can't be required. I'm imagining it would be something like:

properties:
  mode:
    description: the connection technique. if absent, assumes stdio
    type: string
    enum:
      - stdio
      - tcp
  port:
    description: the port for tcp mode connections. a null value will select a random, unused port
    default: null
    oneOf:
      - type: null
      - type: number
        format: integer
  host:
    description: the host for tcp mode connections. a null value will assume 127.0.0.1
    oneOf:
      - type: null
      - type: string

in some other places in the stack, unix sockets are also supported, but i don't know if any language servers support that.

socket.makefile

Perhaps be wary going down this road, as the docs say The socket must be in blocking mode. Figuring out how to do it with a non-blocking socket would be the high road, and more likely to work on more platforms. the anyio upstream, now a dependency of jupyter_server would probably be the right place to look. As long as we're in-loop with the rest of the server, and the socket might take... basically any time at all, the very real danger exists of blocking everything (kernels, contents, etc) which is a very unsatisfying experience.

re.sub() to replace it with the appropriate port?

Probably use either .format, or jinja2 (looks like what i was thinking), since it's lying around. But yes, an individual shell token would have some content replaced. For prior art, see jupyter-server-proxy, which we used to rely on... but it has some other characteristics we didn't want.

@FlyingSamson
Copy link

FlyingSamson commented Feb 12, 2021

Thank you very much for your pointers, they are greatly appreciated!

as in: changed the schema? we'd need to be very careful here, as that's the contract between the frontend and backend. Further, it should be possible to add this to the v2 schema, as it's only adding some properties, so they can't be required. I'm imagining it would be something like:

Sorry I don't really understand what schema v2 vs v1 is. Currently I added

"mode": {
          "description": "connection mode used, e.g. stdio, tcp",
          "title": "Mode",
          "type": "string"
        },

inside partial-language-server-specproperties within `python_packages/jupyter_lsp/jupyter_lsp/schema/schema.json" on the master branch. Should I use a different branch as base, containing such an .yaml based schema file?

Perhaps be wary going down this road, as the docs say The socket must be in blocking mode. Figuring out how to do it with a non-blocking socket would be the high road, and more likely to work on more platforms. the anyio upstream, now a dependency of jupyter_server would probably be the right place to look. As long as we're in-loop with the rest of the server, and the socket might take... basically any time at all, the very real danger exists of blocking everything (kernels, contents, etc) which is a very unsatisfying experience.

I see. To bad, would have been too easy this way ;-) I already had a look at the TCP connection methods within asyncio, before I found the makefile routine. I will see whether I can get a connection up and running with that then.

Probably use either .format, or jinja2 (looks like what i was thinking), since it's lying around. But yes, an individual shell token would have some content replaced. For prior art, see jupyter-server-proxy, which we used to rely on... but it has some other characteristics we didn't want.

Thanks for clarifying. I will have a look at jinja2 and the way free port detection was done in the acceptance test.

@bollwyvl
Copy link
Collaborator Author

containing such an .yaml based schema file?

That .json is the correct file, it's just easier for me to freehand valid yaml... but we don't want to wade into the yaml dependency mess for one file. Maybe we'll all use TOML some day or something, if that makes it into every standard library (and grows some better tooling support)... Julia, Python and Rust can't all be wrong, surely? Until then, JSON it is, for all its warts.

The version property somewhere in there lets us ensure that if we change something drastic about the spec then the spec will not load, as starting a language server incorrectly is probably worse than it just not starting (but generating copious logs). We bumped to version 2 when we gave up on language being the main key to identify language servers, and instead do everything based on the implementation name.

As long as all of our specs, and any open source ones we know of in the wild (countable on one hand after trip to the ER on an exciting insert national holiday here 🎆) still validate, it can stay 2. If mode, port, host, etc. change the game a lot, then it will need to be a 3, i guess. If proprietary forks/languages we don't know about can't update that one file, then their users will have to pin, but we really try to keep that part backwards compatible.

TCP connection methods within asyncio

Ah, anyio is a different beast than asyncio... while in the standard lib (and very close to the tornado code one sees in many places across the jupyter landscape) there are other loop implementations. anyio is a loop-independent wrapper and has its own socket wrapper. More importantly, and the reason it's in jupyter_server, is that it has robust async file I/O... indeed, it could well be we should move our existing subprocess stuff to their wrapper, as that has proven notoriously tricky to make work, especially on windows... language servers are not the most stable/well-behaved/spec-conforming things around.

look at jinja2

well, i guess jinja has the danger of folk getting too fancy... if what needs to be done can be done with .format, that's probably sufficient (and i believe what jupyter-server-proxy does).

port detection was done in the acceptance test.

sorry, was a bit strapped for time, earlier. It's implemented here and is pretty straightforward. It would just live in some util file or something, as I don't think it's part of the non-test portion of jupyter_server.

@FlyingSamson
Copy link

If mode, port, host, etc. change the game a lot, then it will need to be a 3, i guess. If proprietary forks/languages we don't know about can't update that one file, then their users will have to pin, but we really try to keep that part backwards compatible.

All right. I guess this shouldn't be to difficult by just making stdio the default value for mode, as suggested in your yaml snippet above, and ignoring port and host if mode != tcp.

indeed, it could well be we should move our existing subprocess stuff to their wrapper, as that has proven notoriously tricky to make work, especially on windows... language servers are not the most stable/well-behaved/spec-conforming things around.

Ok, then I will see whether I can move the process creation stuff to anyio first. After a first glance on that method it seems that there is no env parameter as is the case for Popen, so I guess we would have to manipulate os.environ directly before calling open_process(). On unix the opened process seems to inherit the new environment. I hope this will also be true for other OS's.

@bollwyvl
Copy link
Collaborator Author

Wanted to point out that https://github.com/jtpio/jupyterlite is looking like the most compelling way forward for static hosting of a lab experience. Having LSP features in a static (set of) document(s) would be killer.

There's an issue (https://github.com/jtpio/jupyterlite/issues/40) for offering bespoke routes on the "server", comms and files (e.g. LSIF).

I'm kind of inclined to favor targeting (ha!) the comms (rather than faking our home-rolled websockets), as then the browser kernels could offer both the jupyter messages as well as LSP, which would get around them having to coordinate their language versions, filesystems, etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants