Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow startup on a cloud-hosted Hub #851

Open
fperez opened this issue Sep 10, 2022 · 7 comments
Open

Slow startup on a cloud-hosted Hub #851

fperez opened this issue Sep 10, 2022 · 7 comments
Labels
duplicate This issue or pull request already exists env:jupyterhub issues related to jupyterhub deployments help wanted Extra attention is needed package:jupyter-lsp schema changes to contract between server, client, and extensions

Comments

@fperez
Copy link

fperez commented Sep 10, 2022

Description

We (well, @yuvipanda and team did the hard work) installed the LSP tools for a cloud-hosted hub I use to teach a ~1,200 student course this term. Unfortunately Yuvi and team found performance issues - this commit describes slow startup times.

In regular usage, I also noticed tab-completion being slower than usual, to the point where it interferes with the typing workflow (to save typing, tab completion really needs to keep up with ~ 70 wpm speeds, which means its latency cap should be very, very tight).

Eventually we disabled it as for us, smooth performance of the hub is critical. Yuvi may have more details perhaps.

I'm sorry not to have more specifics, but at least flagging it here for the team. In the long run I hope we'll be able to deploy this in production everywhere, as I'd love to teach students with these enhanced capabilities.

Thanks for the great work, btw!

cc @ellisonbg

@ellisonbg
Copy link

Thanks for opening this. Do you have any reason to believe that the performance issues are related to the number of users in the deployment, or do you think that detail is unrelated to the root cause?

@krassowski
Copy link
Member

krassowski commented Sep 10, 2022

This previous issue seems to be describing the same problem: #796.

I think that the commit description is right: we are checking for installed language servers which takes time. It might take even more on Windows due to the expansion of the search path included in #731 (unreleased yet).

@yuvipanda, did you try to disable auto-detection of language servers with autodetect traitlet (docs) and specifying configuration of the language servers relevant to your course (I guess a Python LSP server + maybe markdown, but likely not all three Python language servers and a dozen of others more) via config files? The logic is in:

def _collect_language_servers(
self, only_installed: bool
) -> KeyedLanguageServerSpecs:
language_servers: KeyedLanguageServerSpecs = {}
language_servers_from_config = dict(self._language_servers_from_config)
language_servers_from_config.update(self.conf_d_language_servers)
if self.autodetect:
language_servers.update(
self._autodetect_language_servers(only_installed=only_installed)
)
# restore config
language_servers.update(language_servers_from_config)
# coalesce the servers, allowing a user to opt-out by specifying `[]`
return {key: spec for key, spec in language_servers.items() if spec.get("argv")}

You might want to open a PR by changing the autodetect traitlet to Union([Bool, Set]) and if a set is provided only attempt to autodetect the servers from the list provided which would save you from adding custom configuration for those servers.

The autodetection logic is in:

def _autodetect_language_servers(self, only_installed: bool):
entry_points = {}
try:
entry_points = entrypoints.get_group_named(EP_SPEC_V1)
except Exception: # pragma: no cover
self.log.exception("Failed to load entry_points")
skipped_servers = []
for ep_name, ep in entry_points.items():
try:
spec_finder = ep.load() # type: SpecMaker
except Exception as err: # pragma: no cover
self.log.warning(
_("Failed to load language server spec finder `{}`: \n{}").format(
ep_name, err
)
)
continue
try:
if only_installed:
if hasattr(spec_finder, "is_installed"):
spec_finder_from_base = cast(SpecBase, spec_finder)
if not spec_finder_from_base.is_installed(self):
skipped_servers.append(ep.name)
continue
specs = spec_finder(self) or {}
except Exception as err: # pragma: no cover
self.log.warning(
_(
"Failed to fetch commands from language server spec finder"
" `{}`:\n{}"
).format(ep.name, err)
)
traceback.print_exc()
continue
errors = list(LANGUAGE_SERVER_SPEC_MAP.iter_errors(specs))
if errors: # pragma: no cover
self.log.warning(
_(
"Failed to validate commands from language server spec finder"
" `{}`:\n{}"
).format(ep.name, errors)
)
continue
for key, spec in specs.items():
yield key, spec
if skipped_servers:
self.log.info(
_("Skipped non-installed server(s): {}").format(
", ".join(skipped_servers)
)
)

While profiling would be needed, my educated guess is that:

  • few % is lost on entrypoints
  • much more is lost on server schema validation; some good caching would solve the issue but it may not help if you are booting from fresh image; maybe we could allow to disable the validation but that should be flagged as dangerous since it can lead to runtime errors if your specification gets modified later on.
  • as much or more time is spent on invoking is_installed check; edit: one could review the is_installed scripts to see if we can accelerate those.

Further, reworking these two lines could reduce the startup time penalty by up to 50% (probably less):

self.language_servers = self._collect_language_servers(only_installed=True)
self.all_language_servers = self._collect_language_servers(only_installed=False)

Since we could first find all servers and then filter that list to installed servers. Pull requests welcome :)

I answered on the completer topic in: #566

@krassowski krassowski added duplicate This issue or pull request already exists help wanted Extra attention is needed package:jupyter-lsp labels Sep 10, 2022
@yuvipanda
Copy link

@ellisonbg don't think it's related to the number of users, they're fairly well isolated

@krassowski we did not turn it off, I think that would have helped with startup time for sure! Not sure it would have helped with the user lag tho. When we re-enable it,finding a way to constrain the servers it looks for seems the way to go. Alternatively, it would be nice if looking for servers doesn't block jupyter_server startup!

@krassowski
Copy link
Member

​> Alternatively, it would be nice if looking for servers doesn't block jupyter_server startup!

Oh yes, that would be ideal. We should make it async but from a quick skim, it would be a potentially breaking change for the next major version, since frontends do not know that they need to await for initialisation and we only retry the relevant endpoint twice 10 seconds apart (so it should work on most machines but no guarantees):

this._retries = options.retries || 2;
this._retriesInterval = options.retriesInterval || 10000;

And we would needed to communicate specs collection errors on client side too.

@bollwyvl
Copy link
Collaborator

Thanks for the feedback 😍 !

Thanks for reporting this rather than forking, or just badmouthing it elsewhere. As you note, a lot of folk have a lot of opinions on how this stuff should work for their specific use case, and for the most part, we've had to optimize for the single-user, multiple-launch experience. I've found "just" and "need" to come up very frequently, which have become some of my least favorite words in working on FOSS.

There are a ton of things at a managed hub level that would make this experience more fun, especially #184. I'd really love more sourcegraph-like features to work out of the box... but more about that later.

At any rate, lazy loading servers would move this from server startup to (probably) first opening of any document, as we wouldn't know in advance which mimetypes had servers.

Some thoughts:

  • all of the discovery and loading currently happens synchronously in LanguageServerManager.initialize
    • this could be deferred until the event loop has started (but before it was requested), and then be pushed to a thread, as most of those things are blocking
    • then the LanguageServersHandler.get could return a new version: 3 response which adds is_complete and logs
      • on the happy path, a client could "unblock" if the server it needs is ready when first requested
      • the "sad path" of not knowing would wait until everything discovery technique failed
  • slow autocomplete is almost entirely driven by the language server in question
    • for some language servers there are some tricks can be moved to the image build or postBuild equivalent
      • for jedi-based language servers, this script prepopulates the underlying cache
        • this cache is not really in a documented format, and is not appropriate for caching/packaging anyway
    • longer term, another fix is to build/package offline responses (Multiple sources of LSP messages on frontend and backend #184)
      • this would be a whole separate set of executable configurations, as these almost never seem to be integrated directly into language servers
      • sadly, for python, the sourcegraph one has been been deprecated in favor of... their new "standard", the protobuf-based SCIP (cue XKCD link)
        • I cannot recommend adding this dependency to the jupyter stack, unless we move the whole shooting match there 😝
        • however, this kind of mirrors the whole experience I've had with the "community-based" LSP experience

At any rate, I'll take a look...

@bollwyvl bollwyvl added the env:jupyterhub issues related to jupyterhub deployments label Sep 11, 2022
@krassowski krassowski added the schema changes to contract between server, client, and extensions label Sep 11, 2022
bollwyvl added a commit to bollwyvl/jupyterlab-lsp that referenced this issue Dec 11, 2022
@krassowski
Copy link
Member

@fperez @yuvipanda we released jupyterlab-lsp 4.0 which includes #882. Would you like to give it another try and see if the startup time is satisfactory now?

@fperez
Copy link
Author

fperez commented Mar 21, 2023

I just did so last night @krassowski! So far it's looking good, many thanks - we'll keep you posted. It feels a lot better in my lightweight testing (and I just added your suggested TeX support with texlab/tectonic, which I'm falling in love with!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
duplicate This issue or pull request already exists env:jupyterhub issues related to jupyterhub deployments help wanted Extra attention is needed package:jupyter-lsp schema changes to contract between server, client, and extensions
Projects
None yet
Development

No branches or pull requests

5 participants