Slow startup on a cloud-hosted Hub #851

fperez · 2022-09-10T07:05:56Z

Description

We (well, @yuvipanda and team did the hard work) installed the LSP tools for a cloud-hosted hub I use to teach a ~1,200 student course this term. Unfortunately Yuvi and team found performance issues - this commit describes slow startup times.

In regular usage, I also noticed tab-completion being slower than usual, to the point where it interferes with the typing workflow (to save typing, tab completion really needs to keep up with ~ 70 wpm speeds, which means its latency cap should be very, very tight).

Eventually we disabled it as for us, smooth performance of the hub is critical. Yuvi may have more details perhaps.

I'm sorry not to have more specifics, but at least flagging it here for the team. In the long run I hope we'll be able to deploy this in production everywhere, as I'd love to teach students with these enhanced capabilities.

Thanks for the great work, btw!

cc @ellisonbg

ellisonbg · 2022-09-10T20:20:24Z

Thanks for opening this. Do you have any reason to believe that the performance issues are related to the number of users in the deployment, or do you think that detail is unrelated to the root cause?

krassowski · 2022-09-10T21:46:54Z

This previous issue seems to be describing the same problem: #796.

I think that the commit description is right: we are checking for installed language servers which takes time. It might take even more on Windows due to the expansion of the search path included in #731 (unreleased yet).

@yuvipanda, did you try to disable auto-detection of language servers with autodetect traitlet (docs) and specifying configuration of the language servers relevant to your course (I guess a Python LSP server + maybe markdown, but likely not all three Python language servers and a dozen of others more) via config files? The logic is in:

jupyterlab-lsp/python_packages/jupyter_lsp/jupyter_lsp/manager.py

Lines 122 to 139 in b895b4e

    
           def _collect_language_servers( 
        
               self, only_installed: bool 
        
           ) -> KeyedLanguageServerSpecs: 
        
               language_servers: KeyedLanguageServerSpecs = {} 
        
               language_servers_from_config = dict(self._language_servers_from_config) 
        
               language_servers_from_config.update(self.conf_d_language_servers) 
        
               if self.autodetect: 
        
                   language_servers.update( 
        
                       self._autodetect_language_servers(only_installed=only_installed) 
        
                   ) 
        
               # restore config 
        
               language_servers.update(language_servers_from_config) 
        
               # coalesce the servers, allowing a user to opt-out by specifying `[]` 
        
               return {key: spec for key, spec in language_servers.items() if spec.get("argv")}

You might want to open a PR by changing the autodetect traitlet to Union([Bool, Set]) and if a set is provided only attempt to autodetect the servers from the list provided which would save you from adding custom configuration for those servers.

The autodetection logic is in:

jupyterlab-lsp/python_packages/jupyter_lsp/jupyter_lsp/manager.py

Lines 228 to 288 in b895b4e

    
           def _autodetect_language_servers(self, only_installed: bool): 
        
               entry_points = {} 
        
               try: 
        
                   entry_points = entrypoints.get_group_named(EP_SPEC_V1) 
        
               except Exception:  # pragma: no cover 
        
                   self.log.exception("Failed to load entry_points") 
        
               skipped_servers = [] 
        
               for ep_name, ep in entry_points.items(): 
        
                   try: 
        
                       spec_finder = ep.load()  # type: SpecMaker 
        
                   except Exception as err:  # pragma: no cover 
        
                       self.log.warning( 
        
                           _("Failed to load language server spec finder `{}`: \n{}").format( 
        
                               ep_name, err 
        
                           ) 
        
                       ) 
        
                       continue 
        
                   try: 
        
                       if only_installed: 
        
                           if hasattr(spec_finder, "is_installed"): 
        
                               spec_finder_from_base = cast(SpecBase, spec_finder) 
        
                               if not spec_finder_from_base.is_installed(self): 
        
                                   skipped_servers.append(ep.name) 
        
                                   continue 
        
                       specs = spec_finder(self) or {} 
        
                   except Exception as err:  # pragma: no cover 
        
                       self.log.warning( 
        
                           _( 
        
                               "Failed to fetch commands from language server spec finder" 
        
                               " `{}`:\n{}" 
        
                           ).format(ep.name, err) 
        
                       ) 
        
                       traceback.print_exc() 
        
                       continue 
        
                   errors = list(LANGUAGE_SERVER_SPEC_MAP.iter_errors(specs)) 
        
                   if errors:  # pragma: no cover 
        
                       self.log.warning( 
        
                           _( 
        
                               "Failed to validate commands from language server spec finder" 
        
                               " `{}`:\n{}" 
        
                           ).format(ep.name, errors) 
        
                       ) 
        
                       continue 
        
                   for key, spec in specs.items(): 
        
                       yield key, spec 
        
               if skipped_servers: 
        
                   self.log.info( 
        
                       _("Skipped non-installed server(s): {}").format( 
        
                           ", ".join(skipped_servers) 
        
                       ) 
        
                   )

While profiling would be needed, my educated guess is that:

few % is lost on entrypoints
much more is lost on server schema validation; some good caching would solve the issue but it may not help if you are booting from fresh image; maybe we could allow to disable the validation but that should be flagged as dangerous since it can lead to runtime errors if your specification gets modified later on.
as much or more time is spent on invoking is_installed check; edit: one could review the is_installed scripts to see if we can accelerate those.

Further, reworking these two lines could reduce the startup time penalty by up to 50% (probably less):

jupyterlab-lsp/python_packages/jupyter_lsp/jupyter_lsp/manager.py

Lines 119 to 120 in b895b4e

    
           self.language_servers = self._collect_language_servers(only_installed=True) 
        
           self.all_language_servers = self._collect_language_servers(only_installed=False)

Since we could first find all servers and then filter that list to installed servers. Pull requests welcome :)

I answered on the completer topic in: #566

yuvipanda · 2022-09-10T22:53:58Z

@ellisonbg don't think it's related to the number of users, they're fairly well isolated

@krassowski we did not turn it off, I think that would have helped with startup time for sure! Not sure it would have helped with the user lag tho. When we re-enable it,finding a way to constrain the servers it looks for seems the way to go. Alternatively, it would be nice if looking for servers doesn't block jupyter_server startup!

krassowski · 2022-09-11T10:43:27Z

> Alternatively, it would be nice if looking for servers doesn't block jupyter_server startup!

Oh yes, that would be ideal. We should make it async but from a quick skim, it would be a potentially breaking change for the next major version, since frontends do not know that they need to await for initialisation and we only retry the relevant endpoint twice 10 seconds apart (so it should work on most machines but no guarantees):

jupyterlab-lsp/packages/jupyterlab-lsp/src/manager.ts

Lines 35 to 36 in 2a524df

    
           this._retries = options.retries || 2; 
        
           this._retriesInterval = options.retriesInterval || 10000;

And we would needed to communicate specs collection errors on client side too.

bollwyvl · 2022-09-11T14:52:56Z

Thanks for the feedback 😍 !

Thanks for reporting this rather than forking, or just badmouthing it elsewhere. As you note, a lot of folk have a lot of opinions on how this stuff should work for their specific use case, and for the most part, we've had to optimize for the single-user, multiple-launch experience. I've found "just" and "need" to come up very frequently, which have become some of my least favorite words in working on FOSS.

There are a ton of things at a managed hub level that would make this experience more fun, especially #184. I'd really love more sourcegraph-like features to work out of the box... but more about that later.

At any rate, lazy loading servers would move this from server startup to (probably) first opening of any document, as we wouldn't know in advance which mimetypes had servers.

Some thoughts:

all of the discovery and loading currently happens synchronously in LanguageServerManager.initialize
- this could be deferred until the event loop has started (but before it was requested), and then be pushed to a thread, as most of those things are blocking
- then the LanguageServersHandler.get could return a new version: 3 response which adds is_complete and logs
  - on the happy path, a client could "unblock" if the server it needs is ready when first requested
  - the "sad path" of not knowing would wait until everything discovery technique failed
slow autocomplete is almost entirely driven by the language server in question
- for some language servers there are some tricks can be moved to the image build or postBuild equivalent
  - for jedi-based language servers, this script prepopulates the underlying cache
    - this cache is not really in a documented format, and is not appropriate for caching/packaging anyway
- longer term, another fix is to build/package offline responses (Multiple sources of LSP messages on frontend and backend #184)
  - this would be a whole separate set of executable configurations, as these almost never seem to be integrated directly into language servers
  - sadly, for python, the sourcegraph one has been been deprecated in favor of... their new "standard", the protobuf-based SCIP (cue XKCD link)
    - I cannot recommend adding this dependency to the jupyter stack, unless we move the whole shooting match there 😝
    - however, this kind of mirrors the whole experience I've had with the "community-based" LSP experience

At any rate, I'll take a look...

…gh-880-mk2-use-server-http-settings

krassowski · 2023-03-21T14:27:05Z

@fperez @yuvipanda we released jupyterlab-lsp 4.0 which includes #882. Would you like to give it another try and see if the startup time is satisfactory now?

fperez · 2023-03-21T17:42:13Z

I just did so last night @krassowski! So far it's looking good, many thanks - we'll keep you posted. It feels a lot better in my lightweight testing (and I just added your suggested TeX support with texlab/tectonic, which I'm falling in love with!!

krassowski mentioned this issue Sep 10, 2022

Autocompletions randomly stopping completely (or very slow) #566

Open

krassowski added duplicate This issue or pull request already exists help wanted Extra attention is needed package:jupyter-lsp labels Sep 10, 2022

krassowski mentioned this issue Sep 10, 2022

Slow first start #796

Closed

bollwyvl added the env:jupyterhub issues related to jupyterhub deployments label Sep 11, 2022

krassowski added the schema changes to contract between server, client, and extensions label Sep 11, 2022

bollwyvl mentioned this issue Sep 12, 2022

[wip] defer initialization until event loop is available #852

Closed

4 tasks

bollwyvl added a commit to bollwyvl/jupyterlab-lsp that referenced this issue Dec 11, 2022

Merge branch 'jupyter-lspgh-851-lazy-initialization' into jupyter-lsp…

25d158b

…gh-880-mk2-use-server-http-settings

bollwyvl mentioned this issue Dec 11, 2022

Use app's HTTP settings, handle feature settings race condition, CI/build updates, lazy server discovery #882

Merged

17 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slow startup on a cloud-hosted Hub #851

Slow startup on a cloud-hosted Hub #851

fperez commented Sep 10, 2022

ellisonbg commented Sep 10, 2022

krassowski commented Sep 10, 2022 •

edited

Loading

yuvipanda commented Sep 10, 2022

krassowski commented Sep 11, 2022

bollwyvl commented Sep 11, 2022

krassowski commented Mar 21, 2023

fperez commented Mar 21, 2023

Slow startup on a cloud-hosted Hub #851

Slow startup on a cloud-hosted Hub #851

Comments

fperez commented Sep 10, 2022

Description

ellisonbg commented Sep 10, 2022

krassowski commented Sep 10, 2022 • edited Loading

yuvipanda commented Sep 10, 2022

krassowski commented Sep 11, 2022

bollwyvl commented Sep 11, 2022

krassowski commented Mar 21, 2023

fperez commented Mar 21, 2023

krassowski commented Sep 10, 2022 •

edited

Loading