Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

100% CPU usage #2446

Open
tsvallender opened this issue Aug 15, 2024 · 39 comments
Open

100% CPU usage #2446

tsvallender opened this issue Aug 15, 2024 · 39 comments
Assignees
Labels
bug Something isn't working

Comments

@tsvallender
Copy link
Contributor

Description

I am fairly regularly (probably about once a day at a full-time job) having ruby-lsp hit 100% CPU and stay there until I restart the process. I’m using the latest ruby-lsp with the Rails extension, in Neovim. I’ve seen this behaviour in multiple Ruby versions but am mostly in 3.3.4 currently.

I’m more than happy to do some work trying to figure out what’s causing this but am unsure where to start, any pointers appreciated.

@tsvallender tsvallender added the bug Something isn't working label Aug 15, 2024
@andyw8
Copy link
Contributor

andyw8 commented Aug 15, 2024

@tsvallender if you are able to share the LSP logs from just before it causes the CPU to peak, then that may help us to diagnose the problem.

Also, are you are on the latest release? (v0.17.13)

@tsvallender
Copy link
Contributor Author

@tsvallender if you are able to the LSP logs from just before it causes the CPU to peak, then that may help us to diagnose the problem.

Also, are you are on the latest release? (v0.17.13)

There hasn't been anything extra in the logs, and yes latest release. A colleague suggested stracing the process next time it happens so will see if that suggests anything. Always the way it doesn't happen when you're waiting for it...

@Earlopain
Copy link
Contributor

I've had a hang somewhat recently (like maybe 1 month ago) as well but strace didn't show a thing. Process just kept doing things even after vscode was gone. Seems like this happens relativly often for you. How about you try adding the following to the ruby-lsp script:

Signal.trap("SIGUSR1") do
  output = +""
  Thread.list.each do |thr|
    output << ("-" * 40) << "\n"
    output << thr.inspect << "\n"
    output << thr.backtrace.join("\n") << "\n"
  end
  File.write("ruby-lsp-trace.log", output)
end if Signal.list.include?("USR1")

You can then do kill -USR1 pid where pid is somewhere in the output from ps -aux | grep ruby-lsp. I'm not sure how useful the output would actually be, its just something I used elsewhere. This should create a file with backtraces in the root of the project as long as you are not on windows.

This is all a bit awkward with the server auto-updating and all that, I wonder if this would be a generally useful addition for debugging things like this. There have been a few other issues with CPU pegging as their topic.

@tsvallender
Copy link
Contributor Author

So it’s happened again, strace output is:

strace: Process 96120 attached with 5 threads
[pid 96259] wait4(96258,  <unfinished ...>
[pid 96256] futex(0x2d26bd40, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, NULL, FUTEX_BITSET_MATCH_ANY <unfinished ...>
[pid 96239] epoll_wait(4,  <unfinished ...>
[pid 96120] read(0,

which is nigh-on ungrokkable to me I’m afraid, is it of any use to anyone else?

I've had a hang somewhat recently (like maybe 1 month ago) as well but strace didn't show a thing. Process just kept doing things even after vscode was gone. Seems like this happens relativly often for you. How about you try adding the following to the ruby-lsp script:

Great idea, I’ve just added that and verified it’s working so will try it next time…

@tsvallender
Copy link
Contributor Author

Okay, so with that bit of code added in, I get the below, which to me seems to be pointing at this loop becoming infinite?

----------------------------------------
#<Thread:0x00007f2b0aa2a9e0 run>
/home/tsv/.rbenv/versions/3.3.4/lib64/ruby/gems/3.3.0/gems/ruby-lsp-0.17.13/lib/ruby_lsp/base_server.rb:28:in `backtrace'
/home/tsv/.rbenv/versions/3.3.4/lib64/ruby/gems/3.3.0/gems/ruby-lsp-0.17.13/lib/ruby_lsp/base_server.rb:28:in `block (2 levels) in initialize'
/home/tsv/.rbenv/versions/3.3.4/lib64/ruby/gems/3.3.0/gems/ruby-lsp-0.17.13/lib/ruby_lsp/base_server.rb:25:in `each'
/home/tsv/.rbenv/versions/3.3.4/lib64/ruby/gems/3.3.0/gems/ruby-lsp-0.17.13/lib/ruby_lsp/base_server.rb:25:in `block in initialize'
/home/tsv/.rbenv/versions/3.3.4/lib64/ruby/gems/3.3.0/gems/language_server-protocol-3.17.0.3/lib/language_server/protocol/transport/io/reader.rb:16:in `gets'
/home/tsv/.rbenv/versions/3.3.4/lib64/ruby/gems/3.3.0/gems/language_server-protocol-3.17.0.3/lib/language_server/protocol/transport/io/reader.rb:16:in `read'
/home/tsv/.rbenv/versions/3.3.4/lib64/ruby/gems/3.3.0/gems/ruby-lsp-0.17.13/lib/ruby_lsp/base_server.rb:48:in `start'
/home/tsv/.rbenv/versions/3.3.4/lib64/ruby/gems/3.3.0/gems/sorbet-runtime-0.5.11525/lib/types/private/methods/_methods.rb:279:in `bind_call'
/home/tsv/.rbenv/versions/3.3.4/lib64/ruby/gems/3.3.0/gems/sorbet-runtime-0.5.11525/lib/types/private/methods/_methods.rb:279:in `block in _on_method_added'
/home/tsv/.rbenv/versions/3.3.4/lib64/ruby/gems/3.3.0/gems/ruby-lsp-0.17.13/exe/ruby-lsp:135:in `<top (required)>'
/home/tsv/.rbenv/versions/3.3.4/lib64/ruby/gems/3.3.0/bin/ruby-lsp:25:in `load'
/home/tsv/.rbenv/versions/3.3.4/lib64/ruby/gems/3.3.0/bin/ruby-lsp:25:in `<top (required)>'
/home/tsv/.rbenv/versions/3.3.4/lib64/ruby/gems/3.3.0/gems/bundler-2.5.5/lib/bundler/cli/exec.rb:58:in `load'
/home/tsv/.rbenv/versions/3.3.4/lib64/ruby/gems/3.3.0/gems/bundler-2.5.5/lib/bundler/cli/exec.rb:58:in `kernel_load'
/home/tsv/.rbenv/versions/3.3.4/lib64/ruby/gems/3.3.0/gems/bundler-2.5.5/lib/bundler/cli/exec.rb:23:in `run'
/home/tsv/.rbenv/versions/3.3.4/lib64/ruby/gems/3.3.0/gems/bundler-2.5.5/lib/bundler/cli.rb:451:in `exec'
/home/tsv/.rbenv/versions/3.3.4/lib64/ruby/gems/3.3.0/gems/bundler-2.5.5/lib/bundler/vendor/thor/lib/thor/command.rb:28:in `run'
/home/tsv/.rbenv/versions/3.3.4/lib64/ruby/gems/3.3.0/gems/bundler-2.5.5/lib/bundler/vendor/thor/lib/thor/invocation.rb:127:in `invoke_command'
/home/tsv/.rbenv/versions/3.3.4/lib64/ruby/gems/3.3.0/gems/bundler-2.5.5/lib/bundler/vendor/thor/lib/thor.rb:527:in `dispatch'
/home/tsv/.rbenv/versions/3.3.4/lib64/ruby/gems/3.3.0/gems/bundler-2.5.5/lib/bundler/cli.rb:34:in `dispatch'
/home/tsv/.rbenv/versions/3.3.4/lib64/ruby/gems/3.3.0/gems/bundler-2.5.5/lib/bundler/vendor/thor/lib/thor/base.rb:584:in `start'
/home/tsv/.rbenv/versions/3.3.4/lib64/ruby/gems/3.3.0/gems/bundler-2.5.5/lib/bundler/cli.rb:28:in `start'
/home/tsv/.rbenv/versions/3.3.4/lib64/ruby/gems/3.3.0/gems/bundler-2.5.5/exe/bundle:28:in `block in <top (required)>'
/home/tsv/.rbenv/versions/3.3.4/lib64/ruby/gems/3.3.0/gems/bundler-2.5.5/lib/bundler/friendly_errors.rb:117:in `with_friendly_errors'
/home/tsv/.rbenv/versions/3.3.4/lib64/ruby/gems/3.3.0/gems/bundler-2.5.5/exe/bundle:20:in `<top (required)>'
/home/tsv/.rbenv/versions/3.3.4/bin/bundle:25:in `load'
/home/tsv/.rbenv/versions/3.3.4/bin/bundle:25:in `<main>'
----------------------------------------
#<Thread:0x00007f2aee8307c8 /home/tsv/.rbenv/versions/3.3.4/lib64/ruby/gems/3.3.0/gems/ruby-lsp-0.17.13/lib/ruby_lsp/base_server.rb:124 run>
/home/tsv/.rbenv/versions/3.3.4/lib64/ruby/gems/3.3.0/gems/ruby-lsp-0.17.13/lib/ruby_lsp/document.rb:236:in `=='
/home/tsv/.rbenv/versions/3.3.4/lib64/ruby/gems/3.3.0/gems/ruby-lsp-0.17.13/lib/ruby_lsp/document.rb:236:in `=='
/home/tsv/.rbenv/versions/3.3.4/lib64/ruby/gems/3.3.0/gems/ruby-lsp-0.17.13/lib/ruby_lsp/document.rb:236:in `find_char_position'
/home/tsv/.rbenv/versions/3.3.4/lib64/ruby/gems/3.3.0/gems/ruby-lsp-0.17.13/lib/ruby_lsp/document.rb:114:in `locate_node'
/home/tsv/.rbenv/versions/3.3.4/lib64/ruby/gems/3.3.0/gems/ruby-lsp-0.17.13/lib/ruby_lsp/requests/signature_help.rb:54:in `initialize'
/home/tsv/.rbenv/versions/3.3.4/lib64/ruby/gems/3.3.0/gems/sorbet-runtime-0.5.11525/lib/types/private/abstract/declare.rb:37:in `new'
/home/tsv/.rbenv/versions/3.3.4/lib64/ruby/gems/3.3.0/gems/sorbet-runtime-0.5.11525/lib/types/private/abstract/declare.rb:37:in `block in declare_abstract'
/home/tsv/.rbenv/versions/3.3.4/lib64/ruby/gems/3.3.0/gems/ruby-lsp-0.17.13/lib/ruby_lsp/server.rb:638:in `text_document_signature_help'
/home/tsv/.rbenv/versions/3.3.4/lib64/ruby/gems/3.3.0/gems/ruby-lsp-0.17.13/lib/ruby_lsp/server.rb:68:in `process_message'
/home/tsv/.rbenv/versions/3.3.4/lib64/ruby/gems/3.3.0/gems/ruby-lsp-0.17.13/lib/ruby_lsp/base_server.rb:137:in `block in new_worker'
----------------------------------------
#<Thread:0x00007f2aee8afd20 /home/tsv/.rbenv/versions/3.3.4/lib64/ruby/gems/3.3.0/gems/ruby-lsp-0.17.13/lib/ruby_lsp/base_server.rb:33 sleep_forever>
<internal:thread_sync>:18:in `pop'
/home/tsv/.rbenv/versions/3.3.4/lib64/ruby/gems/3.3.0/gems/ruby-lsp-0.17.13/lib/ruby_lsp/base_server.rb:35:in `block in initialize'
----------------------------------------
#<Process::Waiter:0x00007f2aeeba6600 sleep>

@Earlopain
Copy link
Contributor

Certainly looks like something that could happen there, thought the only way I can think of is when edits are received out-of-order.

I coincidentally had another case today myself, though I failed to capture a stacktrace (oops). I'm going to add some logging to that method should we be running into the same thing.

@vinistock
Copy link
Member

We received other reports of this and the thread is quite helpful. The referenced loop essentially turns a line + character position into the index that represents that position in the string we use to store the document's source code.

I'm not sure how we wouldn't be able to find the requested line and end up in an infinite loop though. We process text edits under a mutex lock, exactly to avoid having a feature request like signature help landing in between a thread switch (and thus trying to execute in a partially updated document).

If anyone can reproduce this reliably, we need to understand what piece of code triggers the problem.

Alternatively, we can also raise from that method if we reached the end of the document without finding anything and then include the state of the document and the position we were looking for. That might help diagnose it.

@adam12
Copy link
Contributor

adam12 commented Aug 28, 2024

I've been chasing this one for a few months now as well. An alternative method I've used to capture the stack trace reliably is the rbspy tool.

I've observed two things when this has triggered:

  1. Many times the ruby-lsp process has no ppid (parent). I am not sure this is connected tho.
  2. Frequently, the task eventually completes and CPU usage returns to normal, so it's not likely to be an infinite loop.

I theorized that we were thrashing the GC somehow but wasn't able to get any deeper. It's near impossible to trigger by hand or via ruby-lsp doctor. It would be nice to be able to get Vernier enabled in a reproduction to see what the GC activity is like in order to rule it out.

@denvermullets
Copy link

if it's helpful, this triggers on my machine almost always now - to the point i've had to uninstall and i reinstall when i see an update is pushed - but if there are commands or anything that you would like me to run to help diagnose i am more than willing to help.

i'm on an intel mac still, if that matters. i uninstalled it because i just got tired of it melting my battery down so quickly with bash processes that run 100%.

@vinistock
Copy link
Member

I think adding an explicit error message if the scanner fails to find the right position will help diagnose this. I'll put something up for this.

@denvermullets meanwhile, a few questions to try to understand what's going on:

  1. What editor are you using?
  2. What's the editor's encoding? UTF-8, UTF-16 or UTF-32?
  3. Which line breaks do you use? \n or \r\n?
  4. Do you notice this happening when doing something specific in your code? Like for example, does it always happen when you use multibyte characters like emojis or japanese characters?

@denvermullets
Copy link

@denvermullets meanwhile, a few questions to try to understand what's going on:

  1. What editor are you using?
  2. What's the editor's encoding? UTF-8, UTF-16 or UTF-32?
  3. Which line breaks do you use? \n or \r\n?
  4. Do you notice this happening when doing something specific in your code? Like for example, does it always happen when you use multibyte characters like emojis or japanese characters?

sorry, missed this.

  1. VS Code
  2. UTF-8
  3. none, the only line breaks in the small repos i see it happening only have the default \n in the rails setup files
  4. i'm doing neither of those options

@Earlopain
Copy link
Contributor

Earlopain commented Sep 17, 2024

I've just had this happen to me again and the window I had open at the time was a git diff, and that file is just boring utf8. I noticed this while benchmarking and the average went way up so I'm confident it happened then.

I unfortunatly had no way to debug this since the server updated and I didn't put my things back in. strace only showed this:
futex(0x563c18caca04, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, NULL, FUTEX_BITSET_MATCH_ANY, same as above.

@Aesthetikx
Copy link

This one happens to me essentially every day, neovim, M2 MacOS. I can help diagnose if someone points me in the right direction, no strace on MacOS so I can take a look with dtrace or equivalent perhaps.

@adam12
Copy link
Contributor

adam12 commented Oct 3, 2024

I think it would be handy if anyone sees it again to capture a recent stacktrace. Looking at the stacktrace above, it seems a bit out of date.

@Aesthetikx
Copy link

I think it would be handy if anyone sees it again to capture a recent stacktrace. Looking at the stacktrace above, it seems a bit out of date.

I can reliably reproduce this, my only issue is I only run MacOS and BSD at the moment, I'll see if I can get this going on a Linux box with strace. Maybe I'll see you in IRC later @adam12

@adam12
Copy link
Contributor

adam12 commented Oct 3, 2024

my only issue is I only run MacOS and BSD at the moment

Yeah, Mac ruined strace/dtruss didn't they :( You can install rbspy via Homebrew and run it against a running process.

sudo rbspy record -p <pid>

I've found this mechanism the most reliable.

Maybe I'll see you in IRC later

I'm around 😎

@Aesthetikx
Copy link

I'll see if I can get this with a more up to date ruby-lsp, although I'm having trouble tricking neovim into using the latest version for some reason.

Time since start: 19s. Press Ctrl+C to stop.
Summary of profiling data so far:
% self  % total  name
 71.23    71.23  == [c function] - (unknown)
 28.77   100.00  find_char_position - /Users/john/.rbenv/versions/3.1.6/lib/ruby/gems/3.1.0/gems/ruby-lsp-0.17.2/lib/ruby_lsp/document.rb:240
  0.00   100.00  text_document_completion - /Users/john/.rbenv/versions/3.1.6/lib/ruby/gems/3.1.0/gems/ruby-lsp-0.17.2/lib/ruby_lsp/server.rb:560
  0.00   100.00  process_message - /Users/john/.rbenv/versions/3.1.6/lib/ruby/gems/3.1.0/gems/ruby-lsp-0.17.2/lib/ruby_lsp/server.rb:89
  0.00   100.00  new [c function] - (unknown)
  0.00   100.00  initialize - /Users/john/.rbenv/versions/3.1.6/lib/ruby/gems/3.1.0/gems/ruby-lsp-0.17.2/lib/ruby_lsp/requests/completion.rb:105
  0.00   100.00  block in new_worker - /Users/john/.rbenv/versions/3.1.6/lib/ruby/gems/3.1.0/gems/ruby-lsp-0.17.2/lib/ruby_lsp/base_server.rb:130
  0.00   100.00  block in declare_abstract - /Users/john/.rbenv/versions/3.1.6/lib/ruby/gems/3.1.0/gems/sorbet-runtime-0.5.11592/lib/types/private/abstract/declare.rb:42

@adam12
Copy link
Contributor

adam12 commented Oct 3, 2024

Another idea might be running with --debug and see if you can get rdbg to break into the process once it triggers the 100% CPU scenario. Then perhaps do a bit of poking around (like @source, @pos, and @encoding might be of value).

I have been running with --debug for a few weeks now but either haven't seen the 100% CPU scenario with frequency and at least once, I could, but rdbg didn't see the socket.

Here's my Lua config.

local lspconfig = require('lspconfig')
lspconfig.ruby_lsp.setup({
  cmd = { 'ruby-lsp', '--debug', },
})

You'll want rdbg -A to break into the process.

@vinistock
Copy link
Member

I'm still puzzled by this, but if people are willing to help diagnose this PR #2664 has a bit more machinery to help.

Essentially, if the scanner fails to find the requested position, then we raise and print to the output tab the document state and the requested position that caused the failure. This should hopefully help us understand if the document state got corrupted.

To test on that branch, I believe it would be enough to put the ruby-lsp gem in the Gemfile and point to it.

@tsvallender
Copy link
Contributor Author

Possibly of note is that I changed machine recently, and haven't had this issue since, despite having the same Neovim setup. So one fix is to buy a new PC 😅

@Aesthetikx
Copy link

I'll post another update if I see it, but I actually think updating to the latest ruby-lsp fixed this. I wasn't aware that my vim distribution vendored it's own version of the gem, hence I was not on the latest version that was installed globally or in my Gemfiles.

@andyw8
Copy link
Contributor

andyw8 commented Oct 8, 2024

@Aesthetikx which vim distribution is that?

@Aesthetikx
Copy link

@Aesthetikx which vim distribution is that?

@andyw8

In this particular case LazyVim, which I do update regularly, I think however the issue is that it is not part of the LazyVim update screen, it is :Mason and then you can update the underlying LSP dependencies from there, which is why I was out of date. As for why it doesn't use a local version of ruby-lsp, who's to say?

@andyw8
Copy link
Contributor

andyw8 commented Oct 11, 2024

I'm not familiar with LazyVim or Mason, but there is an example config here that might help:

https://shopify.github.io/ruby-lsp/editors.html#lazyvim-lsp

@miguno
Copy link

miguno commented Oct 30, 2024

I have been sporadically running into this problem, too. I have not yet been able to reproduce it, however.

  • macOS 15.0.1 Sequoia on a 2021 MBP Pro
  • ruby-lsp 0.20.1
  • Ruby 3.3.5
  • Rails v8 (pre-release, directly from main)
$ nvim --version
NVIM v0.10.0
Build type: Release
LuaJIT 2.1.1713484068
Run "nvim -V1 -v" for more info

@denvermullets
Copy link

denvermullets commented Oct 30, 2024

i spent a few hours this morning trying to resolve this and noticed that it's stuck on starting:

Image

i'm using asdf w/vs code and have been trying different settings to see if there's a connection there. i also tried updating my current project to ruby 3.3.5 (was on 3.3.0) and just updated my mac last night to the latest os.

i do not have this problem on my work computer which is an M1 vs my personal laptop which is an intel. unsure how that'd be related but just something i've noticed.

edit: i'm dumb i didn't realize i had to select the tab and then the dropdown in output to get the error so maybe my error isn't fit for this issue:

2024-10-30 11:27:36.183 [info] (myproject) Checking if chruby is available on the path with command: /bin/zsh -i -c 'chruby --version'
2024-10-30 11:27:39.990 [info] (myproject) Checking if rbenv is available on the path with command: /bin/zsh -i -c 'rbenv --version'
2024-10-30 11:27:43.711 [info] (myproject) Checking if rvm is available on the path with command: /bin/zsh -i -c 'rvm --version'
2024-10-30 11:27:47.378 [info] (myproject) Checking if asdf is available on the path with command: /bin/zsh -i -c 'asdf --version'
2024-10-30 11:27:51.247 [info] (myproject) Discovered version manager none
2024-10-30 11:27:51.247 [info] (myproject) Running command: `ruby -W0 -rjson -e 'STDERR.print("RUBY_LSP_ACTIVATION_SEPARATOR" + { env: ENV.to_h, yjit: !!defined?(RubyVM:: YJIT), version: RUBY_VERSION, gemPath: Gem.path }.to_json + "RUBY_LSP_ACTIVATION_SEPARATOR")'` in /Users/denvermullets/Development/pers/myproject using shell: /usr/local/bin/zsh

it never returns from this unless i force quit the bash process that's 100%'ing

@soda92
Copy link

soda92 commented Nov 14, 2024

I have get high CPU problem too, don't know if it's the same.

On windows 10 with neovim+ruby lsp

When I fire up two neovim instances editing Ruby code (so there are two ruby-lsp running), there starts to be ruby process consuming high CPU.

these processes exist even after exited neovim.

Also it's hard to kill them it seems they are spawn each other

vinistock added a commit that referenced this issue Nov 29, 2024
### Motivation

This is an attempt to handle and better understand #2446

We're seeing the server get stuck sometimes and I suspect that what's happening is the following:

1. A position request gets triggered, like completion, looking for a certain spot in the code
2. The code changes right as we start locating the position
3. Ruby switches threads and we process the text edit **before** finishing finding the specified position
4. Since now the document is in a different state, that position is no longer valid and it may lead to infinite loops

I propose we start raising when we fail to locate a position in the document and then we fail the request. This will hopefully help us better understand what's happening.

### Implementation

Started raising if the scanner position is already past the document range, which may happen if we modify the document exactly as we're searching for the position.

### Automated Tests

Added tests.
@nanafox
Copy link

nanafox commented Dec 5, 2024

I'm not familiar with LazyVim or Mason, but there is an example config here that might help:

https://shopify.github.io/ruby-lsp/editors.html#lazyvim-lsp
Thanks for this @andyw8

I followed the recommendation and added the config. I'll keep monitoring the behavior now to see how this helps. But so far, it's not behaving as it used to (at least, for now). The only time I saw a spike was during the initial startup, after which CPU utilization went back to normal.

Image

@sathishmanohar
Copy link
Contributor

sathishmanohar commented Dec 9, 2024

I ran across this issue today. I can replicate it like this. I have the below rails model file open

class Some < ApplicationRecord
  include SecureTokenId

  validates :content, presence: true
end

and then I paste in

class Some < ApplicationRecord
  include SecureTokenId
  
  STATES = %w[requested responded approved].freeze
  
  validates :state, inclusion: { in: STATES }
  validates :content, presence: true, if: :content_required?

  private

  def content_required?
    %w[responded approved].include?(state)
  end
end

Nothing happens just yet (checked the CPU usage at this step. It is normal). As soon as I do :Neoformat to format the buffer using rubocop. ruby_lsp takes up 100% of one CPU thread.

More info

nvim --version
NVIM v0.10.2
Build type: RelWithDebInfo
LuaJIT 2.1.1731601260
Run "nvim -V1 -v" for more info

@andyw8
Copy link
Contributor

andyw8 commented Dec 9, 2024

@sathishmanohar I'm not familiar with Neoformat, but it doesn't seem like it supports LSP tooling: sbdchd/neoformat#400

@jhawthorn
Copy link

jhawthorn commented Dec 20, 2024

I've been seeing this occasionally and managed to catch one in a gdb.

(gdb) bt 15
#0  rb_ident_hash (n=21) at hash.c:366
#1  0x00005a475d8cb9a0 in do_hash (key=0, tab=0x715f92fd28c0) at st.c:320
#2  rb_st_update (tab=tab@entry=0x715f92fd28c0, key=<optimized out>, key@entry=21, func=func@entry=0x5a475d78d6e0 <tbl_update_modify>, arg=arg@entry=140724160002272) at st.c:1447
#3  0x00005a475d78e84a in rb_hash_stlike_update (hash=124655301896360, key=21, func=0x5a475d78d6e0 <tbl_update_modify>, arg=140724160002272) at hash.c:1673
#4  rb_hash_stlike_update (hash=124655301896360, key=21, func=0x5a475d78d6e0 <tbl_update_modify>, arg=140724160002272) at hash.c:1661
#5  tbl_update (hash=hash@entry=124655301896360, key=21, func=func@entry=0x5a475d787590 <hash_aset_insert>, optional_arg=optional_arg@entry=9) at hash.c:1714
#6  0x00005a475d78e8f8 in rb_hash_aset (hash=124655301896360, key=<optimized out>, val=9) at hash.c:2946
#7  0x00005a475d900293 in exec_recursive (func=func@entry=0x5a475d7f04e0 <num_funcall_op_1>, obj=21, pairid=9, arg=arg@entry=140724160002720, outer=outer@entry=0, mid=<optimized out>) at thread.c:5238
#8  0x00005a475d910173 in rb_exec_recursive_paired (func=func@entry=0x5a475d7f04e0 <num_funcall_op_1>, obj=<optimized out>, paired_obj=<optimized out>, arg=arg@entry=140724160002720) at thread.c:5280
#9  0x00005a475d7f11e3 in num_funcall1 (x=<optimized out>, func=140, y=<optimized out>) at numeric.c:392
#10 num_equal (x=<optimized out>, y=<optimized out>) at numeric.c:1592
#11 fix_equal (x=<optimized out>, y=<optimized out>) at numeric.c:4617
#12 rb_int_equal (x=<optimized out>, y=<optimized out>) at numeric.c:4637
#13 0x00005a475d956682 in vm_call_cfunc_with_frame_ (ec=0x5a47808ae440, reg_cfp=0x715fac5fe948, calling=<optimized out>, argc=1, argv=0x715fac4ff620, stack_bottom=<optimized out>) at /home/jhawthorn/src/ruby-3.3.6/vm_insnhelper.c:3502
#14 0x00005a475d969600 in vm_sendish (method_explorer=<optimized out>, ec=<optimized out>, reg_cfp=<optimized out>, cd=<optimized out>, block_handler=<optimized out>) at /home/jhawthorn/src/ruby-3.3.6/vm_callinfo.h:403
(More stack frames follow...)

It's a bit odd to follow what it's running because this is an optimized build, but the obj=21, pairid=9 gives us a hint. This is extra odd because this I think obj is an object (the tagged pointer representing the number 10) and pairid=9 is the object_id (technically rb_memory_id) of nil then represented as a tagged pointer in C, so it's double tagged (nil -> 4 -> 9). So we're comparing 10 and nil 🤷‍♂

EDIT: Oh! and 10 is LINE_BREAK, ie. the ascii value of \n! So this all tracks.

The function we're running is ==.

I set a breakpoint in rb_aref_internal to try to validate the earlier theory that the nil comes from being out of bounds.

rb_ary_entry_internal (ary=<optimized out>, offset=7999447755) at internal/array.h:56
56          long len = RARRAY_LEN(ary);
(gdb) rp offset
FIXNUM: 3999723877

Seems like it!

@nanafox
Copy link

nanafox commented Dec 25, 2024

I'm not familiar with LazyVim or Mason, but there is an example config here that might help:
https://shopify.github.io/ruby-lsp/editors.html#lazyvim-lsp
Thanks for this @andyw8

I followed the recommendation and added the config. I'll keep monitoring the behavior now to see how this helps. But so far, it's not behaving as it used to (at least, for now). The only time I saw a spike was during the initial startup, after which CPU utilization went back to normal.

Image

For those who use conform, here's what I did and saw a good change after my last comment. I am using the locally installed version of ruby_lsp and haven't installed the one from Mason. I realized that after I saved a file that triggered conform to format the ruby file, the ruby_lsp process increased in CPU utilization and stayed there.

After setting the config lsp_format = never on the ruby-specific files, I've seen no issues with the CPU spikes I used to see. I left multiple files that should've normally reproduced the issue but that didn't happen.

LSP suggestions and completions work as expected with the upside of my CPU not being bugged down as it used to. You could replicate a similar config in your setup and share the feedback. I will keep monitoring this and share whatever updates I get from using this setup over time but so far, it's been doing great.

Image

@sathishmanohar
Copy link
Contributor

Installing ruby-lsp via Mason seems to be issue for me. I did these things and now I don't seem to run in the issue.

  1. Uninstalled the ruby-lsp installed with Mason (Just type X when cursor is on the LSP)

  2. Added gem "ruby-lsp-rails" to development block of Gemfile and installed via bundle install

  3. changed my neovim lsp configs like below

--- Enable mason.vim
require("mason").setup()

-- Ruby LSP specific configuration
require('lspconfig').ruby_lsp.setup({
  mason = false,
  on_attach = on_attach,
  capabilities = require('cmp_nvim_lsp').default_capabilities(vim.lsp.protocol.make_client_capabilities()),
  cmd = { "bundle", "exec", "ruby-lsp" },

  flags = {
    debounce_text_changes = 150,
  }
})

@kaka-ruto
Copy link

Interesting, thanks for sharing @sathishmanohar

vinistock added a commit that referenced this issue Jan 6, 2025
### Motivation

I'm hoping this is the solution for #2446

My hypothesis is that we're seeing a concurrency issue where threads switch in the middle of locating targets and then the document gets mutated. That would render the location we're currently searching for invalid, which can then lead to infinite loops.

### Implementation

My idea is to move our mutex into the global state so that we can use it in more places. Then we lock the mutex while locating targets to avoid having any document edits be applied in the middle.

**Note**: I experimented with passing the mutex down to the document instances, but it was a bit messier.
@vinistock
Copy link
Member

Since all of the stack traces point to locating positions in documents, my suspicion is that this is a concurrency issue where the following scenario happens:

  1. We start locating a target position in the document for any feature (completion, document highlight, ...)
  2. The document gets modified (imagine someone typing fast, a bunch of sequential keypresses)
  3. The worker thread switches in the middle of locating the target and we pick up the document edit, which gets applied under a mutex
  4. We switch back to the thread that was locating the target, but the document is no longer in the same state as when we started, resulting in an infinite loop

If my understanding is correct, then I hope #2976 will do the trick.

@sathishmanohar
Copy link
Contributor

In terms of documents getting modified. I ran across this issue more often when I pasted a bunch of text into the file (sometimes replacing the whole file structure). Just sharing so that this could also be considered as a possible scenario.

@vinistock
Copy link
Member

@sathishmanohar thanks for sharing. From the perspective of the language server, insert, replace or delete are all considered edits, regardless of whether they were typed key by key or pasted.

If you're seeing this happen when pasting large parts of a document, then it's another piece of evidence in favour of my theory. If you manage to paste exactly as the language server is searching for a target position and the new version of the document is shorter than the previous state, the loop will never end.

I'll ping this thread once the new release is out so that folks can report if they are still seeing the issue.

@miguno
Copy link

miguno commented Jan 8, 2025

Does https://github.com/Shopify/ruby-lsp/releases/tag/v0.23.1 include the potential fix?

@vinistock
Copy link
Member

I just cut v0.23.2 with what I hope is the fix. Please try it for some time and report back if you experience any issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests