-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
x/tools/gopls: poor performance and crash on google-cloud-go #60089
Comments
Editor and computer came pretty unusable post crash. Downgrading for now to make things better again. We recently added a go.work to this repository which I think makes gopls maybe try to do too much extra work. On the previous version(0.11.X) I noticed things were better since I did not need to open editor at sub-module roots -- but the perf is starting to make me think it is not worth the tradeoff of having a go.work file. I think this is a great repo to test out gopls/go.work in because it has so many modules and so much code :) |
Thanks Cody. It looks like the critical top portion of your crash is cut-off. Do you still have it? |
Apologies, I assumed select all would have selected all 😿 I no longer do. I will see if I can reproduce it later today if I have a couple of minutes. |
Tried for a couple to make it crash again but was unable to. The experience in the pre version does seem noticeably slower than 0.11.0 for this repo though.
If there is nothing actionable from the report feel free to close, I don't think I can keep the version installed to try to reproduce though as it affects my daily dev too much. More info on the type of machine I am on: |
@codyoss if it's too slow for your daily dev, that's actionable. I'll take a look. |
Well, it's not that bad for me (I suspect due to the linux filesystem). Using coc.nvim, performs about the same as v0.11.0 and uses ~1/6th of the memory. I am sure that there is a problem here, it just doesn't yet reproduce in my environment. However, I grabbed a profile while editing this repo interactively, and the results are astonishing: So, the good news is that there is some very low hanging fruit here. The bad new is that I think this fruit exists in v0.11.0 as well, and I don't yet understand the regression. I think making gopls work well on this repository could be a good exercise before the second prerelease, as it has so many workspace modules. Moving this issue to the Go issue tracker, since it is (probably) a gopls issue. CC @adonovan |
Change https://go.dev/cl/494096 mentions this issue: |
Change https://go.dev/cl/494097 mentions this issue: |
The CLs above set up some benchmarks for this repo, and then fix the low hanging fruit mentioned above to optimize the time it takes to diagnose the workspace from ~1.5s->300ms. I'll note that the initial workspace load on my machine went from 2 minutes@v0.11.0 (!!) to 17s@v0.12.0-pre.1, and the in-use memory went from 6-7GB to 1GB. Gopls should perform significantly better overall on v0.12.0-pre.1, and in fact this is what I observe on my linux laptop. I am curious to test more on Mac+VS Code. I am very suspicious of system calls: I observed that even on linux, 20%+ of CPU was spent in system calls related to the file cache. I expect this could be much worse on darwin. As an experiment, I put a 100MB in-memory LRU cache in front of our file-based cache, in https://go.dev/cl/494100, which significantly reduces the system call overhead. I'll test this on darwin tomorrow. |
That does seem like an awful lot, for Linux. We shouldn't be making that many filecache system calls in total, and there should be very little contention in flock (and whatever contention there is should be minimally CPU-bound: all it does is walk a very short linked list before going to sleep). It would be interesting to run gopls under perf in system mode with kernel symbols and see what shows up. (We could also compare against a rename(2)-based implementation of the cache, since it's atomic on Linux, but it really shouldn't be necessary to maintain a second implementation.) For macOS, I could totally believe it because of #13841, a kernel bug which causes a frustrating profile skew towards system calls that do minimal CPU work (like open(2)). |
This repo is large, and exercises use-cases with many workspace modules. For golang/go#60089 Change-Id: I93b9c99ce52bd7ae25f65fdb9bf0568c25375411 Reviewed-on: https://go-review.googlesource.com/c/tools/+/494096 Reviewed-by: Alan Donovan <adonovan@google.com> gopls-CI: kokoro <noreply+kokoro@google.com> Run-TryBot: Robert Findley <rfindley@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>
Optimize checking for ignored files to avoid unnecessary checks, and only build prefixes once. Along the way, fix a bug where path segments were not handled correctly in the ignore check. Encapsulate the check to make this easy to test. As a result, the DiagnoseChange/google-cloud-go benchmark improved ~5x from ~1.5s to 300ms. Also remove span.Dir, which tended to lead to unnecessary filepath->span->filepath conversions. Inline it in the one place where it was correct. For golang/go#60089 Change-Id: Id24d05b504b43e6a6d9b77b5b578583e1351de31 Reviewed-on: https://go-review.googlesource.com/c/tools/+/494097 Reviewed-by: Alan Donovan <adonovan@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Robert Findley <rfindley@google.com> gopls-CI: kokoro <noreply+kokoro@google.com>
@codyoss I have a few more thoughts about this, after reflecting on your description a bit more:
With the optimizations above, it would be great if you could try again using gopls@master, being sure not to leave the gopls logs open in the output window. It really should be faster than v0.11.0, and if it's not I'd love to jump on a VC to pair debug. To install gopls@master, just check out go.googlesource.com/tools, |
Building from HEAD I saw little to no issue. I did have a minute or two where my editor, vscode, would not jump to refs or show things again. But after a period of time it started working again. gopls CPU usage does seem down quite a bit as well. In the past I was consistently noticed it as the top process, but no longer. I will keep this version installed as it seems to be working well for my use cases. Thanks |
Thanks for testing again! I'm going to close this issue as done for now, but please comment if you encounter problems again, and I'll reopen.
gopls will use a lot of CPU when building its indexes, particularly at startup but also after changes. However, such periods should be brief, and overall CPU usage should be down. It may be the case that gopls so successfully saturates CPUs that your computer freezes. In that case, please let us know. We may need to tweak gopls to use less parallelism. |
@findleyr I did have another crash from my build of HEAD. The issue is the stacktrace is too long for vscode to capture in output to see that beginning important bit -- my terminal captured about 315k loc and that was not enough. Any suggestions about how to get more info? Does gopls write a log anywhere or anything that would be useful to share? |
Actually on subsequent startup I got another that was small enough: https://gist.github.com/codyoss/d21403933fe6c1f5802d3bf831424bd3 |
Thanks Cody. Huh, this looks like a runtime issue. Could someone from @golang/runtime take a look at Cody's crash? Cody: could you confirm that you're still on Go 1.20.3? |
Perhaps I spoke too soon: as @adonovan points out there are ~4000 threads in our file cache blocked on opening a locked file, which is exhausting the file limit. So this does look like a gopls bug. We have ideas for how to fix this. |
Change https://go.dev/cl/495055 mentions this issue: |
This change causes Get to be subject to the same concurrency-limiting semaphore just added to Set, as a user (predictably) encountered thread exhaustion in this operation too. Oddly, the Get benchmark doesn't seem to be limited by the value of 128, or even smaller values. Updates golang/go#60089 Change-Id: Ie4632cbc4cdd6536558b2067a3d115cf4fa17fae Reviewed-on: https://go-review.googlesource.com/c/tools/+/495055 Run-TryBot: Alan Donovan <adonovan@google.com> gopls-CI: kokoro <noreply+kokoro@google.com> Reviewed-by: Robert Findley <rfindley@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>
Change https://go.dev/cl/494100 mentions this issue: |
Put a 100MB in-memory LRU cache in front of the filecache, to reduce I/O for repeated cache access, such as we observe with workspace diagnostics or implements queries. Updates golang/go#60089 For golang/go#57987 Change-Id: I01a1fcca7dcf26416d4cdb578a7a2674765c9f08 Reviewed-on: https://go-review.googlesource.com/c/tools/+/494100 Run-TryBot: Robert Findley <rfindley@google.com> Reviewed-by: Alan Donovan <adonovan@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> gopls-CI: kokoro <noreply+kokoro@google.com>
I still do believe there are perf issue with HEAD gopls + google-cloud-go's large codebase + using go.work -- at least on my machine. I reverted today to v0.11.0 and gopls does idle at much higher mem usuage(~12GB) but after a period of time everything seems to stabilize perf wise in my editor. On the latest gopls I still have consistent freeze ups where basic operations like searching the whole repo take a considerable amount of time to perform while it feels like gopls indexing of symbols or something and is causing other operations not to complete. I think for now the solution for us is going to have to be removing our workspace files as they seems to tax gopls too much. I would like to eventually restore our these files because I do see an benefit of having them for other operations like syncing deps between all of our various modules, but I don't think it is currently worth the productivity hit in vscode. I would be curious if other large workspace repos have experienced the same issues, but I don't personally know of other projects as large as google-cloud-go that are trying to use this feature. |
That's really concerning to hear. You should absolutely not need to downgrade to v0.11.0, nor should you need to remove the go.work file. We'll keep digging into this.
Can you clarify what you mean by "searching the whole repo"? Do you mean symbol search, or something not related to gopls, like textual search? |
Discussing with @codyoss, we believe this is related to our Historically, gopls has conservatively re-run go mod tidy when files are saved (because saved files may theoretically change the tidy result, before overlay support). This runs for each go.mod file in the workspace, and eagerly parses Go files in the module to analyze the tidy result. Mix in the VS Code setting that auto-saves files after a delay, gopls@v0.12.0's more aggressive eviction of parsed files, and the new google-cloud-go go.work file with 145 modules, and you have a recipe for disaster. Fortunately, this is yet again easy to fix... |
Change https://go.dev/cl/496186 mentions this issue: |
Change https://go.dev/cl/496435 mentions this issue: |
Change https://go.dev/cl/496439 mentions this issue: |
Change https://go.dev/cl/496442 mentions this issue: |
As we discovered while investigating golang/go#60089, mod tidy operations can significantly affect the amount of time it takes gopls to diagnose a saved file. Add a benchmark for this operation. For reference, this new benchmark takes 8s+ on google-cloud-go, vs 300ms for DiagnoseChange (without the save). Updates golang/go#60089 Change-Id: Ie88bd63dd7d205b8629173e7f84aa1aa9858016b Reviewed-on: https://go-review.googlesource.com/c/tools/+/496435 TryBot-Result: Gopher Robot <gobot@golang.org> gopls-CI: kokoro <noreply+kokoro@google.com> Run-TryBot: Robert Findley <rfindley@google.com> Reviewed-by: Alan Donovan <adonovan@google.com>
Change https://go.dev/cl/496595 mentions this issue: |
Run mod tidy diagnostics in parallel, and don't parse files to determine missing imports if there are no missing requires. BenchmarkDiagnoseSave: 8s->1.8s For golang/go#60089 Change-Id: I5d41827914e4eb9264b16ed14af323c017eb327c Reviewed-on: https://go-review.googlesource.com/c/tools/+/496439 TryBot-Result: Gopher Robot <gobot@golang.org> gopls-CI: kokoro <noreply+kokoro@google.com> Run-TryBot: Robert Findley <rfindley@google.com> Reviewed-by: Alan Donovan <adonovan@google.com>
The overlays method showed up as a hot spot in the google-cloud-go repo (via reloadOrphanedFiles), because it walks all files to find the small number of overlays. We already have a filesMap abstraction; use it to keep track of overlays in a separate map. For golang/go#60089 Change-Id: I62c6c688d012beaa4b0f255225993da961cb9dad Reviewed-on: https://go-review.googlesource.com/c/tools/+/496442 Reviewed-by: Alan Donovan <adonovan@google.com> Run-TryBot: Robert Findley <rfindley@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> gopls-CI: kokoro <noreply+kokoro@google.com>
For workspaces with a lot of modules, deleting every mod tidy handle on every save is too expensive. Approximate the correct behavior by deleting only the most relevant mod file. See the comments in the code for an explanation of why this is an approximation, and why is is probably acceptable. This decreases the DiagnoseSave benchmark for google-cloud-go to 550ms (from 1.8s). For golang/go#60089 Change-Id: I94bea0b00b13468f73f921db789292cfa2b8d3e9 Reviewed-on: https://go-review.googlesource.com/c/tools/+/496595 TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Robert Findley <rfindley@google.com> Reviewed-by: Alan Donovan <adonovan@google.com>
Change https://go.dev/cl/496835 mentions this issue: |
As discovered in golang/go#60089, file watching patterns behave very differently in different clients. We avoided a bad client-side bug in VS Code by splitting our subdirectory watch pattern, but this appears to be very expensive in other clients (notably coc.nvim, or any vim client that uses watchman). The subdirectory watch patterns were only known to be necessary for VS Code, due to microsoft/vscode#109754. Other clients work as expected when we watch e.g. **/*.go. For that reason, let's revert all other clients to just use simple watch patterns, and only specialize to have subdirectory watch patterns for VS Code. It's truly unfortunate to have to specialize in this way. To paper over this hole in the wall, add an internal setting that allows clients to configure this behavior explicitly. The new "subdirWatchPatterns" setting may accepts the following values: - "on": request watch patterns for each subdirectory (as before) - "off": do not request subdirectory watch patterns - "auto": same as "on" for VS Code, "off" for all others, based on the provided initializeParams.clientInfo.Name. Includes some minor cleanup for the fake editor, and fixes some stale comments. Updates golang/go#golang/go#60089 Fixes golang/go#59635 Change-Id: I1eab5c08790bd86a5910657169edcb20511c0280 Reviewed-on: https://go-review.googlesource.com/c/tools/+/496835 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Alan Donovan <adonovan@google.com> gopls-CI: kokoro <noreply+kokoro@google.com> Run-TryBot: Robert Findley <rfindley@google.com>
gopls version: v0.12.0-pre.1 (go1.20.3)
gopls flags:
update flags: proxy
extension version: 0.38.0
go version: 1.20.3
environment: Visual Studio Code darwin
initialization error: undefined
issue timestamp: Tue, 09 May 2023 14:45:46 GMT
restart history:
Tue, 09 May 2023 14:43:43 GMT: activation (enabled: true)
ATTENTION: PLEASE PROVIDE THE DETAILS REQUESTED BELOW.
Describe what you observed.
gopls crashed while in google-cloud-go. I believe I was searching to open a file before this happened.
Please attach the stack trace from the crash.
A window with the error message should have popped up in the lower half of your screen.
Please copy the stack trace and error messages from that window and paste it in this issue.
https://gist.github.com/codyoss/f19310e0db7499861e7f3d92ad056625
Failed to auto-collect gopls trace: unrecognized crash pattern.
Also got the following message post crash:
extension-output-golang.go-#3-gopls (server): tokenization, wrapping and folding have been turned off for this large file in order to reduce memory usage and avoid freezing or crashing.
andConnection to server got closed. Server will not be restarted.
The text was updated successfully, but these errors were encountered: