-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OOM and Process Hang on Windows 10 in LanguageServerWrapperTest #1103
Comments
I finally managed to reproduce the OOM exception on my Windows 10 machine, but I had to change the test to make it run longer. I guess my Windows 10 machine is much better at counting to 10_000_000 than yours :->. As you mentioned, it is hard to track what is going on because of the async nature of things, but I'll see what I can find. |
I switched to a Ryzen CPU some time ago. Maybe they behave somewhat different making the underlying issue more prominently visible. |
This issue now also affects most of the Jenkins builds see https://ci.eclipse.org/lsp4e/job/lsp4e-github/job/main/75/console |
My analysis shows that LanguageServerWrapperTest.testStopAndActivate() leaks threads executing ConcurrentMessageProcessor.run() because the thread started in MockConnectionProviderMultiRootFolders.start() is never shut down. This was so before #1044, the only thing that apparently changed with this PR is that the balance between the 2 loops in the test has changed slightly so that the thread leak now causes actual problems. |
Hi @ava-fred , I see that the test was added as part of #691. It is true that it can be seen by simple code inspection than Do I understand correctly that the problem is that wrapper.stop(); does not wait for the process in the background to be finished, and then by calling I wonder if we can fix the test by waiting after |
Hi @rubenporras I'm afraid I did not express myself clearly. The primary issue is that the |
I think we can go the rout of waiting a bit. I will not stress test any race conditions as much as now, but I think it would be a good compromise. |
I think I know why the test is still crashing even when I fix the |
In eclipse#1103, it was noted that LanguageServerWrapperTest#testStopAndActivate causes OOM errors. On investigation, it was discovered that there are multiple causes for this. 1) The MockConnectionProviderMultiRootFolder used in the test created a message processor thread in LSP4J on each call to start() but did not shut them down. 2) The termination of the start/stop loop in the test was wrong: the loop ran for as long as the VM running tests was alive. 3) The "already stopping" logic in LanguageServerWrapper#shutdown was wrong: while the shutdown of a LanguageServerWrapper was processed, further calls to shutdown were ignored. 4) There was a race between the initializationFuture of the LanguageServerWrapper and the future used for shutdown. With this commit, the test is rewritten to avoid crashes and to properly test that calling stop / start repeatedly on LanguageServerWrapper does not leave any connection providers running. The LanguageServerWrapper class is refactored to make the test pass.
In eclipse#1103, it was noted that LanguageServerWrapperTest#testStopAndActivate causes OOM errors. On investigation, it was discovered that there are multiple causes for this. 1) The MockConnectionProviderMultiRootFolder used in the test created a message processor thread in LSP4J on each call to start() but did not shut them down. 2) The termination of the start/stop loop in the test was wrong: the loop ran for as long as the VM running tests was alive. 3) The "already stopping" logic in LanguageServerWrapper#shutdown was wrong: while the shutdown of a LanguageServerWrapper was processed, further calls to shutdown were ignored. 4) There was a race between the initializationFuture of the LanguageServerWrapper and the future used for shutdown. With this commit, the test is rewritten to avoid crashes and to properly test that calling stop / start repeatedly on LanguageServerWrapper does not leave any connection providers running. The LanguageServerWrapper class is refactored to make the test pass.
In eclipse#1103, it was noted that LanguageServerWrapperTest#testStopAndActivate causes OOM errors. On investigation, it was discovered that there are multiple causes for this. 1) The MockConnectionProviderMultiRootFolder used in the test created a message processor thread in LSP4J on each call to start() but did not shut them down. 2) The termination of the start/stop loop in the test was wrong: the loop ran for as long as the VM running tests was alive. 3) The "already stopping" logic in LanguageServerWrapper#shutdown was wrong: while the shutdown of a LanguageServerWrapper was processed, further calls to shutdown were ignored. 4) There was a race between the initializationFuture of the LanguageServerWrapper and the future used for shutdown. With this commit, the test is rewritten to avoid crashes and to properly test that calling stop / start repeatedly on LanguageServerWrapper does not leave any connection providers running. The LanguageServerWrapper class is refactored to make the test pass.
In #1103, it was noted that LanguageServerWrapperTest#testStopAndActivate causes OOM errors. On investigation, it was discovered that there are multiple causes for this. 1) The MockConnectionProviderMultiRootFolder used in the test created a message processor thread in LSP4J on each call to start() but did not shut them down. 2) The termination of the start/stop loop in the test was wrong: the loop ran for as long as the VM running tests was alive. 3) The "already stopping" logic in LanguageServerWrapper#shutdown was wrong: while the shutdown of a LanguageServerWrapper was processed, further calls to shutdown were ignored. 4) There was a race between the initializationFuture of the LanguageServerWrapper and the future used for shutdown. With this commit, the test is rewritten to avoid crashes and to properly test that calling stop / start repeatedly on LanguageServerWrapper does not leave any connection providers running. The LanguageServerWrapper class is refactored to make the test pass.
The test should run now without problems |
I can confirm that the test works now. Thanks! |
As of PR #1044 (@ava-fred) running integration tests reproducible results in a OOMs and process hangs on at least Windows 10. I tried with different JDKs - same effect. Reverting the commit from #1044 solves the issue.
The issue appears when the LanguageServerWrapperTest.testStopAndActivate() is executed.
Running
mvn verify -Dtest=LanguageServerWrapperTest#testStopAndActivate
immediately results in OOM within a few seconds and process hangs.I am attaching a thread dump where one can see that thousands of threads are created.
threaddump-1725956087712.txt
Because of the LSPs async natures I am having difficulties tracking down the root cause.
Can anyone else reproduce that issue? @rubenporras @mickaelistria
Even though this only appears in that one integration test so prominently it probably has an underlying cause that may affect end-users during normal usage.
The text was updated successfully, but these errors were encountered: