-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LGTM tests are stalling (or are too slow for the timeout, I don't know) #45774
Comments
/cc @alesj @holly-cummins |
The Develocity history (https://ge.quarkus.io/scans/tests?search.timeZoneId=Europe%2FLondon&tests.container=io.quarkus.observability.test.LgtmResourcesIT&tests.test=testTracing#) is now showing this: The first build it fails in is the one which merges #45689. That strongly suggests my #45689 is to blame ... except that on #45689 I looked up the history of the failing test, and had this: That screencap shows it starts failing on the 17th. I didn't capture the previous URL to see why the two histories are different. |
We're actually waiting on Imho, the problem is we've added a few more tests - reload, Prometheus,
which we always sort of suspected it might be a problem -- startup time wise. Dunno what we can do? I do have a PR somewhere :-), which breaks this things in separate pieces, |
@holly-cummins I don't see how your PR could be the problem -- it just fixed the missing properties on the re-use. Imho, the 34eb48e fix -- to have proper container wait strategy -- is to blame. :-) Before we somehow got lucky for the tests to pass, even though we weren't properly waiting for LGTM to be fully operational. |
That's my feeling too. The re-use path isn't even used on the normal test execution (which is why we didn't notice the properties were missing). My change did touch the code on the non-re-use path just as part of the refactoring, so it's possible it could have accidentally changed the behaviour. But the only thing I can think of is if the refactored code was much faster or much slower and that somehow affected how much time there was for the container to start. But it's pretty long shot.
:)
That makes a lot of sense. It might be that just increasing the timeout for container start is enough to get things reliable again. |
Let's increment the container timeouts and see what happens, no? |
Damned if I do, damned if I don't :) |
Fixes quarkusio#45774 (maybe...)
Fixes quarkusio#45774 (maybe...)
Fixes quarkusio#45774 (maybe...)
Fixes quarkusio#45774 (maybe...) (cherry picked from commit b3cdc89)
Describe the bug
We are seeing some recent issues in the LGTM tests.
I am not sure if it's a problem with the new startup condition of the container or if starting the container is just too slow and this test needs a higher timeout but it needs to be addressed as it has been failing on CI quite a lot.
You can get the whole output by getting the raw logs for this build: https://github.com/quarkusio/quarkus/actions/runs/12897381888/job/35963408203 .
Then you wait for them to load entirely and you search for
2025-01-22T01:14:31.6075066Z 2025-01-22 01:14:31,520 INFO [io.qua.obs.tes.LgtmContainer] (docker-java-stream--1572734623) [LGTM] STDOUT: Tempo is up and running.
The text was updated successfully, but these errors were encountered: