Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Observability LGTM dev service filling up logs with services starting #45690

Merged

Conversation

edeandrea
Copy link
Contributor

At startup the Observability LGTM devservice was filling the console with lots of logs:

image
image

Furthermore, the wait strategy was only waiting for the grafana http port to be available, which is available very soon in the container startup lifecycle. Instead, it should be waiting for all the services in the container to start up.

After this change the console now looks like this:

image
image

And the Lgtm Dev Services Starting: message stays at the bottom of the console until its fully up and running.

@edeandrea edeandrea force-pushed the fix-lgtm-devservice-verbose-logging branch from de89284 to 6957299 Compare January 17, 2025 21:54
@edeandrea
Copy link
Contributor Author

@brunobat / @alesj FYI

@alesj
Copy link
Contributor

alesj commented Jan 17, 2025

@edeandrea nice catch and fix. Lgtm!

@edeandrea
Copy link
Contributor Author

edeandrea commented Jan 17, 2025

I was looking for a way to tie the container logs into the StartupLogCompressor but I couldn't figure out how to bridge them, so this is what I ended up with.

StartupLogCompressor would be the ideal solution I think.

@alesj
Copy link
Contributor

alesj commented Jan 17, 2025

I was looking for a way to tie the container logs into the StartupLogCompressor but I couldn't figure out how to bridge them, so this is what I ended up with.

StartupLogCompressor would be the ideal solution I think.

Yeah, I was playing with that idea once ... but ended up with the same problem as you.
At least I managed to get some decent output and filtering from them.

Or in my case it was at first that there was no useful info / log, and then I managed to fix this.
But I guess I introduced too much logging then.

@alesj
Copy link
Contributor

alesj commented Jan 17, 2025

StartupLogCompressor would be the ideal solution I think.

Nah, I think your stuff is the way to go.

There should be more then just LGTM stuff in Observability Dev Services / Resources -- I need to find the time to merge back 2nd (and probably 3rd) part of my initial (huge) Observability Dev Services PR ...
So each Dev Resource (and its Container) should be able to know what and how to filter things.

@edeandrea
Copy link
Contributor Author

edeandrea commented Jan 17, 2025

So each Dev Resource (and its Container) should be able to know what and how to filter things.

That was what I was thinking too, so that's the route I took with introducing the Predicate into the log consumer.

Copy link

quarkus-bot bot commented Jan 18, 2025

Status for workflow Quarkus CI

This is the status report for running Quarkus CI on commit 6957299.

✅ The latest workflow run for the pull request has completed successfully.

It should be safe to merge provided you have a look at the other checks in the summary.

You can consult the Develocity build scans.


Flaky tests - Develocity

⚙️ JVM Tests - JDK 17

📦 integration-tests/observability-lgtm

io.quarkus.observability.test.LgtmResourcesTest.testTracing - History

  • java.lang.RuntimeException: Failed to start quarkus - java.lang.RuntimeException
java.lang.RuntimeException: java.lang.RuntimeException: Failed to start quarkus
	at io.quarkus.test.junit.QuarkusTestExtension.throwBootFailureException(QuarkusTestExtension.java:611)
	at io.quarkus.test.junit.QuarkusTestExtension.interceptTestClassConstructor(QuarkusTestExtension.java:695)
	at java.base/java.util.Optional.orElseGet(Optional.java:364)
	at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
	at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
Caused by: java.lang.RuntimeException: Failed to start quarkus
	at io.quarkus.runner.ApplicationImpl.doStart(Unknown Source)

⚙️ JVM Tests - JDK 21

📦 integration-tests/observability-lgtm

io.quarkus.observability.test.LgtmResourcesTest.testTracing - History

  • java.lang.RuntimeException: Failed to start quarkus - java.lang.RuntimeException
java.lang.RuntimeException: java.lang.RuntimeException: Failed to start quarkus
	at io.quarkus.test.junit.QuarkusTestExtension.throwBootFailureException(QuarkusTestExtension.java:611)
	at io.quarkus.test.junit.QuarkusTestExtension.interceptTestClassConstructor(QuarkusTestExtension.java:695)
	at java.base/java.util.Optional.orElseGet(Optional.java:364)
	at java.base/java.util.ArrayList.forEach(ArrayList.java:1596)
	at java.base/java.util.ArrayList.forEach(ArrayList.java:1596)
Caused by: java.lang.RuntimeException: Failed to start quarkus
	at io.quarkus.runner.ApplicationImpl.doStart(Unknown Source)

io.quarkus.observability.test.LgtmServicesTest.testTracing - History

  • java.lang.RuntimeException: io.quarkus.builder.BuildException: Build failure: Build failed due to errors [error]: Build step io.quarkus.observability.deployment.ObservabilityDevServiceProcessor\#startContainers threw an exception: java.lang.RuntimeException: org.testcontainers.containers.ContainerLaunchException: Container startup failed for image docker.io/grafana/otel-lgtm:0.8.2 at io.quarkus.observability.deployment.ObservabilityDevServiceProcessor.lambda$startContainers$3(ObservabilityDevServiceProcessor.java:170) at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1708) at java.base/java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:762) - java.lang.RuntimeException
java.lang.RuntimeException: 
java.lang.RuntimeException: io.quarkus.builder.BuildException: Build failure: Build failed due to errors
	[error]: Build step io.quarkus.observability.deployment.ObservabilityDevServiceProcessor#startContainers threw an exception: java.lang.RuntimeException: org.testcontainers.containers.ContainerLaunchException: Container startup failed for image docker.io/grafana/otel-lgtm:0.8.2
	at io.quarkus.observability.deployment.ObservabilityDevServiceProcessor.lambda$startContainers$3(ObservabilityDevServiceProcessor.java:170)
	at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1708)
	at java.base/java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:762)
	at io.quarkus.observability.deployment.ObservabilityDevServiceProcessor.startContainers(ObservabilityDevServiceProcessor.java:113)
	at java.base/java.lang.invoke.MethodHandle.invokeWithArguments(MethodHandle.java:733)

@edeandrea
Copy link
Contributor Author

@alesj / @brunobat I can't reproduce this test failure on my local machine. I tried to re-run it several times and it runs successfully...

@alesj
Copy link
Contributor

alesj commented Jan 18, 2025

Could it be something with #45689 ?

@edeandrea
Copy link
Contributor Author

edeandrea commented Jan 18, 2025

I honestly don't know if it's related. It doesn't seem to be, but like @holly-cummins mentioned I don't know this codebase very well. This is the first time I've dug into it at all. But that PR hasn't yet been merged? Or are you saying there's something pre-existing that PR is intending to fix.

Like I said on my machine I ran all those interaction tests successfully. I ran the entire project, not just that one test.

@alesj
Copy link
Contributor

alesj commented Jan 18, 2025

But that PR hasn't yet been merged?

No.
Waiting for some minor feedback from @holly-cummins , but otherwise lgtm.

Or are you saying there's something pre-existing that PR is intending to fix.

Yes, or at least, that's how I understood @holly-cummins comment(s).

@edeandrea
Copy link
Contributor Author

Either way though wouldn't that mean that nothing I did in this PR is triggering this failure?

@alesj
Copy link
Contributor

alesj commented Jan 19, 2025

Either way though wouldn't that mean that nothing I did in this PR is triggering this failure?

Yeah, I don't see how your changes could produce this error.

Copy link
Contributor

@brunobat brunobat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The build error is due to a timeout in tests.
See: https://github.com/quarkusio/quarkus/actions/runs/12837447370/job/35801488131?pr=45690#step:17:49213
It's likely a transient.

@brunobat brunobat merged commit 34eb48e into quarkusio:main Jan 20, 2025
46 checks passed
@quarkus-bot quarkus-bot bot added this to the 3.19 - main milestone Jan 20, 2025
@holly-cummins
Copy link
Contributor

Or are you saying there's something pre-existing that PR is intending to fix.

Yes, or at least, that's how I understood @holly-cummins comment(s).

It sounds like @brunobat has diagnosed the failure in this build. The problem my change was intending to fix is that if the lgtm container isn't shut down properly (either because of a test problem, or because a person left it up deliberately), that container will be re-used as a dev service, but clients won't be able to connect to all services. (They'll try use default ports for some services instead of container-specific ones from config.)

@brunobat
Copy link
Contributor

The re-use will need to provide the ports to use somehow... Otherwise the services connection will fail, @holly-cummins

@holly-cummins
Copy link
Contributor

The re-use will need to provide the ports to use somehow... Otherwise the services connection will fail, @holly-cummins

That's what my changes on #45689 do. Well, sort of. Most of it was already there; @alesj already had code which worked backwards from the ports exposed on the container to the correct config for that container, but it missed setting quarkus.otel.exporter.otlp.endpoint, quarkus.micrometer.export.otlp.url, and quarkus.otel.exporter.otlp.protocol. My changes fill in the gap.

@brunobat
Copy link
Contributor

Excelent! @holly-cummins
Will review that soon.

@gsmet
Copy link
Member

gsmet commented Jan 20, 2025

Sorry but could you clarify why this ends up behaving differently than all the other Dev Services?

@holly-cummins
Copy link
Contributor

@gsmet which aspect of the behaviour do you mean (are you asking about the logging, or the "work backwards to allow re-use" model, or the "fails to shut down with Holly's classloading changes" part?)

@edeandrea edeandrea deleted the fix-lgtm-devservice-verbose-logging branch January 20, 2025 13:45
@gsmet gsmet removed this from the 3.19 - main milestone Jan 21, 2025
@gsmet gsmet added this to the 3.18.0 milestone Jan 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants