-
Notifications
You must be signed in to change notification settings - Fork 315
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Documentation / Default for Logging flushlevel can be considered harmful #2906
Comments
We have run into performance and availability issues in production due to this default flushing setting, so we are now indeed setting this to |
Thanks for reporting this. I'm not too familiar with logging and just trying to understand the issue more.
Would you be able to elaborate on this a bit more? Is this due to a build up of buffered logs until an Also, if possible, would you have a simple repro with spring that I can try? |
I believe the issue I linked in the original post explains the issue better than I can, but I'll try to do so again :) With the current setup, if an application uses Spring Cloud GCP and has a method that does e.g.
The interesting part of this stacktrace is 1 the fact that it's in TIMED_WAITING and 2 how it got there:
The What's wrong with blocking the thread you may wonder? Well that really depends! But when threads are limited (very common) and new threads are required due to the workload (e.g. handling REST calls), if the pressure is high enough, you will run out of threads / heap. And actually, even if there is little pressure on the application, if the API (logging.googleapis.com) has an issue and we're unfortunate enough to perform a Asynchronous logging is a requirement for high performance applications. The fact that there is a commonly accessed code path that blocks the thread that is logging (from the users code), is a trap that can easily bring down an application (our experience). Instead, the "flush" should have happened asynchronously (when the Appender has already been set up for asynchronous logging), but the library developers said this is not possible at the moment, so instead they decided to turn off the flushing by default. I believe that decision was the right one, and that Spring Cloud GCP should not deviate from it by having a different flushLevel, effectively enabling a dangerous feature that was turned off at some point. |
@wleese Nice explanation! The idea of the The safest default for |
Thanks for the added context and explanations. I do not know the history of the deviation between the logging/flushing levels. Having the flush operation be blocking for X seconds does seem quite problematic. I believe I saw the older example had someone wait for 2 hours and your comment above mentioned ~50 seconds. CC: @meltsufin @burkedavison For additional opinions on this. Given that java-logging has changed the flush level, perhaps this is something we should consider for spring-cloud-gcp? |
OP's linked logging client disabled-by-default PR happened after this module was developed, which at least explains how we got here. There doesn't appear to be any intent to have a different defaults in Spring Cloud GCP or the java-logging-logback appender.
I'd like to hear thoughts about the backward compatibility impact for customers in making the change, but generally agree the default should be 'disabled'. |
I think it makes sense to match the change to the default in googleapis/google-cloud-java#4254. |
The documentation found at https://googlecloudplatform.github.io/spring-cloud-gcp/5.3.0/reference/html/index.html#log-via-api says:
I believe the default that Spring Cloud GCP uses is actually
ERROR
, when the user has not explicitly set aflushLevel
.However, looking at googleapis/google-cloud-java#4254 the default the library uses is
null
(OFF
), due to running into severe performance issues that can occur when a log statement at the flushing log level is performed. It effectively blocks the thread that is performing the logging statement.Unless I'm misunderstanding the situation, I believe it would be correct to adhere to the library's default of not using log severity level flushing, to avoid performance issues - and of course correct the documentation, whatever is decided.
The text was updated successfully, but these errors were encountered: