-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Large memory usage by gcloud-logging API. #1449
Comments
Thanks @gregw for a detailed analysis. |
Unfortunately, google-cloud-java gets its version of Netty indirectly through grpc, so there isn't a super-clean way to update that. Regardless, I would be very interested in seeing what your analysis shows if you update the Netty dependency on your end to 4.1.5 using the exclusions feature of Maven described in #1433 . |
@garrettjonesgoogle I ran the same test with 4.1.5.Final Netty. It improve the base memory usage, but not the dynamic usage per request. After the 10K log messages total memory usage is now only 45.5MB, but when idle it now reduces to 8.7MB. So there is an approx 5MB saving, but the per message cost is about the same, dominated by 22.8 MB is used by |
As per the discussion in grpc/grpc-java#2554, the issue appears to be caused (or at least aggravated) by the flush size of 1, which results in only an individual log entry being sent per RPC and many log entry lists being created as garbage. The simple flush level will be replaced by a better mechanism called "bundling", to provide timely yet aggregated log entries. |
@garrettjonesgoogle I see two ways to resolve this
Do you have a preference? Since this class allows configuring various options on per-handler basis, I think the 2nd option is probably easier to do. Due to the work we're doing on pubsub, the bundling surface feels somewhat unstable. So either way we decide to do it, we might have to rewrite pieces of it. I'll be happy to do the work though. |
Bundling was already added to Logging and the GAPIC client was regenerated in d4d494d . |
@garrettjonesgoogle I somehow missed this. Was the config not in the YAML? I see no reference to bundling in https://github.com/googleapis/googleapis/blob/622cee91552e025d8f2f5733308ee2f633356f98/google/logging/v2/logging_gapic.yaml |
Ahh you're right, I haven't submitted it there yet. |
@gregw Can you please take a look? |
@meltsufin @garrettjonesgoogle I have rerun the test I did above. The good news is that the network usage appears to be a lot better! Bad news is that the memory usage appears to be a lot worse:
I'll see if I can attach some detailed reports. Bad news is that the total image size is a bit bigger at 62MB, of which 54MB is consumed by |
Here is a report. Tell me if you need more info. |
@gregw Thanks for the detailed analysis! It sounds like there is a memory leak somewhere in the use of |
@pongad will be the one to take a look. |
ping, any updates? |
Apologies for the delay. I have been pulled a few different ways the past couple of weeks. I looked deeper into this issue. Previously d4d494d bundles the log messages so that so that we make fewer GRPC calls. This is good. The problem is that This would explain why Greg's benchmark reports so little time spent on the RPC: The RPCs might not have happened at all. Since Getting flush to work properly with bundling proved more difficult than I thought. I plan to write and share a doc on this subject. |
@pongad I'm not sure we really need the flush mechanism, at least how it is currently written. See the discussion on grpc/grpc-java#2554. Currently the mechanism is such that as soon as the number of queued messages is over a configured limit (default 1), flush is called. So even if a write from a previous flush is outstanding another flush is done and another asyncwrite is queued. I think the semantic that is needed is that if a previous RPC is outstanding, then the next RPC is not initiated until the previous one is complete. This will allow logs to accumulate and for many to be written at once. Perhaps this can be approximately achieved by having both a high flush limit (say 100) and a minimum time to wait before flushing (say 1000ms). So the log queue will be flushed and an RPC created only if more than 100 entries are created OR an entry has been waiting more than 1000ms. @garrettjonesgoogle implied that this is exactly what the bundling system is meant to achieve. So perhaps you should be looking at removing the flush paradigm as it may be incompatible with bundling? |
@gregw That makes sense. "Proper" flushing is probably still desirable, but we don't need it for this. I think that might be referring to the |
@michaelbausor Have you had a chance to take a look? This is our only blocker for integrating the library into the Java runtime. |
@meltsufin I have had an initial look - I will start working on a fix tomorrow. |
@michaelbausor Thanks for the update! :) |
@meltsufin @garrettjonesgoogle A further update about the outstanding issues: Unbounded queue of bundles Flush implementation is broken Bundles are too small Bundling consumes a lot of memory |
@michaelbausor Flushing is indeed a big problem. I don't think it's immediately relevant to this issue though. Once we deal with items 1, 3, and 4, the memory problem should be resolved. We can deal with item 2 afterwards. I think I conflated the two issues because they were introduced in the same commit. |
@garrettjonesgoogle It looks like we've now identified the underlying issues. How are you prioritizing the fixes? Do you have a rough ETA? Thanks! |
@meltsufin I will follow what @pongad said, and prioritize the other fixes above the flushing implementation. Hopefully I can provide a fix for bundling size and implement using |
Status update: I have one PR in flight for adding |
@michaelbausor I see that the |
I should have an open PR by the end of today, I will link to it here. cc @garrettjonesgoogle for when we will do a release, but hopefully as soon as the changes are merged. |
PR is open here: googleapis/gax-java#214 |
Update: The above PR was replaced by googleapis/gax-java#226, which is now merged. There is one outstanding PR with the equivalent changes in toolkit: googleapis/gapic-generator#1095 |
@gregw @meltsufin The fix for this issue is now in master. We will try to create a release as soon as possible, but we also have other outstanding breaking changes to logging which we would like to include. It would be great to get verification that these changes have fixed the issue, and if not to figure out what more needs to be done. Are you able to test using the master branch? If not, let me know and we can try to use a snapshot release or some other alternative. |
@gregw Would you be able to run the test again with a build from master? |
@meltsufin @michaelbausor I'll rerun today... standby ... |
@meltsufin @michaelbausor I've run the same tests and the memory usage is much much better.:
So there is still a moderate penalty for queued logs, but it is not leaked and quickly cleared. Few applications will log 10,000 messages in a tight loop. |
@gregw @meltsufin Great, thanks! That matches with what I would expect. I will close this issue once we create a new release. Also note that @pongad has a PR in progress that will reduce the memory consumption of queued logs further in some cases: #1721 |
Thanks @michaelbausor!! Also, #1721 sounds like a great idea for compressing the log batches! |
This is now released in https://github.com/GoogleCloudPlatform/google-cloud-java/releases/tag/v0.10.0 |
🤖 I have created a release *beep* *boop* --- ## [2.3.5](googleapis/java-bigquerydatatransfer@v2.3.4...v2.3.5) (2022-08-24) ### Dependencies * update dependency com.google.cloud:google-cloud-bigquery to v2.14.7 ([#1448](googleapis/java-bigquerydatatransfer#1448)) ([3310cbb](googleapis/java-bigquerydatatransfer@3310cbb)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
The logging API has a large memory foot print, with approximately 4KB of memory consumed for each log generated.
This was measured with the following program:
Eclipse MAT was used to analyse a head dump taken immediately after 10K log entries were generated, then again once the program had returned to idle and network monitor indicated that all data had been flushed.
After 10L log messages, the heap was 51MB, which reduced to 13.4MB once the logs had been flushed (but no forced GC, so the difference may be larger). Thus at least 38MB appears to be consumed by logging 10K messages, which is 3.9KB per message. The attached reports( 10K Top Consumers.pdf & Idle Top Consumers.pdf ) indicate that 22.9 MB is used by
io.grpc.netty.NettyClientTransport$2
and 12.1MB alone is consumed byio.netty.handler.codec.http2.StreamBufferingEncoder
.Ideally, the memory usage per log line could be greatly reduced. Eitherway, it would also be highly desirable to be able to put some limits on the memory consumed by the logging subsystem so that excessively verbose logs are either discarded, summarized, blocked or excepted.
The text was updated successfully, but these errors were encountered: