Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

@google-cloud/PubSub@1.x has increased memory usage compared to 0.29.1 #788

Closed
bcoe opened this issue Oct 21, 2019 · 20 comments
Closed

@google-cloud/PubSub@1.x has increased memory usage compared to 0.29.1 #788

bcoe opened this issue Oct 21, 2019 · 20 comments
Assignees
Labels
api: pubsub Issues related to the googleapis/nodejs-pubsub API. needs more info This issue needs more information from the customer to proceed. priority: p2 Moderately-important priority. Fix may not be included in next release. 🚨 This issue needs some love. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@bcoe
Copy link
Contributor

bcoe commented Oct 21, 2019

@pmcnr-hx this memory issue sounds a bit different than what folks were initially seeing in #770 (which was a ceasing up of message processing), so I've created this new tracking issue.

Running In Debug Mode

If it's not too much of a bother, could I get you to run your system with debugging enabled:

process.env.GRPC_TRACE = 'all';
process.env.GRPC_VERBOSITY = 'DEBUG';
const {PubSub} = require('@google-cloud/pubsub');

☝️ I'm curious to see if there is any odd behavior immediately before the message fails to process.

Memory Usage

Do you have any graphs you could share of the memory spiking behavior caused by @grpc/grpc-js, vs., grpc.

Environment details

@pmcnr-hx mind providing this information, with regards to OS/Runtime, are you on Google App Engine, Kubernetes Engine?

  • OS/Runtime: ?
  • Node.js version: ?
  • @google-cloud/pubsub version: 1.1.3

Steps to reproduce

@pmcnr-hx it sounds like the memory issue you run into happens after a few hours of processing, what does the workload look like?, e.g., messages per minute, what type of work happens as the result of a message.

  1. ?

see #770

@bcoe bcoe added priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. labels Oct 21, 2019
@googleapis googleapis deleted a comment from ericpearson Oct 21, 2019
@yoshi-automation yoshi-automation added 🚨 This issue needs some love. and removed 🚨 This issue needs some love. labels Oct 26, 2019
@bcoe
Copy link
Contributor Author

bcoe commented Oct 29, 2019

a memory issue has been found in the upstream @grpc/grpc-js library, this is most likely the root cause. See:

grpc/grpc-node#1085

@Legogris I will update you and close this thread as soon as the fix is rolled out.

@bcoe bcoe added priority: p2 Moderately-important priority. Fix may not be included in next release. needs more info This issue needs more information from the customer to proceed. and removed priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. labels Oct 29, 2019
@bcoe
Copy link
Contributor Author

bcoe commented Oct 29, 2019

@Legogris could you share a snippet of code, demonstrating where you're initializing your PubSub client in logic, @alexander-fenster points out to me that one way memory can leak is if the client is being created multiple times in an application, rather than once at the entry-point of the application?

@pmcnr-hx, likewise, I'm wondering if you might be bumping into a similar issue.

@Legogris
Copy link
Contributor

@bcoe It's definitely only instantiated once (there are two per node-process though, each initialized at entry-point).

Just to clarify, our symptoms are outbound network requests made from inside node but unrelated to PubSub / GRPC starting to fail. So if it's related to a memory leak issue, it would be resulting from use of sockets. I'll have to leave now, but I'll try to provide more details tomorrow AM.

I posted in #789 (comment) that "@grpc/grpc-js": "0.5.4" with "@google-cloud/pubsub": "0.32.1" was working fine, but I see now that we do have sockets to local services also acting inconsistently. This one I am not certain I can ascribe to grpc/pubsub yet, will have to dig deeper for that.

I can tell with certainty that the most current version of pubsub + grpc-js completely breaks, though, and that the older versions get less HTTP requests failing and that there are differences in how connections fail between the native grpc and 0.5.4.

@bcoe
Copy link
Contributor Author

bcoe commented Oct 29, 2019

@Legogris if you're able to, running the process with:

process.env.GRPC_TRACE = 'all';
process.env.GRPC_VERBOSITY = 'DEBUG';

could provide some valuable information to share with the gprc folks.

@bsato212
Copy link

I am seeing the same increase in memory usage after upgrading a Node.js 8 Cloud Function that forwards POST data to a PubSub queue. The function is running on us-central1 with 512 MB RAM. After upgrading from @google-cloud/pubsub 0.16.2 to 1.1.5 I get a memory limit exceeded error roughly every two hours. Only one PubSub instance is lazily initialized and re-used. I will now try rolling back to 0.28.0.

Captura de Tela 2019-10-29 às 21 26 51

@bcoe
Copy link
Contributor Author

bcoe commented Oct 30, 2019

@bsato212 what are you running the worker on that is consuming from the PubSub queue? (is it the worker that is running out of memory periodically?).

@Legogris
Copy link
Contributor

@Legogris if you're able to, running the process with:

process.env.GRPC_TRACE = 'all';
process.env.GRPC_VERBOSITY = 'DEBUG';

could provide some valuable information to share with the gprc folks.

@bcoe I emailed you some logs

@Legogris
Copy link
Contributor

Legogris commented Oct 30, 2019

I think I have a satisfactory explanation: The increased number of sockets due to the socket leak cause the connection pool to fill up and since we have global http(s)Agents with finite maxSockets, this causes requests to be blocked.

I will try to separate out the http agents and see if that performs differently.

EDIT: As the maxSockets is per origin, this does not fully explain it after all (it does explain failing requests to PubSub, though). Grasping at straws here, but maybe it can have to do with the maxFreeSockets setting of the http agents.

For reference, we're using https://www.npmjs.com/package/agentkeepalive with the following settings:

  maxSockets: 1000,
  maxFreeSockets: 10,
  timeout: 10000,
  freeSocketTimeout: 30000,

Is there a way to set http agent explicitly for pubsub? Can't find anything when looking through docs or source.

@vigandhi
Copy link

Starting yesterday in our staging environment, between 2 deployments (so we're not convinced yet its a code change) we started seeing a flood of these errors

Error: 3 INVALID_ARGUMENT: Request contains an invalid argument.
at Object.callErrorFromStatus (/app/node_modules/@grpc/grpc-js/build/src/call.js:30)
at Http2CallStream.<anonymous> (/app/node_modules/@grpc/grpc-js/build/src/client.js:96)
at Http2CallStream.emit (events.js:215)
at Http2CallStream.EventEmitter.emit (domain.js:476)
at (/app/node_modules/@grpc/grpc-js/build/src/call-stream.js:75)
at processTicksAndRejections (internal/process/task_queues.js:75)

We're seeing it in production now as well, currently debugging, flagging it here in case anyone's seen similar and may have suggestions.

@bcoe
Copy link
Contributor Author

bcoe commented Oct 30, 2019

@vigandhi are you seeing these errors after an extended period of time, accompanied with an increase in memory consumption?

If you're seeing these errors immediately, and it doesn't seem to be tied to a gradual memory leak, I think it would be worth opening a new issue (as what you're seeing seems like different symptoms). We can try to get to the bottom of what's happening to you in that thread.

@vigandhi
Copy link

Thanks @bcoe - @ibarsi from our team is going to follow up with a more detailed explanation, but essentially we found that using @google-cloud/logging-bunyan's WARN severity method was the culprit here.

@ibarsi
Copy link

ibarsi commented Oct 30, 2019

@bcoe It seems to be due to an increase in memory consumption. As @vigandhi stated, we noticed the flood after shipping changes that added extensive usage of @google-cloud/logging-bunyan's WARN severity. As a test, we modified all calls to log INFO instead and noticed that the errors persisted. This makes us believe that it's not actually the logging of WARN itself but perhaps another change in the way we are using @google-cloud/logging-bunyan.

There are a few more tests we'd like to do on our end in an attempt to narrow down the change. Wiull report back once we have more info.

@bcoe
Copy link
Contributor Author

bcoe commented Nov 19, 2019

@Legogris @vigandhi @ibarsi @bsato212 if anyone has the cycles to test the dependency out, could I bother you to try the dependency:

npm i @google-cloud/pubsub@next

☝️ I've released version 1.1.6-beta.0 of PubSub, which hopefully addresses some lingering memory issues that folks were seeing.


@ibarsi @vigandhi I have not floated a similar patch for @google-cloud/logging-bunyan, if this seems to be where you're seeing the issue, I can do so, but let's move the conversation to that repository.

@ericpearson
Copy link

@bcoe Cool, we will try out the patch thanks for the information. Can you elaborate on what you mean by not floating a patch for the same issue to bunyan? Is there a common piece of code that affects both products in relation to the memory issues that were being observed?

I also was curious if using the native c++ grpc client mitigates this issues and if so can other libraries be made to use the native bindings as well.

Thank you.

@bcoe
Copy link
Contributor Author

bcoe commented Nov 22, 2019

Can you elaborate on what you mean by not floating a patch for the same issue to bunyan

@ericpearson we will soon have the updated version of @grpc/grpc-js released to all libraries, we are testing it out first on @google-cloud/pubsub, because this is the library that the bug had been discovered on.

I also was curious if using the native c++ grpc client mitigates this issue

The C++ grpc dependency works on any of our grpc libraries, and I don't believe has been exhibiting this specific bug (so might be worth a shot). Within the next couple weeks we'll also release the update @grpc/grpc-js more widely, hopefully resulting in a lower memory footprint regardless of which dependency a user opts for.

@google-cloud-label-sync google-cloud-label-sync bot added the api: pubsub Issues related to the googleapis/nodejs-pubsub API. label Jan 30, 2020
@Sandhyakripalani
Copy link

Sandhyakripalani commented Feb 26, 2020

@bcoe Is this fixed?
We face a similiar issue with @google-cloud/pubsub@1.4.0
What used to consume <256MiB wth pubsub <1.x is consuming 1GiB with 1.4.0.
Does 1.5.0 have the fix for this?

@bcoe
Copy link
Contributor Author

bcoe commented Feb 26, 2020

@Sandhyakripalani there have been some fixes to the @google-cloud/pubsub library over the last couple months. I recommend trying the latest version and seeing if it helps (it should always be safe to update minor versions, e.g., from 1.3.0 to 1.4.0).

@Sandhyakripalani
Copy link

image
We did a deployment at around the time indicated in the image screenshot.
1.5.0 does not seem to help

@yoshi-automation yoshi-automation added the 🚨 This issue needs some love. label Apr 18, 2020
@feywind
Copy link
Collaborator

feywind commented May 5, 2020

Is this still a problem for everyone with the latest library release? (1.7.2) We've been held up a bit by some dependency timing issues, but I'm hoping we can get 2.0 out the door soonish. There are quite a few dependencies updates and generated code changes that other libraries have already received.

@feywind
Copy link
Collaborator

feywind commented May 14, 2020

I'm going to go ahead and close this for now since it's been idle, but please do re-open/comment if needed.

@feywind feywind closed this as completed May 14, 2020
feywind pushed a commit to feywind/nodejs-pubsub that referenced this issue Nov 12, 2024
This PR was generated using Autosynth. 🌈

Synth log will be available here:
https://source.cloud.google.com/results/invocations/7a1b0b96-8ddb-4836-a1a2-d2f73b7e6ffe/targets

- [ ] To automatically regenerate this PR, check this box.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: pubsub Issues related to the googleapis/nodejs-pubsub API. needs more info This issue needs more information from the customer to proceed. priority: p2 Moderately-important priority. Fix may not be included in next release. 🚨 This issue needs some love. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
Development

No branches or pull requests

9 participants