Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bigquery/storage: ManagedStream leaks goroutines & channels #8272

Closed
Rudiksz opened this issue Jul 17, 2023 · 3 comments · Fixed by #8275
Closed

bigquery/storage: ManagedStream leaks goroutines & channels #8272

Rudiksz opened this issue Jul 17, 2023 · 3 comments · Fixed by #8275
Assignees
Labels
api: bigquery Issues related to the BigQuery API. triage me I really want to be triaged.

Comments

@Rudiksz
Copy link

Rudiksz commented Jul 17, 2023

Client

BigQuery/Storage

Environment

Any

Go Environment
Any
Code

This is related to the memory leak issue that was fixed in v1.43
#6766

We have long running consumers that use 3 managed streams and receive a large batch of messages every 30 minutes.
We noticed a memory leak and I was able to trace the leak of gorutines to managedwriter.connRecvProcessor and grpc.newClientStreamWithParams functions creating 3 additional goroutines every time a new batch of messages is produced.

After updating to v1.52 managedwriter.connRecvProcessor stoped leaking but we still see grpc.newClientStreamWithParams creating 3 new goroutines every 30 minutes.

The fix used arc.CloseSend but the grpc package says this - note at point 3:

// actions must be performed:
//
//  1. Call Close on the ClientConn.
//  2. Cancel the context provided.
//  3. Call RecvMsg until a non-nil error is returned. A protobuf-generated
//     client-streaming RPC, for instance, might use the helper function
//     CloseAndRecv (note that CloseSend does not Recv, therefore is not
//     guaranteed to release all resources).
//  4. Receive a non-nil, non-io.EOF error from Header or SendMsg.

Expected behavior

Previous connections are closed.

Actual behavior

Previous connections are not closed and goroutines are leaked.

Screenshots

Gorutines in v1.42

Pasted Graphic 55

Gorutines in v1.52
image

@Rudiksz Rudiksz added the triage me I really want to be triaged. label Jul 17, 2023
@Rudiksz Rudiksz changed the title packagename: short description of bug bigquery/storage: ManagedStream leaks goroutines & channels Jul 17, 2023
@product-auto-label product-auto-label bot added the api: bigquery Issues related to the BigQuery API. label Jul 18, 2023
@shollyman
Copy link
Contributor

Thanks for the report.

The veneer library doesn't really interact at the grpc clientconn level, so more fundamental connection closing is not going to make sense. I'm refactoring context propagation as part of #8232 so that's going to be the best avenue for dealing with connection lifecycle. I'll take another look at that work in light of this report.

@shollyman
Copy link
Contributor

Also, if you can share any more repro details that would be lovely. The 30 minute interval would suggest to me that you're dealing with idle connection recycling (a connection idle for ~10 minute will be closed on the server side), but it's not clear from the existing details.

@Rudiksz
Copy link
Author

Rudiksz commented Jul 19, 2023

Hi, I don't have code I can show you but what we're doing is very simple. Basically create a managedwriter.NewClient then get three streams with managedClient.NewManagedStream and start calling AppendRow on them.

The 10 minute connection timeout appears to be consistent with the reconnection behaviour we observe.
These are long running consumers that process 15-20k logs periodically for about 10-15 minutes then they stay idle until the next batch of messages that arrive in the queue.

This is our traffic pattern during the whole day:
image

Which in a course of many hours and many running in parallel show this memory consumption on our servers. We excuded any other possibility, except these hanging goroutines and possible quircks in go runtime's memory management.
image

After we updated this package from 1.42 to 1.52 we started seeing can't append rows to stream: EOF - which was expected, and it appears to be consintent with the theory about connection becoming idle and getting closed the first time we try to append on the next batch.

Pasted Graphic 70

We tried passing keepalive options to the managedwriter client and while the server seemed to react to it (when we used a short time interval for the pings, it did not seem to have effect when we used 1 minute. The streams would get closed and goroutines created by "newClientStreamWithParams` would still accumulate.

We are trying manually closing the streams after 5 minutes of inactivity in our consumers, but it would be cleaner if the managedwriter would handle the entire close/recreate logic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the BigQuery API. triage me I really want to be triaged.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants