bigquery/storage/managedwriter: contexts are shared across regions in the pool causing future writes to be unusable #8232

jameshartig · 2023-07-07T16:04:09Z

Client

bigquery/storage/managedwriter

Environment

N/A

Go Environment

N/A

Code

Here's a contrived example. Essentially whenever a managed stream's context is cancelled all future managed streams are unusable because the connectionPool's context was cancelled.
e.g.

package main

import (
	"context"
	"fmt"

	"cloud.google.com/go/bigquery/storage/managedwriter"
)

var projectID = "test"
var dataset = "test"
var table = "test"

func main() {
	ctx := context.Background()
	client, err := managedwriter.NewClient(ctx, projectID)
	if err != nil {
		panic(err)
	}
	func() {
		innerCtx, cancel := context.WithCancel(ctx)
		defer cancel()
		ms, err := client.NewManagedStream(innerCtx,
			managedwriter.WithDestinationTable(fmt.Sprintf("projects/%s/datasets/%s/tables/%s", projectID, dataset, table)),
			// there isn't really a way to have duplicates within the same stream given
			// how this is architected so there's no reason not to use the default stream
			managedwriter.WithType(managedwriter.DefaultStream),
			managedwriter.EnableWriteRetries(true),
		)
		if err != nil {
			panic(err)
		}
		defer ms.Close()
		// do stuff with the stream like append, etc
	}()

	// later...

	ms, err := client.NewManagedStream(ctx,
		managedwriter.WithDestinationTable(fmt.Sprintf("projects/%s/datasets/%s/tables/%s", projectID, dataset, table)),
		// there isn't really a way to have duplicates within the same stream given
		// how this is architected so there's no reason not to use the default stream
		managedwriter.WithType(managedwriter.DefaultStream),
		managedwriter.EnableWriteRetries(true),
	)
	if err != nil {
		panic(err)
	}
	defer ms.Close()

	// anything you do with the stream will fail with context.Cancelled
}

Expected behavior

NewManagedStream should create a new stream with its own context separate from other contexts. Notably I do not have multiplexing enabled but even if I did, I don't think that should cause the contexts to be shared.

Actual behavior

After closing the context for a managed stream, all future managed streams are unusable.

Screenshots

e.g. A chart showing how messages are slow. Delete if not necessary.

Additional context

Commenting out this line fixes the issue for us:

google-cloud-go/bigquery/storage/managedwriter/client.go

Line 225 in c992115

c.pools[loc] = pool

shollyman · 2023-07-07T16:32:22Z

Thanks for the report, I'll take a look.

shollyman · 2023-07-07T18:23:40Z

This is a case of deriving child contexts from the wrong parents(s) when building the connection pool(s), which manage both multiplex and non-multiplex connections. We should build the pool(s) from the client's context, whereas we're deriving from the first writer we create in a given location/region at present.

We can potentially derive a specific connection's context from the writer context for the explicit connection case, or scope the context from NewManagedStream solely to the writer (ManagedStream) and not it's connection at the potential cost of more context checks. I'll need to explore this a bit.

jameshartig · 2023-07-10T14:58:40Z

@shollyman I appreciate the quick reply. I agree with your assessment. I think using the client's context would be a quick fix for this issue.

Longer-term it would be nice if the pool could close the connection when all writes are gone, which I think was the goal of using the writer's context. Switching to the client's context would cause it to potentially outlast the writers, right?

patrickmeiring · 2023-07-11T03:03:31Z

Our team also ran into this, rolling the package back to v1.49.0 fixed it for us [1].

Our setup is similar: a singleton BigQuery client created on a long-lived server context, with managed stream writers being created on short-lived per-request contexts, to take advantage of connection pooling.

[1] https://chromium-review.googlesource.com/c/infra/luci/luci-go/+/4676773

jameshartig added the triage me I really want to be triaged. label Jul 7, 2023

product-auto-label bot added the api: bigquery Issues related to the BigQuery API. label Jul 7, 2023

blunderbuss-gcf bot assigned shollyman Jul 7, 2023

shollyman added type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. and removed triage me I really want to be triaged. labels Jul 7, 2023

This was referenced Jul 17, 2023

fix(bigquery/storage/managedwriter): context refactoring #8275

Merged

bigquery/storage: ManagedStream leaks goroutines & channels #8272

Closed

shollyman closed this as completed in #8275 Jul 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bigquery/storage/managedwriter: contexts are shared across regions in the pool causing future writes to be unusable #8232

bigquery/storage/managedwriter: contexts are shared across regions in the pool causing future writes to be unusable #8232

jameshartig commented Jul 7, 2023

shollyman commented Jul 7, 2023

shollyman commented Jul 7, 2023

jameshartig commented Jul 10, 2023

patrickmeiring commented Jul 11, 2023

bigquery/storage/managedwriter: contexts are shared across regions in the pool causing future writes to be unusable #8232

bigquery/storage/managedwriter: contexts are shared across regions in the pool causing future writes to be unusable #8232

Comments

jameshartig commented Jul 7, 2023

shollyman commented Jul 7, 2023

shollyman commented Jul 7, 2023

jameshartig commented Jul 10, 2023

patrickmeiring commented Jul 11, 2023