Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bigquery: RowIterator resets token after first page #2694

Closed
vic3lord opened this issue Aug 6, 2020 · 6 comments
Closed

bigquery: RowIterator resets token after first page #2694

vic3lord opened this issue Aug 6, 2020 · 6 comments
Assignees
Labels
api: bigquery Issues related to the BigQuery API. type: question Request for information or clarification. Not an issue.

Comments

@vic3lord
Copy link

vic3lord commented Aug 6, 2020

Client

BigQuery

Environment

GKE

Go Environment

$ go version

go version go1.14.6 darwin/amd64

$ go env

GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/<user>/Library/Caches/go-build"
GOENV="/Users/<user>/Library/Application Support/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOINSECURE=""
GOOS="darwin"
GOPATH="/Users/ore/.go"
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/Cellar/go/1.14.6/libexec"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/Cellar/go/1.14.6/libexec/pkg/tool/darwin_amd64"
GCCGO="gccgo"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD="/Users/<user>/src/<service>/go.mod"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/rb/0fwsdvnj62585x6dbh0_ts680000gn/T/go-build072005646=/tmp/go-build -gno-record-gcc-switches -fno-common"

Code

e.g.

package main

func (d *DB) ListEvents(ctx context.Context, token string) ([]*pb.Event, string, error) {
	q := d.client.Query(query)
	it, err := q.Read(ctx)
	if err != nil {
		return nil, "", err
	}

	it.PageInfo().Token = token
	it.PageInfo().MaxSize = 30
	var rows []*pb.Event
	for {
		var row Event
		err := it.Next(&row)
		if errors.Is(err, iterator.Done) {
			break
		}
		if err != nil {
			return nil, "", err
		}
		event, err := row.Proto()
		if err != nil {
			return nil, "", err
		}
		rows = append(rows, event)
		if it.PageInfo().Remaining() == 0 {
			break
		}
	}
	return rows, it.PageInfo().Token, nil
}

Expected behavior

When calling this function multiple times when the token returned from previous call it should return the next page.

Actual behavior

It fails on the Next(&row) call as iterator.Done error

Screenshots

image

@vic3lord vic3lord added the triage me I really want to be triaged. label Aug 6, 2020
@product-auto-label product-auto-label bot added the api: bigquery Issues related to the BigQuery API. label Aug 6, 2020
@shollyman shollyman added type: question Request for information or clarification. Not an issue. and removed triage me I really want to be triaged. labels Aug 6, 2020
@shollyman
Copy link
Contributor

Thanks for the report.

This is somewhat unusual idiom, as you're specifying a token defined by the query result, but you're executing a new query each time. There's no guarantee that the pagination tokens for one query are valid for another. So if your data is changing at all this is going to yield incorrect results, but for results that yield cached results you're probably getting stable tokens from query to query.

If you reach iterator.Done I expect the token returned from this function to be set to the empty string. That's the expected signal that you've reached the end of the iteration on the backend. By passing the empty token back in on a subsequent call to the function you've basically just requested iteration start from the beginning.

If you want to consume results by page rather than by element, https://pkg.go.dev/google.golang.org/api/iterator?tab=doc has a Pager abstraction that allows you to consume elements by page rather than by element, which is similar to what you've built here.

If I've misunderstood, please provide more details.

@vic3lord
Copy link
Author

vic3lord commented Aug 7, 2020

Thanks @shollyman for the swift answer.

What I'm trying to achieve is 20 items sized pages the frontend will be asking via gRPC so subsequent request would go with a token from the previous one.

Maybe I'm doing it all wrong but that's not obvious from the docs.

EDIT:
Just tested out the iterator.NewPager you suggested, it has the same behaviour

@shollyman
Copy link
Contributor

It's not a particularly important detail, but the bigquery library is still based on REST and JSON, not gRPC.

Maybe this example will help:

func TestIntegration_PagedQueryIterator(t *testing.T) {
	if client == nil {
		t.Skip("Integration tests skipped")
	}
	ctx := context.Background()
	// This uses the shortened invocation of asking for an iterator given a query config
	// which elides some of the query execution details.
	//
	// You may want to run the query via q.Run() and retain a reference to the Job ID (string),
	// so you can reconstruct an iterator and pager from the JobID each time.
	q := client.Query("SELECT * from bigquery-public-data.samples.shakespeare LIMIT 99")
	it, err := q.Read(ctx)
	if err != nil {
		t.Fatalf("couldn't query: %v", err)
	}

	pageSize := 12
	currentToken := ""
	totalPages := 0
	tokenSet := make(map[string]struct{})

	for {
		// You could continue to ask for pages from a single pager instance.
		// This example builds the pager from an iterator for each page to
		// demonstrate managing tokens directly.
		pager := iterator.NewPager(it, pageSize, currentToken)

		var page [][]Value

		nextToken, err := pager.NextPage(&page)
		if err != nil {
			t.Fatalf("err getting page: %v", err)
		}
		totalPages = totalPages + 1
		tokenSet[nextToken] = struct{}{}
		if len(page) != pageSize && nextToken != "" {
			// This is a brittle constraint.  The backend can choose to limit
			// page size due to response size, but we're asking for trivial
			// volumes of data. Be aware of this for larger pages.
			t.Errorf("got an unexpected smaller page size of %d for token %s", len(page), currentToken)
		}

		if nextToken == "" {
			break
		}
		currentToken = nextToken
	}

	wanted := 9

	if totalPages != wanted {
		t.Errorf("wanted %d pages, got %d", wanted, totalPages)
	}
	if len(tokenSet) != wanted {
		t.Errorf("wanted %d distinct tokens, got %d", wanted, len(tokenSet))
	}
}

@vic3lord
Copy link
Author

vic3lord commented Aug 7, 2020

Thanks 🙏!
going to try this example to understand better.

Just to clarify, I meant my frontend talks to my backend using gRPC protocol. I did not mean that bigquery does that.
I mentioned gRPC as a protocol because it's a request/response type of thing that I want to achieve so each time I request a new page, it will spawn a new goroutine (same as net/http mux) which will prevent me from doing the for loop.

I want the end user to request a new page on each subsequent request.

@shollyman
Copy link
Contributor

I'll go ahead and close this issue out now, as there's nothing here indicating a problem with the row iterator mechanism.

I hope the example was sufficient for you to build a page-based mechanism. If I've missed something, please comment here or open a followup issue.

@vic3lord
Copy link
Author

@shollyman The example does not work between rpc calls
To solve this (not in an elegant way) I had to use a job instead of directly querying the table
Then pass the job-id+token to the client to request the next page from the job results.

This was not trivial at all. hope it helps to anyone.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the BigQuery API. type: question Request for information or clarification. Not an issue.
Projects
None yet
Development

No branches or pull requests

2 participants