Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Yet another PR for async execution and batching. Defer calling thunks until as late as possible during execution. #367

Closed
wants to merge 5 commits into from

Conversation

jjeffery
Copy link
Contributor

@jjeffery jjeffery commented Jul 24, 2018

Here is yet another PR that provides support for batching queries and asynchronous execution. It proposes an alternative approach to other async PRs including #357 and #132.

It's a difficult problem, and there are clearly a number of different approaches to consider. Hopefully this PR will help. The features of the approach in this PR are:

  • No change to the public API.
  • Bare minimum change to the existing code.
  • No additional goroutines in the graphql code

Thunks

My thinking is that staying with the use of thunks is the way to go, both for batching and for asynchronous execution. (This is in comparison with introducing the concept of a promise, or using channels in the API).

The approach taken in this PR is to delay calling any thunks until as late as possible in the execution sequence.

Batching

Batching of queries (for example, to an SQL database) can be implemented using the dataloader pattern and do not require the use of additional goroutines. The important thing for batching is that the resolver functions are all called first; only once all available thunks have been returned are any of these thunk functions actually called.

Async Execution

My thinking is that if the resolver function wants to operate asynchronously, it should be in control of any goroutines, not the graphql package. So if a resolver function were to run asynchronously, it might look something like the following:

func (params graphql.ResolveParams) (interface{}, error) {
    type result struct {
        data interface{}
        err  error
    }
    ch := make(chan result)
    go func() {
        // ... perform async operation here ...
        ch <- resultFromAsyncOperation()
    }()

    f := func() interface{} {
        r := <-ch
        if r.err != nil {
            panic(r.err)
        }
        return r.data
    }

    return f, nil
}

This allows for batching of queries (using the dataloader pattern), and provides support for concurrent operations.
@coveralls
Copy link

coveralls commented Jul 24, 2018

Coverage Status

Coverage increased (+0.03%) to 91.828% when pulling 6131ddf on jjeffery:deferred-thunks into ef7caf8 on graphql-go:master.

@jjeffery
Copy link
Contributor Author

Fixes #235

@edpark11
Copy link

edpark11 commented Aug 5, 2018

@jjeffery thanks for this

@chris-ramon -- do you have a perspective on criteria for merging one of these async patches? #213 #367 #357

Right now I slightly prefer #213 but that's because I already have an implementation that works with Nick Randall's dataloader and because some of the folks on that thread say that they've been using it for a while in production with no issues. If we liked the approach here (#367) better, I'd be happy to port to this (which also seems straightforward).

Anyways thank you for maintaining graphql-go! I mostly just want to make sure that this was somewhere on the near-term radar for merging into master-- I don't think anyone really wants to use a forked version of graphql-go :)

@jjeffery
Copy link
Contributor Author

jjeffery commented Aug 6, 2018

@edpark11 thanks for your interest in this PR.

@chris-ramon I'd be interested in your views on this topic as well, if you have the time.

Thanks also to the authors of PRs #213 and #367.

When I created this PR I did not have any real expectation that it would be merged as-is, but I did want to show that it is possible to come up with an implementation that supports async processing, but does not create a goroutine per graphql field. (I recognize that #213 has an opt-in for concurrent processing, but that is a topic for another discussion).

I should probably explain why creating one or more goroutines in the graphql execution could present a problem. The main use I have for this package is building a GraphQL server in front of a relational database (PostgreSQL in my case). In my implementation, each HTTP request creates a database transaction (*sql.Tx from the database/sql package) and this same transaction is used for all database access in the request.

This works fairly well for me at the moment, because I know that all of my resolver functions and all of my thunks are called from the same goroutine. If this were not the case, then I would have to apply a mutex around every database call (on account of *sql.Tx not being thread-safe). This, in my opinion, would make my code worse, not better.

I understand that implementations that use other storage backends (eg MongoDB) might not have the same issue, but I do think that relational databases as a backend is an important use case.

I do think that I remember @chris-ramon mentioning that this package uses the GraphQL reference implementation as a guide, and that implementation makes use of promises. In that way, I think that #357 has some merit, but I would be reluctant to introduce a promise-based API into the public API of a Go package. That does not mean that the implementation could not have something that looked a bit like promises -- I just think that it would be preferable to keep this out of the public API. (Just my opinion, of course).

I thought my PR was interesting in that it deferred resolution of any thunks until as late as possible in the execution sequence. The idea here is that a resolver that returns a thunk is returning an indication that there is some work to be done before the thunk should be called. Calling the thunk straight away (as happens on the current master branch) kind of defeats the purpose of having a thunk.

Delaying thunk resolution for as long as possible works with the dataloader pattern when using an SQL database. If a resolver returns many rows, each of which returns some thunks from the dataloader, then none of these thunks will get resolved until all the rows have been returned. This goes a long way towards solving the classic "select N+1" problem encountered with relational databases.

Delaying thunk resolution for as long as possible would also work for truly asynchronous processing, because each resolver function that wants to perform asynchronous processing creates its own goroutine. The goroutines are all started before thunk resolution blocks on any of them to obtain a result. The key to this is that the resolver functions are responsible for creating goroutines if that is appropriate.

Obviously there are many perspectives from which to view this problem. I look forward to an implementation on master that is acceptable to everyone, and would like to help if I can.

@ccbrown
Copy link
Contributor

ccbrown commented Aug 7, 2018

I 100% agree that the graphql package should not be in charge of the goroutines. So I'm definitely a big fan of this over #132 and #213.

In addition to reference-implementation parity, #357's channel-based approach does have one more advantage over this: The user of the library can choose which resolutions get performed first. For example, the dataloader could choose to send off a batch request for objects of type A, then wait and see if continued resolution allows it to batch up more objects of type B before sending off a request for those. In this PR, your functions have to return values in the order that they're called.

That said, if #357 isn't agreeable, I really hope this is the approach that gets merged. Except I would want the thunk to return (interface{}, error) rather than relying on panics to convey errors. 😛

@jjeffery
Copy link
Contributor Author

jjeffery commented Aug 7, 2018

@ccbrown -- thanks for your comments.

Regarding the thunks returning (interface{}, error). I totally agree with you. I would like to remove panics as error recovery as much as possible. I would like to go even further and allow thunks to return (AnyType, error), but I originally felt that would best be a topic for another PR.

It's probably worth talking about here, though, in the context of PR #213. That PR provides an opt-in for asynchronous thunks in the form that a func() (interface{}, error) gets resolved asynchronously and a func() interface{} gets resolved synchronously. While I appreciate the ingenuity behind this approach, I would prefer not to see this merged into master. I think it is not a clear way of identifying async vs sync processing, and could easily result in hard to diagnose problems. (My opinion, of course).

@edpark11
Copy link

edpark11 commented Aug 7, 2018

@jjeffery -- I agree re: the signatures of #213 . In fact, the first time I implemented it, I assumed that func() interface{} was the async version and couldn't figure out why things were going wrong until I traced much more deeply into the code. So I think that thunks for the initial branch should return func() (interface{}, error) and not allow the other sync signature. Alternately, you could introduce a new type called Thunk (as nick randall's DataLoader does)

@jjeffery -- hadn't considered the threading issues with sqlx. A corner case but an important one.

@ccbrown @jjeffery -- do you guys have any estimates on overhead of "waiting as long as possible" vs just thunking in parallel? I sort of like the "wait as long as possible" from a repeatabliity perspective... waiting a certain number of milliseconds seems less desirable (and we don't have something like the Javascript event loop here to rely on).

@chris-ramon -- I'm willing to try either/both of @jjeffery and @ccbrown 's approaches if you could give us a hint of the direction you might be willing to eventually accept a patch from :)

Anyways, thanks guys for pushing this forward!

@edpark11
Copy link

Awesome, @chris-ramon just put together a gist with a summary here-- thanks Chris! https://gist.github.com/chris-ramon/0e64fcb63de5a68107b38cbab4d7a138

@edpark11
Copy link

edpark11 commented Sep 1, 2018

@jjeffery -- I was testing this in my situation and it mostly works but I ran into an issue. Here is an illustrative situation: imagine you have a many-to-many relationship between Customers and Groups. In Postgres, this is modeled as a Customer table, a Group table, and a Customer <=> Group. Further, imagine you have a table, Visit, which models customer visits (a visit has_a customer). In our graphql-go implementation, we would model a query that would allow us to pull down every visit, the customer associated with that visit, and the groups associated with that customer something like this:

query ListCustomerVisits {
  CustomerVisit {
    items {
      customer {
        id
        first_name
        last_name
        affiliations {
          items {
            group {
              id
              name
            }
          }
        }
      }
    }
  }
} 

There are then 3 batching functions in play, which are called from the resolver for customer, affiliations, and groups respectively... the basic code looks something like this (using Nick Randall's dataloader, which I realize might be a little overkill for this situation but I was porting from #213 and it should still work):

Resolver for customer:
thunk := dataloaders["GetCustomer"].Load(p.Context, key)
returnVal := func() interface {} {
  ret, _ := thunk()
return ret

Resolver for affilations:
thunk := dataloaders["GetCustomerAffiliations"].Load(p.Context, key)
returnVal := func() interface {} {
  ret, _ := thunk()
return ret

Resolver for group:
thunk := dataloaders["GetGroup"].Load(p.Context, key)
returnVal := func() interface {} {
  ret, _ := thunk()
return ret

And of course the batch function is a straightforward transformation from a Get to a List and back-- they all look like this:

func GetCustomerDataloaderBatchFunc(ctx context.Context, keys dataloader.Keys) []*dataloader.Result {
  var results []*dataloader.Result

  if len(keys) == 0 {
    return []*dataloader.Result{}
  }

  var ids = []string{}
  for _, key := range keys {
    rkey, _ := key.(*ResolverKey)
    ids = append(ids, rkey.id)
  }
  args := &ListCustomersRequest{
    FilterBy: &CustomerFilter{
      Id: &CustomerIdFilter{
        Op:     CategoricalOperator_EQ,
        Values: ids,
      },
    },
  }
  resp, err := keys[0].(*ResolverKey).client().ListCustomers(ctx, args)

  // Guarantee stable ordering by the incoming Id.
  // First make a map of Id to result rows, then create the final result.
  idRowMap := map[string]*Customer{}
  for _, row := range resp.Items {
    idRowMap[row.Id] = row
  }
  sortedResults := []*Customer{}
  for _, id := range ids {
    val, _ := idRowMap[id]
    sortedResults = append(sortedResults, val)
  }
}

The current issue is that with the breadth-first execution pattern in this PR, I successfully batch all of the GetCustomer calls into one call to ListCustomers (yay!!). But once it descends to the next level, the descent pattern does not know that customers will by trying to resolve affiliations, and that affiliations will be trying to resolve groups. The net result is that batching works correctly for the first level of the tree but subsequent levels of the tree do not appear to be batched.

So question-- is this right? What am I missing? I am sure I am being stupid somewhere. :) Is there a good solution for this situation? Thanks everyone for all of your thoughtful comments!

cc @chris-ramon @ccbrown

@ccbrown
Copy link
Contributor

ccbrown commented Sep 1, 2018

@edpark11 It's been a few weeks since I looked at this PR, but I don't think you're missing anything. I think that's a problem inherent to this approach. But I can say that #357 handles this kind of case really well (It's capable of batching any query with the theoretical maximum efficiency I believe.).

@edpark11
Copy link

edpark11 commented Sep 2, 2018

@ccbrown -- thanks for the thoughtful comments. I looked pretty deeply into #367 and found that isn't really doing what @jjeffery intended (a breadth-first traversal)-- it does it for the first level but not subsequent levels. I fixed that with the new PR #388 , and now it behaves correctly (and remains very efficient I think).

@chris-ramon
Copy link
Member

Thanks a lot guys! 👍 — Closing this one in favor of #388

chris-ramon added a commit that referenced this pull request Sep 10, 2018
Breadth-first-traversal for PR #367 (allows for thunk-based dataloader batching)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants