GraphQL::Dataloader, built-in batching system #2483

rmosolgo · 2019-09-18T13:52:16Z

Projects like https://github.com/Shopify/graphql-batch and https://github.com/exaspark/batch-loader have proven the value of batch loading in GraphQL. In fact, you really can't run a production GraphQL system without batching.

For this reason, I want to include a batching system in GraphQL-Ruby (without breaking compat with existing systems, of course!) Here are some goals for this system:

Feature parity with GraphQL-Batch
First-class support for pushing IO to a background thread
Traceable -- include graphql context with loads, so that a developer can see what GraphQL fields used which loaders.
Good built-in defaults (eg, ActiveRecord, Redis, HTTP resources ... others?)
Well-documented, low-friction custom loaders

If anyone has other suggestions for a built-in dataloader, please share them!

TODO

rmosolgo · 2019-09-18T13:53:43Z

@panthomakos, I'd love to get your feedback on the goals discussed here. This is where I want to take inspiration your work from #1981 😊 , so please let me know if I've overlooked any of your goals and accomplishments from that branch.

eapache · 2019-09-18T13:58:33Z

One of our requirements at Shopify is that we make very heavy use of https://github.com/Shopify/graphql-metrics/blob/master/lib/graphql_metrics/timed_batch_executor.rb, so we'll need some sort of similar hook where we can collect performance data.

lib/graphql/dataloader/background_loader.rb

rmosolgo · 2019-09-18T14:34:18Z

👌 Thanks for the reference there, @eapache 👀 I'll keep it in mind!

lib/graphql/dataloader/loader.rb

chrisbutcher · 2019-09-19T14:17:58Z

Traceable -- include graphql context with loads, so that a developer can see what GraphQL fields used which loaders.

This would be amazing. With the existing graphql-ruby + graphql-batch, I couldn't find an obvious or clean way to, for example, attribute time spent batch loading a given field with a given ast_node.

With the existing executor / lazy loading implementation, I suppose this is true of any field resolvers that return promises, but I wonder if batch loader perform start / end hook methods that can read/write context would help?

rmosolgo · 2019-09-19T14:31:13Z

read/write context

Yeah, I think that's part of the ticket: right now, loaders (ours, anyway) throw out graphql context (including current field, path, etc) and operate independently of it. I'd love to find a way that keeps the low-overhead api of loaders but adds context-awareness. Maybe we could tack on the context info after the application's resolver (eg, if it returns a promise, tack context onto the promise) to make it easy.

sfcgeorge · 2019-09-20T14:44:34Z

I found the current ones a bit verbose so implemented a nice little DSL that looks like this:

def self.authorized?(timeslot, context)
  return false unless super
  return false unless context.current_instructor

  chain_load(timeslot).offer.outlet.instructors.then do |_offer, _outlet, instructors|
    instructors.include?(context.current_instructor)
  end
end

So you can load a chain of one-to-one relationships and optionally a one-to-many on the end, and access any of the models along the way in the block.

https://gist.github.com/sfcgeorge/e067822f174d42175fec0f2264fe399e

rmosolgo · 2019-09-24T15:21:56Z

Nice, thanks for sharing! We have some similar shortcuts in github/github. I like the approach of adding methods like chain_load, and that might be an option for sneaking in context without making the user-facing API to burdensome.

panthomakos

This is exciting work. Sorry it has taken me so long to respond. I have a few questions about the concurrent implementation.

panthomakos · 2019-09-25T20:10:36Z

lib/graphql/dataloader/loader.rb

+      end
+
+      def load(value)
+        @promises[value] ||= GraphQL::Execution::Lazy.new do


Is it possible that you will end up with two lazy executions for the same value based on how threads are scheduled? You might consider using a https://ruby-concurrency.github.io/concurrent-ruby/1.1.4/Concurrent/Map.html.

lib/graphql/dataloader/loader.rb

lib/graphql/dataloader/background_loader.rb

panthomakos · 2019-09-25T20:22:57Z

lib/graphql/dataloader/loader.rb

+      def load(value)
+        @promises[value] ||= GraphQL::Execution::Lazy.new do
+          if !@loaded_values.key?(value)
+            sync


In the concurrent example, sync will run in a separate thread. Correct?

If so, then how do you guarantee that @loaded_values[value] will be present on line 26 below?

daemonsy

Hi @rmosolgo I was looking into batch loading for my company, hoping to contribute something and thankfully saw this ❤️. So instead, hoping to contribute by trying to adopt Dataloader for a nascent GraphQL in my company that doesn't do batch loading yet.

Specifically,

Using GraphQL::Dataloader::Loader as a consumer
Contributing to the usage guide on this PR as we did more testing

daemonsy · 2019-09-26T04:06:22Z

lib/graphql/dataloader/loader.rb

+
+      def self.load(context, key, value)
+        dl = context[:dataloader]
+        loader = dl.loaders[self][key]


Took me a while to get it, the way the dl.loaders[self][key] instantiates the loader is really clever 👍 .

So far in initial testing, I've already forgot to use context as the first argument twice 😺. That's actually the mental impedance so far, as I'm not thinking about the context object while trying to write a batch loading statement.

Yes, i think there's got to be some better API for this. But I'd really like to keep dataloading context-aware so that we can trace it as part of the GraphQL request.

daemonsy · 2019-09-26T04:16:53Z

lib/graphql/dataloader/loader.rb

+
+      def initialize(context, key)
+        @context = context
+        @key = key


For the typical use case, where @key is the model, #perform looks a little weird:

class MyLoader < GraphQL::Dataloader::Loader def perform(ids) @key.where(id: ids) end end

Also, what are the thoughts around supporting additional arguments?

In graphql-batch, it was common to have:

RecordLoader.for(Product, :other_id).load(object.other_id)

which gets passed into initializer of the loader. We used it for setting simple where conditions or having a different key as the loader above.

Yes, I suppose it could be more readable like:

class MyLoader < GraphQL::Dataloader::Loader def initialize(context, model) @model = model super end def perform(ids) @model.where(id: ids) end end

This was required for graphql-batch (IIRC) because it didn't store state otherwise. It also didn't require the super call. I wonder how I can remove that boilerplate in this implementation 🤔

lib/graphql/dataloader/loader.rb

rmosolgo · 2020-01-07T16:12:11Z

I'm going to release 1.10 without this in the interest of time. That branch has other big changes on it already, and there's still a lot of work to do here, and I haven't gotten to it. And it isn't essential -- graphql-batch works great and you can build backgrounded IO on top of it AFAIK.

…batch's API for loaders

rmosolgo · 2020-09-25T15:17:45Z

lib/graphql/execution/interpreter/resolve.rb

        def self.resolve(results)
+          # First, kick off any loaders that will resolve in background threads
+          Dataloader.current && Dataloader.current.process_async_loader_queue


This isn't exactly subtle, but after a lot of attempts, I couldn't find a better way to work in a "kick off" step into the existing execution flow. This is very similar to the original suggestion, but at the loader level instead of the promise level.

rmosolgo · 2020-09-25T15:18:30Z

lib/graphql/dataloader.rb

+      def current
+        Thread.current[:graphql_dataloader]
+      end
+
+      def current=(dataloader)
+        Thread.current[:graphql_dataloader] = dataloader
+      end


I think this adds the requirement that GraphQL queries be executed within a single thread.

(The alternative would be to use context[:dataloader], which earlier iterations used. But then you're stuck with, how to get that dataloader into each loader, so that the loader can register itself with the dataloader's cache.)

swalkinshaw · 2020-09-25T19:50:06Z

One small issue we've had with batch loaders is dealing with keys (or fetch parameters in your terminology?) that don't have data to fulfill. I think right now it leads to opaque Promise::BrokenError exceptions. Here's an example:

class ProductImageLoader < GraphQL::Batch::Loader
  def initialize(shop_id)
    @shop_id = shop_id
  end

  def perform(product_ids)
    Images
      .where(shop_id: @shop_id, product_id: product_ids)
      .group_by(&:product_id)
      .each { |product_id, images| fulfill(product_id, images) }
  end
end

If no images exist for a product id, it won't be fulfilled leading to that error. We have two common solutions:

manually fulfill all unfulfilled keys

class ProductImageLoader < GraphQL::Batch::Loader
  def initialize(shop_id)
    @shop_id = shop_id
  end

  def perform(product_ids)
    Images
      .where(shop_id: @shop_id, product_id: product_ids)
      .group_by(&:product_id)
      .each { |product_id, images| fulfill(product_id, images) }

    product_ids.each { |id| fulfill(id, nil) unless fulfilled?(id) } # nil here, but any "default" value works
  end
end

iterate over the keys/fetch parameters instead of the data

class ProductImageLoader < GraphQL::Batch::Loader
  def initialize(shop_id)
    @shop_id = shop_id
  end

  def perform(product_ids)
    images = Images
      .where(shop_id: @shop_id, product_id: product_ids)
      .group_by(&:product_id)

    product_ids.each do |id|
      fulfill(id, images[id])
    end
  end
end

I've had the idea before that we could have a better interface to enforce/prevent this situation. A dataloader could declare a default value explicitly. If set, the loader could automatically fulfill all missing keys with it?

But thinking more about this, perform has a fairly strict requirement that fulfill gets called for each of its fetch parameters, yet we have to manually do that work which is error prone (as seen above). I wonder if that's a better interface we could give people instead 🤔 I'll give it more thought

rmosolgo · 2020-09-25T20:12:51Z

in your terminology?

I started updating those docs and realize I didn't have a good word for those different kinds of keys. Now I see that Batch::Loader uses group_args and keys, seems good too. I just want to pick something that makes their usage clear.

Also, I'm torn between doubling down on the terms from graphql-batch and batch-loader, or picking new words to make it more googleable. Oh, and avoiding "-er" classes (http://wiki.c2.com/?DontCreateVerbClasses).

declare a default value explicitly

Yeah, I could see that, something like

unfulfilled_default nil

Then the library could basically do

if self.class.set_unfulfilled_default?
  keys_to_load.each { |key| fulfill(key, self.class.unfulfilled_default) unless fulfilled?(key) }
end

But probably only if unfulfilled_default was explicitly set, otherwise we'd raise an error of some kind. (Because I don't think we want it to silently ignore unfulfilled keys, that's important feedback to the application behavior.)

gaffney · 2020-09-27T19:10:33Z

Hey @rmosolgo thanks for the awesome library and hard work; we are currently using this in production without any issues.

If anyone has other suggestions for a built-in dataloader, please share them!

Our team decided to go with exAspArk/batch-loader due to its generic nature / the fact that it is not tied to GraphQL. We have plenty of external REST API calls we wanted to batch in addition to GraphQL fields.

batch-loader works great for us, but it is a little painful integrating with graphql-ruby, as illustrated by the graphql-ruby example in the README:

To avoid this problem, all we have to do is to change the resolver to return BatchLoader::GraphQL (#32 explains why not just BatchLoader):

I found that there were several issues around this and very long threads in 2018, but unfortunately this is where we landed:

I suggested a few other potentially more flexible solutions for graphql-ruby to detect lazy objects such as duck typing or using explicit arguments. But it looks like it won't be implemented. To fix the issue BatchLoader started wrapping BatchLoader objects with PORO (plain old ruby objects) by using graphql-ruby instrumentation.

Since you are hard at work at a major refactor I was wondering if you could consider revisiting some of the solutions proposed by exAspArk to make for a more seamless integration... or any alternative to avoid BatchLoader::GraphQL.for. The current batching workaround makes testing painful and is inherently inextensible.

rmosolgo · 2020-09-28T14:14:29Z

Thanks for sharing that discussion, @gaffney. Unfortunately, this refactor doesn't touch the underlying behavior where GraphQL-Ruby detects lazy values by calling value.class. Last I checked, Batch-Loader objects implement #class by delegating to the batch-loaded object (instead of returning BatchLoader), so it just doesn't work.

Interestingly, the original suggestion in those issues was to add a lazy: true configuration to field(...). That would be possible now, something like:

class BaseField < GraphQL::Schema::Field 
  # When `lazy: true` is given, add a field extension to wrap the returned value of this field
  def initialize(*args, **kwargs, lazy: false, &block)
    if lazy 
      extensions = kwargs[:extensions] ||= [] 
      extensions << BatchLoaderExtension 
    end 
    super
  end 
end 

# When this extension is added, the field was configured with `lazy: true`, so apply a wrapper so 
# GraphQL-Ruby can identify the lazy object. 
class BatchLoaderExtension 
  def resolve(object:, arguments:, **_rest)
    # call normal field execution 
    return_value = yield(object, arguments)
    # apply a wrapper and return it, TODO is `.wrap` the correct method here? Not exactly sure. 
    BatchLoader::GraphQL.wrap(return_value)
  end 
end

Anyways, just a thought after reviewing that code for the first time in a while. Interestingly, a recent Ruby version added Object#then which I could imagine using as the basis for a duck-typing approach to batching and lazy evaluation. But that's another matter than what's in the works here 🍻 .

rmosolgo · 2021-01-06T22:15:33Z

I ended up going a very different direction on this: #3264

rmosolgo added this to the 1.10.0 milestone Sep 18, 2019

rmosolgo self-assigned this Sep 18, 2019

rmosolgo commented Sep 18, 2019

View reviewed changes

lib/graphql/dataloader/background_loader.rb Outdated Show resolved Hide resolved

eapache reviewed Sep 18, 2019

View reviewed changes

lib/graphql/dataloader/loader.rb Outdated Show resolved Hide resolved

panthomakos reviewed Sep 25, 2019

View reviewed changes

javiercr mentioned this pull request Sep 25, 2019

Subscriptions with ActionCable do not work with new Interpeter #2495

Closed

daemonsy reviewed Sep 26, 2019

View reviewed changes

rmosolgo mentioned this pull request Sep 27, 2019

Release 1.10.0 #2100

Closed

14 tasks

rmosolgo mentioned this pull request Oct 23, 2019

Unknown overhead #1416

Closed

swalkinshaw reviewed Nov 7, 2019

View reviewed changes

lib/graphql/dataloader/loader.rb Outdated Show resolved Hide resolved

rmosolgo removed this from the 1.10.0 milestone Jan 7, 2020

rmosolgo mentioned this pull request Aug 1, 2020

Release 1.12.0 #3056

Closed

33 tasks

rmosolgo and others added 8 commits August 3, 2020 10:18

Add basic dataloader and intro doc

efeadcf

remove useless variable

5934e16

add specs for shared loading scope; remove unnecessary API surface area

f920930

Test for not batching across mutations

5811b74

Add graphql-batch's graphql_spec to check compatibility; use graphql-…

379c36b

…batch's API for loaders

Add failing spec for nested loader behavior

c45aa91

Hack and hack until the nested load test passes

7eed58a

update doc

22e747c

rmosolgo force-pushed the dataloader branch from 2664799 to 22e747c Compare August 3, 2020 14:18

Update some docs and code

0a34189

rmosolgo added 5 commits September 24, 2020 16:09

Use a promise cache and a key queue

e0a3277

Document the bug

2acff6e

Add a resolution step that kicks off any background loaders

79351a8

remove old doc

9c90c31

Remove unused recursive: argument

39c732c

rmosolgo commented Sep 25, 2020

View reviewed changes

rmosolgo added 7 commits September 25, 2020 11:30

Use Concurrent::Map for shared caches

a1c2c4d

remove unused method

bbf32de

Rename Loader => Source

42cfad8

Add code docs

e731a96

Update guides

6aeb214

Add some example loaders

01d545c

Fix lint error

554861b

Add more example loaders

8b259e4

rmosolgo added this to the 1.12.0 milestone Sep 28, 2020

rmosolgo added 2 commits December 22, 2020 12:13

Merge branch '1.12-dev' into dataloader

d8f4704

Add tests for built-in sources

75416dd

rmosolgo changed the base branch from master to 1.12-dev December 22, 2020 21:26

rmosolgo added 4 commits December 22, 2020 16:36

Skip dataloader AR tests on Rails 3

5c44700

Update Preloader usage for Rails 6.2

79bdf96

Update docs, move classes to their own files

413bcac

Some updates for graphql-batch compatibility

3d9ac9d

rmosolgo mentioned this pull request Dec 27, 2020

Add fiber-based batch loading API #3264

Merged

18 tasks

rmosolgo closed this Jan 6, 2021

rmosolgo deleted the dataloader branch January 6, 2021 22:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GraphQL::Dataloader, built-in batching system #2483

GraphQL::Dataloader, built-in batching system #2483

rmosolgo commented Sep 18, 2019 •

edited

Loading

rmosolgo commented Sep 18, 2019

eapache commented Sep 18, 2019

rmosolgo commented Sep 18, 2019

chrisbutcher commented Sep 19, 2019 •

edited

Loading

rmosolgo commented Sep 19, 2019

sfcgeorge commented Sep 20, 2019

rmosolgo commented Sep 24, 2019

panthomakos left a comment

panthomakos Sep 25, 2019

panthomakos Sep 25, 2019

daemonsy left a comment

daemonsy Sep 26, 2019

rmosolgo Nov 7, 2019

daemonsy Sep 26, 2019

rmosolgo Nov 7, 2019

rmosolgo commented Jan 7, 2020

rmosolgo Sep 25, 2020

rmosolgo Sep 25, 2020

rmosolgo Sep 25, 2020

swalkinshaw commented Sep 25, 2020

rmosolgo commented Sep 25, 2020

gaffney commented Sep 27, 2020

rmosolgo commented Sep 28, 2020

rmosolgo commented Jan 6, 2021

GraphQL::Dataloader, built-in batching system #2483

GraphQL::Dataloader, built-in batching system #2483

Conversation

rmosolgo commented Sep 18, 2019 • edited Loading

rmosolgo commented Sep 18, 2019

eapache commented Sep 18, 2019

rmosolgo commented Sep 18, 2019

chrisbutcher commented Sep 19, 2019 • edited Loading

rmosolgo commented Sep 19, 2019

sfcgeorge commented Sep 20, 2019

rmosolgo commented Sep 24, 2019

panthomakos left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

daemonsy left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rmosolgo commented Jan 7, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

swalkinshaw commented Sep 25, 2020

rmosolgo commented Sep 25, 2020

gaffney commented Sep 27, 2020

rmosolgo commented Sep 28, 2020

rmosolgo commented Jan 6, 2021

rmosolgo commented Sep 18, 2019 •

edited

Loading

chrisbutcher commented Sep 19, 2019 •

edited

Loading