-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GraphQL::Dataloader, built-in batching system #2483
Conversation
@panthomakos, I'd love to get your feedback on the goals discussed here. This is where I want to take inspiration your work from #1981 😊 , so please let me know if I've overlooked any of your goals and accomplishments from that branch. |
One of our requirements at Shopify is that we make very heavy use of https://github.com/Shopify/graphql-metrics/blob/master/lib/graphql_metrics/timed_batch_executor.rb, so we'll need some sort of similar hook where we can collect performance data. |
👌 Thanks for the reference there, @eapache 👀 I'll keep it in mind! |
This would be amazing. With the existing graphql-ruby + graphql-batch, I couldn't find an obvious or clean way to, for example, attribute time spent batch loading a given field with a given With the existing executor / lazy loading implementation, I suppose this is true of any field resolvers that return promises, but I wonder if batch loader perform start / end hook methods that can read/write context would help? |
Yeah, I think that's part of the ticket: right now, loaders (ours, anyway) throw out graphql context (including current field, path, etc) and operate independently of it. I'd love to find a way that keeps the low-overhead api of loaders but adds context-awareness. Maybe we could tack on the context info after the application's resolver (eg, if it returns a promise, tack context onto the promise) to make it easy. |
I found the current ones a bit verbose so implemented a nice little DSL that looks like this: def self.authorized?(timeslot, context)
return false unless super
return false unless context.current_instructor
chain_load(timeslot).offer.outlet.instructors.then do |_offer, _outlet, instructors|
instructors.include?(context.current_instructor)
end
end So you can load a chain of one-to-one relationships and optionally a one-to-many on the end, and access any of the models along the way in the block. https://gist.github.com/sfcgeorge/e067822f174d42175fec0f2264fe399e |
Nice, thanks for sharing! We have some similar shortcuts in github/github. I like the approach of adding methods like |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is exciting work. Sorry it has taken me so long to respond. I have a few questions about the concurrent implementation.
lib/graphql/dataloader/loader.rb
Outdated
end | ||
|
||
def load(value) | ||
@promises[value] ||= GraphQL::Execution::Lazy.new do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible that you will end up with two lazy executions for the same value based on how threads are scheduled? You might consider using a https://ruby-concurrency.github.io/concurrent-ruby/1.1.4/Concurrent/Map.html.
lib/graphql/dataloader/loader.rb
Outdated
def load(value) | ||
@promises[value] ||= GraphQL::Execution::Lazy.new do | ||
if !@loaded_values.key?(value) | ||
sync |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the concurrent example, sync
will run in a separate thread. Correct?
If so, then how do you guarantee that @loaded_values[value]
will be present on line 26 below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @rmosolgo I was looking into batch loading for my company, hoping to contribute something and thankfully saw this ❤️. So instead, hoping to contribute by trying to adopt Dataloader for a nascent GraphQL in my company that doesn't do batch loading yet.
Specifically,
- Using
GraphQL::Dataloader::Loader
as a consumer - Contributing to the usage guide on this PR as we did more testing
lib/graphql/dataloader/loader.rb
Outdated
|
||
def self.load(context, key, value) | ||
dl = context[:dataloader] | ||
loader = dl.loaders[self][key] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Took me a while to get it, the way the dl.loaders[self][key]
instantiates the loader is really clever 👍 .
So far in initial testing, I've already forgot to use context
as the first argument twice 😺. That's actually the mental impedance so far, as I'm not thinking about the context
object while trying to write a batch loading statement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, i think there's got to be some better API for this. But I'd really like to keep dataloading context
-aware so that we can trace it as part of the GraphQL request.
lib/graphql/dataloader/loader.rb
Outdated
|
||
def initialize(context, key) | ||
@context = context | ||
@key = key |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the typical use case, where @key
is the model, #perform
looks a little weird:
class MyLoader < GraphQL::Dataloader::Loader
def perform(ids)
@key.where(id: ids)
end
end
Also, what are the thoughts around supporting additional arguments?
In graphql-batch
, it was common to have:
RecordLoader.for(Product, :other_id).load(object.other_id)
which gets passed into initializer of the loader. We used it for setting simple where
conditions or having a different key as the loader above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I suppose it could be more readable like:
class MyLoader < GraphQL::Dataloader::Loader
def initialize(context, model)
@model = model
super
end
def perform(ids)
@model.where(id: ids)
end
end
This was required for graphql-batch (IIRC) because it didn't store state otherwise. It also didn't require the super
call. I wonder how I can remove that boilerplate in this implementation 🤔
I'm going to release 1.10 without this in the interest of time. That branch has other big changes on it already, and there's still a lot of work to do here, and I haven't gotten to it. And it isn't essential -- graphql-batch works great and you can build backgrounded IO on top of it AFAIK. |
…batch's API for loaders
def self.resolve(results) | ||
# First, kick off any loaders that will resolve in background threads | ||
Dataloader.current && Dataloader.current.process_async_loader_queue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't exactly subtle, but after a lot of attempts, I couldn't find a better way to work in a "kick off" step into the existing execution flow. This is very similar to the original suggestion, but at the loader level instead of the promise level.
lib/graphql/dataloader.rb
Outdated
def current | ||
Thread.current[:graphql_dataloader] | ||
end | ||
|
||
def current=(dataloader) | ||
Thread.current[:graphql_dataloader] = dataloader | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this adds the requirement that GraphQL queries be executed within a single thread.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(The alternative would be to use context[:dataloader]
, which earlier iterations used. But then you're stuck with, how to get that dataloader
into each loader, so that the loader can register itself with the dataloader's cache.)
One small issue we've had with batch loaders is dealing with keys (or fetch parameters in your terminology?) that don't have data to fulfill. I think right now it leads to opaque class ProductImageLoader < GraphQL::Batch::Loader
def initialize(shop_id)
@shop_id = shop_id
end
def perform(product_ids)
Images
.where(shop_id: @shop_id, product_id: product_ids)
.group_by(&:product_id)
.each { |product_id, images| fulfill(product_id, images) }
end
end If no images exist for a product id, it won't be fulfilled leading to that error. We have two common solutions:
class ProductImageLoader < GraphQL::Batch::Loader
def initialize(shop_id)
@shop_id = shop_id
end
def perform(product_ids)
Images
.where(shop_id: @shop_id, product_id: product_ids)
.group_by(&:product_id)
.each { |product_id, images| fulfill(product_id, images) }
product_ids.each { |id| fulfill(id, nil) unless fulfilled?(id) } # nil here, but any "default" value works
end
end
class ProductImageLoader < GraphQL::Batch::Loader
def initialize(shop_id)
@shop_id = shop_id
end
def perform(product_ids)
images = Images
.where(shop_id: @shop_id, product_id: product_ids)
.group_by(&:product_id)
product_ids.each do |id|
fulfill(id, images[id])
end
end
end I've had the idea before that we could have a better interface to enforce/prevent this situation. A dataloader could declare a default value explicitly. If set, the loader could automatically fulfill all missing keys with it? But thinking more about this, |
I started updating those docs and realize I didn't have a good word for those different kinds of keys. Now I see that Batch::Loader uses Also, I'm torn between doubling down on the terms from graphql-batch and batch-loader, or picking new words to make it more googleable. Oh, and avoiding "-er" classes (http://wiki.c2.com/?DontCreateVerbClasses).
Yeah, I could see that, something like unfulfilled_default nil Then the library could basically do if self.class.set_unfulfilled_default?
keys_to_load.each { |key| fulfill(key, self.class.unfulfilled_default) unless fulfilled?(key) }
end But probably only if |
Hey @rmosolgo thanks for the awesome library and hard work; we are currently using this in production without any issues.
Our team decided to go with exAspArk/batch-loader due to its generic nature / the fact that it is not tied to GraphQL. We have plenty of external REST API calls we wanted to batch in addition to GraphQL fields.
I found that there were several issues around this and very long threads in 2018, but unfortunately this is where we landed:
Since you are hard at work at a major refactor I was wondering if you could consider revisiting some of the solutions proposed by exAspArk to make for a more seamless integration... or any alternative to avoid |
Thanks for sharing that discussion, @gaffney. Unfortunately, this refactor doesn't touch the underlying behavior where GraphQL-Ruby detects lazy values by calling Interestingly, the original suggestion in those issues was to add a class BaseField < GraphQL::Schema::Field
# When `lazy: true` is given, add a field extension to wrap the returned value of this field
def initialize(*args, **kwargs, lazy: false, &block)
if lazy
extensions = kwargs[:extensions] ||= []
extensions << BatchLoaderExtension
end
super
end
end
# When this extension is added, the field was configured with `lazy: true`, so apply a wrapper so
# GraphQL-Ruby can identify the lazy object.
class BatchLoaderExtension
def resolve(object:, arguments:, **_rest)
# call normal field execution
return_value = yield(object, arguments)
# apply a wrapper and return it, TODO is `.wrap` the correct method here? Not exactly sure.
BatchLoader::GraphQL.wrap(return_value)
end
end Anyways, just a thought after reviewing that code for the first time in a while. Interestingly, a recent Ruby version added |
I ended up going a very different direction on this: #3264 |
Projects like https://github.com/Shopify/graphql-batch and https://github.com/exaspark/batch-loader have proven the value of batch loading in GraphQL. In fact, you really can't run a production GraphQL system without batching.
For this reason, I want to include a batching system in GraphQL-Ruby (without breaking compat with existing systems, of course!) Here are some goals for this system:
If anyone has other suggestions for a built-in dataloader, please share them!
TODO
Execution::Lazy
vsDataloader::PendingLoad
vs::Promise
Dataloader
by default? (dependency?)