Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use GraphQL::Dataloader breaks RequestStore usage #3449

Closed
gopeter opened this issue Apr 27, 2021 · 13 comments
Closed

use GraphQL::Dataloader breaks RequestStore usage #3449

gopeter opened this issue Apr 27, 2021 · 13 comments

Comments

@gopeter
Copy link
Contributor

gopeter commented Apr 27, 2021

Describe the bug

I'm using the gem acts_as_tenant which uses RequestStore to store global values (like the current tenant). But adding use GraphQL::Dataloader to the schema breaks this, because ActsAsTenant.current_tenant is not defined then:

module Types
  class Query < BaseObject
    field :catalog, CatalogType, null: true do
      argument :id, ID, required: true
    end

    def catalog(id:)
      puts "###########"
      puts ActsAsTenant.current_tenant
      puts "###########"

      Catalog.find!(id)
    end
  end
end

returns just two rows of ### 😄

I don't know if this is something that has to be done in the RequestStore gem? Or a bug in graphql-ruby itself? Or are both libraries just incompatible and there is no way to resolve this?

I am not sure how Dataloader works, but since it's somewhat coupled to Fiber (?) it maybe have to do with how RequestStore stores its data: in Thread.current objects.

Versions

graphql version: 1.12.8
graphql-pro version: 1.17.14
rails (or other framework): 6.0.3

@rmosolgo
Copy link
Owner

Hey, thanks for the detailed report.

it maybe have to do with how RequestStore stores its data: in Thread.current objects.

Yes, I bet that's exactly it. When Dataloader is enabled, all GraphQL execution takes place inside a Fiber. As you noticed, Thread.current is empty inside a new Fiber. (This caveat is also mentioned in the docs)

My first recommendation is to use context for those values during GraphQL execution. For example, add

context = {
  current_tenant: ActsAsTenant.current_tenant
  # ...
}

MySchema.execute(query_str, variables: variables, context: context)

To your GraphQL controller. Then, inside fields that require a current_tenant, you could use ActsAsTenant's block support:

    def catalog(id:)
      ActsAsTenant.with_tenant(context[:current_tenant]) do 
        Catalog.find!(id)
       end 
    end

If a lot of your fields require a current_tenant, you could build a field extension for that purpose. You could even install that extension by default in your base field class. (Take care though -- you might not want to install it on fields that return enums and scalars, otherwise, the overhead might really add up.)

Let me know if the .with_tenant { ... } suggestion above works at all. If it does, then we can be sure that you were right about the Thread.current[...] issue. Then, if you want to investigate an extension-based setup, I'd be happy to look into it with you!

@gopeter
Copy link
Contributor Author

gopeter commented Apr 27, 2021

Thanks for the fast response! Yes, indeed all models are tied to an account, so all fields need to now about the current tenant (Ok, a catalog with a specific id does not need the account id, but connections like catalogs should return just catalogs from the current account – not to mention all of my mutations).

ActsAsTenant.with_tenant(context[:current_tenant]) works fine, but I wonder if there could also be something like ...

class MySchema < GraphQL::Schema
  use GraphQL::Subscriptions::ActionCableSubscriptions
  use GraphQL::Dataloader

  query Types::Query
  mutation Types::Mutation
  subscription Types::Subscription

  self.before_execution
    ActsAsTenant.current_tenant = context[:current_tenant]
  end
end

Would something like this makes sense?

@rmosolgo
Copy link
Owner

Would something like this makes sense?

Something like that might work for now (see "query instrumenters"), but the problem is, with Dataloader, GraphQL-Ruby will spin up new Fibers as it needs them. Those newly-created Fibers won't have the context that they need to use current_tenant properly.

Here's what I recommend:

  1. Make a field extension to give this context to each field:
class CurrentTenantExtension < GraphQL::Schema::FieldExtension 
  # Make sure that `current_tenant` is available for field resolution, then continue with resolution as usual.
  def resolve(object:, arguments:, context:, **_rest)
    ActsAsTenant.current_tenant = context[:current_tenant]
    yield(object, arguments)
  end 
end 

Then, you could add this extension whenever you want, eg:

field :catalog, Types::Catalog, extensions: [CurrentTenantExtension]
  1. That might get a bit tedious, so you can also extend the base field class to install that extension by default:
class BaseField < GraphQL::Schema::Field 
  def initialize(*args, **kwargs, &block)
    super 
    if type.unwrap.kind.composite? # This is true for Object, Interface, and Union types 
      extension(CurrentTenantExtension)
    end 
  end
end 

If there are scalar or enum fields that also require that context, you could add the extension as-needed to those fields, or remove the if ... condition I added above.

  1. Pass the tenant to your Dataloader calls. Dataloaders run in different Fibers than execution (so that they can load data while execution is paused), so they'll need the tenant explicitly provided. For example:
class ActiveRecordSource < GraphQL::Dataloader::Source
  def initialize(tenant, model_class)
    @tenant = tenant 
    @model_class = model_class
  end 
  
  def fetch(ids)
    ActsAsTenant.with_tenant(@tenant) do 
      @model_class.where(id: ids)
    end 
  end 
end 

Then, in fields, you'd pass the tenant when loading objects with dataloader:

def catalog(id:)
  dataloader.with(ActsAsTenant.current_tenant, Catalog).load(id)
end 

(You could probably extract the tenant handling as you see fit -- I think you could even assign @tenant = ActsAsTenant.current_tenant in def initializeabove, since that'd be run in the same Fiber as def catalog.)

What do you think of an approach like that? Let me know how it goes if you give it a try!

@gopeter
Copy link
Contributor Author

gopeter commented Apr 27, 2021

Holy moly, thanks again! I'll have a look at this later, since I've just hacked a bit around at the same time and was happy that my misuse of a tracer worked 😄

class CurrentTenant
  def self.use(schema_defn, options = {})
    tracer = new(**options)
    schema_defn.instrument(:field, tracer) unless schema_defn.is_a?(Class)
    schema_defn.tracer(tracer)
  end

  def trace(key, data)
    ActsAsTenant.current_tenant = data[:context][:current_tenant] if data[:context] && data[:context][:current_tenant]

    yield
  end
end

Something like that might work for now (see "query instrumenters"), but the problem is, with Dataloader, GraphQL-Ruby will spin up new Fibers as it needs them. Those newly-created Fibers won't have the context that they need to use current_tenant properly.

Hmmm, is there a way to reproduce this? Am I able to spin up more Fibers by ... creating a very large in a test case, for example?

@gopeter
Copy link
Contributor Author

gopeter commented Apr 27, 2021

Ok, great, I've just adopted your solution and it works fine!

Thanks for your fast support!

@gopeter gopeter closed this as completed Apr 27, 2021
@gopeter
Copy link
Contributor Author

gopeter commented Apr 27, 2021

@rmosolgo I just wanted to add that using Dataloader breaks everything that relies on the current_user. I've added PaperTrail.request.whodunnit = context[:current_user].id to the CurrentTenantExtension too to make paper_trail work again. Maybe this should be added to the docs as well? 😄

@rmosolgo
Copy link
Owner

What's current_user? Is that also a Thread.current-backed method?

@gopeter
Copy link
Contributor Author

gopeter commented Apr 28, 2021

current_user is something that devise defines (I'm not sure where it stores this information) and which is used by many other Gems too.

@rmosolgo
Copy link
Owner

The only definition I could find for current_user in devise is this Controller helper: https://github.com/heartcombo/devise/blob/5d5636f03ac19e8188d99c044d4b5e90124313af/lib/devise/controllers/helpers.rb#L98-L139

Could you check where your current_user method comes from by something like:

p method(:current_user).source_location

?

(I can't figure out how a controller helper would have ever worked inside GraphQL!)

@gopeter
Copy link
Contributor Author

gopeter commented Apr 28, 2021

Hmmm this one does indeed not work inside any of my GraphQL code. But this flow worked before adding the use Dataloader 🤔

  • Install and setup devise and paper_trail
  • Set before_action :set_paper_trail_whodunnit in the GraphQL controller (which uses the current_user internally)
  • Add some mutations where models can be created or updated
  • Et voilá: the newly added model has the right "whodunit" information in paper_trail's versions table

@ianks
Copy link
Contributor

ianks commented Apr 30, 2021

This also breaks Rack::MiniRacer for us. Unfortunately, this seems like a deal breaker for use. Do you think it would make sense to copy over thread locals to the fiber in this case? it seems like that would be the least disruptive policy, although not ideal

@rmosolgo
Copy link
Owner

I'm open to it, for sure, I think you could do it wherever Dataloader calls Fiber.new, something like:

fiber_locals = {} 
Thread.current.keys.each do |fiber_var_key|
  fiber_locals[fiber_var_key] = Thread.current[fiber_var_key]
end 

Fiber.new do 
  fiber_locals.each { |k, v| Thread.current[k] = v }
  # ...
end 

Is that about what you have in mind? I'm game to give it a shot if it seems reasonable to you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants