-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow full ecto functionality #25
Comments
The built in loading function does a bit more than I think you realize. Suppose you have a
For truly arbitrary stuff though you only need the
Then you use this via:
The Avoiding ecto queries outside the context module is a central design principle of Dataloader, which I can talk more about if you want. |
Ok, that might be a difference. Ideally you'd like to (lazy) load data within the context. That's not really practical with the current implementation, but my PR is a step into that direction. Interested to hear your thoughts, but I don't really agree with the premise. IMO the dataloader should work at a Repo level, and the contexts should leverage dataloader as tool on top of Ecto or another source, to efficiently load data in terms of IO. So batching, caching and paralyzing. The GraphQL API layer should just call the context, and the context is responsible for leveraging dataloader (or not). What you gain is that every part of your app will load data efficiently. Additionally every dataloading that happens in the business logic layer that does not directly map to graphql fields (like authorization) can now also be efficiently loaded (these things can make the current dataloader pretty worthless in the current form). I agree that queries should not happen outside of contexts. It would be nice if we have the ability to load any queryable even though they might not batch (it's not only a batching library!). One benefit for instance, is that it potentially would be possible to cache direct id lookups if they already appeared in a list for instance (not yet implemented, but would be quite trivial). If we cannot run the list query (might be a untypical query) through dataloader, we cannot do that. We can do |
That's an interesting take, and I'll have to think about it more. My concern is that you'd either have an enormous proliferation of API functions from the context so you could grab stuff from GraphQL, or you'd simply be duplicating the dataloader API and then you're just using dataloader from the resolvers anyway. Setting aside questions of how Dataloader should be used for a moment though, how do you feel that Dataloader itself prevents you from doing what you want? The built in Ecto source takes the stance I've articulated for sure, but you could simply write another source that works differently. The built in Ecto source can't and won't be going undergoing a conceptual rewrite any time soon, at a minimum the book depends on its current functionality. The point of using a protocol however is to let other folks do what they want, and I think that there are a wide variety of options available. |
So the issue above can be implemented in the Ecto source as is. Why not add more flexibility? As for having multiple Ecto sources and these being contexts, that's a design decision but it doesn't really matter too much as you can just create a single As for the dataloader in the current form, yes it severely limits what I can do. I'm not married to the approach. Dataloader now is nice if your database maps exactly to your graphql schema and your context does not have logic that needs to reach out to the database like authorization logic. It's not so much an interesting take to do the dataloading in the business logic (context), but it is the intended purpose of dataloader (the facebook version). I don't get how this approach will result in a proliferation of API functions. The resolver will just call If you use the dataloader in the Anyway to have the ability to use dataloader in contexts, it just needs an abstraction on top of the current form of Dataloader, like the |
If you don't think the |
I don't understand how the context "grabs stuff" from graphql. I think you are referring to re-using the dataloader state across context functions. |
I was hoping you would offer specifics about what you find lacking in the
Correct me if I'm wrong but the Facebook dataloader exclusively does batched key value lookups, so we're already pretty far afield from its functionality. Let me take a step back and articulate the problem I was trying to solve with Dataloader. Dataloader exists to solve not just a technical set of problems, but also some conceptual and structural problems that arise in projects. The ProblemPhoenix Contexts forced Absinthe to re-think data access. The most important characteristic of a context in my view is that it has the responsibility of enforcing your business rules with respect to the data and functionality in its charge. This is I think most important for mutations to that data, but it applies to queries as well. However, this poses a problem for both Absinthe and even regular controllers. For Absinthe, the DataloaderCentral to the Dataloader implementation then was the idea that all of the actual data retrieval and access would happen within the context code. The context would have both full access to all of the data in its domain, as well as the ability to apply restrictions and limit access patterns for any query that came through Dataloader. The exact manner by which this happens of course depends on the Dataloader source, and I very explicitly didn't want Dataloader to be Ecto specific. If you have context A and context B, and A can call a function in B and pass in an Ecto query that relates to B's schemas, then we're violating that control. The whole point is to prevent arbitrary SQL access at arbitrary points in the application. If A wants to use Dataloader within itself to load things, and you want to do so with Ecto query inputs then that seems fine, but at this point we're just basically re-creating Ecto.Repo with a different API. |
I agree with all of the above. But I don't think you get my point. The above explains how you design your modules, agree 100%. However if you use a module you shouldn't be exposed to dataloader at all. Using dataloader within a context should just be an implementation detail inside of contexts, as a consumer of a context (for instance for a GraphQL API layer, I shouldn't have to know or care about it. The point is that in the current implementation if I use the dataloader inside a function in a context, for instance In the current implementation the dataloader is exposed in the graphql layer (should need to be IMO), the graphql layer should just translate a request for something into the right call to the internal API (context). Much like controllers (you call the context, you aren't exposed to Ecto). How can we do that in Elixir when we execute eagerly and don't have a concept like ticks (the javascript dataloader implementation uses a hack where it waits 1 tick before executing queries so it can batch queries that happen at the same time)? Well we need to have lazy evaluation. I did this by implementing a Deferrable protocol. A function can return a Deferrabla then when I want to evaluate this Deferrable I need to do that explicitly. The graphql layer can evaluate these deferables so that all queries are batched/cached etc efficiently. With this solution the Dataloader just becomes an implementation detail in a context. The only thing we need is a simple middleware that knows how to handle deferred values. As for the concrete example it is impossible with the current dataloader implementation to do authorization of fields (for each posts in a list of posts) efficiently whereby you need to query associations. Preloads can be a solution, but that only works for simple cases, and is not ideal. I think this is something that is needed in almost any non-trivial app, so not a very contrived example. |
As per above, it's not really about Source. However it would be nice to make Dataloader.Ecto feature complete so we can run any query (small change).
If you use Facebook's javascript dataloader in the context/model whatever the name where the business logic is it will batch repeat calls, like this:
That only results in a single query. Now if I use dataloader in my elixir get context function it will not batch at al. It will only batch within get. So that is the functionality that is lacking now. And in my opinion it's the most important feature. If the These queries related to authorization cannot be batched currently, but they are the largest source for n+1 issues because it's usually multiple queries per resource. |
Not showing deleted posts is really the job of the function that returns a list of posts in the context IMO, like if I run:
But I might be interested in the deletions in some cases:
The dataloader should query whatever the query is inside of those functions. If we have more complex rules like a deleted field in associations, IMO it's better to have these in a separate function.
I don't think you should use I agree that the problem is not unique to graphql or controllers or however you'd like to talk to the business logic. So that shows that the solution to this should be in the business logic. |
These are a lot of great ideas and I think I better understand your position now. In essence, you want Dataloader to be an implementation detail of the context, instead of an explicit part of the API of the context. Once it's an implementation detail of the context then we should feel free to pass in any argument value (like queries) that we want. I actually feel like we're closer to being on the same page than may be obvious. Dataloader started off as "how can I make an API where the actual data retrieval mechanism is an implementation detail". Deferrables are trying to say "how can I make even using dataloader an implementation detail". I'm not entirely convinced however that deferrables aren't exactly what dataloader is, just with guarantees around the API to try to actually get batching. Here are my questions:
To expand on #2, I'm unclear on how batching actually happens. Maybe I need to simply dig into your implementation of the Deferrables protocol.
This seems seems entirely possible to me, so maybe working through this example will help further this discussion. Can you expand on this situation a bit so I can sketch a "concrete" solution? |
Nice. I'll be speaking at a meetup about this. So I can work on explaining this in a bit more detail and work on the implementation. Let me work out the example in code, so you'll have a better idea why it's not possible with the current dataloader (or there might be a solution for it, I'll doubt it would be a good solution but maybe I'll just don't see it😅). I will also work out the example using the proposed solution of the PR.
it's the acting user (a name convention we use in our codebase)
I'll write an example in code but basically:
|
Dataloader is indeed 90% there. Deferrables solve the last 10%. But let me work it out in code so it's clearer to see. |
Any progress on this front ? Pretty interested in this topic and curious to see where it leads eventually. @benwilson512 did you have a chance to watch @jfrolich talk ? |
Hey @jfrolich @tlvenn I just wanted to provide an update. The last 6 months have been overwhelmingly busy for me both with Cargosense stuff as well as personally. Fortunately we've made a lot of progress on Absinthe 1.5 and hope to release it in the week after ElixirConf. At that point I look forward to focusing on this question. FWIW we've run into this too. It is not possible with the current Dataloader functionality to use it within a context function, which makes it very difficult to have context functions that take on complex business logic while still loading efficiently. I'm very interested in your deferred execution approach. I'm also curious about what could be done with a multiprocess approach. I apologize for letting this languish, but I am hopeful that I will be able to return to it soon. |
@benwilson512: Same with my startup as well, so I also left it in an incomplete state. Looking forward working on a good solution/API for this! Looking forward to your talk! |
@benwilson512 do you have thoughts on these kind of these kinds of loading issues in 2021? Is there a good answer to this with the current iterations of Absinthe / Dataloader? |
Hi @eprothro , this is a pretty old topic, what kind of issues are you trying to solve that the current Dataloader does not currently support ? |
Hi @tlvenn I don't have a concrete issue. I'm evaluating implementing dataloader on a project and was curious about usage from context functions and more complex authorization scenarios. Ben and Jaap mentioned those here, so was curious of their take today and what patterns they're using for the concepts raised in this issue. |
Now we can just load by
id
or a named association. It would be nice if we can support the full ecto functionality by having the ability to load any queryable.We cannot do batching on queryables, so that benefit is gone, but we can execute the queries in parallel, reducing waterfalls and apply simple caching. Any thoughts?
The text was updated successfully, but these errors were encountered: