-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP - Benchmarking validation with a bigger schema and query #1172
Conversation
Now:
|
🍻 10x slower than the slowest previous example 😅 Let's see what we can find. |
Sorry, I should have said first: THANK YOU for such a detailed reproduction of this issue! I know it's a pain to invest time in replicating the bug, but for me it's huge. Now we have a case that we can run some profiling on! |
I added some more profiling and started diving in! The benchmarks are down a bit for me: Before~/code/graphql-ruby $ be rake bench:validate Warming up -------------------------------------- validate - introspection 24.000 i/100ms validate - abstract fragments 53.000 i/100ms validate - abstract fragments 2 31.000 i/100ms validate - hackerone query 2.000 i/100ms Calculating ------------------------------------- validate - introspection 251.626 (± 4.0%) i/s - 1.272k in 5.062746s validate - abstract fragments 526.332 (± 3.0%) i/s - 2.650k in 5.039200s validate - abstract fragments 2 310.111 (± 4.5%) i/s - 1.550k in 5.008831s validate - hackerone query 29.260 (± 3.4%) i/s - 148.000 in 5.067372s After~/code/graphql-ruby $ be rake bench:validate Warming up -------------------------------------- validate - introspection 26.000 i/100ms validate - abstract fragments 57.000 i/100ms validate - abstract fragments 2 33.000 i/100ms validate - hackerone query 3.000 i/100ms Calculating ------------------------------------- validate - introspection 268.893 (± 4.1%) i/s - 1.352k in 5.037168s validate - abstract fragments 568.874 (± 3.7%) i/s - 2.850k in 5.016582s validate - abstract fragments 2 334.139 (± 3.3%) i/s - 1.683k in 5.042201s validate - hackerone query 32.419 (± 3.1%) i/s - 162.000 in 5.004348s But I think the big wins will depend on revisiting how fragment nodes are merged into the contexts where they're spread. Looking at object allocations, that's the big one, also |
This is what stands out to me:
So, I think the next step, looking for a bigger win, is to rethink how fragments are merged into the main operation. For example:
|
I'm trying to wrap my head around the validation and, especially, the internal representation of nodes, fields and how they work together. I think I came pretty far, but I do have some questions 😱
Yeah, for a query like the one in this PR copying the ast_nodes adds up. My initial thought was: why not freeze the ast_nodes so we don't have to duplicate them, but this dream was shattered once I figured out the rewriter modifies the internal node's ast_nodes. That made me think, can't we have an intermediate object that stores pointers to the original AST nodes and modify those pointers while rewriting? We could even create a "real" AST object once the rewriting is done, so we only copy the nodes once.
Are you talking about fields on the same type but scattered over multiple fragments? If so, are you thinking to have a register per type, so we don't have to duplicate field nodes? |
I think that describes the current
yes, I'm not exactly sure how they're handled, and judging by the number of |
Yeah, it's hard to put my finger on it, but we need a world with less The problem as I see it is like this:
Here's an example of the transformation I'm talking about: # $ irb -Ilib
require "graphql"
schema = GraphQL::Schema.from_definition <<-GRAPHQL
type A implements I {
int: Int
int2: Int
}
type B implements I {
int: Int
}
type C implements I {
int: Int
}
interface I {
int: Int
}
type Query {
iface: I
}
GRAPHQL
query = <<-GRAPHQL
query GetStuff {
iface {
int
}
iface {
... on A {
int2
}
}
}
GRAPHQL
puts GraphQL::InternalRepresentation::Print.print(schema, query)
# query GetStuff {
# ... on Query {
# iface {
# ... on A {
# int
# int2
# }
# ... on B {
# int
# }
# ... on C {
# int
# }
# }
# }
# } See how the fields on That's why they're copied, so that we can merge later without worrying about accidental sharing. But maybe that's what needs fixing, some better structure that doesn't require the copy, but ideally, we can still serve the query analyzer API. |
Running the full query:
When removing the
It results in ~50% more iterations per second. With 102 deletions, these fields were a significant part of the original 475 line long query. Which makes me believe that unnecessary duplication of fields (unfortunately) isn't a performance bottleneck. |
Isn't that 79% faster? Seems significant to me 😬
Anyhow, it makes me wonder, I can't think of a good incremental solution, so what could we do if we started again from scratch? Even if we can't replace the current implementation, maybe we'd learn something to carry over. |
We had an offline discussion about this task, let me summarize it for those following this thread. @mvgijssel raised an interesting question: most of queries are not unique, can't we just cache the rewriter's result? Unfortunately, it's not as easy as just caching the rewriter output using a query identifier. Based on an off-line discussion with @rmosolgo, the behavior of directives (like skip and if) will influence the rewriter's outcome. But, caching is a bit of a hack/band-aid. The real problem, still, is the number of Still, not all hope is lost for the caching solution. We can do a two-phase rewriter, in the first pass we the rewriter outputs a cacheable result. In the second pass, we'll rewrite the nodes that depend on directives. Depending if we can split the rewriter, this would be a quick win and significantly improve performance for clients that use Relay. I'd like to take a stab at this in the next weeks! |
I spent some time refreshing myself on this over the weekend, and I found a few more small tweaks. I was thinking about a real fix, some way to make fragment merging more efficient, and I haven't thought of any thing yet!
|
Ok, here are some things I've thought about but given up on:
I share this to say that, although I don't have anything to show for it, I am trying some things 😬 And personally, I really want to solve this issue. If graphql-ruby isn't fast enough to handle Relay without a problem, then that's no good 😖 |
One more thing I explored: Adding a prepare-then-execute workflow, where the work of parsing, validating and preparing a query could be done once, then the same template could be used for re-running that query each time. This is a great feature and definitely one I want to have soon, however, it was harder than I thought. Here's why: The current implementation of
So, that means if we have a selection like this: {
f1 @skip(if: $v1) { a }
f2 @skip(if: $v2) { b }
} And variables like this: {
f2 { b }
} So, the variables (which change from query-to-query) are mixed with the query string (static). What if you want to separate those two steps? Then you need to maintain the two "branches" of code, ( The current query plan data structure doesn't allow for this. It's just a tree of parent types, and selections that apply to those types. So, to support runtime evaluation of |
The 1.9-dev branch supports a more dynamic approach than last time this benchmark was run, so I gave it a try: $ git diff
diff --git a/benchmark/run.rb b/benchmark/run.rb
index 3afc758d0..3d4adfb24 100644
--- a/benchmark/run.rb
+++ b/benchmark/run.rb
@@ -12,11 +12,18 @@ module GraphQLBenchmark
SCHEMA = Jazz::Schema
BENCHMARK_PATH = File.expand_path("../", __FILE__)
- CARD_SCHEMA = GraphQL::Schema.from_definition(File.read(File.join(BENCHMARK_PATH, "schema.graphql")))
+ CARD_SCHEMA = GraphQL::Schema.from_definition(File.read(File.join(BENCHMARK_PATH, "schema.graphql"))).redefine do
+ use GraphQL::Execution::Interpreter
+ use GraphQL::Analysis::AST
+ end
ABSTRACT_FRAGMENTS = GraphQL.parse(File.read(File.join(BENCHMARK_PATH, "abstract_fragments.graphql")))
ABSTRACT_FRAGMENTS_2 = GraphQL.parse(File.read(File.join(BENCHMARK_PATH, "abstract_fragments_2.graphql")))
- BIG_SCHEMA = GraphQL::Schema.from_definition(File.join(BENCHMARK_PATH, "big_schema.graphql"))
+ BIG_SCHEMA = GraphQL::Schema.from_definition(File.join(BENCHMARK_PATH, "big_schema.graphql")).redefine do
+ use GraphQL::Execution::Interpreter
+ use GraphQL::Analysis::AST
+ end
+
BIG_QUERY = GraphQL.parse(File.read(File.join(BENCHMARK_PATH, "big_query.graphql")))
module_function It looks like the benchmarks above run about twice as fast as they used to:
Some of that work is pushed off until runtime, but it really depends on the data now. (Previously, you paid a high price for abstract types regardless of whether the query data exercised the different possibilities.) Besides that, if you have static queries and use It's a huge migration to get to both |
I'm going to close this because I think I've really done what I can on it. I wish everything was superfast , but it's almost 3x faster now than it was previously! Feel free to open a new issue if you'd like to keep digging into this with the new validation/execution flow. |
@rmosolgo this is awesome! Thanks so much for your work! |
Using HackerOne's schema and a complex (Relay) query I'm benchmarking the validation method