Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP - Benchmarking validation with a bigger schema and query #1172

Closed
wants to merge 11 commits into from

Conversation

Willianvdv
Copy link
Contributor

@Willianvdv Willianvdv commented Dec 5, 2017

Using HackerOne's schema and a complex (Relay) query I'm benchmarking the validation method

@Willianvdv
Copy link
Contributor Author

Now:

validate - introspection
                        25.000  i/100ms
validate - abstract fragments
                        44.000  i/100ms
validate - abstract fragments 2
                        29.000  i/100ms
validate - hackerone query
                         2.000  i/100ms
Calculating -------------------------------------
validate - introspection
                        238.815  (± 5.0%) i/s -      1.200k in   5.036233s
validate - abstract fragments
                        498.230  (± 7.4%) i/s -      2.508k in   5.064500s
validate - abstract fragments 2
                        295.405  (± 5.4%) i/s -      1.479k in   5.021705s
validate - hackerone query
                         25.413  (±11.8%) i/s -    126.000  in   5.072018s

@rmosolgo
Copy link
Owner

rmosolgo commented Dec 6, 2017

🍻 10x slower than the slowest previous example 😅

Let's see what we can find.

@rmosolgo
Copy link
Owner

rmosolgo commented Dec 6, 2017

Sorry, I should have said first:

THANK YOU for such a detailed reproduction of this issue! I know it's a pain to invest time in replicating the bug, but for me it's huge. Now we have a case that we can run some profiling on!

@rmosolgo
Copy link
Owner

rmosolgo commented Dec 6, 2017

I added some more profiling and started diving in! The benchmarks are down a bit for me:

Before
~/code/graphql-ruby $ be rake bench:validate
Warming up --------------------------------------
validate - introspection
                        24.000  i/100ms
validate - abstract fragments
                        53.000  i/100ms
validate - abstract fragments 2
                        31.000  i/100ms
validate - hackerone query
                         2.000  i/100ms
Calculating -------------------------------------
validate - introspection
                        251.626  (± 4.0%) i/s -      1.272k in   5.062746s
validate - abstract fragments
                        526.332  (± 3.0%) i/s -      2.650k in   5.039200s
validate - abstract fragments 2
                        310.111  (± 4.5%) i/s -      1.550k in   5.008831s
validate - hackerone query
                         29.260  (± 3.4%) i/s -    148.000  in   5.067372s

After
~/code/graphql-ruby $ be rake bench:validate
Warming up --------------------------------------
validate - introspection
                        26.000  i/100ms
validate - abstract fragments
                        57.000  i/100ms
validate - abstract fragments 2
                        33.000  i/100ms
validate - hackerone query
                         3.000  i/100ms
Calculating -------------------------------------
validate - introspection
                        268.893  (± 4.1%) i/s -      1.352k in   5.037168s
validate - abstract fragments
                        568.874  (± 3.7%) i/s -      2.850k in   5.016582s
validate - abstract fragments 2
                        334.139  (± 3.3%) i/s -      1.683k in   5.042201s
validate - hackerone query
                         32.419  (± 3.1%) i/s -    162.000  in   5.004348s

But I think the big wins will depend on revisiting how fragment nodes are merged into the contexts where they're spread. Looking at object allocations, that's the big one, also Array#| and a few others.

@Willianvdv Willianvdv changed the title WIP - Benchmarking WIP - Benchmarking validation with a big schema and query Dec 6, 2017
@Willianvdv Willianvdv changed the title WIP - Benchmarking validation with a big schema and query WIP - Benchmarking validation with a bigger schema and query Dec 6, 2017
@rmosolgo
Copy link
Owner

rmosolgo commented Dec 6, 2017

This is what stands out to me:

  • Lots of objects allocated by .duping irep_nodes:

    allocated objects by location
    -----------------------------------
          1620  /Users/rmosolgo/code/graphql-ruby/lib/graphql/internal_representation/node.rb:192
          1620  /Users/rmosolgo/code/graphql-ruby/lib/graphql/internal_representation/node.rb:89
          1620  /Users/rmosolgo/code/graphql-ruby/lib/graphql/internal_representation/node.rb:90
           858  /Users/rmosolgo/code/graphql-ruby/lib/graphql/internal_representation/rewrite.rb:99
    

    (Probably not the problem itself, but indicative of where work is being done)

  • Lots of time spent merging nodes:

     %self      total      self      wait     child     calls  name
      3.84      0.008     0.004     0.000     0.004     4601   GraphQL::BaseType#==
      3.77      0.009     0.004     0.000     0.005     1620   GraphQL::InternalRepresentation::Node#initialize_copy
      3.56      0.005     0.004     0.000     0.001      944   Array#|
      2.86      0.007     0.003     0.000     0.004     2234   <Module::GraphQL::Execution::Typecast>#subtype?
      2.85      0.003     0.003     0.000     0.000     9150   GraphQL::BaseType#to_s
    

    (Array#| and #subtype? are indicators for me)

So, I think the next step, looking for a bigger win, is to rethink how fragments are merged into the main operation. For example:

  • Can we avoid the .dup calls? We keep arrays of ast_node and definition, but could we keep only one of those?
  • Can we be more efficient in merging nodes?
  • Take a look at which nodes are merged the most. I bet there's duplication (eg id and __typename), and I wonder if we can somehow be smarter with that.

@Willianvdv
Copy link
Contributor Author

I'm trying to wrap my head around the validation and, especially, the internal representation of nodes, fields and how they work together. I think I came pretty far, but I do have some questions 😱

Can we avoid the .dup calls? We keep arrays of ast_node and definition, but could we keep only one of those?

Yeah, for a query like the one in this PR copying the ast_nodes adds up. My initial thought was: why not freeze the ast_nodes so we don't have to duplicate them, but this dream was shattered once I figured out the rewriter modifies the internal node's ast_nodes. That made me think, can't we have an intermediate object that stores pointers to the original AST nodes and modify those pointers while rewriting? We could even create a "real" AST object once the rewriting is done, so we only copy the nodes once.

Take a look at which nodes are merged the most. I bet there's duplication (eg id and __typename), and I wonder if we can somehow be smarter with that.

Are you talking about fields on the same type but scattered over multiple fragments? If so, are you thinking to have a register per type, so we don't have to duplicate field nodes?

@rmosolgo
Copy link
Owner

rmosolgo commented Dec 8, 2017

intermediate object that stores pointers to the original AST nodes

I think that describes the current @ast_nodes array, right? 😬

fields on the same type but scattered over multiple fragments?

yes, I'm not exactly sure how they're handled, and judging by the number of id fields in the attached Relay query, I wonder if that's part of the problem!

@rmosolgo
Copy link
Owner

rmosolgo commented Dec 8, 2017

Yeah, it's hard to put my finger on it, but we need a world with less .dup ...

The problem as I see it is like this:

  • We have an API where we iterate over each field on each object type (query analyzers), which would be nice to not break
  • But that means, for abstract types, we have to "reify" their fields on their actual object types
  • That's the tricky part: when there's a selection on an abstract type, and we want to reify it on each object type, we copy it.
  • We copy it because each object type might have some customizations which shouldn't be shared with the other object types.

Here's an example of the transformation I'm talking about:

# $ irb -Ilib 
require "graphql" 
schema = GraphQL::Schema.from_definition <<-GRAPHQL 
type A implements I { 
  int: Int 
  int2: Int
}
type B implements I { 
  int: Int 
}
type C implements I { 
  int: Int 
}

interface I {
  int: Int
}

type Query {
  iface: I 
}
GRAPHQL

query = <<-GRAPHQL
query GetStuff {
  iface {
    int
  }
  iface {
    ... on A  {
      int2
    }
  }  
}
GRAPHQL

puts GraphQL::InternalRepresentation::Print.print(schema, query)

# query GetStuff {
#   ... on Query {
#     iface {
#       ... on A {
#         int
#         int2
#       }
#       ... on B {
#         int
#       }
#       ... on C {
#         int
#       }
#     }
#   }
# }

See how the fields on I were copied to each of A, B and C, but we had to maintain the distinction between them, because additional selections were made on A, which should not be applied to B or C?

That's why they're copied, so that we can merge later without worrying about accidental sharing.

But maybe that's what needs fixing, some better structure that doesn't require the copy, but ideally, we can still serve the query analyzer API.

@Willianvdv
Copy link
Contributor Author

yes, I'm not exactly sure how they're handled, and judging by the number of id fields in the attached Relay query, I wonder if that's part of the problem!

Running the full query:

validate - hackerone query
                         26.703  (±15.0%) i/s -    132.000  in   5.036051s

When removing the id, _id, and __typename:

validate - hackerone query
                         46.996  (± 6.4%) i/s -    236.000  in   5.047717s

It results in ~50% more iterations per second. With 102 deletions, these fields were a significant part of the original 475 line long query. Which makes me believe that unnecessary duplication of fields (unfortunately) isn't a performance bottleneck.

@rmosolgo
Copy link
Owner

rmosolgo commented Dec 8, 2017

Isn't that 79% faster? Seems significant to me 😬

(236.000 - 132.000) / (132)
# => 0.7878787878787878

Anyhow, it makes me wonder, I can't think of a good incremental solution, so what could we do if we started again from scratch? Even if we can't replace the current implementation, maybe we'd learn something to carry over.

@Willianvdv
Copy link
Contributor Author

Willianvdv commented Dec 8, 2017

We had an offline discussion about this task, let me summarize it for those following this thread. @mvgijssel raised an interesting question: most of queries are not unique, can't we just cache the rewriter's result? Unfortunately, it's not as easy as just caching the rewriter output using a query identifier. Based on an off-line discussion with @rmosolgo, the behavior of directives (like skip and if) will influence the rewriter's outcome.

But, caching is a bit of a hack/band-aid. The real problem, still, is the number of .dups we're calling during rewriting. This is the interesting but way harder problem we have to address.

Still, not all hope is lost for the caching solution. We can do a two-phase rewriter, in the first pass we the rewriter outputs a cacheable result. In the second pass, we'll rewrite the nodes that depend on directives. Depending if we can split the rewriter, this would be a quick win and significantly improve performance for clients that use Relay. I'd like to take a stab at this in the next weeks!

@rmosolgo
Copy link
Owner

I spent some time refreshing myself on this over the weekend, and I found a few more small tweaks. I was thinking about a real fix, some way to make fragment merging more efficient, and I haven't thought of any thing yet!

~/code/graphql-ruby $ be rake bench:validate
Warming up --------------------------------------
validate - introspection
                        28.000  i/100ms
validate - abstract fragments
                        59.000  i/100ms
validate - abstract fragments 2
                        36.000  i/100ms
validate - hackerone query
                         3.000  i/100ms
Calculating -------------------------------------
validate - introspection
                        282.117  (± 3.5%) i/s -      1.428k in   5.067636s
validate - abstract fragments
                        598.867  (± 2.8%) i/s -      3.009k in   5.028500s
validate - abstract fragments 2
                        361.353  (± 5.3%) i/s -      1.836k in   5.097977s
validate - hackerone query
                         36.498  (± 2.7%) i/s -    183.000  in   5.021557s

@rmosolgo
Copy link
Owner

rmosolgo commented Feb 1, 2018

Ok, here are some things I've thought about but given up on:

  • Go back to executing the AST. This won't work because some important features of GraphQL-Ruby are built on query analyzers, which require some ahead-of-time knowledge about what we're going to execute. Maybe we could refine the ahead-of-time structure, but I think we need something. Besides, my suspicion is that if we went back to the AST, we would still have to do complex fragment merging. Also, FWIW, graphql-js switched from AST to an intermediary structure, too: Introduce planning phase to query execution graphql/graphql-js#304
  • Identify some selections which can be merged without .dup. The problem here is that selections have a reference to .parent node. If you don't dup, then you can't be sure that .parent won't be overridden later.
  • Identify some nodes which can be skipped in merging. This actually does work, but it doesn't have a perf benefit. Before merging one field into another, you can check if the two AST nodes are identical (using #==), and if they are, return without actually merging. The problem is that this is almost never true. The only times when it was true in the attached example were id and __typename, which are very simple merges, so although it occurred about 300 times, it made no difference in the speed.

I share this to say that, although I don't have anything to show for it, I am trying some things 😬 And personally, I really want to solve this issue. If graphql-ruby isn't fast enough to handle Relay without a problem, then that's no good 😖

@rmosolgo
Copy link
Owner

rmosolgo commented Feb 5, 2018

One more thing I explored: Adding a prepare-then-execute workflow, where the work of parsing, validating and preparing a query could be done once, then the same template could be used for re-running that query each time. This is a great feature and definitely one I want to have soon, however, it was harder than I thought. Here's why:

The current implementation of @skip works like this:

If you find a @skip in the query string, then evaluate its if argument, and if the node should be skipped, then don't add it to the query plan. (Similar for @include(if: false).)

So, that means if we have a selection like this:

{
  f1 @skip(if: $v1) { a }
  f2 @skip(if: $v2) { b }
}

And variables like this: {v1: true, v2: false}, then we simplify the query to:

{ 
  f2 { b } 
}

So, the variables (which change from query-to-query) are mixed with the query string (static). What if you want to separate those two steps? Then you need to maintain the two "branches" of code, (@skip(if: $v1), @skip(if: $v2)) and handle various values of v1/v2 during each run.

The current query plan data structure doesn't allow for this. It's just a tree of parent types, and selections that apply to those types. So, to support runtime evaluation of @skip/@include, we have to alter the query plan structure in some way. This is possible, but it's a big change with lots of consequences (does it change ctx[:irep_node]? query analyzers?), so I want to revisit after releasing 1.8.0.

@rmosolgo rmosolgo mentioned this pull request Apr 5, 2018
16 tasks
@rmosolgo
Copy link
Owner

The 1.9-dev branch supports a more dynamic approach than last time this benchmark was run, so I gave it a try:

$ git diff
diff --git a/benchmark/run.rb b/benchmark/run.rb
index 3afc758d0..3d4adfb24 100644
--- a/benchmark/run.rb
+++ b/benchmark/run.rb
@@ -12,11 +12,18 @@ module GraphQLBenchmark
   SCHEMA = Jazz::Schema

   BENCHMARK_PATH = File.expand_path("../", __FILE__)
-  CARD_SCHEMA = GraphQL::Schema.from_definition(File.read(File.join(BENCHMARK_PATH, "schema.graphql")))
+  CARD_SCHEMA = GraphQL::Schema.from_definition(File.read(File.join(BENCHMARK_PATH, "schema.graphql"))).redefine do
+    use GraphQL::Execution::Interpreter
+    use GraphQL::Analysis::AST
+  end
   ABSTRACT_FRAGMENTS = GraphQL.parse(File.read(File.join(BENCHMARK_PATH, "abstract_fragments.graphql")))
   ABSTRACT_FRAGMENTS_2 = GraphQL.parse(File.read(File.join(BENCHMARK_PATH, "abstract_fragments_2.graphql")))

-  BIG_SCHEMA = GraphQL::Schema.from_definition(File.join(BENCHMARK_PATH, "big_schema.graphql"))
+  BIG_SCHEMA = GraphQL::Schema.from_definition(File.join(BENCHMARK_PATH, "big_schema.graphql")).redefine do
+    use GraphQL::Execution::Interpreter
+    use GraphQL::Analysis::AST
+  end
+
   BIG_QUERY = GraphQL.parse(File.read(File.join(BENCHMARK_PATH, "big_query.graphql")))

   module_function

It looks like the benchmarks above run about twice as fast as they used to:

~/code/graphql-ruby $ be rake bench:validate
Warming up --------------------------------------
validate - introspection
                        65.000  i/100ms
validate - abstract fragments
                       117.000  i/100ms
validate - abstract fragments 2
                        62.000  i/100ms
validate - big query     6.000  i/100ms
Calculating -------------------------------------
validate - introspection
                        653.886  (± 7.8%) i/s -      3.250k in   5.015013s
validate - abstract fragments
                          1.127k (± 6.6%) i/s -      5.616k in   5.007786s
validate - abstract fragments 2
                        753.421  (± 7.3%) i/s -      3.782k in   5.049324s
validate - big query     70.381  (± 5.7%) i/s -    354.000  in   5.048852s

Some of that work is pushed off until runtime, but it really depends on the data now. (Previously, you paid a high price for abstract types regardless of whether the query data exercised the different possibilities.)

Besides that, if you have static queries and use validate: false, then it will really do nothing ahead of time, it just runs the parsed AST.

It's a huge migration to get to both Interpreter and Analysis::AST, but I think it's worth it!

@rmosolgo
Copy link
Owner

I'm going to close this because I think I've really done what I can on it. I wish everything was superfast , but it's almost 3x faster now than it was previously! Feel free to open a new issue if you'd like to keep digging into this with the new validation/execution flow.

@rmosolgo rmosolgo closed this Dec 17, 2018
@Willianvdv
Copy link
Contributor Author

@rmosolgo this is awesome! Thanks so much for your work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants