WIP - Benchmarking validation with a bigger schema and query #1172

Willianvdv · 2017-12-05T21:46:31Z

Using HackerOne's schema and a complex (Relay) query I'm benchmarking the validation method

Willianvdv · 2017-12-05T21:51:36Z

Now:

validate - introspection
                        25.000  i/100ms
validate - abstract fragments
                        44.000  i/100ms
validate - abstract fragments 2
                        29.000  i/100ms
validate - hackerone query
                         2.000  i/100ms
Calculating -------------------------------------
validate - introspection
                        238.815  (± 5.0%) i/s -      1.200k in   5.036233s
validate - abstract fragments
                        498.230  (± 7.4%) i/s -      2.508k in   5.064500s
validate - abstract fragments 2
                        295.405  (± 5.4%) i/s -      1.479k in   5.021705s
validate - hackerone query
                         25.413  (±11.8%) i/s -    126.000  in   5.072018s

rmosolgo · 2017-12-06T00:00:48Z

🍻 10x slower than the slowest previous example 😅

Let's see what we can find.

rmosolgo · 2017-12-06T00:01:32Z

Sorry, I should have said first:

THANK YOU for such a detailed reproduction of this issue! I know it's a pain to invest time in replicating the bug, but for me it's huge. Now we have a case that we can run some profiling on!

rmosolgo · 2017-12-06T02:12:20Z

I added some more profiling and started diving in! The benchmarks are down a bit for me:

Before

~/code/graphql-ruby $ be rake bench:validate
Warming up --------------------------------------
validate - introspection
                        24.000  i/100ms
validate - abstract fragments
                        53.000  i/100ms
validate - abstract fragments 2
                        31.000  i/100ms
validate - hackerone query
                         2.000  i/100ms
Calculating -------------------------------------
validate - introspection
                        251.626  (± 4.0%) i/s -      1.272k in   5.062746s
validate - abstract fragments
                        526.332  (± 3.0%) i/s -      2.650k in   5.039200s
validate - abstract fragments 2
                        310.111  (± 4.5%) i/s -      1.550k in   5.008831s
validate - hackerone query
                         29.260  (± 3.4%) i/s -    148.000  in   5.067372s

After

~/code/graphql-ruby $ be rake bench:validate
Warming up --------------------------------------
validate - introspection
                        26.000  i/100ms
validate - abstract fragments
                        57.000  i/100ms
validate - abstract fragments 2
                        33.000  i/100ms
validate - hackerone query
                         3.000  i/100ms
Calculating -------------------------------------
validate - introspection
                        268.893  (± 4.1%) i/s -      1.352k in   5.037168s
validate - abstract fragments
                        568.874  (± 3.7%) i/s -      2.850k in   5.016582s
validate - abstract fragments 2
                        334.139  (± 3.3%) i/s -      1.683k in   5.042201s
validate - hackerone query
                         32.419  (± 3.1%) i/s -    162.000  in   5.004348s

But I think the big wins will depend on revisiting how fragment nodes are merged into the contexts where they're spread. Looking at object allocations, that's the big one, also Array#| and a few others.

rmosolgo · 2017-12-06T15:35:21Z

This is what stands out to me:

Lots of objects allocated by .duping irep_nodes:

allocated objects by location
-----------------------------------
      1620  /Users/rmosolgo/code/graphql-ruby/lib/graphql/internal_representation/node.rb:192
      1620  /Users/rmosolgo/code/graphql-ruby/lib/graphql/internal_representation/node.rb:89
      1620  /Users/rmosolgo/code/graphql-ruby/lib/graphql/internal_representation/node.rb:90
       858  /Users/rmosolgo/code/graphql-ruby/lib/graphql/internal_representation/rewrite.rb:99

(Probably not the problem itself, but indicative of where work is being done)

Lots of time spent merging nodes:

 %self      total      self      wait     child     calls  name
  3.84      0.008     0.004     0.000     0.004     4601   GraphQL::BaseType#==
  3.77      0.009     0.004     0.000     0.005     1620   GraphQL::InternalRepresentation::Node#initialize_copy
  3.56      0.005     0.004     0.000     0.001      944   Array#|
  2.86      0.007     0.003     0.000     0.004     2234   <Module::GraphQL::Execution::Typecast>#subtype?
  2.85      0.003     0.003     0.000     0.000     9150   GraphQL::BaseType#to_s

(Array#| and #subtype? are indicators for me)

So, I think the next step, looking for a bigger win, is to rethink how fragments are merged into the main operation. For example:

Can we avoid the .dup calls? We keep arrays of ast_node and definition, but could we keep only one of those?
Can we be more efficient in merging nodes?
Take a look at which nodes are merged the most. I bet there's duplication (eg id and __typename), and I wonder if we can somehow be smarter with that.

Willianvdv · 2017-12-06T21:08:49Z

I'm trying to wrap my head around the validation and, especially, the internal representation of nodes, fields and how they work together. I think I came pretty far, but I do have some questions 😱

Can we avoid the .dup calls? We keep arrays of ast_node and definition, but could we keep only one of those?

Yeah, for a query like the one in this PR copying the ast_nodes adds up. My initial thought was: why not freeze the ast_nodes so we don't have to duplicate them, but this dream was shattered once I figured out the rewriter modifies the internal node's ast_nodes. That made me think, can't we have an intermediate object that stores pointers to the original AST nodes and modify those pointers while rewriting? We could even create a "real" AST object once the rewriting is done, so we only copy the nodes once.

Take a look at which nodes are merged the most. I bet there's duplication (eg id and __typename), and I wonder if we can somehow be smarter with that.

Are you talking about fields on the same type but scattered over multiple fragments? If so, are you thinking to have a register per type, so we don't have to duplicate field nodes?

rmosolgo · 2017-12-08T00:09:02Z

intermediate object that stores pointers to the original AST nodes

I think that describes the current @ast_nodes array, right? 😬

fields on the same type but scattered over multiple fragments?

yes, I'm not exactly sure how they're handled, and judging by the number of id fields in the attached Relay query, I wonder if that's part of the problem!

rmosolgo · 2017-12-08T01:42:36Z

Yeah, it's hard to put my finger on it, but we need a world with less .dup ...

The problem as I see it is like this:

We have an API where we iterate over each field on each object type (query analyzers), which would be nice to not break
But that means, for abstract types, we have to "reify" their fields on their actual object types
That's the tricky part: when there's a selection on an abstract type, and we want to reify it on each object type, we copy it.
We copy it because each object type might have some customizations which shouldn't be shared with the other object types.

Here's an example of the transformation I'm talking about:

# $ irb -Ilib 
require "graphql" 
schema = GraphQL::Schema.from_definition <<-GRAPHQL 
type A implements I { 
  int: Int 
  int2: Int
}
type B implements I { 
  int: Int 
}
type C implements I { 
  int: Int 
}

interface I {
  int: Int
}

type Query {
  iface: I 
}
GRAPHQL

query = <<-GRAPHQL
query GetStuff {
  iface {
    int
  }
  iface {
    ... on A  {
      int2
    }
  }  
}
GRAPHQL

puts GraphQL::InternalRepresentation::Print.print(schema, query)

# query GetStuff {
#   ... on Query {
#     iface {
#       ... on A {
#         int
#         int2
#       }
#       ... on B {
#         int
#       }
#       ... on C {
#         int
#       }
#     }
#   }
# }

See how the fields on I were copied to each of A, B and C, but we had to maintain the distinction between them, because additional selections were made on A, which should not be applied to B or C?

That's why they're copied, so that we can merge later without worrying about accidental sharing.

But maybe that's what needs fixing, some better structure that doesn't require the copy, but ideally, we can still serve the query analyzer API.

Willianvdv · 2017-12-08T10:16:05Z

yes, I'm not exactly sure how they're handled, and judging by the number of id fields in the attached Relay query, I wonder if that's part of the problem!

Running the full query:

validate - hackerone query
                         26.703  (±15.0%) i/s -    132.000  in   5.036051s

When removing the id, _id, and __typename:

validate - hackerone query
                         46.996  (± 6.4%) i/s -    236.000  in   5.047717s

It results in ~50% more iterations per second. With 102 deletions, these fields were a significant part of the original 475 line long query. Which makes me believe that unnecessary duplication of fields (unfortunately) isn't a performance bottleneck.

rmosolgo · 2017-12-08T13:37:08Z

Isn't that 79% faster? Seems significant to me 😬

(236.000 - 132.000) / (132)
# => 0.7878787878787878

Anyhow, it makes me wonder, I can't think of a good incremental solution, so what could we do if we started again from scratch? Even if we can't replace the current implementation, maybe we'd learn something to carry over.

Willianvdv · 2017-12-08T16:14:58Z

We had an offline discussion about this task, let me summarize it for those following this thread. @mvgijssel raised an interesting question: most of queries are not unique, can't we just cache the rewriter's result? Unfortunately, it's not as easy as just caching the rewriter output using a query identifier. Based on an off-line discussion with @rmosolgo, the behavior of directives (like skip and if) will influence the rewriter's outcome.

But, caching is a bit of a hack/band-aid. The real problem, still, is the number of .dups we're calling during rewriting. This is the interesting but way harder problem we have to address.

Still, not all hope is lost for the caching solution. We can do a two-phase rewriter, in the first pass we the rewriter outputs a cacheable result. In the second pass, we'll rewrite the nodes that depend on directives. Depending if we can split the rewriter, this would be a quick win and significantly improve performance for clients that use Relay. I'd like to take a stab at this in the next weeks!

rmosolgo · 2018-01-29T15:55:13Z

I spent some time refreshing myself on this over the weekend, and I found a few more small tweaks. I was thinking about a real fix, some way to make fragment merging more efficient, and I haven't thought of any thing yet!

~/code/graphql-ruby $ be rake bench:validate
Warming up --------------------------------------
validate - introspection
                        28.000  i/100ms
validate - abstract fragments
                        59.000  i/100ms
validate - abstract fragments 2
                        36.000  i/100ms
validate - hackerone query
                         3.000  i/100ms
Calculating -------------------------------------
validate - introspection
                        282.117  (± 3.5%) i/s -      1.428k in   5.067636s
validate - abstract fragments
                        598.867  (± 2.8%) i/s -      3.009k in   5.028500s
validate - abstract fragments 2
                        361.353  (± 5.3%) i/s -      1.836k in   5.097977s
validate - hackerone query
                         36.498  (± 2.7%) i/s -    183.000  in   5.021557s

rmosolgo · 2018-02-01T21:47:40Z

Ok, here are some things I've thought about but given up on:

Go back to executing the AST. This won't work because some important features of GraphQL-Ruby are built on query analyzers, which require some ahead-of-time knowledge about what we're going to execute. Maybe we could refine the ahead-of-time structure, but I think we need something. Besides, my suspicion is that if we went back to the AST, we would still have to do complex fragment merging. Also, FWIW, graphql-js switched from AST to an intermediary structure, too: Introduce planning phase to query execution graphql/graphql-js#304
Identify some selections which can be merged without .dup. The problem here is that selections have a reference to .parent node. If you don't dup, then you can't be sure that .parent won't be overridden later.
Identify some nodes which can be skipped in merging. This actually does work, but it doesn't have a perf benefit. Before merging one field into another, you can check if the two AST nodes are identical (using #==), and if they are, return without actually merging. The problem is that this is almost never true. The only times when it was true in the attached example were id and __typename, which are very simple merges, so although it occurred about 300 times, it made no difference in the speed.

I share this to say that, although I don't have anything to show for it, I am trying some things 😬 And personally, I really want to solve this issue. If graphql-ruby isn't fast enough to handle Relay without a problem, then that's no good 😖

rmosolgo · 2018-02-05T15:15:43Z

One more thing I explored: Adding a prepare-then-execute workflow, where the work of parsing, validating and preparing a query could be done once, then the same template could be used for re-running that query each time. This is a great feature and definitely one I want to have soon, however, it was harder than I thought. Here's why:

The current implementation of @skip works like this:

If you find a @skip in the query string, then evaluate its if argument, and if the node should be skipped, then don't add it to the query plan. (Similar for @include(if: false).)

So, that means if we have a selection like this:

{
  f1 @skip(if: $v1) { a }
  f2 @skip(if: $v2) { b }
}

And variables like this: {v1: true, v2: false}, then we simplify the query to:

{ 
  f2 { b } 
}

So, the variables (which change from query-to-query) are mixed with the query string (static). What if you want to separate those two steps? Then you need to maintain the two "branches" of code, (@skip(if: $v1), @skip(if: $v2)) and handle various values of v1/v2 during each run.

The current query plan data structure doesn't allow for this. It's just a tree of parent types, and selections that apply to those types. So, to support runtime evaluation of @skip/@include, we have to alter the query plan structure in some way. This is possible, but it's a big change with lots of consequences (does it change ctx[:irep_node]? query analyzers?), so I want to revisit after releasing 1.8.0.

rmosolgo · 2018-12-17T19:06:29Z

The 1.9-dev branch supports a more dynamic approach than last time this benchmark was run, so I gave it a try:

$ git diff
diff --git a/benchmark/run.rb b/benchmark/run.rb
index 3afc758d0..3d4adfb24 100644
--- a/benchmark/run.rb
+++ b/benchmark/run.rb
@@ -12,11 +12,18 @@ module GraphQLBenchmark
   SCHEMA = Jazz::Schema

   BENCHMARK_PATH = File.expand_path("../", __FILE__)
-  CARD_SCHEMA = GraphQL::Schema.from_definition(File.read(File.join(BENCHMARK_PATH, "schema.graphql")))
+  CARD_SCHEMA = GraphQL::Schema.from_definition(File.read(File.join(BENCHMARK_PATH, "schema.graphql"))).redefine do
+    use GraphQL::Execution::Interpreter
+    use GraphQL::Analysis::AST
+  end
   ABSTRACT_FRAGMENTS = GraphQL.parse(File.read(File.join(BENCHMARK_PATH, "abstract_fragments.graphql")))
   ABSTRACT_FRAGMENTS_2 = GraphQL.parse(File.read(File.join(BENCHMARK_PATH, "abstract_fragments_2.graphql")))

-  BIG_SCHEMA = GraphQL::Schema.from_definition(File.join(BENCHMARK_PATH, "big_schema.graphql"))
+  BIG_SCHEMA = GraphQL::Schema.from_definition(File.join(BENCHMARK_PATH, "big_schema.graphql")).redefine do
+    use GraphQL::Execution::Interpreter
+    use GraphQL::Analysis::AST
+  end
+
   BIG_QUERY = GraphQL.parse(File.read(File.join(BENCHMARK_PATH, "big_query.graphql")))

   module_function

It looks like the benchmarks above run about twice as fast as they used to:

~/code/graphql-ruby $ be rake bench:validate
Warming up --------------------------------------
validate - introspection
                        65.000  i/100ms
validate - abstract fragments
                       117.000  i/100ms
validate - abstract fragments 2
                        62.000  i/100ms
validate - big query     6.000  i/100ms
Calculating -------------------------------------
validate - introspection
                        653.886  (± 7.8%) i/s -      3.250k in   5.015013s
validate - abstract fragments
                          1.127k (± 6.6%) i/s -      5.616k in   5.007786s
validate - abstract fragments 2
                        753.421  (± 7.3%) i/s -      3.782k in   5.049324s
validate - big query     70.381  (± 5.7%) i/s -    354.000  in   5.048852s

Some of that work is pushed off until runtime, but it really depends on the data now. (Previously, you paid a high price for abstract types regardless of whether the query data exercised the different possibilities.)

Besides that, if you have static queries and use validate: false, then it will really do nothing ahead of time, it just runs the parsed AST.

It's a huge migration to get to both Interpreter and Analysis::AST, but I think it's worth it!

rmosolgo · 2018-12-17T19:07:41Z

I'm going to close this because I think I've really done what I can on it. I wish everything was superfast , but it's almost 3x faster now than it was previously! Feel free to open a new issue if you'd like to keep digging into this with the new validation/execution flow.

Willianvdv · 2018-12-18T11:16:17Z

@rmosolgo this is awesome! Thanks so much for your work!

Willianvdv changed the title ~~WIP - Benchmarking~~ WIP - Benchmarking validation with a big schema and query Dec 6, 2017

Willianvdv changed the title ~~WIP - Benchmarking validation with a big schema and query~~ WIP - Benchmarking validation with a bigger schema and query Dec 6, 2017

Willianvdv and others added 11 commits January 28, 2018 14:23

Add HackerOne's slugish validation query

e6e3490

Robert fixed the schema loader

d99bd7b

Lets use the complex query

191eb36

Add profiling for validation of HACKERONE_QUERY

87c25a3

Perf improvements for validation

f996b9a

Generate all types

41b6607

Push a little fix to .print

0c17608

Use a simpler BaseType#==

316eac9

Tweak stuff

01c7297

Fix some tests

176832e

Fix input object printing

f437dd9

rmosolgo force-pushed the h1 branch from 90bbb3b to f437dd9 Compare January 29, 2018 15:54

rmosolgo mentioned this pull request Jan 30, 2018

small validation performance improvements #1248

Merged

rmosolgo mentioned this pull request Apr 5, 2018

Interpreter #1394

Merged

16 tasks

rmosolgo mentioned this pull request Apr 18, 2018

Unknown overhead #1416

Closed

rmosolgo mentioned this pull request Aug 14, 2018

[WIP] Proposal: add inaccessible_field feature #1769

Closed

rmosolgo closed this Dec 17, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP - Benchmarking validation with a bigger schema and query #1172

WIP - Benchmarking validation with a bigger schema and query #1172

Willianvdv commented Dec 5, 2017 •

edited

Loading

Willianvdv commented Dec 5, 2017

rmosolgo commented Dec 6, 2017

rmosolgo commented Dec 6, 2017

rmosolgo commented Dec 6, 2017

rmosolgo commented Dec 6, 2017

Willianvdv commented Dec 6, 2017

rmosolgo commented Dec 8, 2017

rmosolgo commented Dec 8, 2017

Willianvdv commented Dec 8, 2017

rmosolgo commented Dec 8, 2017

Willianvdv commented Dec 8, 2017 •

edited

Loading

rmosolgo commented Jan 29, 2018

rmosolgo commented Feb 1, 2018

rmosolgo commented Feb 5, 2018

rmosolgo commented Dec 17, 2018

rmosolgo commented Dec 17, 2018

Willianvdv commented Dec 18, 2018

WIP - Benchmarking validation with a bigger schema and query #1172

WIP - Benchmarking validation with a bigger schema and query #1172

Conversation

Willianvdv commented Dec 5, 2017 • edited Loading

Willianvdv commented Dec 5, 2017

rmosolgo commented Dec 6, 2017

rmosolgo commented Dec 6, 2017

rmosolgo commented Dec 6, 2017

rmosolgo commented Dec 6, 2017

Willianvdv commented Dec 6, 2017

rmosolgo commented Dec 8, 2017

rmosolgo commented Dec 8, 2017

Willianvdv commented Dec 8, 2017

rmosolgo commented Dec 8, 2017

Willianvdv commented Dec 8, 2017 • edited Loading

rmosolgo commented Jan 29, 2018

rmosolgo commented Feb 1, 2018

rmosolgo commented Feb 5, 2018

rmosolgo commented Dec 17, 2018

rmosolgo commented Dec 17, 2018

Willianvdv commented Dec 18, 2018

Willianvdv commented Dec 5, 2017 •

edited

Loading

Willianvdv commented Dec 8, 2017 •

edited

Loading