Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance issues with large data sets #268

Open
mwilliamson-healx opened this issue Sep 2, 2016 · 85 comments
Open

Performance issues with large data sets #268

mwilliamson-healx opened this issue Sep 2, 2016 · 85 comments

Comments

@mwilliamson-healx
Copy link

For our use case, we send a few thousand objects to the client. We're currently using a normal JSON API, but are considering using GraphQL instead. However, when returning a few thousand objects, the overhead of resolving values makes it impractical to use. For instance, the example below returns 10000 objects with an ID field, and that takes around ten seconds to run.

Is there a recommended way to improve the performance? The approach I've used successfully so far is to use the existing parser to parse the query, and then generate the response by creating dictionaries directly, which avoids the overhead of resolving/completing on every single value.

import graphene

class UserQuery(graphene.ObjectType):
    id = graphene.Int()

class Query(graphene.ObjectType):
    users = graphene.Field(UserQuery.List())

    def resolve_users(self, args, info):
        return users

class User(object):
    def __init__(self, id):
        self.id = id

users = [User(index) for index in range(0, 10000)]

schema = graphene.Schema(query=Query)

print(schema.execute('{ users { id } }').data)
@ekampf
Copy link
Contributor

ekampf commented Sep 2, 2016

@mwilliamson-healx Ive had the same problem.
Fortunately the next version of Graphene fixes the issue (and also adds performance tests to make sure it doesnt regress).
Though its a risk using the library's bleeding edge, Ive been running the next version (pip install graphene>=1.0.dev) for a couple of weeks now in production without problems.

So you should give it a try and see if it solves your problem (and if not, maybe there's some new performance test cases to add to Graphene's performance tests)

@syrusakbary
Copy link
Member

syrusakbary commented Sep 2, 2016

@mwilliamson-healx as Eran pointed, the next version its been rewritten with a special focus on performance.

We also added a benchmark for a similar case you are exposing (retrieving about 100k elements instead of 10k).
https://github.com/graphql-python/graphene/blob/next/graphene/types/tests/test_query.py#L129

The time spent for retrieving 10k elements should be about 10-20 times faster in the next branch (50-100ms?).
https://travis-ci.org/graphql-python/graphene/jobs/156652274#L373

Would be great if you could test this case in the next branch and expose if you run into any non-performant case, I will happily work on that :).

@mwilliamson-healx
Copy link
Author

Thanks for the suggestion! I gave Graphene 1.0.dev0 a go, and while it's certainly faster, it still takes around a second to run the example above. Admittedly, I didn't try it out on the speediest of machines, but suggests that it would still be the dominant factor in response time for our real data.

@syrusakbary
Copy link
Member

@mwilliamson-healx some of the performance bottleneck was also in the OrderedDict generation.
For that graphql-core uses cyordereddict when available (a implementation of OrderedDict in Cython that runs about 2-6x faster).

Could you try installing cyordereddict with pip install cyordereddict and running again the tests? (no need to modify anything in the code).

Thanks!

@syrusakbary
Copy link
Member

syrusakbary commented Sep 5, 2016

PS: There are plans to port some code to Cython (while still preserving the Python implementation) to make graphene/graphql-core even more performant, however any other suggestion would be always welcome! :)

@mwilliamson-healx
Copy link
Author

Thanks again for the suggestion! Using cyordereddict shaves about 200ms off the time (from 1s to 0.8s), so an improvement, but still not ideal. I had a look around the code, but nothing stuck out to me as an easy way of improving performance. The problem (from my extremely quick and poorly informed glance!) is that you end up resolving every single value, which includes going through any middleware and having to coordinate promises. Keeping that functionality while being competitive with just spitting out dicts directly seems rather tricky.

The proof of concept I've got sidesteps the issue somewhat by parsing the GraphQL query, and then relying on the object types being able to generate the requested data directly, without having to further resolve values. It's very much a proof of concept (so doesn't support fragments, and isn't really GraphQL compliant yet), but feel free to have a look. Assuming the approach is sane, then it's hard to see how to reconcile that approach with the normal GraphQL resolve approach.

@syrusakbary
Copy link
Member

syrusakbary commented Sep 6, 2016

Hi @mwilliamson-healx,
At first congrats for your great proof of concept!

I've been thinking for a while how we can improve performance in GraphQL. This repository -graphene- uses graphql-core under the hood which is a very similar port of the GraphQL-js reference implementation.

The problem we are seeing is that either in graphql-core and graphql-js that each type/value is checked in runtime (what I mean is that the resolution+serialization function is "discovered" in runtime each time a value is completed). In js the performance difference is not as big as it usually have a great JIT that optimizes each of the type/value completion calls. However as Python doesn't have any JIT by default, this result in a quite expensive operation.

In the current graphql-js and graphql-core implementations if you want to execute a GraphQL query this is how the process will look like:

Parse AST from string (==> validate the AST in the given schema) ==> Execute a AST given a Root type.

However we can create a "Query Builder" as intermediate step before executing that will know exactly what are the fields we are requesting and therefore it's associated types and resolvers, so we don't need to "search" for them each time we are completing the value.
This way, the process will be something like:

Parse AST from string (==> validate the AST in the given schema) ==> Build the Query resolver based in the AST ==> Execute the Query resolver builder given a Root type.

Your proof of concept is doing the latter so the performance difference is considerable comparing with the current graphql-core implementation.

I think it's completely reasonable to introduce this extra Query resolver build step before executing for avoid the performance bottleneck of doing it in runtime. In fact, I would love to have it in graphql-core.

And I also think this would be super valuable to have it too in the graphql-js implementation as it will improve performance and push forward other language implementations ( @leebyron ).

@mwilliamson
Copy link
Contributor

Thanks for the kind words. One question I had was how much you'd imagine trusting the query builder? For my implementation, I was planning on putting the responsibility of correctness onto the queries (rather than having the GraphQL implementation check). The result is that, unlike the normal implementations of GraphQL, it's possible to implement something that doesn't conform to the GraphQL spec.

@syrusakbary
Copy link
Member

I'm working in the query builder concept. As of right now the benchmarks shows about 4x improvement when returning large datasets.

Related PR in graphql-core: graphql-python/graphql-core#74

@syrusakbary
Copy link
Member

syrusakbary commented Sep 8, 2016

Some updates!
I've been working non-stop on keep improving the performance with the Query Builder.

Benchmarks

Retrieving 10k ObjectTypes

Doing something similar to the following query where allContainers type is a [ObjectType] and x is a Integer:

{
  allContainers {
    x
  }
}

Retrieving a List with 10k Ints

Doing something similar to the following query where allInts type is a [Integer]

{
  allInts
}

NOTE: Just serializing a plain list using GraphQLInt.serialize takes about 8ms, so the gains are better compared substracting this amount from the totals: 4ms vs 22ms

Conclusion

The work I'm doing so far is being a demonstration the code performance still have margins to improve while preserving fully compatibility with GraphQL syntax.

The proof of concept speedup goes between 5x and 15x while maintaining the syntax and features GraphQL have. Still a lot of work to do there, but it's a first approach that will let us discover new paths for speed improvement.

Extra

I think by using Cython for some critical instructions we can gain about another 10-20x in speed.

Transport

Apart of using Cython I'm thinking how we can plug multiple kind of transports into GraphQL.
So instead of creating Python Objects each time we are accessing a field, and then transforming the result to JSON, another approach could be transform the values directly into JSON or whatever transport we are using.

This way the result could be created directly in the output format. This way we can plug other transports like binary (CapN Proto/FlatBuffers/Thrift/others), msgpack or any other thing we could think of.

@mwilliamson
Copy link
Contributor

Thanks for working on this. I've taken a look at the proof of concept you wrote, but it's not clear to me exactly how it behaves, and how it's saving time versus the existing implementation. It seems like it's still resolving all fields of objects in the response, but I could easily have misread.

I adjusted my proof of concept to (optionally) integrate with GraphQL properly. This means that you can do things like generating the schema, introspect, and the all other stuff that GraphQL does, but it means you hit the performance penalty again. It seems to me that the easiest way of fixing this for my use case would be a way to prevent resolution from descending into the object that my proof of concept produces -- a way of returning a value from resolve functions that doesn't trigger resolution on any fields (since they're already resolved).

Perhaps something like:

def resolve_users(...):
    ...
    return FullyResolvedValue(users)

where users is already fully resolved by inspecting the AST or whatever. Alternatively, a decorator on the function itself might be clearer.

This shifts more responsibility onto the calling code to make sure that the returned value is of the correct shape in order to ensure it's still a valid GraphQL implementation, but that's definitely a good trade-off for me.

@qubitron
Copy link

qubitron commented Mar 1, 2017

@syrusakbary any update on this thread? I am using graphene in production and unfortunately it simply doesn't scale for even the moderate data sets being returned by my API. I'm slowly rewriting my API calls as normal HTTP calls and seeing 10x RPS increases (and therefore 10x reduction in server costs), but it means I'm losing the flexibility of the graphQL approach. Seems like the solution discussed in this thread would save me from this headache!

@mwilliamson-healx
Copy link
Author

In case it's useful, I've been using the project I mentioned above in production, and performance has been good enough. In particular, it avoids having to run a (potentially asynchronous) resolver for every field. I'm still tweaking the API, but it should be reasonably stable (and better documented!) soon.

https://github.com/healx/python-graphjoiner

@syrusakbary
Copy link
Member

syrusakbary commented Mar 1, 2017

Hi @qubitron,

If you use the experimental branch features/next-query-builder in graphql-core, you will be able to use a new execution system that improves significantly the speed: graphql-python/graphql-core#74.

It should give you a ~3-5x speed improvement for both big and small datasets.

How to use it

  1. Install it with pip install https://github.com/graphql-python/graphql-core/archive/features/next-query-builder.zip

  2. Enable the new executor (execute this code before any query)

from graphql.execution import executor

executor.use_experimental_executor = True
  1. Execute the query

If you can try it and output here your results would be great!

Extra questions

To help us optimize for your use case:

  • Are you in a CPython environment? (non pypy or google app engine) (to see if we can optimize easily with Cython)
  • How many fields are resolved? (what is the "size" of the GraphQL output)
  • Did you use any GraphQL middleware?

@qubitron
Copy link

qubitron commented Mar 5, 2017

@syrusakbary it took me a bit of time to get to a place where I had a good test for this. The package you provided seems to make a big improvement! Cutting total execution time for my request roughly in half, with the graphene portion reduced by a factor of 3x.

Initially it wasn't working because I already had graphql-core installed, doing "pip uninstall graphql-core" before running your command above finally yielded the performance improvements.

More about my workload... I'm using a flask web server with graphene_sqlalchemy and returning objects that inherit from SQLAlchemyObjectType (not sure if that counts as middleware but I get similar results when I return plain graphene.ObjectType).

For this particular example, I have ~300 items being returned, and resolving 5 fields (on each. The SQL Query takes about 18ms to return results, and the full HTTP response takes 78ms.

After installing your package the request takes about 18ms and full HTTP response takes 37ms. This is much more reasonable, but there still might be some opportunities for improvements.

I ran the CPython profiler for the duration of the request, here is the breakdown of time spent in the graphql libraries with the experimental executor:

   ncalls  cumtime    filename:lineno(function)
        1    0.165    flask/app.py:1605(dispatch_request)
        1    0.165    flask/views.py:82(view)
        1    0.165    flask_graphql/graphqlview.py:58(dispatch_request)
        1    0.162    flask_graphql/graphqlview.py:149(execute_graphql_request)
        1    0.159    flask_graphql/graphqlview.py:146(execute)
        1    0.159    graphql/execution/executor.py:32(execute)
        1    0.159    graphql/execution/experimental/executor.py:14(execute)
        3    0.159    promise/promise.py:42(__init__)
        1    0.159    promise/promise.py:73(do_resolve)
        1    0.159    graphql/execution/experimental/executor.py:42(executor)
        1    0.159    graphql/execution/experimental/executor.py:59(execute_operation)
    323/1    0.159    graphql/execution/experimental/fragment.py:98(resolve)
   2255/1    0.155    graphql/execution/experimental/resolver.py:25(on_complete_resolver)

I'm using a CPython runtime in AWS, do you think your experimental executor is complete/stable enough for me to use it in production (obviously I will test it)?

@syrusakbary
Copy link
Member

Hi @qubitron, thanks for the info and the profiling data!

I've fixed few issues in the experimental executor and now is as stable as the master branch.
For extra verification, I've executed all the master tests using the experimental executor and all are passing ☺️

So yes, as stable as master! :)

@mwilliamson-healx
Copy link
Author

Unfortunately, this is still probably too slow for my use-case -- GraphJoiner is around four times faster. When profiling, it seems like most of the time is spent in (potentially asynchronous) field resolution.

Having said that, I'm not sure that the approach I'm using is really compatible with the way Graphene works. I suspect my comments aren't particularly helpful, so I'll be quiet!

@qubitron
Copy link

@mwilliamson-healx I agree it would be nice if this could be faster, for me these changes make it usable but further performance improvements would be nice. I took a cursory look at the GraphJoiner, I haven't had time to full internalize how it works and although it seems like a promising alternative, I'd prefer if the graphene approach could be made faster or if some sort of hybrid approach could be used.

One thing that would be interesting for me is if somehow we could select only the columns from SQL that were requested by the user's query, to further improve database performance.

@syrusakbary
Copy link
Member

syrusakbary commented Mar 11, 2017

I'm still working on improving Performance.
First step is quite close to be ready, is a new (and ultra-performant) promise implementation.

I'm going to drop here some numbers, so is easier to see the advantages by using just the faster implementation of promise:

Non-optimized GraphQL resolution

Old promise

------------------------------------------------------------------------------------------ benchmark: 5 tests -----------------------------------------------------------------------------------------
Name (time in ms)                                      Min                 Max                Mean             StdDev              Median                IQR            Outliers(*)  Rounds  Iterations
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_big_list_of_ints_serialize                     2.4519 (1.0)        4.8950 (1.0)        2.8593 (1.0)       0.4961 (1.0)        2.6586 (1.0)       0.4846 (1.0)            48;21     380           1
test_big_list_of_ints                              61.0509 (24.90)     73.8399 (15.08)     66.3891 (23.22)     3.7764 (7.61)      66.2786 (24.93)     6.3930 (13.19)            6;0      16           1
test_big_list_objecttypes_with_one_int_field      231.4451 (94.39)    274.0550 (55.99)    253.6332 (88.70)    17.2165 (34.70)    257.7021 (96.93)    27.6580 (57.08)            2;0       5           1
test_big_list_objecttypes_with_two_int_fields     373.6482 (152.39)   407.3970 (83.23)    391.4426 (136.90)   14.5990 (29.43)    391.9201 (147.42)   26.1913 (54.05)            2;0       5           1
test_fragment_resolver_abstract                   233.4590 (95.22)    283.4949 (57.92)    259.2367 (90.66)    21.3765 (43.09)    263.5479 (99.13)    37.4374 (77.26)            2;0       5           1
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

New promise implementation syrusakbary/promise#23

------------------------------------------------------------------------------------------ benchmark: 5 tests -----------------------------------------------------------------------------------------
Name (time in ms)                                      Min                 Max                Mean             StdDev              Median                IQR            Outliers(*)  Rounds  Iterations
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_big_list_of_ints_serialize                     2.4672 (1.0)        7.0231 (1.0)        2.9814 (1.0)       0.5989 (1.0)        2.7701 (1.0)       0.4563 (1.0)            40;31     378           1
test_big_list_of_ints                              23.3240 (9.45)      31.2262 (4.45)      26.8308 (9.00)      1.9695 (3.29)      26.7700 (9.66)      3.2494 (7.12)            14;0      36           1
test_big_list_objecttypes_with_one_int_field      165.3101 (67.00)    201.4430 (28.68)    181.6540 (60.93)    15.7699 (26.33)    181.4460 (65.50)    29.1352 (63.85)            3;0       6           1
test_big_list_objecttypes_with_two_int_fields     248.4190 (100.69)   291.1139 (41.45)    267.6542 (89.77)    17.9228 (29.93)    259.4721 (93.67)    28.7293 (62.96)            2;0       5           1
test_fragment_resolver_abstract                   112.4361 (45.57)    160.6219 (22.87)    139.5578 (46.81)    20.4794 (34.19)    149.4532 (53.95)    35.4158 (77.61)            2;0       7           1
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Optimized GraphQL resolution graphql-python/graphql-core#74

Old Promise

------------------------------------------------------------------------------------------ benchmark: 5 tests -----------------------------------------------------------------------------------------
Name (time in ms)                                      Min                 Max                Mean             StdDev              Median                IQR            Outliers(*)  Rounds  Iterations
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_big_list_of_ints_serialize                     2.4519 (1.0)        5.0600 (1.0)        2.8100 (1.0)       0.4778 (1.0)        2.6290 (1.0)       0.3346 (1.0)            40;35     361           1
test_big_list_of_ints                              48.6422 (19.84)     61.3708 (12.13)     55.8666 (19.88)     2.9545 (6.18)      55.4373 (21.09)     2.9249 (8.74)             6;1      20           1
test_big_list_objecttypes_with_one_int_field      148.5479 (60.58)    192.1201 (37.97)    164.5386 (58.55)    18.2469 (38.19)    153.1000 (58.23)    30.8557 (92.23)            2;0       7           1
test_big_list_objecttypes_with_two_int_fields     214.3099 (87.41)    252.1060 (49.82)    237.2049 (84.41)    16.0745 (33.64)    241.0800 (91.70)    26.6772 (79.74)            1;0       5           1
test_fragment_resolver_abstract                   263.5369 (107.48)   294.0340 (58.11)    275.1848 (97.93)    13.9760 (29.25)    268.7261 (102.21)   24.3396 (72.75)            1;0       5           1
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

New Promise implementation

------------------------------------------------------------------------------------------ benchmark: 5 tests -----------------------------------------------------------------------------------------
Name (time in ms)                                      Min                 Max                Mean             StdDev              Median                IQR            Outliers(*)  Rounds  Iterations
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_big_list_of_ints_serialize                     2.4509 (1.0)        4.5359 (1.0)        2.9296 (1.0)       0.4356 (1.0)        2.7819 (1.0)       0.4752 (1.0)            54;25     351           1
test_big_list_of_ints                              14.3750 (5.87)      20.3481 (4.49)      16.1198 (5.50)      1.0453 (2.40)      15.9812 (5.74)      0.8274 (1.74)            15;6      65           1
test_big_list_objecttypes_with_one_int_field       73.8251 (30.12)    115.9289 (25.56)     92.0637 (31.43)    15.2907 (35.10)     82.6714 (29.72)    27.2505 (57.35)            4;0      12           1
test_big_list_objecttypes_with_two_int_fields      98.5930 (40.23)    149.9560 (33.06)    123.6130 (42.19)    19.3822 (44.50)    128.8331 (46.31)    35.7828 (75.31)            4;0       9           1
test_fragment_resolver_abstract                   115.6740 (47.20)    156.7039 (34.55)    138.5075 (47.28)    16.4670 (37.80)    146.8499 (52.79)    28.6682 (60.33)            3;0       7           1
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

@syrusakbary
Copy link
Member

syrusakbary commented Mar 11, 2017

When used with PyPy the difference is even bigger, and this is just the beginning.
Also, when having multiple fields in a same ObjectType, the improvement is also quite significant.

After finishing this promise implementation, I will work on separate the serializer that I assume will give another ~2x gains if using a simple dict instead of OrderedDict for serialization, and maybe even higher if serialized directly to JSON. This will also open the possibility of using other serializers like msgpack :)

And after that, optimizations with Cython will help to crush all benchmarks! 😊

And all this, while preserving 100% compatibility with the GraphQL spec and the current GraphQL Graphene implementation, with no changes required for the developer, other than updating the package once the new version is published.

@syrusakbary
Copy link
Member

PS: Meanwhile I'm also working on a dataloader implementation for Python that will solve the N+1 problem in GraphQL

@qubitron
Copy link

Amazing work, @syrusakbary! Looking forward to the improvements, let me know if I can help test any changes.

@qubitron
Copy link

@syrusakbary I am a bit hesitant to use PyPy, I ran into some bugs/compatibility issues with Cython libraries (unrelated to graphene) and was getting mixed performance results using sqlalchemy. That being said, if the wins are there then it's always good to have that option.

@syrusakbary
Copy link
Member

syrusakbary commented Mar 13, 2017

I've been able to improve a little bit more the type resolution, giving an extra ~35% in speed gains: graphql-python/graphql-core@81bcf8c.

New benchmarks (new promise and better type resolution with experimental executor)

--------------------------------------------------------------------------------------- benchmark: 5 tests ---------------------------------------------------------------------------------------
Name (time in ms)                                     Min                 Max               Mean            StdDev             Median               IQR            Outliers(*)  Rounds  Iterations
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_big_list_of_ints_serialize                    2.6469 (1.0)        5.0581 (1.0)       2.9428 (1.0)      0.4469 (1.0)       2.7812 (1.0)      0.2511 (1.0)            47;53     401           1
test_big_list_of_ints                             13.6490 (5.16)      21.1191 (4.18)     15.1494 (5.15)     1.7030 (3.81)     14.3925 (5.18)     1.9491 (7.76)            12;2      62           1
test_big_list_objecttypes_with_one_int_field      60.2801 (22.77)     90.2431 (17.84)    67.1742 (22.83)    9.6505 (21.60)    63.0350 (22.67)    5.5089 (21.94)            2;2      15           1
test_big_list_objecttypes_with_two_int_fields     82.4349 (31.14)    110.2500 (21.80)    90.0414 (30.60)    7.7319 (17.30)    88.1380 (31.69)    9.3712 (37.32)            1;1      12           1
test_fragment_resolver_abstract                   92.1650 (34.82)    107.6009 (21.27)    98.8749 (33.60)    4.5259 (10.13)    97.8079 (35.17)    4.3540 (17.34)            2;0       8           1
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

@syrusakbary
Copy link
Member

(all this benchmarks are without PyPy, just plain Python with the common CPython executor)

@syrusakbary
Copy link
Member

syrusakbary commented Mar 13, 2017

The latest next-query-builder branch now includes the ultra-performant version of promise.

Just by running pip install pip install https://github.com/graphql-python/graphql-core/archive/features/next-query-builder.zip it should upgrade promise to promise>=2.0.dev.

(you will also need to do: executor.use_experimental_executor = True)

@qubitron Willing to know the extra performance improvements!

@jedie
Copy link

jedie commented Dec 19, 2019

I also make some tests and compare graphene-django with django-rest-framework.

In both cases i used a view with pagination (DjangoConnectionField or PageNumberPagination).
The tests requests 100 items from existing 1000 items.

graphene:

django v1.11.26
graphene v2.1.8
graphene-django v2.7.1

timeit... use 5 * 75 loop...
max...: 10.84 ms
median: 10.66 ms
min...: 10.61 ms
cProfile stats for one request: 30148 function calls (28688 primitive calls) in 0.019 seconds

Rest-API:

django v1.11.26
Rest-Framework v3.9.4

timeit... use 5 * 213 loop...
max...: 3.81 ms
median: 3.63 ms
min...: 3.58 ms
cProfile stats for one request: 14171 function calls (13142 primitive calls) in 0.007 seconds

What immediately stands out is the much higher number of function calls.

I also run a test with graphene v3 with django 2.2 with the graphene-django sources from: graphql-python/graphene-django#812 It's ~30% slower than graphene v2
I hope this is slower because it's not the final code yet.

See also: graphql-python/graphene-django#829

@jkimbo
Copy link
Member

jkimbo commented Dec 26, 2019

@jedie could you share the code you used to benchmark Graphene vs Rest Framework?

@jedie
Copy link

jedie commented Feb 5, 2020

@jedie could you share the code you used to benchmark Graphene vs Rest Framework?

Sorry, can't share the code. But it's really a minimal example code.

But i made another tests and benchmark only with graphql-core: https://gist.github.com/jedie/581444e02e784ff7c2b9fb1e763759fa

It fetches only a list of 1000 dummy items and takes ~20ms

@jedie
Copy link

jedie commented Feb 5, 2020

Now, i also made a similar test with tartiflette.

To my surprise: tartiflette (~57ms) is significantly slower than graphql-core (~20ms).

My benchmark code:

tartiflette: https://gist.github.com/jedie/45ddf8ee7e24704c9485eb8cbcf9ba13
graphql-core: https://gist.github.com/jedie/581444e02e784ff7c2b9fb1e763759fa

EDIT: I re-implement a "standalone" Benckmark test with Django REST-Framework that will so similar stuff... And yes, it's very, very faster: ~8ms

https://gist.github.com/jedie/1d658a184eb4435383820aa0c647d7e9

@sostholm
Copy link

I was fixing a performance issue in graphene-mongo:
graphql-python/graphene-mongo#125

My pull request brought down the response time from 2s to 0.02s on a dataset of 12000 documents in MongoDB.

The solution was to provide the list_slice_length in default_resolver to prevent the default resolver from doing a len() on the collection.

It would appear that the default behavior for many orms when doing len on their collections is to load all objects in the collection.

Although I resolved this particular issue, there were plenty more like this one. I stopped trying because it would require some major changes to Graphene in order to fix it.

Will issues like this be fixed for v3?

@doda
Copy link

doda commented Feb 9, 2021

@qeternity it's really good, but I don't think this issue is about preparing the data: this is more about time spent inside graphene/graphql-core, and caling the resolvers.

Now, there is something wrong: v3 seems to be 2x slower than v2: https://gist.github.com/ktosiek/849e8c7de8852c2df1df5af8ac193287

There doesn't seem to be this dicrepancy between 2.1.8 and 3.07b

Graphene 2: 12.702938017901033
Graphene 3: 12.651812066091225

@ktosiek
Copy link

ktosiek commented Feb 10, 2021

There doesn't seem to be this dicrepancy between 2.1.8 and 3.07b

Graphene 2: 12.702938017901033
Graphene 3: 12.651812066091225

That particular problem was fixed in graphql-python/graphql-core#54, fix was released in graphql-core 3.1.0.

@cancan101
Copy link

cancan101 commented May 5, 2021

FWIW, I am using graphql-core==3.1.4 with Ariadne and am still seeing a fair bit of unexplained time spent when returning larger data sets. This is at least one place that I am seeing a lot of time being spent in various forms of completion (e.g. complete_value_catching_error) vs resolving of values (e.g. resolve_field_value_or_error):

Function: resolve_field at line 578

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   578                                               def resolve_field(
   579                                                   self,
   580                                                   parent_type: GraphQLObjectType,
   581                                                   source: Any,
   582                                                   field_nodes: List[FieldNode],
   583                                                   path: Path,
   584                                               ) -> AwaitableOrValue[Any]:
   585                                                   """Resolve the field on the given source object.
   586                                           
   587                                                   In particular, this figures out the value that the field returns by calling its
   588                                                   resolve function, then calls complete_value to await coroutine objects,
   589                                                   serialize scalars, or execute the sub-selection-set for objects.
   590                                                   """
   591      2189       1355.0      0.6      1.3          field_node = field_nodes[0]
   592      2189       1523.0      0.7      1.5          field_name = field_node.name.value
   593                                           
   594      2189       4577.0      2.1      4.4          field_def = get_field_def(self.schema, parent_type, field_name)
   595      2189       1159.0      0.5      1.1          if not field_def:
   596                                                       return Undefined
   597                                           
   598      2189       1265.0      0.6      1.2          resolve_fn = field_def.resolve or self.field_resolver
   599                                           
   600      2189       1128.0      0.5      1.1          if self.middleware_manager:
   601      2189       4540.0      2.1      4.4              resolve_fn = self.middleware_manager.get_field_resolver(resolve_fn)
   602                                           
   603      2189      10609.0      4.8     10.3          info = self.build_resolve_info(field_def, field_nodes, parent_type, path)
   604                                           
   605                                                   # Get the resolve function, regardless of if its result is normal or abrupt
   606                                                   # (error).
   607      4378      25759.0      5.9     25.0          result = self.resolve_field_value_or_error(
   608      2189       1140.0      0.5      1.1              field_def, field_nodes, resolve_fn, source, info
   609                                                   )
   610                                           
   611      4378      48599.0     11.1     47.2          return self.complete_value_catching_error(
   612      2189       1314.0      0.6      1.3              field_def.type, field_nodes, info, path, result
   613                                                   )

Perhaps still this old issue: graphql-python/graphql-core#54 (comment)?

And this is from Sentry profiling (spans are created just for the top level instance of any recursive calls):
image

@kevindice
Copy link

In case this helps anyone, I ran some benchmarks against graphene-django and DRF with django-silk and discovered the FieldTracker from django_model_utils was the cause of my performance issues. The profile showed a heinous amount of time spent on the deepcopy function.

To paint a picture here:

from model_utils import FieldTracker

class Profile(models.Model):
    bio = models.TextField(blank=True)
    bio_hashtags = models.ManyToManyField(Hashtag, blank=True)

    tracker = FieldTracker()

    def save(self, *args, **kwargs):
        if self.id and self.tracker.has_changed('bio'):
            self.reconcile_bio_hashtags()
        super.save(*args, **kwargs)

Fetching 50 profiles, the following timings were obtained:

  • DRF w/ field-tracker enabled: 10.2 seconds
  • graphene-django (using relay fwiw): 9.7 seconds
  • DRF w/ field-tracker disabled: 0.159 seconds
  • graphene-django w/ field-tracker disabled: 0.7 seconds

0.7 seconds is still pretty bad for a query of 50 things and a single postgres query, but 90% of my problem was not graphene.

@Kobold
Copy link

Kobold commented May 7, 2021

Great find @kevindice . Thank you for posting!

@tomduncalf
Copy link

tomduncalf commented Aug 22, 2021

Just to offer my experience, it seems like performance is still an issue. Returning a set of 50 items using Graphene (I'd consider them medium sized maybe? nothing unusual), requests were taking over 2 seconds to return on a Heroku free dyno – nearly all of this time is spent in GraphQL code according to NewRelic, I had optimised the queries already. Switching to Strawberry made no difference, and switching DRF greatly improved performance, so it definitely seems like the GraphQL Core is the issue.

I would have liked to investigate more but I am on a deadline, so I ended up switching to Rails with graphql-ruby which, to my surprise, is faster than either Python solution – the same query on Heroku with Rails returns in 300-400ms, so it's several times faster and makes the difference between a good and bad user experience. Interestingly with Rails, it seems using GraphQL is actually a bit faster than a normal REST endpoint!

I prefer the developer ergonomics of Python and Django but to be honest Django and Rails are similar enough in many ways that it's not a big deal for me to switch. Obviously you can't do this if you're deep into a project, but for anyone considering Python for a GraphQL project I think it's worth being aware of these potential performance issues – and also being aware that Graphene seems to be stuck in a potentially-unmaintained limbo, if I'd realised this sooner I'd probably have started with Rails.

@jkimbo
Copy link
Member

jkimbo commented Aug 22, 2021

@tomduncalf were you using Graphene v2 or v3? Also I'm surprised that switching to Strawberry didn't help. Based on these benchmarks: https://twitter.com/jayden_windle/status/1235323199220592644 I would expect Strawberry or Graphene v3 to be significantly better especially for lists of objects.

IMO I would expect GraphQL to always have a bit of a performance overhead compared to a REST endpoint since it's doing quite a bit more. It would be good to get Python performance to a point where it's comparable to other similar languages though (like ruby). Can you share any of your code?

@tomduncalf
Copy link

@jkimbo This was on Graphene v3. Unfortunately I can't share the code as it's not open source and I don't really have the time to dig into exactly why it was slower right now, but if I do find time to make a simple repro comparing Python vs Ruby I will post it here!

@jkimbo
Copy link
Member

jkimbo commented Aug 23, 2021

So @tomduncalf you properly nerd sniped me with this and I ended up building a more "realistic" benchmark. I implemented an API to fetch the top 250 rated IMDB movies in Graphene, Strawberry, DjangoRestFramework and a plain JSON api, all hosted on Django. All the data comes from a sqlite db. The code is here: https://github.com/jkimbo/django-graphql-benchmarks and you can try it out the Graphene API here.

Here are the P99 results against the Heroku instance:

requests-time
requests-pre-second

So you can see that Graphene (v3) and Strawberry (v0.73.1 contains a fix for a performance regression btw) are pretty much neck and neck, which is what I would expect considering that they are just different ways to setup a graphql-core server. DRF is definitely faster (~25% faster) and the the plain json endpoint is faster still. I couldn't replicate the 2 second response times you were seeing with your API @tomduncalf so I'm not sure what is going on there.

Overall GraphQL in Python is definitely slower that using something like DjangoRestFramework but not horribly so in my opinion. There are definitely things that can be improved though and thanks to this exercise I have some ideas for improvements we can make to Strawberry.

Would be interested in how this all compares to graphql-ruby as well but unfortunately my experience there is lacking.

@tomduncalf
Copy link

Hey @jkimbo, thanks for doing this and I hope my initial post didn't come across too negatively – I just wanted to share my experience for anyone else in my situation (i.e not familiar with either Django or Rails and looking to pick one for a GraphQL project), as I didn't really find much online comparing the two for GraphQL specifically and I didn't realise there is a performance overhead.

You prompted me to do a little bit more digging as I felt bad for just saying it was slow 🙃 as your demo seems to perform pretty well. One thing I didn't think to mention is that I am using Relay with my API – I did a little bit of testing with my API and it seems like using Relay adds a fairly significant overhead – almost doubling the response times on Heroku for the same query vs. a non-Relay version! I wonder if you could try using Relay with Graphene on yours and see if you see similar results? Or if you tell me how to run yours, I can try it (pretty new to Python so couldn't work out how to run yours from a git clone).

My API does still seem quite slow compared to yours and I'm not really sure why, as yours is returning a larger set of data. I'm new to Django so I could be doing something a bit stupid somewhere. To be honest, I am going to stick with Rails at this point as it's probably a slightly better fit for what I am trying to do (build an API with as little code as possible basically, haha – the ecosystem of gems seems a bit more developed for some of the things I want to do), but if you have any suggestions of good ways to profile my Python I could give it a go.

Anyway, you piqued my curiosity so I reproduced your demo in Rails! The code is at https://github.com/tomduncalf/rails-graphql-benchmark and I've deployed it to Heroku in the EU region. It seems like it returns a bit faster than yours, but not dramatically so – I'm not sure how you run your benchmark but I'd be happy to try it on mine if it's useful for comparison.

There are two queries you can run, one Relay and one non-Relay (doesn't seem the Ruby version of GraphiQL supports embedding them in the URL!):

{
  movies {
    edges {
      node {
        id
        imdbId
        title
        year
        imageUrl
        imdbRating
        imdbRatingCount
        director {
          id
          name
        }
      }
    }
  }
}

{
  moviesPlain {
    id
    imdbId
    title
    year
    imageUrl
    imdbRating
    imdbRatingCount
    director {
      id
      name
    }
  }
}

Cheers,
Tom

@Ashish-Bansal
Copy link

Last year I spent a few weekends pulling off a very minimal PoC based on @syrusakbary idea of generating template code to improve GQL performance in python.

Here's the link to the same - https://github.com/Ashish-Bansal/graphql-jit

It's very vague, un-tested PoC implementation, and needs a lot of work.

I won't get time to work on it, so in case anyone is interested, they can work over that PoC.

@cglacet
Copy link

cglacet commented Oct 19, 2022

This thread is interesting but kind of hard to get a grasp on. That would be great if someone competent, like a maintainer of graphql-core or graphene could make a summary of the different solutions that still exist today (potential gain depending on the use case, potential issues, required version of graphene/graphql-core, maturity of the solution).

I also have a few questions:

  • Is there a significant performance benefit in moving from v2 to v3?
  • Are there significant performance differences amongst python versions?
  • Can working with graphene-sqlalchemy be an issue when it comes to performances?
  • What tool(s) are you using to measure asynchronous code performances with decent precision and without interferences? (I tried yappi and it seems decently close to clock time but the output is not very readable). I first started to use cProfile because I had no idead it couldn't measure async code.
  • What is considered as a "large" dataset and how does size impact performances? I feel jus like @jkimbo and his benchmark look quite "realistic" to me (strangely when I experiment with the Graphene API I get response time under 200ms).

In my case, all requests I try to make take an enormous time to resolve (300ms for < 10 fields queries, and up to 3s for larger queries). On the other hand I sometimes get quite lower response time (~ x3 improvement) for the exact same query on the exact same database (same data state), all of which use the exact same docker image, sometimes I wonder if it's not a lack of memory/CPU issue which is causing this.

Anyway, thanks for this discussion, I hope someone smarter than me would be able to summarize the good ideas that lies here and there in this thread.

@flbraun
Copy link

flbraun commented Dec 8, 2022

@cglacet

Is there a significant performance benefit in moving from v2 to v3?

Currently asking myself the same question. I took the liberty to fork jkimbo's benchmark suite mentioned above, update the frameworks to their latest version and include graphene v2 as well: flbraun/django-graphql-benchmarks

Here's the result from a fresh bench I did this morning:
results

As you can see the difference between v2 and v3 is pretty much non-existent, however the benchmark suite currently only serializes a bunch of ObjectTypes. Since this is the most primitive building block of a GraphQL API and doesn't cover real-world setups utilizing pagination, connections, etc, your experience may vary.

Maybe this is of any interest for you.

@erikwrede
Copy link
Member

@flbraun nice update! Probably makes sense to add some asyncio into the mix as this adds a lot of overhead as well.

@flbraun
Copy link

flbraun commented Jan 10, 2023

I updated my forked bench suite to be tested against multiple Python versions, see results here.

tl;dw: Graphene (both v2 and v3) perform almost identical on the same Python version. However, Graphene seems to have heavily benefited from performance improvements in the Python interpreter. Jumping from 3.10 to 3.11 alone shaves off ~20% off of mean response times, which is kinda impressive.
Maybe somebody with more knowledge about the inner workings of Graphene (and graphql-core) can leverage this information.

@qeternity
Copy link

If you're running this in prod, you should try pyston...we've seen perf that is close to pypy but without warmup/other issues

@vade
Copy link

vade commented Dec 8, 2023

Hi.

I'm curious what the latest is on this? Were noticing similar performance issues as OP posted, where a lot of time appears to be spent planning SQL Queries

Our SQL takes 60-100ms to process, but time to first byte is close to 1 second. Our profile implies a lot of time is spent planning SQL

Our Stack:

  • Python 3.11
  • Django 4.2.8
  • DRF 3.14.0
  • Django Graphene 3.1.5

All optimized SQL, and we get numbers like:

Chrome Dev Tools

  • 6 SQL Querues 56.44 ms
  • 44us to send request
  • 937 ms waiting for server response
  • 8.48 ms to download
  • Total ~1 second for a 100ms DB query!

Our PySpy on a Postgres and Django all running local implies much of the time is in Graphene / SQL planning?

profile-gunicorn-with-signed-url

@pfcodes
Copy link

pfcodes commented Jul 6, 2024

Hi.

I'm curious what the latest is on this? Were noticing similar performance issues as OP posted, where a lot of time appears to be spent planning SQL Queries

Our SQL takes 60-100ms to process, but time to first byte is close to 1 second. Our profile implies a lot of time is spent planning SQL

Our Stack:

  • Python 3.11
  • Django 4.2.8
  • DRF 3.14.0
  • Django Graphene 3.1.5

All optimized SQL, and we get numbers like:

Chrome Dev Tools

  • 6 SQL Querues 56.44 ms
  • 44us to send request
  • 937 ms waiting for server response
  • 8.48 ms to download
  • Total ~1 second for a 100ms DB query!

Our PySpy on a Postgres and Django all running local implies much of the time is in Graphene / SQL planning?

profile-gunicorn-with-signed-url

Did you ever figure this out? @vade

@vade
Copy link

vade commented Jul 6, 2024

@pfcodes We've done a lot of optimization on our stack in the interim, so I don't know if I can speak specifically to any one single change, but things we've observed.

We use Relay, and pagination:

  • Using an ORM SQL optimizer makes a world of difference, especially ones that pays close attention to details for more real world complicated queries. We've been helping debug graphene-django-query-optimizer which supports not only optimization, fetch related and selected related, but most importantly it optimizes nested query sets and field objects and supports filtering and relay pagination, and also optimizes synthetic fields (fields that are actually functions that may use other model fields).
  • We optimized the shit out of our GQL queries and redesigned our schema so things can be a bit more light weight in terms of fields we fetch
  • For some filtered query sets, resolving total count (for relay pagination) is highly suboptimial and total count takes a majority of the SQL time and for some reason the total count breaks some optimization paths. This one of our last large hurdles.

We've seen 10x improvement in response time with some of the above.

I know it's hand wavy, but a lot of it was just really paying attention to details and ensuring that more complicated queries do in fact get optimized.

We do some hand tuned query set tweaks in some cases for fields that require model / db lookups and annotate them, and we found some hot spots where we unintentionally did dumb shit like evaluate a query set in place rather than use annotated values so the DB could do the work.

cc @rsomani95 - any thing else to add from the work we did that maybe im missing?

@vade
Copy link

vade commented Jul 6, 2024

Also, theres a link to an issue with observations about the flame graph in question with observations from the optimizers author @ MrThearMan (sorry for the tag) - they deserve a ton of credit, this optimizer is best in class right now for Django and no one knows it :)

MrThearMan/graphene-django-query-optimizer#86

@dicknetherlands
Copy link

For the very specific use-case that I'm working with, which is synchronous-only and does not support the full range of GraphQL features, I've made a Gist which shows how I worked around this performance issue to get an apparent speedup of 10x in some cases (not formally measured). If your use-case is similar to mine then it might make a useful starting point. https://gist.github.com/dicknetherlands/2f6e8619409fa155a05b3a863f10269a

It works by assuming that the developers will respect the schema. The key speedups come from discovering the type of each field only once, bypassing serializers for builtin scalars, bypassing nullability checks, not doing object type checks for nested fields, not re-validating the schema upon every query, and more efficient resolution of SyncFuture objects returned by graphql_sync_dataloader (if you're using that library).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests