-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slow queries in GraphQL after upgrading to v3.4.1 #11291
Comments
This is more of a feature request than a bug, The issue is we aren't doing any prefetch_related or select_related on the GraphQL queries. I replicated the above and 2046 queries are done. I'm not certain why it got worse from 3.3 to 3.4 but the root of the issue is the DB calls. The bigger issue with GraphQL is we don't know what will be included in the query, so you can either optimize for most common queries or dynamically optimize queries. There is a package https://github.com/tfoxy/graphene-django-optimizer to dynamically optimize queries, unfortunately it doesn't work with the latest Graphene. |
I think this is a standard problem in graphql APIs that is usually solved using batching/data loaders. Basically, it's an n+1 problem/lazy loading problem. There is one query to the database to fetch all prefixes and then when it is time to load the VLAN for a prefix, there is no batching implemented, so it performs lazy-loading/one query to the database per prefix. The solution is to batch it. When trying to load VLANs, instead of directly loading them, their keys are "cached" and finally, a list of all prefixes that we need to load VLANs for is sent to the batch method and we can load them all at once, using one query to the database. I have implemented this in several graphql APIs but I'm not "fluent in python or graphene". :) It seems like graphene has a solution for it though: https://docs.graphene-python.org/en/latest/execution/dataloader It seems like it could possibly be a solution for this issue to use that. |
After upgrading to 3.4.3 from 3.3.6, we've also noticed our graphql query jump from ~1s to ~7s. Looking at the CPU usage with py-spy, it doesn't seem like the DB queries are being slow, but the django bits around them that are:
I haven't had a chance yet to rerun the query against 3.3.6 to see what it looked like there. |
This issue is causing some trouble for us because most of our queries are taking so long we run into timeouts. Example query we are using:
|
I have identified commit 99cf1b1 as responsible for the massive slowdown of the GraphQL queries. |
My fork reverts the commit on top of the current NetBox release. GraphQL is now as fast as prior to v3.4. As to be expected, filtering does not work on all levels, but we do not use that at the moment. |
Hi guys, I have a similar issue. Since the upgrade to 3.4.x the graphql query ends with a timeout
Less complex queries still work without any issues. Best regards |
Am going to take another look at this later this week, see if there are some tweaks that can be made. Thanks for the sample queries, will test with those. |
thanks a lot! |
I split the one script into three scripts as workaround I created one for the region "emea", "apac" and "americas". We created super networks/summary networks with the VLAN ID 4094. Actually we don´t need it in our graphql report, we need only VLAN/Prefixes that are really configured on our switches, routers .... Maybe its also a workaround for some user people.
|
Traced down the issue: https://github.com/netbox-community/netbox/blob/develop/netbox/netbox/filtersets.py#L249 This code is causing custom-fields to be queried even if they aren't in GraphQL request. You don't see this in REST API as all fields are returned anyways and there are prefetch_related calls in-place, but in GraphQL only a specific subset of fields are returned and there is currently no query optimization. This basically causes the query count to double from previous versions as filters are applied correctly now. I made a bunch of VLANs with the query above, here are the query counts: This will need to be fixed separately from a query optimizer, either the code referenced above needs to be changed so the custom fields aren't added to the query (preferred), or the query optimizer would need to detect the filter is a subclass of NetBoxModelFilterSet and auto add custom field prefetches. |
@arthanson Thanks for your work. I just tested our use case with your branch and it is (at least) up to speed with previous versions. |
Added issue #11949 to track the custom-fields being auto added to queries. |
PR #11943 has been merged into |
NetBox version
v3.4.1
Python version
3.9
Steps to Reproduce
import/create >= 800 vlans
[{"vid": "1", "name": "vlan_1", "status": "active"} ..... {"vid": "800", "name": "vlan_800", "status": "active"}]
run query:
Expected Behavior
Response in less than 10 seconds.
Observed Behavior
In version 3.3 this query took less than 10s, after upgrading to 3.4 this takes between 30-40 seconds.
This is reproduced at the demo site: https://demo.netbox.dev/ipam/vlans/
The text was updated successfully, but these errors were encountered: