-
Notifications
You must be signed in to change notification settings - Fork 9.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix performance when parsing GraphQL schema into AST which causes downtime on cache stampede #31879
Fix performance when parsing GraphQL schema into AST which causes downtime on cache stampede #31879
Conversation
Hi @convenient. Thank you for your contribution
❗ Automated tests can be triggered manually with an appropriate comment:
You can find more information about the builds here ℹ️ Please run only needed test builds instead of all when developing. Please run all test builds before sending your PR for review. For more details, please, review the Magento Contributor Guide documentation. 🕙 You can find the schedule on the Magento Community Calendar page. 📞 The triage of Pull Requests happens in the queue order. If you want to speed up the delivery of your contribution, please join the Community Contributions Triage session to discuss the appropriate ticket. 🎥 You can find the recording of the previous Community Contributions Triage on the Magento Youtube Channel ✏️ Feel free to post questions/proposals/feedback related to the Community Contributions Triage process to the corresponding Slack Channel |
fb10610
to
0672892
Compare
@magento run all tests |
When I click on any of the build reports they just spin for me never loading. Either way I intend to cover all the tests in one swoop after the discussion on what we do with the meta data that can no longer be asserted against. |
And to aid with the stampeding during deployments I am testing out an overwrite to the core like the following, it's a bit more specific to our situation but just in case other people are reading this in the same scenario --- setup/src/Magento/Setup/Model/Installer.php 2020-06-23 12:03:22.000000000 +0100
+++ setup/src/Magento/Setup/Model/Installer.php 2021-01-29 16:52:16.000000000 +0000
@@ -1279,6 +1279,10 @@
$types = empty($types) ? $availableTypes : array_intersect($availableTypes, $types);
$enabledTypes = $cacheManager->setEnabled($types, $isEnabled);
if ($isEnabled) {
+ // Do not flush varnish during a deployment
+ $enabledTypes = array_filter($enabledTypes, function ($type) {
+ return $type !== 'full_page';
+ });
$cacheManager->clean($enabledTypes);
}
@@ -1308,6 +1312,12 @@
/** @var \Magento\Framework\App\Cache\Manager $cacheManager */
$cacheManager = $this->objectManagerProvider->get()->get(\Magento\Framework\App\Cache\Manager::class);
$types = $cacheManager->getAvailableTypes();
+
+ // Do not flush varnish during a deployment
+ $types = array_filter($types, function ($type) {
+ return $type !== 'full_page';
+ });
+
$cacheManager->clean($types);
$this->log->log('Cache cleared successfully');
} This means that we don't automatically purge varnish during deplyoment, but we have a short enough TTL that his is forgiving and it is better for some stale pages to be served than a 503. |
Another possible solution is to move the SchemaStitching to the deployment phase. After such change, the cache flush would not trigger the expensive logic involved in parsing the massive ASTs. |
Just a note that this has been deployed to production without issue, and no spike in CPU or anything to remark during the deployment. Went through without downtime because of this patch. |
Is there anything you need from me on this? |
FWIW: We're seeing this too; Degraded GQL performance during high load on deployments. New Relic traces point exactly to what @convenient has been describing above. We've applied patch and no issues during UAT so far. |
@magento run all tests |
The requested builds are added to the queue. You should be able to see them here within a few minutes. Please re-request them if they don't show in a reasonable amount of time. |
There was a bug with the enum placeholder ordering Input like the following ``` interface SomeInterface { product_type: String @doc(description: "The type of product, such as simple, configurable, etc.") } enum SomeEnum { } ``` would have placeholder logic ran against it and end up like ``` interface SomeInterface { product_type: String @doc(description: "The type of product, such as simple, configurable, etc.") } enum SomeEnum { placeholder_graphql_field: String } ``` This caused an error, something in the AST was trying to make it resolve against the word "Type" in the @doc description By fixing the enum so that it does not have `: String` in it the AST is generated successfully, we do this by replacing those empty enums first so that they cannot get caught by the empty type check
The newly supported union is the cause of this issue in the parseTypes function. The regex is finding the type of union CompanyStructureEntity but its also grabbing the item below it and including it as part of its schema content, which means CompanyStructureEntity contains the schema for itself as well as CompanyStructureItem , this means I never have a $knownTypes['CompanyStructureItem'] to work with which means it is a missing dependency. ``` <?php use Magento\Framework\App\Bootstrap; require __DIR__ . '/app/bootstrap.php'; $bootstrap = Bootstrap::create(BP, $_SERVER); $obj = $bootstrap->getObjectManager(); /** @var Magento\Framework\GraphQlSchemaStitching\GraphQlReader $graphqlReader */ $graphqlReader = $obj->get(Magento\Framework\GraphQlSchemaStitching\GraphQlReader::class); $reflector = new ReflectionObject($graphqlReader); $method = $reflector->getMethod('parseTypes'); $method->setAccessible(true); $broken = 'type IsCompanyEmailAvailableOutput @doc(description: "Contains the response of a company email validation query") { is_email_available: Boolean! @doc(description: "A value of `true` indicates the email address can be used to create a company") } union CompanyStructureEntity @typeResolver(class: "Magento\\CompanyGraphQl\\Model\\Resolver\\StructureEntityTypeResolver") = CompanyTeam | Customer type CompanyStructureItem @doc(description: "Defines an individual node in the company structure") { id: ID! @doc(description: "The unique ID for a `CompanyStructureItem` object") parent_id: ID @doc(description: "The ID of the parent item in the company hierarchy") entity: CompanyStructureEntity @doc(description: "A union of `CompanyTeam` and `Customer` objects") } type CompanyStructure @doc(description: "Contains an array of the individual nodes that comprise the company structure") { items: [CompanyStructureItem] @doc(description: "An array of elements in a company structure") }'; $result = $method->invoke($graphqlReader, $broken); echo "Got the following types" . PHP_EOL; echo implode(PHP_EOL, array_keys($result)) . PHP_EOL; echo "The values for the union actually contains CompanyStructureItem which is why the initial approach works any mine does not" . PHP_EOL; echo $result['CompanyStructureEntity'] . PHP_EOL; ```
@magento run all tests |
The requested builds are added to the queue. You should be able to see them here within a few minutes. Please re-request them if they don't show in a reasonable amount of time. |
Hey @cpartica As discussed on slack there were 2 issues I encountered, both these errors completely broke all graphql requests.
|
…hql-stitching-perf Conflicts: lib/internal/Magento/Framework/GraphQlSchemaStitching/GraphQlReader.php
@magento run all tests |
The requested builds are added to the queue. You should be able to see them here within a few minutes. Please re-request them if they don't show in a reasonable amount of time. |
@magento run Functional Tests EE, Unit Tests |
The requested builds are added to the queue. You should be able to see them here within a few minutes. Please re-request them if they don't show in a reasonable amount of time. |
Just so its recorded @danslo asked for something to repro the This uses magento <?php
use Magento\Framework\App\Bootstrap;
require __DIR__ . '/app/bootstrap.php';
$bootstrap = Bootstrap::create(BP, $_SERVER);
$obj = $bootstrap->getObjectManager();
/** @var Magento\Framework\GraphQlSchemaStitching\GraphQlReader */
$reader = $obj->get(Magento\Framework\GraphQlSchemaStitching\GraphQlReader::class);
$schema = 'type IsCompanyEmailAvailableOutput @doc(description: "Contains the response of a company email validation query") {
is_email_available: Boolean! @doc(description: "A value of `true` indicates the email address can be used to create a company")
}
union CompanyStructureEntity @typeResolver(class: "Magento\\CompanyGraphQl\\Model\\Resolver\\StructureEntityTypeResolver") = CompanyTeam | Customer
type CompanyStructureItem @doc(description: "Defines an individual node in the company structure") {
id: ID! @doc(description: "The unique ID for a `CompanyStructureItem` object")
parent_id: ID @doc(description: "The ID of the parent item in the company hierarchy")
entity: CompanyStructureEntity @doc(description: "A union of `CompanyTeam` and `Customer` objects")
}
type CompanyStructure @doc(description: "Contains an array of the individual nodes that comprise the company structure") {
items: [CompanyStructureItem] @doc(description: "An array of elements in a company structure")
}';
$method = new ReflectionMethod(\Magento\Framework\GraphQlSchemaStitching\GraphQlReader::class, "parseTypes");
$method->setAccessible(true);
$results = $method->invoke($reader, $schema);
if (!array_key_exists('CompanyStructureItem', $results)) {
echo "FAILURE - Missing CompanyStructureItem" . PHP_EOL;
} else {
echo "SUCCESS" . PHP_EOL;
}
echo implode(PHP_EOL, array_keys($results)) . PHP_EOL; This uses just PHP <?php
$schema = 'type IsCompanyEmailAvailableOutput @doc(description: "Contains the response of a company email validation query") {
is_email_available: Boolean! @doc(description: "A value of `true` indicates the email address can be used to create a company")
}
union CompanyStructureEntity @typeResolver(class: "Magento\\CompanyGraphQl\\Model\\Resolver\\StructureEntityTypeResolver") = CompanyTeam | Customer
type CompanyStructureItem @doc(description: "Defines an individual node in the company structure") {
id: ID! @doc(description: "The unique ID for a `CompanyStructureItem` object")
parent_id: ID @doc(description: "The ID of the parent item in the company hierarchy")
entity: CompanyStructureEntity @doc(description: "A union of `CompanyTeam` and `Customer` objects")
}
type CompanyStructure @doc(description: "Contains an array of the individual nodes that comprise the company structure") {
items: [CompanyStructureItem] @doc(description: "An array of elements in a company structure")
}';
$regex = '/(type|interface|union|enum|input)[\s\t\n\r]+([_A-Za-z][_0-9A-Za-z]+)[\s\t\n\r]+([^\{]*)(\{[^\}]*\})/i';
preg_match_all(
"$regex",
$schema,
$matches
);
if (count($matches[0]) === 4) {
echo "Pass" . PHP_EOL;
} else {
echo "Failure, didnt split out CompanyStructureItem from CompanyStructureEntity" . PHP_EOL;
} If I can get the |
This workaround handles a bug where parseTypes also contains the data from below it ``` union X = Y | Z type foo {} ``` Would return ```php [ 'x' => 'union X = Y | Z type foo {}' ] ``` instead of ```php [ 'x' => 'union X = Y | Z', 'foo' => 'type foo {}' ] ``` This workaround deals only with union types, so doesnt materially affect anything in most cases.
@magento run all tests |
The requested builds are added to the queue. You should be able to see them here within a few minutes. Please re-request them if they don't show in a reasonable amount of time. |
@magento run all tests |
The requested builds are added to the queue. You should be able to see them here within a few minutes. Please re-request them if they don't show in a reasonable amount of time. |
This is the best attempt I can do at resolving the issue with It should be generic enough that it works with any union type I think. It's not great engineering, but downtime is worse so I'm going to leave it as is until someone from Magento wishes to discuss it further. |
Can anyone from Magento chime in on whats happening with this? It's 10 months that it's been open and it's a pretty serious issue. I know a bunch of users in the Magento slack have applied this as a patch and have had it fix their PWA sites. This patch (or an alternative solution) is necessary for sites to perform properly as graphql schema / customisations are added. I can see there's changes in Like #34651 adds a new type which would mean rework for this patch. It's hard to keep fixing a moving target, and I do need to keep fixing it as our clients cant have the downtime. |
Hi @convenient, you PR was merged into internal repo and will change status soon, thank you so much for your contribution |
@vzabaznov thank you! |
This is still a bit work in progress as we need to discuss some of the testing I think.
There is a performance issue when reading all the graphql schema and stitching them together, each new schema adds significant runtime and on a PWA/headless magento site with many customisation this adds up.
Usually this is well hidden because the parsed schema are held in cache but when a cache flush occurs while many graphql requests are occurring the system can suffer a cache stampede which brings down the site.
In this pull request I improve the performance by over 80% percentage on vanilla, but this would be more in a customised magento instance.
Related Pull Requests
The symptoms
When the cache is flushed when the site is under load we experience significant slowdown of the graphql requests which cause a minute or so downtime. Each graphql request tries to parse the configuration and store it in the cache at the same time, and as this process is CPU intensive it causes the issue.
In this screenshot we can see that some of the requests take 89 seconds to process all the graphql schema.
They also go alongside large spikes
We would not expect a cache flush to cause this problem, as the site performs well in all other aspects.
The problem
magento2/lib/internal/Magento/Framework/GraphQlSchemaStitching/GraphQlReader.php
Lines 91 to 105 in 2a4641b
The magento GraphQLSchemaStitching logic currently works like so
schema.graphqls
files and their content$partialSchemaTypes
) with keys likeComplexTextValue
and value liketype ComplexTextValue { html: String! @doc(description: "HTML format")}
$knownTypes
)webonyx/graphql-php
code to build full array of types and schema into an AST, merge these results with all previously built AST ($results
)As we progress further through the list of available
schema.graphqls
the total number of identified types ($knownTypes
) grows which are then fed into the AST generation repeatedly. This is so that we can declare a new schema file and if it refer to something likeMoney
the AST builder needs to have those definitions available or an exception is thrown.There is a performance problem with GraphQL when parsing large data documents, see webonyx/graphql-php#244 and the comment at webonyx/graphql-php#244 (comment)
I added some logging locally
On a vanilla magento instance of 2.4-develop we can see the following results, that the total size of the schema parsed grows and grows and that we parse once per every graphql file. This means that for every new graphql file added we have to parse the entire schema again.
The solution
We cannot realistically fix
webonyx/graphql-php
to fix this issue with large documents, it seems a bit of a systemic issue with graphql. What we can do is try to reduce the amount of documents Magento needs to produce, and to reduce their size.By improving the performance of this task we can remove the CPU bottleneck which causes downtime.
In this pull request
readPartialTypes
by batching the schemas from many differentschema.graphqls
into one parse.schema.graphqls
scales much better and only brings in the required types for that given schema rather than the all schemas.My solution here feels a bit weird, doing some static analysis over the available schema before building the AST, but the results are good.
A blackfire comparison shows a vast improvement
Looking at the same log we have called
readPartialTypes
17 times rather than 34 and the string size of the documents parsed are significantly smaller.Automated testing
I added a test which stitches together all the graphql schema from the system just to verify the process ran complete from end to end without any mocking.
However to make this change I needed to remove some of the work done in #28747. In that pull request meta data is added to the parsed schema per
schema.graphqls
, however we no longer process the data file by file because of the performance issues involved.The test I added covers that the entire schema does generate in order, but it will be breaking the tests added in that PR.
How do we want to proceed with this? Having a test to verify dependencies is a good idea, but for the sake of performance we cannot process each graphql document file by file, it does not work at an enterprise level site with many graphql requests being fired.
Manual testing
Manual testing method
I have a few PHP scripts to investigate the graphql schema in the cache and report on its contents after triggering a graphql request.
You start by running
test.sh
with these scripts in the magento root.Manual testing results
I have to sort the data as my method changes the ordering of some of the fields, I also have to strip out the metadata that I discussed above in automated testing notes.
We can see that before and after this change we produce a string of the same length, with the same md5 hash.
This gives me confidence that we're producing the same graphql schema for the magento system at a fraction of the runtime.
Before change
After change
Contribution checklist (*)