perf(gatsby): drop severe scaling regression caused by analytics #24709

pvdz · 2020-06-02T11:36:57Z

Regression introduced in #22851

The problem seems to be that these calls to v8.serialize trigger the gc to start a full hold-the-world mark-and-sweep step sooner. In a benchmark of 150k pages, the step would trigger almost always after between 100k and 110k queries had run, and it would pause the process for 60+ seconds.

Example benchmark results from before and after that PR:

info bootstrap finished - 86.758 s
success Building production JavaScript and CSS bundles - 9.404s
success run queries - 205.676s - 150002/150002 729.31/s
success Building static HTML for pages - 142.800s - 150002/150002 1050.44/s
info Done building in 451.33 sec

info bootstrap finished - 85.933 s
success Building production JavaScript and CSS bundles - 8.335s
success run queries - 84.795s - 150002/150002 1769.00/s
success Building static HTML for pages - 141.000s - 150002/150002 1063.84/s
info Done building in 320.158 sec

This is very consistent behavior. We looked at the change and agreed that the best was to just drop this measurement since it was for the sake of analytics and a non-vital metric to record. We'd rather have the perf than the metric.

Numbers for the fix, same benchmark, first on current master and then on this PR:

info bootstrap finished - 79.788s
success Building production JavaScript and CSS bundles - 9.635s
success run queries - 201.542s - 150002/150002 744.27/s
success Building static HTML for pages - 141.535s - 150002/150002 1059.82/s
info Done building in 440.766 sec

info bootstrap finished - 80.751s
success Building production JavaScript and CSS bundles - 9.570s
success run queries - 87.162s - 150002/150002 1720.95/s
success Building static HTML for pages - 142.609s - 150002/150002 1051.84/s
info Done building in 319.151 sec

nice!

Regression introduced in #22851 The problem seems to be that these calls to `v8.serialize` trigger the gc to start a full hold-the-world mark-and-sweep step sooner. In a benchmark of 150k files, the step would trigger almost always after between 100k and 110k queries had run, and it would pause the process for 60+ seconds. Example benchmark results from before and after that PR: ``` info bootstrap finished - 86.758 s success Building production JavaScript and CSS bundles - 9.404s success run queries - 205.676s - 150002/150002 729.31/s success Building static HTML for pages - 142.800s - 150002/150002 1050.44/s info Done building in 451.33 sec ``` ``` info bootstrap finished - 85.933 s success Building production JavaScript and CSS bundles - 8.335s success run queries - 84.795s - 150002/150002 1769.00/s success Building static HTML for pages - 141.000s - 150002/150002 1063.84/s info Done building in 320.158 sec ``` This is very consistent behavior. We looked at the change and agreed that the best was to just drop this measurement since it was for the sake of analytics and a non-vital metric to record. We'd rather have the perf than the metric. Numbers for the fix, same benchmark, first on current master and then on this PR: ``` info bootstrap finished - 79.788s success Building production JavaScript and CSS bundles - 9.635s success run queries - 201.542s - 150002/150002 744.27/s success Building static HTML for pages - 141.535s - 150002/150002 1059.82/s info Done building in 440.766 sec ``` ``` info bootstrap finished - 80.751s success Building production JavaScript and CSS bundles - 9.570s success run queries - 87.162s - 150002/150002 1720.95/s success Building static HTML for pages - 142.609s - 150002/150002 1051.84/s info Done building in 319.151 sec ``` nice!

wardpeet

LGTM!

pvdz · 2020-06-02T12:20:25Z

While this is already a good fix, I think I need to dig a little deeper. According to old benchmark dumps this is what master on February 12th did for this benchmark:

info bootstrap finished - 115.315 s
success Building production JavaScript and CSS bundles - 5.162s
success run queries - 55.484s - 145178/145178 2616.58/s
success Building static HTML for pages - 113.598s - 145178/145178 1278.00/s
info Done building in 292.68 sec
Done in 298.79s.

So it seems we could still pump 50% faster. Hopefully that regression is also fairly easy to find.

pvdz · 2020-06-02T13:09:00Z

Went to repo tag gatsby@2.19.17:

info bootstrap finished - 86.401 s                                                                                                                                                
success Building production JavaScript and CSS bundles - 8.012s                                                                                                                   
success run queries - 78.374s - 150002/150002 1913.92/s                                                                                                                           
success Building static HTML for pages - 140.455s - 150002/150002 1067.97/s                                                                                                       
info Done building in 312.881 sec

And repo at tag gatsby@2.19.16:

info bootstrap finished - 88.241 s                                                                                                                                                
success Building production JavaScript and CSS bundles - 6.144s                                                                                                                   
success run queries - 75.819s - 150002/150002 1978.43/s                                                                                                                           
success Building static HTML for pages - 140.238s - 150002/150002 1069.63/s                                                                                                       
info Done building in 312.954 sec

Hard to tell what the regression might have been or how I was able to get the perf mentioned above. Could be a plugin that wasn't bumped or ... I'm not sure. But I'm pretty sure it's not worth chasing this kinda of delta over such a long timespan (over 3 months) so let's look forward.

) * perf(gatsby): drop severe scaling regression caused by analytics Regression introduced in #22851 The problem seems to be that these calls to `v8.serialize` trigger the gc to start a full hold-the-world mark-and-sweep step sooner. In a benchmark of 150k files, the step would trigger almost always after between 100k and 110k queries had run, and it would pause the process for 60+ seconds. Example benchmark results from before and after that PR: ``` info bootstrap finished - 86.758 s success Building production JavaScript and CSS bundles - 9.404s success run queries - 205.676s - 150002/150002 729.31/s success Building static HTML for pages - 142.800s - 150002/150002 1050.44/s info Done building in 451.33 sec ``` ``` info bootstrap finished - 85.933 s success Building production JavaScript and CSS bundles - 8.335s success run queries - 84.795s - 150002/150002 1769.00/s success Building static HTML for pages - 141.000s - 150002/150002 1063.84/s info Done building in 320.158 sec ``` This is very consistent behavior. We looked at the change and agreed that the best was to just drop this measurement since it was for the sake of analytics and a non-vital metric to record. We'd rather have the perf than the metric. Numbers for the fix, same benchmark, first on current master and then on this PR: ``` info bootstrap finished - 79.788s success Building production JavaScript and CSS bundles - 9.635s success run queries - 201.542s - 150002/150002 744.27/s success Building static HTML for pages - 141.535s - 150002/150002 1059.82/s info Done building in 440.766 sec ``` ``` info bootstrap finished - 80.751s success Building production JavaScript and CSS bundles - 9.570s success run queries - 87.162s - 150002/150002 1720.95/s success Building static HTML for pages - 142.609s - 150002/150002 1051.84/s info Done building in 319.151 sec ``` nice! * And drop the import

pvdz requested a review from a team as a code owner June 2, 2020 11:36

gatsbot bot added the status: triage needed Issue or pull request that need to be triaged and assigned to a reviewer label Jun 2, 2020

pvdz force-pushed the drop-uniqueOperations branch from eeb1826 to 7c8ce64 Compare June 2, 2020 11:43

wardpeet previously approved these changes Jun 2, 2020

View reviewed changes

And drop the import

9eaa7a8

pvdz dismissed wardpeet’s stale review via 9eaa7a8 June 2, 2020 12:00

pvdz added topic: GraphQL Related to Gatsby's GraphQL layer topic: performance Related to runtime & build performance topic: scaling builds and removed status: triage needed Issue or pull request that need to be triaged and assigned to a reviewer labels Jun 2, 2020

wardpeet approved these changes Jun 2, 2020

View reviewed changes

pvdz merged commit 2528a85 into master Jun 2, 2020

delete-merged-branch bot deleted the drop-uniqueOperations branch June 2, 2020 13:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(gatsby): drop severe scaling regression caused by analytics #24709

perf(gatsby): drop severe scaling regression caused by analytics #24709

pvdz commented Jun 2, 2020 •

edited

Loading

wardpeet left a comment

pvdz commented Jun 2, 2020

pvdz commented Jun 2, 2020

perf(gatsby): drop severe scaling regression caused by analytics #24709

perf(gatsby): drop severe scaling regression caused by analytics #24709

Conversation

pvdz commented Jun 2, 2020 • edited Loading

wardpeet left a comment

Choose a reason for hiding this comment

pvdz commented Jun 2, 2020

pvdz commented Jun 2, 2020

pvdz commented Jun 2, 2020 •

edited

Loading