Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(gatsby): drop severe scaling regression caused by analytics #24709

Merged
merged 2 commits into from
Jun 2, 2020

Conversation

pvdz
Copy link
Contributor

@pvdz pvdz commented Jun 2, 2020

Regression introduced in #22851

The problem seems to be that these calls to v8.serialize trigger the gc to start a full hold-the-world mark-and-sweep step sooner. In a benchmark of 150k pages, the step would trigger almost always after between 100k and 110k queries had run, and it would pause the process for 60+ seconds.

Example benchmark results from before and after that PR:

info bootstrap finished - 86.758 s
success Building production JavaScript and CSS bundles - 9.404s
success run queries - 205.676s - 150002/150002 729.31/s
success Building static HTML for pages - 142.800s - 150002/150002 1050.44/s
info Done building in 451.33 sec
info bootstrap finished - 85.933 s
success Building production JavaScript and CSS bundles - 8.335s
success run queries - 84.795s - 150002/150002 1769.00/s
success Building static HTML for pages - 141.000s - 150002/150002 1063.84/s
info Done building in 320.158 sec

This is very consistent behavior. We looked at the change and agreed that the best was to just drop this measurement since it was for the sake of analytics and a non-vital metric to record. We'd rather have the perf than the metric.

Numbers for the fix, same benchmark, first on current master and then on this PR:

info bootstrap finished - 79.788s
success Building production JavaScript and CSS bundles - 9.635s
success run queries - 201.542s - 150002/150002 744.27/s
success Building static HTML for pages - 141.535s - 150002/150002 1059.82/s
info Done building in 440.766 sec
info bootstrap finished - 80.751s
success Building production JavaScript and CSS bundles - 9.570s
success run queries - 87.162s - 150002/150002 1720.95/s
success Building static HTML for pages - 142.609s - 150002/150002 1051.84/s
info Done building in 319.151 sec

nice!

@pvdz pvdz requested a review from a team as a code owner June 2, 2020 11:36
@gatsbot gatsbot bot added the status: triage needed Issue or pull request that need to be triaged and assigned to a reviewer label Jun 2, 2020
Regression introduced in #22851

The problem seems to be that these calls to `v8.serialize` trigger the gc to start a full hold-the-world mark-and-sweep step sooner. In a benchmark of 150k files, the step would trigger almost always after between 100k and 110k queries had run, and it would pause the process for 60+ seconds.

Example benchmark results from before and after that PR:

```
info bootstrap finished - 86.758 s
success Building production JavaScript and CSS bundles - 9.404s
success run queries - 205.676s - 150002/150002 729.31/s
success Building static HTML for pages - 142.800s - 150002/150002 1050.44/s
info Done building in 451.33 sec
```

```
info bootstrap finished - 85.933 s
success Building production JavaScript and CSS bundles - 8.335s
success run queries - 84.795s - 150002/150002 1769.00/s
success Building static HTML for pages - 141.000s - 150002/150002 1063.84/s
info Done building in 320.158 sec
```

This is very consistent behavior. We looked at the change and agreed that the best was to just drop this measurement since it was for the sake of analytics and a non-vital metric to record. We'd rather have the perf than the metric.

Numbers for the fix, same benchmark, first on current master and then on this PR:

```
info bootstrap finished - 79.788s
success Building production JavaScript and CSS bundles - 9.635s
success run queries - 201.542s - 150002/150002 744.27/s
success Building static HTML for pages - 141.535s - 150002/150002 1059.82/s
info Done building in 440.766 sec
```

```
info bootstrap finished - 80.751s
success Building production JavaScript and CSS bundles - 9.570s
success run queries - 87.162s - 150002/150002 1720.95/s
success Building static HTML for pages - 142.609s - 150002/150002 1051.84/s
info Done building in 319.151 sec
```

nice!
@pvdz pvdz force-pushed the drop-uniqueOperations branch from eeb1826 to 7c8ce64 Compare June 2, 2020 11:43
wardpeet
wardpeet previously approved these changes Jun 2, 2020
Copy link
Contributor

@wardpeet wardpeet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@pvdz pvdz added topic: GraphQL Related to Gatsby's GraphQL layer topic: performance Related to runtime & build performance topic: scaling builds and removed status: triage needed Issue or pull request that need to be triaged and assigned to a reviewer labels Jun 2, 2020
@pvdz
Copy link
Contributor Author

pvdz commented Jun 2, 2020

While this is already a good fix, I think I need to dig a little deeper. According to old benchmark dumps this is what master on February 12th did for this benchmark:

info bootstrap finished - 115.315 s
success Building production JavaScript and CSS bundles - 5.162s
success run queries - 55.484s - 145178/145178 2616.58/s
success Building static HTML for pages - 113.598s - 145178/145178 1278.00/s
info Done building in 292.68 sec
Done in 298.79s.

So it seems we could still pump 50% faster. Hopefully that regression is also fairly easy to find.

@pvdz
Copy link
Contributor Author

pvdz commented Jun 2, 2020

Went to repo tag gatsby@2.19.17:

info bootstrap finished - 86.401 s                                                                                                                                                
success Building production JavaScript and CSS bundles - 8.012s                                                                                                                   
success run queries - 78.374s - 150002/150002 1913.92/s                                                                                                                           
success Building static HTML for pages - 140.455s - 150002/150002 1067.97/s                                                                                                       
info Done building in 312.881 sec                                                                                                                                                 

And repo at tag gatsby@2.19.16:

info bootstrap finished - 88.241 s                                                                                                                                                
success Building production JavaScript and CSS bundles - 6.144s                                                                                                                   
success run queries - 75.819s - 150002/150002 1978.43/s                                                                                                                           
success Building static HTML for pages - 140.238s - 150002/150002 1069.63/s                                                                                                       
info Done building in 312.954 sec                       

Hard to tell what the regression might have been or how I was able to get the perf mentioned above. Could be a plugin that wasn't bumped or ... I'm not sure. But I'm pretty sure it's not worth chasing this kinda of delta over such a long timespan (over 3 months) so let's look forward.

@pvdz pvdz merged commit 2528a85 into master Jun 2, 2020
@delete-merged-branch delete-merged-branch bot deleted the drop-uniqueOperations branch June 2, 2020 13:09
axe312ger pushed a commit that referenced this pull request Jun 23, 2020
)

* perf(gatsby): drop severe scaling regression caused by analytics

Regression introduced in #22851

The problem seems to be that these calls to `v8.serialize` trigger the gc to start a full hold-the-world mark-and-sweep step sooner. In a benchmark of 150k files, the step would trigger almost always after between 100k and 110k queries had run, and it would pause the process for 60+ seconds.

Example benchmark results from before and after that PR:

```
info bootstrap finished - 86.758 s
success Building production JavaScript and CSS bundles - 9.404s
success run queries - 205.676s - 150002/150002 729.31/s
success Building static HTML for pages - 142.800s - 150002/150002 1050.44/s
info Done building in 451.33 sec
```

```
info bootstrap finished - 85.933 s
success Building production JavaScript and CSS bundles - 8.335s
success run queries - 84.795s - 150002/150002 1769.00/s
success Building static HTML for pages - 141.000s - 150002/150002 1063.84/s
info Done building in 320.158 sec
```

This is very consistent behavior. We looked at the change and agreed that the best was to just drop this measurement since it was for the sake of analytics and a non-vital metric to record. We'd rather have the perf than the metric.

Numbers for the fix, same benchmark, first on current master and then on this PR:

```
info bootstrap finished - 79.788s
success Building production JavaScript and CSS bundles - 9.635s
success run queries - 201.542s - 150002/150002 744.27/s
success Building static HTML for pages - 141.535s - 150002/150002 1059.82/s
info Done building in 440.766 sec
```

```
info bootstrap finished - 80.751s
success Building production JavaScript and CSS bundles - 9.570s
success run queries - 87.162s - 150002/150002 1720.95/s
success Building static HTML for pages - 142.609s - 150002/150002 1051.84/s
info Done building in 319.151 sec
```

nice!

* And drop the import
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic: GraphQL Related to Gatsby's GraphQL layer topic: performance Related to runtime & build performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants