perf(gatsby-plugin-mdx): prevent babel parse step at sourcing time #25437

pvdz · 2020-07-01T11:13:22Z

The mdx plugin was doing its default parsing step for every time it got called. At sourcing time it's only called to retrieve the import bindings and for this we can be much faster by manually processing the import statements. So that's what this does.

This change reduces baseline mdx processing by 30% by skipping a Babel parse step in one of the bootstrap steps

A very simple flat no-image mdx benchmark cuts down sourcing time in half for this change.

Benchmark numbers

This are the numbers for a simple, flat, local sourced, zero-image, mdx benchmark where each mdx file is automatically generated and look like this:

---
articleNumber: 1
title: "Maiores iusto autem dolorum."
path: 'maiores-iusto-autem-dolorum.'
date: 2018-04-14
---

import { Link } from "gatsby"

<Link to="/">Go Home</Link>

Reiciendis assumenda molestiae a. Ea aperiam aut numquam id. Ratione sint saepe illum est porro asperiores quas ut tempora. Minima sit enim deserunt maxime.
 
Earum iure et explicabo quis qui inventore modi commodi qui. Dolorem ab quo minima neque suscipit voluptas odio deleniti occaecati. Dolore temporibus ut fugiat facilis maiores dignissimos ut eius. Eius similique eos aut magnam quaerat modi. Autem dolor saepe quibusdam reprehenderit error et aut illo.

So there's a header, an import, a link, and two paragraphs of random garble (pre-generated, not part of the timings). The benchmark can run with an arbitrary number of pages (N=1000 yarn bench) so that's what we do for 1k, 2k, 4k, 8k, and 16k. By 2x-ing every step we can detect logaritmic perf problems easier. While the baseline Gatsby pipeline holds for a million pages easily, for mdx that ceiling is a lot lower right now (that's what I'm trying to improve :p )

The benchmark runs on a dedicated Intel NUC, which is a headless linux setup, specifically to benchmark, so numbers ought to be fairly consistent relative to other runs.

Data

Master is currently roughly on gatsby-plugin-mdx@1.2.19 and gatsby@2.23.12 ( + whatever is unpublished).

The raw data for both runs can be found in two gists:

1000 pages:
- master:
  - sourcing: 30s
  - bootstrap: 33s
  - finished: 59s
- PR:
  - sourcing: 14s
  - bootstrap: 17s
  - finished: 45s
- delta: -14s
- e2e page speed 16p/s -> 22p/s
2000 pages:
- master
  - sourcing: 55s
  - bootstrap: 59s
  - finished: 104s
- PR:
  - sourcing: 25s
  - bootstrap: 29s
  - finished: 76s
- delta: -28s
- e2e page speed 19p/s -> 26p/s
4000 pages:
- master
  - sourcing: 107s
  - bootstrap: 112s
  - finished: 193s
- PR:
  - sourcing: 48s
  - bootstrap: 52s
  - finished: 137s
- delta: -56s
- e2e page speed 20p/s -> 29p/s
8000 pages:
- master
  - sourcing: 221s
  - bootstrap: 227s
  - finished: 388s
- PR:
  - sourcing: 104s
  - bootstrap: 110s
  - finished: 281s
- delta: -107s
- e2e page speed 20p/s -> 28p/s
16000 pages:
- master
  - sourcing: 464s
  - bootstrap: 474s
  - finished: 819s
- PR:
  - sourcing: 218s
  - bootstrap: 226s
  - finished: 572s
- delta: -247s
- e2e page speed 19p/s -> 27p/s (I mean, 27.97 ...)

gatsby-cloud-staging · 2020-07-06T09:48:51Z

Your pull request can be previewed in Gatsby Cloud: https://build-fcca516d-a972-47c2-87c6-e2a80c0f9051.staging-previews.gtsb.io

The mdx plugin was doing its default parsing step for every time it got called. At sourcing time it's only called to retrieve the import bindings and for this we can be much faster by manually processing the import statements. So that's what this does. A very simple flat no-image mdx benchmark cuts down sourcing time in half for this change.

packages/gatsby-plugin-mdx/utils/import-parser.js

pvdz · 2020-07-07T14:50:31Z

Confirmed a canary to build (https://github.com/gatsbyjs/gatsby/pull/25569/checks?check_run_id=845921135)

johno

😻

We recently began using a new API in MDX with #25437 which doesn't exist for all versions below 1.5.9. This updates the peer dep for MDX to ensure that the new minimum version is specified. Fixes #25691

…25798) We recently began using a new API in MDX with #25437 which doesn't exist for all versions below 1.5.9. This updates the peer dep for MDX to ensure that the new minimum version is specified. Fixes #25691

pvdz added topic: MDX topic: performance Related to runtime & build performance labels Jul 1, 2020

pvdz requested a review from a team as a code owner July 1, 2020 11:13

gatsbot bot added the status: triage needed Issue or pull request that need to be triaged and assigned to a reviewer label Jul 1, 2020

pvdz removed the status: triage needed Issue or pull request that need to be triaged and assigned to a reviewer label Jul 1, 2020

pvdz force-pushed the skip-mdx-babel-once branch 3 times, most recently from 10daace to c424102 Compare July 6, 2020 09:45

pvdz force-pushed the skip-mdx-babel-once branch from c424102 to 9edcaf3 Compare July 7, 2020 11:46

pvdz mentioned this pull request Jul 7, 2020

test: do not commit #25565

Closed

Support multiple import statements together

6969461

pvdz mentioned this pull request Jul 7, 2020

test; do not merge #25569

Closed

pieh reviewed Jul 7, 2020

View reviewed changes

packages/gatsby-plugin-mdx/utils/import-parser.js Outdated Show resolved Hide resolved

Drop comments

9e4e124

johno approved these changes Jul 7, 2020

View reviewed changes

pvdz merged commit 760845a into master Jul 8, 2020

delete-merged-branch bot deleted the skip-mdx-babel-once branch July 8, 2020 07:24

stefcameron mentioned this pull request Jul 10, 2020

gatsby-plugin-mdx peer dep on mdx-js/mdx must be increased to ^1.5.9 #25691

Closed

johno mentioned this pull request Jul 16, 2020

fix(gatsby-plugin-mdx): Specify new minimum peerDep version for MDX #25798

Merged

johno mentioned this pull request Jul 16, 2020

MDX performance mdx-js/mdx#1152

Closed

blainekasten mentioned this pull request Jul 20, 2020

error "gatsby-plugin-mdx" threw an error while running the onCreateNode lifecycle: #25734

Closed

lucasterra mentioned this pull request Aug 2, 2020

Fix gatsby-plugin-mdx not running plugins on findImports(...) #26191

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(gatsby-plugin-mdx): prevent babel parse step at sourcing time #25437

perf(gatsby-plugin-mdx): prevent babel parse step at sourcing time #25437

pvdz commented Jul 1, 2020 •

edited

Loading

gatsby-cloud-staging bot commented Jul 6, 2020 •

edited

Loading

pvdz commented Jul 7, 2020

johno left a comment

perf(gatsby-plugin-mdx): prevent babel parse step at sourcing time #25437

perf(gatsby-plugin-mdx): prevent babel parse step at sourcing time #25437

Conversation

pvdz commented Jul 1, 2020 • edited Loading

Benchmark numbers

Data

gatsby-cloud-staging bot commented Jul 6, 2020 • edited Loading

pvdz commented Jul 7, 2020

johno left a comment

Choose a reason for hiding this comment

pvdz commented Jul 1, 2020 •

edited

Loading

gatsby-cloud-staging bot commented Jul 6, 2020 •

edited

Loading