Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance - Reduce build time and memory usage - use alternative webpack JS loaders #4765

Open
slorber opened this issue May 11, 2021 · 124 comments
Labels
apprentice Issues that are good candidates to be handled by a Docusaurus apprentice / trainee domain: performance Related to bundle size or perf optimization proposal This issue is a proposal, usually non-trivial change
Milestone

Comments

@slorber
Copy link
Collaborator

slorber commented May 11, 2021

💥 Proposal

With Webpack 5 support, re-build times are now faster.

But we still need to improve the time for the first build which is not so good currently.

Some tools to explore:

It will be hard to decouple Docusaurus totally from Webpack at this point.

But we should at least provide a way for users to use an alternative (non-Babel) JS loader that could be faster and good enough. Docusaurus core should be able to provide a few alternate loaders that would work by default using the theme classic, by just switching a config flag.

If successful and faster, we could make one of those alternate loader the default loader for new sites (when no custom babel config is found in the project).

Existing PR by @SamChou19815 for esbuild: #4532

@slorber slorber added the proposal This issue is a proposal, usually non-trivial change label May 11, 2021
@slorber slorber changed the title Decrease build time Decrease build time - use alternative webpack JS loaders May 11, 2021
@slorber slorber changed the title Decrease build time - use alternative webpack JS loaders Reduce build time - use alternative webpack JS loaders May 11, 2021
@slorber
Copy link
Collaborator Author

slorber commented May 14, 2021

For anyone interested, we added the ability to customize the jsLoader here #4766

This gives the opportunity to replace babel by esbuild, and you can add this in your config:

  webpack: {
    jsLoader: (isServer) => ({
      loader: require.resolve('esbuild-loader'),
      options: {
        loader: 'tsx',
        format: isServer ? 'cjs' : undefined,
        target: isServer ? 'node12' : 'es2017',
      },
    }),
  },

We don't document it yet (apart here).
We may recommend it later for larger sites if it proves to be successful according to feedback from early adopters, so please let us know if that works for your use-case.

Important notes:

  • Docusaurus.io builds with esbuild and the above config
  • browser support, syntax and polyfills might be a little bit different, this is not a 1-1 replacement (https://github.com/privatenumber/esbuild-loader/discussions/170)
  • esbuild does not use browserslist config, you are responsible to provide the right target value (Support browserslist evanw/esbuild#121)
  • browser support seems good enough with es2017, and Docusaurus theme works in a lot of recent browsers
  • use a tool like Browserstack to test browser support (it's easy to get a free account for open-source projects)
  • eventually, use polyfill.io for even older browsers when some DOM APIs are unsupported?

@adventure-yunfei
Copy link
Contributor

came from #4785 (comment).

Just wondering, is this issue aiming to reduce build time for entire site generator (including md/mdx parser, or Docs), or just jsx react pages?

@slorber
Copy link
Collaborator Author

slorber commented Jun 15, 2021

@adventure-yunfei md docs are compiled to React components with MDX, and the alternative js loader like esbuild also process the output of the MDX compiler, so this applies to documentation as well. Check the mdx playground to test the mdx compiler: https://mdxjs.com/playground/

If you have 10k docs you basically need to transpile 10k react components

@adventure-yunfei
Copy link
Contributor

adventure-yunfei commented Jun 15, 2021

@slorber perfect! we're also trying to use esbuild to boost build time (for application project). I'll have a try on this.

BTW I've created a similar large doc site here from our internal project.

Update:
Tested with a higher perf PC:

  • with Docusaurus 2.0.0-beta.0, doc site generation finished in 63min
  • with latest in-dev version, doc site generation finished in 30min.
    Reduced 50% time. 👍

@alphaleonis
Copy link

alphaleonis commented Jun 23, 2021

This gave a nice performance boost, although I think there are still more to be desired. Out of curiosity, what is actually going on that is taking so much time behind the scenes? In our case (a site with around 2,000 .md(x) files) most of the time spent seems to be before and after the "Compiling Client/Compiling Server" progress bars appear and complete.

As it stands, building the site takes around 20 minutes with esbuild, and was closer to 40 minutes before. Then out of curiosity, I just tested to add four versions to our site, and building it. Before using esbuild, the process took just shy of 13 hours(!). Using esbuild it was down to just shy of 8 hours. (Still way too long to be acceptable). So while it was a big improvement, it still seems to be very slow.

In the second case, it reported:

[success] [webpackbar] Client: Compiled successfully in 1.33h
[success] [webpackbar] Server: Compiled successfully in 1.36h

What was going on for the remaining 5 hours? Is this normal behavior, or did we configure something incredibly wrong? And why does it take much longer than four times the amount of time with four versions added?

@slorber
Copy link
Collaborator Author

slorber commented Jun 23, 2021

@alphaleonis it's hard to say without further analysis, but the MDX compiler is transforming each md doc to a React component, that is later processed by babel (or esbuild).

The MDX compiler might be a bottleneck, this is why I'd like to provide an alternate MD parser for cases where MDX is not really needed.

Webpack might also be a bottleneck.

Using esbuild is not enough. Also when using esbuild as a webpack loader, we are not really leveraging the full speed benefits of esbuild. Unfortunately we can't replace replace Webpack by esbuild easily, it's part of our plugin lifecycle API and Webpack is more featured than esbuild (we use various things like file-loader, svgr loader...)

What was going on for the remaining 5 hours? Is this normal behavior, or did we configure something incredibly wrong?

We have enabled Webpack 5 has persistent caching, and rebuild times are much faster. You need to persist node_modules/.cache across build to leverage it.

And why does it take much longer than four times the amount of time with four versions added?

It's hard to tell without measuring on your system. Your system may have not enough memory for Webpack to do its job efficiently, leading to more garbage collection or whatever.

@alphaleonis
Copy link

@slorber Thanks for the explanation. We did try the persistent caching, and it seems to help a lot with the time spent during the "Build server/client" phase (which I assume is Webpack). The machine in question had 16GB memory, and the same was specified as max_old_space_size.

Is there any way we can do some further analysis, such as enabling some verbose logging to get some more details perhaps? Or is this kind of the expected build time for sites of that size? (If so I guess we will have to find another solution for versioning, such as building/deploying each version separately.

@johnnyreilly
Copy link
Contributor

Also when using esbuild as a webpack loader, we are not really leveraging the full speed benefits of esbuild

This is true - but there's still a speed benefit to take advantage of. It's also pretty plug and play to make use of. See my post here:

https://blog.logrocket.com/webpack-or-esbuild-why-not-both/

@slorber
Copy link
Collaborator Author

slorber commented Jun 24, 2021

Is there any way we can do some further analysis, such as enabling some verbose logging to get some more details perhaps?

This is a Webpack-based app, and the plugin system enables you to tweak the Webpack config to your needs (configureWebpack lifecycle) and add logs or whatever you want that can help troubleshoot the system. You can also modify your local docusaurus and add tracing code if you need.

I'm not an expert in Webpack performance debugging so I can't help you much on how to configure webpack and what to measure exactly, you'll have to figure out yourself for now.

Or is this kind of the expected build time for sites of that size?

It's hard to have meaningful benchmarks. Number of docs is one factor but also the size of docs obviously matter so one site is not strictly comparable to another. 40min build time for 2000 mdx docs with babel seems expected when comparing with other sites. Obviously it's too much and we should aim to reduce that build time, but it's probably not an easy thing to do.

(If so I guess we will have to find another solution for versioning, such as building/deploying each version separately.

For large sites, it's definitively the way to go, and is something I'd like to document/encourage more in the future.
It's only useful to keep multiple versions in master when you actively update them.
Once a version becomes unmaintained, you should rather move it to a branch and create a standalone immutable deployment for it, so that your build time does not increase as time passes and your version number increase.

We have made it possible to include "after items" in the version dropdown, so that you can include external links to older versions, and we use it on the Docusaurus site itself:

image

I also want to have a "docusaurus archive" command to support this workflow better, giving the ability to publish a standalone version of an existing site and then remove that version.

@adventure-yunfei
Copy link
Contributor

Tested with a higher perf PC:

  • with Docusaurus 2.0.0-beta.0, doc site generation finished in 63min
  • with latest in-dev version, doc site generation finished in 30min.
    Reduced 50% time. 👍

Saddly the process costs a very large memory.
My local testing environment has 32G memory, but in CICD environment memory limit is 20G. The process is killed cause of OOM, during emitting phase. From the monitor, the memory suddenly increased from 8G to 20G+.

@slorber
Copy link
Collaborator Author

slorber commented Jun 26, 2021

It is unexpected that beta.2 is faster than beta.0, maybe you didn't clear your cache?

The process is killed cause of OOM, during emitting phase. From the monitor, the memory suddenly increased from 8G to 20G+.

What do you mean by the "emitting phase"? I didn't take much time to investigate all this so any info can be useful.

@adventure-yunfei
Copy link
Contributor

It is unexpected that beta.2 is faster than beta.0, maybe you didn't clear your cache?

I'm using the esbuild-loader config from the docusaurus website example. So it should be esbuild making build faster.

What do you mean by the "emitting phase"? I didn't take much time to investigate all this so any info can be useful.

This may not be accurate. The process memory was 7G most times. About 20 minutes later memory jumped to 20.2G while the console showing Client "emitting". After client build finished, the memory dropped down to 7G. (The Server was still building)

@krillboi
Copy link

Trying to test esbuild-loader but running into some trouble.

I have added the following to the top level of my docusaurus.config.js file:

  webpack: {
    jsLoader: (isServer) => ({
      loader: require.resolve('esbuild-loader'),
      options: {
        loader: 'tsx',
        format: isServer ? 'cjs' : undefined,
        target: isServer ? 'node12' : 'es2017',
      },
    }),
  },

I have added the following to my dependencies in package.json:

    "esbuild-loader": "2.13.1",

The install of esbuild-loader fails. Am I missing more dependencies for this to work? Might also be a Windows problem, unsure right now.

@krillboi
Copy link

Seems like it was one of the good ol' corporate proxy issues giving me the install troubles..

I'll try and test the esbuild-loader to see how much faster it is for me.

@krillboi
Copy link

Tested yesterday with production build, took about 3 hours compared to 6 hours before (~400 docs x 5 versions x 4 languages).

So about half the time with the esbuild-loader which is nice. But we are reaching a size of docs where I am now looking into archiving older versions as seen on the Docusaurus site.

This may not be accurate. The process memory was 7G most times. About 20 minutes later memory jumped to 20.2G while the console showing Client "emitting". After client build finished, the memory dropped down to 7G. (The Server was still building)

I witnessed the same thing where the memory usage would suddenly spike up to take 25+ gb.

@slorber
Copy link
Collaborator Author

slorber commented Jun 30, 2021

Thanks for highlighting that, we'll try to figure out why it takes so much memory suddenly

@slorber
Copy link
Collaborator Author

slorber commented Jul 15, 2021

Not 100% related but I expect this PR to improve perf (smaller output) and reduce build time for sites with very large sidebars: #5136 (can't really tell by how much though, it's site specific so please let me know if you see a significant improvement)

@adventure-yunfei
Copy link
Contributor

adventure-yunfei commented Jul 20, 2021

Not 100% related but I expect this PR to improve perf (smaller output) and reduce build time for sites with very large sidebars: #5136 (can't really tell by how much though, it's site specific so please let me know if you see a significant improvement)

Tested my application with latest dev version.

  • Max memory usage: 21G
  • Build time: 34min

Seems not working for my case.

Update:

  • site sizes descreased a little bit. 115M -> 104M.

@adventure-yunfei
Copy link
Contributor

This may not be accurate. The process memory was 7G most times. About 20 minutes later memory jumped to 20.2G while the console showing Client "emitting". After client build finished, the memory dropped down to 7G. (The Server was still building)

I've made another test, using plugin to override .md loader with noop:

// inside docusaurus.config.js
{
  // ...
  plugins: [
    function myPlugin() {
      return {
        configureWebpack() {
          return {
            module: {
              rules: [
                {
                  test: /\.mdx?$/,
                  include: /.*/,
                  use: {
                    loader: require('path').resolve(__dirname, './scripts/my-md-loader.js')
                  }
                }
              ]
            }
          }
        }
      };
    }
  ],
}
// scripts/my-md-loader.js
module.exports = function myPlugin() {
    const callback = this.async();
    return callback && callback(null, 'empty...');
};

And then run doc builder again.

  • build time: 17min
  • max memory: 20+G

So I'm afraid it's the code of page wrapper (e.g. top bar, side navigation, ...) that causes the max memory usage. Switching mdx-loader to another one may won't help.

@slorber
Copy link
Collaborator Author

slorber commented Aug 31, 2021

@adventure-yunfei it's not clear to me how to do those measures, can you explain?

If you allow Docusaurus to take up to 20go, it may end up taking 20go. And it may take more if you give it more. The question is, how much can you reduce the max_old_space_size nodejs setting until it starts crashing due to OOM.

So I'm afraid it's the code of page wrapper (e.g. top bar, side navigation, ...) that causes the max memory usage. Switching mdx-loader to another one may won't help.

Proving a memory issue is not the mdx-loader does not mean it's the "page wrapper". There is much more involved than the React server-side rendering here.

I suspect there are optimizations that can be done in this webpack plugin's fork we use: https://github.com/slorber/static-site-generator-webpack-plugin/blob/master/index.js

Gatsby used it initially and replaced it with some task queueing system.

@adventure-yunfei
Copy link
Contributor

Proving a memory issue is not the mdx-loader does not mean it's the "page wrapper". There is much more involved than the React server-side rendering here.

That's true. By saying "page wrapper" I mean any other code outside the md page content itself. Just trying to provide more perf information to help identify the problem.

More info:

  • max_old_space_size was set as 4096.
  • memory was monitored by windows perf monitor, with following result:
    image
  • the console stuck here for a long time:
    image

molant added a commit to electron/website that referenced this issue Sep 30, 2021
Use `esbuild-loader` to reduce build times. This is currently not
documented on Docusaurus' official docs.

With this change my build times went from 117s to 82s (30% faster).

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Ref: facebook/docusaurus#4765
@Josh-Cena Josh-Cena added this to the 2.0.0 GA milestone Oct 29, 2021
thejcannon added a commit to pantsbuild/pantsbuild.org that referenced this issue Jan 19, 2024
Using suggestions in facebook/docusaurus#4765
which are:

- Using `swc-loader` as a webpack JS loader
- Full disclosure, this is greek to me. I'm just being a lemming and
copying Docusaurus' own configuration
- Caching in CI/deployment
@mapachurro
Copy link

Hello all!

This is the Thread I've Been Looking For.

I'm in the process of trying to get our site live in production; unfortunately repo is still behind the company GitHub org, but: we've got seventeen locales, a total of ~1400 articles, on Docusaurus V3, and we've been hitting the GH Actions / Pages RAM issue hard.

I've tried a few of the fixes mentioned here and elsewhere, and it looks like the next step is webpack.

Is there, at this point, an off-the-shelf process for switching to webpack, or do I need to engineer a custom solution, as above?

Apologies if this is documented somewhere and I missed it.

@PrivatePuffin
Copy link

We over at TrueCharts have it set up for a few thousand of pages, using github actions.
(github.com/truecharts)

@societymartingale
Copy link

Switching to swc-loader reduced my Docusaurus build time by about 20%. This is for a site with 600 mdx files. Build time is 2.5 minutes on an Apple M3 pro and 4.5 minutes in a GitLab CI docker-based build.

@damageboy
Copy link

@societymartingale do you have an example for how this was done with docusarus?

@johnnyreilly
Copy link
Contributor

@damageboy
Copy link

damageboy commented Mar 23, 2024

@johnnyreilly I've just literally found it minutes before you posted, tried and unfortunately for me, it doesn't seem to end up with faster builds...

I'm building 3 different docs sites in one build, on a mac M3 pro.
I've disabled for each of these:

        showLastUpdateAuthor: false,
        showLastUpdateTime: false,

To skip anything that isn't pure building.

On top of that, I've also discovered that it's pretty important to exclude on macOs the build / project folders from spotlight if you don't want to spend most of the CPU time indexing them during the build...

Summary:

Times Without SWC With SWC
Wall 231.47s 243.10s
User 12.44 12.12s
System 223% 229%
Total 1:49.22 1:51.32

Paper trail

Without swc-loader:

❯ yarn clear; time yarn build
yarn run v1.22.22
$ docusaurus clear
[SUCCESS] Removed the Webpack persistent cache folder at "/Users/dmg/projects/docu3/node_modules/.cache".
[SUCCESS] Removed the generated folder at "/Users/dmg/projects/docu3/.docusaurus".
[SUCCESS] Removed the build output folder at "/Users/dmg/projects/docu3/build".
✨  Done in 1.13s.
yarn run v1.22.22
$ docusaurus build
[INFO] [en] Creating an optimized production build...

✔ Client

✔ Server
  Compiled successfully in 1.68m
✨  Done in 109.14s.
yarn build  231.47s user 12.44s system 223% cpu 1:49.22 total

With swc-loader

❯ yarn clear; time yarn build
yarn run v1.22.22
$ docusaurus clear
[SUCCESS] Removed the Webpack persistent cache folder at "/Users/dmg/projects/docu3/node_modules/.cache".
[SUCCESS] Removed the generated folder at "/Users/dmg/projects/docu3/.docusaurus".
[SUCCESS] Removed the build output folder at "/Users/dmg/projects/docu3/build".
✨  Done in 1.09s.
yarn run v1.22.22
$ docusaurus build
[INFO] [en] Creating an optimized production build...

✔ Client

✔ Server
  Compiled successfully in 1.72m
[SUCCESS] Generated static files in "build".
[INFO] Use `npm run serve` command to test your build locally.
✨  Done in 111.24s.
yarn build  243.10s user 12.12s system 229% cpu 1:51.32 total

@PrivatePuffin
Copy link

YOu can also get a decent performance improvement by not processing .md files as .mdx

@damageboy
Copy link

damageboy commented Mar 23, 2024

You can also get a decent performance improvement by not processing .md files as .mdx

Tips on how to get this done?

Are you referring to this

@slorber
Copy link
Collaborator Author

slorber commented Mar 24, 2024

YOu can also get a decent performance improvement by not processing .md files as .mdx

I'm also curious to know what you mean, maybe you are right but my intuition is that it does not have a significant impact since in both cases content is going to be compiled to React components.


Note, I'm actively working on perf optimizations for Docusaurus v3.2.

There are no breaking changes in the 3.x branch yet so if you can run a canary version of Docusaurus and let me know how it improves, I'd be curious to know how faster it is on your site.
https://docusaurus.io/community/canary

The last remaining bottleneck remains the Webpack compilation time, I'll look into that soon.

@societymartingale
Copy link

@damageboy, I added the following to package.json:

"@swc/core": "^1.4.6",
"swc-loader": "^0.2.6"

And added following to docusaurus.config.js:

webpack: {
    jsLoader: (isServer) => ({
      loader: require.resolve("swc-loader"),
      options: {
        jsc: {
          parser: {
            syntax: "typescript",
            tsx: true,
          },
          target: "es2019",
          transform: {
            react: {
              runtime: "automatic",
            },
          },
        },
        module: {
          type: isServer ? "commonjs" : "es6",
        },
      },
    }),
  },

As indicated above, I also disabled the last update author/time feature, as it wasn't scaling well. This saved another minute or two.

showLastUpdateAuthor: false,
showLastUpdateTime: false

@slorber
Copy link
Collaborator Author

slorber commented Mar 25, 2024

I also disabled the last update author/time feature, as it wasn't scaling well.

This major perf issue is fixed in canary (#9890) and will be released in v3.2

@andrewgbell
Copy link

3.2 for us has seen a major improvement in build times (thanks @slorber !). Running on Github actions, 4 core 16gb linux runner, we have the following

3.1 build time - 35 mins

3.2 build time - 23 mins
3.2 build time (esbuild) - 27 mins
3.2 build time (swc) - 21 mins

@Steveantor
Copy link

Why not ditch webpack for esbuild like vite? It's faster and better in every possible way.

@slorber
Copy link
Collaborator Author

slorber commented Apr 19, 2024

@Steveantor it's easier said than done.

First of all, our plugin system has a configureWebpack hook, which means migrating away from Webpack would break most of our ecosystem that would need to update and provide a version compatibility matrix to their user. We also must ensure that an upgrade path is possible for plugin authors under the new bundler, at least for all major plugins.

I'm likely to adopt an incremental migration path thanks to unplugin, which is a more portable abstraction for bundler plugins that would probably be good enough for most Docusaurus plugins that only need a loader.

Also, Vite is only using esbuild in some parts, and uses Rollup for bundling (v4 is based on SWC). And they are also working on Rolldown, a Rust port of Rollup.

Due to how Docusaurus works, and our plugin system, it does not look like a good idea to use different bundling in dev/build modes.

Afaik esbuild supports live reloading but has no official for hot reloading. We don't want to refresh the browser when you edit JS or MD, this would make your page lose its state.

There's also Rspack which aims to be 100% retro-compatible with Webpack and according to this Bun benchmark, it's already quite faster than Webpack.

https://bun.sh/blog/bun-bundler

image

Vercel is also actively working on Turbopack, which also aims to be "mostly" compatible with Webpack, but less than Webpack.

So Rspack is for me the most suitable candidate in the short term, due to the constraints we have, and until other solutions become more mature. I'm likely to introduce "future flags" in Docusaurus and let you swap Webpack with Rspack. It might not work for all third-party plugins (yet), but it should improve over time as Rspack fills the gap.


Note that bundling is a major bottleneck (mostly for "cold builds" with an empty Webpack cache, less for rebuilds) but is not the only performance issue we have in Docusaurus. I fixed some in v3.2 but I have ideas to improve other parts as well.

Notably, I'm not sure the high memory consumption is related to the bundling phase, but rather the SSG phase.

@slorber
Copy link
Collaborator Author

slorber commented Apr 19, 2024

FYI in v3.2 I added some site build perf logging (#9975).

This is considered an internal API for now, but if you are curious to see what takes time on your site you can run your site with DOCUSAURUS_PERF_LOGGER=true, and get this kind of output:

CleanShot 2024-04-19 at 11 53 01@2x

@snake-py
Copy link

snake-py commented May 6, 2024

So I am also running into this issue, but my site is really small. I only have about three pages right now. It seems that the main issue is with CssMinimizerPlugin for me.
image

Is it possible to disable the minimizer?

@slorber
Copy link
Collaborator Author

slorber commented May 7, 2024

You can try running with USE_SIMPLE_CSS_MINIFIER=true docusaurus build and see if it improves.

@Romej
Copy link

Romej commented Jul 25, 2024

I would love to use docusaurus with rspack, we had many legacy CRA projects and we successfully migrated all of them to rspack without much efforts.

with rspack 1.0 around the corner, i am hoping this will be an option soon

@clainchoupi

This comment was marked as off-topic.

@OzakIOne

This comment was marked as off-topic.

@clainchoupi

This comment was marked as off-topic.

@slorber

This comment was marked as off-topic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
apprentice Issues that are good candidates to be handled by a Docusaurus apprentice / trainee domain: performance Related to bundle size or perf optimization proposal This issue is a proposal, usually non-trivial change
Projects
None yet
Development

No branches or pull requests