-
-
Notifications
You must be signed in to change notification settings - Fork 8.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
After updating to v3.1, large repo build takes 3 hours #9754
Comments
Hey No we didn't change anything recently that could lead to such a significant difference. But your report is not clear enough. What was the version of Docusaurus you used before exactly? How long did it take to build previously? Can you replicate this only on your computer, or also on CI such as GitHub Actions? What was the upgrade PR? Are we even sure it's Docusaurus fault? Your log shows that How comes you are reporting using Node.js 16.4 while Docusaurus v3.0 requires Node 18? |
First off, huge fan of Docusaurus. Wanted to comment along. Might be tangential to this, we also were increasing a 2x increase in build times upgrading from Docusaurus 3.0.1 to 3.1. We ended up downgrading back to 3.0.1. We leverage our own CI Solution, Harness CI Enterprise. 3.0.1 Builds: 8-9 mins We are wanting to dig in a little further if anyone on Docusaurus Project Side can weigh in The big increase comes between Server Compile and the "done" hook.
Node Build Version: 18.19.0 Thanks for a great project! |
Was using 3.0.1.
It took under 30 minutes.
My Docusaurus site is pretty big and doesn't fit on CI machines. RAM usage used to spike to 14GB the sealing process, and all CI machines crashed at this point.
I am sure. Every other script finishes under 1 minute, and the it's only the Docusaurus build step that hangs.
I am using v18.17.1. Where did you get this information, may I ask? |
+1 on the sealing process, where the resource usage/time seems to spike for us also. Anything added to that process from 3.0.1 -> 3.1, e.g On Broken Anchors? Thanks! |
If it's coming from the brokenAnchor you may try to put onBrokenAnchors in docusaurus.config file to Maybe you can disable it in your CI but still have a build process somewhere that you run manually / every few times to check for broken links / anchors |
Just to add we've recently rolled back from 3.1 to 3.0.1 for this exact same issue (we also have a large site). Normally would take approx 45 mins to build, and with 3.1 moves to just over 2 hours. However, maybe of interest, when we initially rolled back we updated our package-lock.json and noticed the build times stayed the same (close to 2 hours). Reverting to the original package-lock.json prior to our 3.1 upgrade that we used when originally on 3.0.1, the build went back to 45 mins. I've just tried it again, and when using 3.0.1, and building without a package-lock.json to use the latest dependencies, the build time more than doubles. As an aside, onBrokenAnchors: "ignore", made no difference for us (and we also fixed all the broken anchors). |
Thanks @OzakIOne it's a great feature. Curious, we noticed the same behavior with |
@andrewgbell it looks like the build-time increase is not related to the 3.1 upgrade, but rather the upgrade of a transitive dependency that has a perf regression. It would be super helpful for me to be able see/run that upgrade myself and study the Can someone share a site / branch that build faster in 3.0.1, and where I could reproduce the build time regression by upgrading.
@ravilach I'd recommend to try turning off both I'll try to optimize that better in the future, but in the meantime the code looks like this: if (onBrokenLinks === 'ignore' && onBrokenAnchors === 'ignore') {
return;
}
const brokenLinks = getBrokenLinks({
routes,
collectedLinks: normalizeCollectedLinks(collectedLinks),
});
reportBrokenLinks({brokenLinks, onBrokenLinks, onBrokenAnchors}); Note: is this possible that you encounter longer build times only due to cache eviction. We use Webpack with persistent caching and on rebuilds it's supposed to rebuild faster. It may be possible that your site builds longer simply because the caches were empty? In this case I suggest trying to run |
@anaclumos I tried using your repo before the upgrade (https://github.com/anaclumos/extracranial/tree/f144432acdfff55d741a1dbc568ae0b51dd052fe) but the usage of Bun package manager makes it inconvenient to troubleshoot. First when I run Then, the binary format of the lockfile makes it super inconvenient to inspect and diff. Maybe I could try using the exact same version of Bun you are using, and it would not upgrade? For now I'm unable to troubleshoot this using your repo. |
Unfortunately the repo isn't (yet) open source, but I can share the package-lock.json's from both runs if any use and potentially the build log files if there's anything particular you need? |
@andrewgbell I'd have to run this locally myself, partially upgrading some libs in a dichotomic way to find out which transitive dep cause the problem. I doubt seeing a diff will be enough to identify the problem unfortunately, I need to run the code. |
Ours is Open Source: https://github.com/harness/developer-hub if that helps. Currently on DS 3.0.1. Here is the yarn.lock from the 3.1 upgrade: https://github.com/harness/developer-hub/blob/7b5fbafc4036f61d30e094362a67204cc573cf7a/yarn.lock |
@slorber If you need another repo, let me know as I can invite you into our org. |
Still investigating your site @ravilach, but it looks like there are 2 problems:
Have any of you tried to upgrade without fully regenerating the lockfile, and disabling all the broken link checkers? yarn upgrade @docusaurus/core@latest @docusaurus/cssnano-preset@latest @docusaurus/plugin-client-redirects@latest @docusaurus/plugin-debug@latest @docusaurus/plugin-google-analytics@latest @docusaurus/plugin-google-gtag@latest @docusaurus/plugin-sitemap@latest @docusaurus/preset-classic@latest @docusaurus/theme-classic@latest @docusaurus/theme-mermaid@latest @docusaurus/theme-search-algolia@latest @docusaurus/module-type-aliases@latest @docusaurus/tsconfig@latest
|
Thanks @slorber, much appreciated! |
I migrated to pnpm. |
This comment was marked as duplicate.
This comment was marked as duplicate.
Hi, I've added:
alongside running
And build time dropped back to the expected (in fact a few minutes quicker, approx 40 mins). I've tried removing However build time just back up again to over 2 hours. I've also tried adding these ignores again but this time upgrading the whole package-lock.json. As of today, it now runs through at the same speed as above so not sure if a dependency has updated since. So looks like you're correct on the brokenlinks and anchors seems to have a far greater impact. |
Thanks for reporting @andrewgbell I've submitted a PR that should optimize things, likely faster than before: #9778 So far it seems to work on @ravilach site. Could you give it a test by building locally with this modified file?
"use strict";
/**
* Copyright (c) Facebook, Inc. and its affiliates.
*
* This source code is licensed under the MIT license found in the
* LICENSE file in the root directory of this source tree.
*/
Object.defineProperty(exports, "__esModule", { value: true });
exports.handleBrokenLinks = void 0;
const tslib_1 = require("tslib");
const lodash_1 = tslib_1.__importDefault(require("lodash"));
const logger_1 = tslib_1.__importDefault(require("@docusaurus/logger"));
const react_router_config_1 = require("react-router-config");
const utils_1 = require("@docusaurus/utils");
const utils_2 = require("./utils");
function matchRoutes(routeConfig, pathname) {
// @ts-expect-error: React router types RouteConfig with an actual React
// component, but we load route components with string paths.
// We don't actually access component here, so it's fine.
return (0, react_router_config_1.matchRoutes)(routeConfig, pathname);
}
function createBrokenLinksHelper({ collectedLinks, routes, }) {
const validPathnames = new Set(collectedLinks.keys());
// Matching against the route array can be expensive
// If the route is already in the valid pathnames,
// we can avoid matching against it as an optimization
const remainingRoutes = routes
.filter((route) => !validPathnames.has(route.path));
function isPathnameMatchingAnyRoute(pathname) {
if (matchRoutes(remainingRoutes, pathname).length > 0) {
// IMPORTANT: this is an optimization here
// See https://github.com/facebook/docusaurus/issues/9754
// Large Docusaurus sites have many routes!
// We try to minimize calls to a possibly expensive matchRoutes function
validPathnames.add(pathname);
return true;
}
return false;
}
function isPathBrokenLink(linkPath) {
const pathnames = [linkPath.pathname, decodeURI(linkPath.pathname)];
if (pathnames.some((p) => validPathnames.has(p))) {
return false;
}
if (pathnames.some(isPathnameMatchingAnyRoute)) {
return false;
}
return true;
}
function isAnchorBrokenLink(linkPath) {
const { pathname, hash } = linkPath;
// Link has no hash: it can't be a broken anchor link
if (hash === undefined) {
return false;
}
// Link has empty hash ("#", "/page#"...): we do not report it as broken
// Empty hashes are used for various weird reasons, by us and other users...
// See for example: https://github.com/facebook/docusaurus/pull/6003
if (hash === '') {
return false;
}
const targetPage = collectedLinks.get(pathname) || collectedLinks.get(decodeURI(pathname));
// link with anchor to a page that does not exist (or did not collect any
// link/anchor) is considered as a broken anchor
if (!targetPage) {
return true;
}
// it's a not broken anchor if the anchor exists on the target page
if (targetPage.anchors.has(hash) ||
targetPage.anchors.has(decodeURIComponent(hash))) {
return false;
}
return true;
}
return {
collectedLinks,
isPathBrokenLink,
isAnchorBrokenLink,
};
}
function getBrokenLinksForPage({ pagePath, helper, }) {
const pageData = helper.collectedLinks.get(pagePath);
const brokenLinks = [];
pageData.links.forEach((link) => {
const linkPath = (0, utils_1.parseURLPath)(link, pagePath);
if (helper.isPathBrokenLink(linkPath)) {
brokenLinks.push({
link,
resolvedLink: (0, utils_1.serializeURLPath)(linkPath),
anchor: false,
});
}
else if (helper.isAnchorBrokenLink(linkPath)) {
brokenLinks.push({
link,
resolvedLink: (0, utils_1.serializeURLPath)(linkPath),
anchor: true,
});
}
});
return brokenLinks;
}
/**
* The route defs can be recursive, and have a parent match-all route. We don't
* want to match broken links like /docs/brokenLink against /docs/*. For this
* reason, we only consider the "final routes" that do not have subroutes.
* We also need to remove the match-all 404 route
*/
function filterIntermediateRoutes(routesInput) {
const routesWithout404 = routesInput.filter((route) => route.path !== '*');
return (0, utils_2.getAllFinalRoutes)(routesWithout404);
}
function getBrokenLinks({ collectedLinks, routes, }) {
const filteredRoutes = filterIntermediateRoutes(routes);
const helper = createBrokenLinksHelper({
collectedLinks,
routes: filteredRoutes,
});
const result = {};
collectedLinks.forEach((_unused, pagePath) => {
try {
result[pagePath] = getBrokenLinksForPage({
pagePath,
helper,
});
}
catch (e) {
throw new Error(`Unable to get broken links for page ${pagePath}.`, {
cause: e,
});
}
});
return result;
}
function brokenLinkMessage(brokenLink) {
const showResolvedLink = brokenLink.link !== brokenLink.resolvedLink;
return `${brokenLink.link}${showResolvedLink ? ` (resolved as: ${brokenLink.resolvedLink})` : ''}`;
}
function createBrokenLinksMessage(pagePath, brokenLinks) {
const type = brokenLinks[0]?.anchor === true ? 'anchor' : 'link';
const anchorMessage = brokenLinks.length > 0
? `- Broken ${type} on source page path = ${pagePath}:
-> linking to ${brokenLinks
.map(brokenLinkMessage)
.join('\n -> linking to ')}`
: '';
return `${anchorMessage}`;
}
function createBrokenAnchorsMessage(brokenAnchors) {
if (Object.keys(brokenAnchors).length === 0) {
return undefined;
}
return `Docusaurus found broken anchors!
Please check the pages of your site in the list below, and make sure you don't reference any anchor that does not exist.
Note: it's possible to ignore broken anchors with the 'onBrokenAnchors' Docusaurus configuration, and let the build pass.
Exhaustive list of all broken anchors found:
${Object.entries(brokenAnchors)
.map(([pagePath, brokenLinks]) => createBrokenLinksMessage(pagePath, brokenLinks))
.join('\n')}
`;
}
function createBrokenPathsMessage(brokenPathsMap) {
if (Object.keys(brokenPathsMap).length === 0) {
return undefined;
}
/**
* If there's a broken link appearing very often, it is probably a broken link
* on the layout. Add an additional message in such case to help user figure
* this out. See https://github.com/facebook/docusaurus/issues/3567#issuecomment-706973805
*/
function getLayoutBrokenLinksHelpMessage() {
const flatList = Object.entries(brokenPathsMap).flatMap(([pagePage, brokenLinks]) => brokenLinks.map((brokenLink) => ({ pagePage, brokenLink })));
const countedBrokenLinks = lodash_1.default.countBy(flatList, (item) => item.brokenLink.link);
const FrequencyThreshold = 5; // Is this a good value?
const frequentLinks = Object.entries(countedBrokenLinks)
.filter(([, count]) => count >= FrequencyThreshold)
.map(([link]) => link);
if (frequentLinks.length === 0) {
return '';
}
return logger_1.default.interpolate `
It looks like some of the broken links we found appear in many pages of your site.
Maybe those broken links appear on all pages through your site layout?
We recommend that you check your theme configuration for such links (particularly, theme navbar and footer).
Frequent broken links are linking to:${frequentLinks}`;
}
return `Docusaurus found broken links!
Please check the pages of your site in the list below, and make sure you don't reference any path that does not exist.
Note: it's possible to ignore broken links with the 'onBrokenLinks' Docusaurus configuration, and let the build pass.${getLayoutBrokenLinksHelpMessage()}
Exhaustive list of all broken links found:
${Object.entries(brokenPathsMap)
.map(([pagePath, brokenPaths]) => createBrokenLinksMessage(pagePath, brokenPaths))
.join('\n')}
`;
}
function splitBrokenLinks(brokenLinks) {
const brokenPaths = {};
const brokenAnchors = {};
Object.entries(brokenLinks).forEach(([pathname, pageBrokenLinks]) => {
const [anchorBrokenLinks, pathBrokenLinks] = lodash_1.default.partition(pageBrokenLinks, (link) => link.anchor);
if (pathBrokenLinks.length > 0) {
brokenPaths[pathname] = pathBrokenLinks;
}
if (anchorBrokenLinks.length > 0) {
brokenAnchors[pathname] = anchorBrokenLinks;
}
});
return { brokenPaths, brokenAnchors };
}
function reportBrokenLinks({ brokenLinks, onBrokenLinks, onBrokenAnchors, }) {
// We need to split the broken links reporting in 2 for better granularity
// This is because we need to report broken path/anchors independently
// For v3.x retro-compatibility, we can't throw by default for broken anchors
// TODO Docusaurus v4: make onBrokenAnchors throw by default?
const { brokenPaths, brokenAnchors } = splitBrokenLinks(brokenLinks);
const pathErrorMessage = createBrokenPathsMessage(brokenPaths);
if (pathErrorMessage) {
logger_1.default.report(onBrokenLinks)(pathErrorMessage);
}
const anchorErrorMessage = createBrokenAnchorsMessage(brokenAnchors);
if (anchorErrorMessage) {
logger_1.default.report(onBrokenAnchors)(anchorErrorMessage);
}
}
// Users might use the useBrokenLinks() API in weird unexpected ways
// JS users might call "collectLink(undefined)" for example
// TS users might call "collectAnchor('#hash')" with/without #
// We clean/normalize the collected data to avoid obscure errors being thrown
// We also use optimized data structures for a faster algorithm
function normalizeCollectedLinks(collectedLinks) {
const result = new Map();
Object.entries(collectedLinks).forEach(([pathname, pageCollectedData]) => {
result.set(pathname, {
links: new Set(pageCollectedData.links.filter(lodash_1.default.isString)),
anchors: new Set(pageCollectedData.anchors
.filter(lodash_1.default.isString)
.map((anchor) => (anchor.startsWith('#') ? anchor.slice(1) : anchor))),
});
});
return result;
}
async function handleBrokenLinks({ collectedLinks, onBrokenLinks, onBrokenAnchors, routes, }) {
if (onBrokenLinks === 'ignore' && onBrokenAnchors === 'ignore') {
return;
}
const brokenLinks = getBrokenLinks({
routes,
collectedLinks: normalizeCollectedLinks(collectedLinks),
});
reportBrokenLinks({ brokenLinks, onBrokenLinks, onBrokenAnchors });
}
exports.handleBrokenLinks = handleBrokenLinks; |
@slorber Yes, that worked great! Replaced the file and ran it with: onBrokenLinks: "warn", And it built just as quick as earlier. Thanks for all your help with this! |
Thanks @andrewgbell Don't you see any improvement too? On @ravilach site (that I simplified a bit, just 1 docs plugin instance instead of 5), I see a significant improvement in time to handle broken links and total build time.
|
Hi @slorber , sorry yes. I'd been comparing 3.1 optimisations with 3.1 ignore broken links so hadn't spotted it. But looking again we get: 3.0 build time with handleBrokenLinks - 54 mins 3.1 (without fix) build time with handleBrokenLinks - 137 mins 3.1 (with fix) build time with handleBrokenLinks - 41 mins So very significant! Thanks! |
Awesome news then 🎉 thanks for reporting |
Just updated. It's even faster than before!! Thank you so much 😃 |
awesome news @anaclumos Do you mind sharing numbers? How much faster is it? |
It used to take around 20 minutes. Now it finishes around 11 minutes. |
🤯 Didn't expect it to have such an impact. Finally this perf regression was a good thing 😄 |
Have you read the Contributing Guidelines on issues?
Prerequisites
npm run clear
oryarn clear
command.rm -rf node_modules yarn.lock package-lock.json
and re-installing packages.Description
On the last line, take a look at 3h 30m 19s. Even though the client and server was compiled in ~4m, it just hangs there forever, and the node process takes ~7GB of RAM. In previous versions it went up to ~14GB; was there any change in how docusaurus limit RAM usage in sacrifice of compilation speed?
Reproducible demo
https://github.com/anaclumos/extracranial
Steps to reproduce
all-in-one:build
Expected behavior
Compiles relatively fast, preferably under 30 minutes
Actual behavior
Takes 3 hours to build.
Your environment
Self-service
The text was updated successfully, but these errors were encountered: