Skip to content

Bloated doc pages hinder crawlers from finding relevant information #3015

@sacOO7

Description

@sacOO7
  • At the moment, Claude Code webfetch complains that the response contains only CSS. See related issue: https://ably.atlassian.net/browse/FTF-227
  • Header-based content negotiation is not sufficient. Modern LLM crawlers, such as Grok, frequently spoof standard browser headers to bypass detection measures. This neglects our reliance on User-Agent or Accept string detection to serve pure Markdown (as referenced in [WEB-4447] Add MDX to Markdown transpilation with content negotiation #3000). Consequently, we must pivot from gating content to ensuring Universal Content Accessibility.
  • Claude code uses both websearch and webfetch tools to obtain up-to-date information. When searching for docs on demand, it doesn't look for llms.txt instead uses websearch tool that depends on third-party providers, such as Google Search, to generate relevant links for a given query.
  • In conclusion, supporting only .md content isn’t sufficient — there should also be a fallback to HTML documentation that returns relevant content. This will be used by emerging LLMs and ensures proper indexing by search engines.

Problem: Pages are 437KB with 91% CSS overhead

  • 400KB: Inline CSS (Tailwind + @ably/ui + syntax highlighting)
  • 30KB: Actual documentation content
  • 7KB: JavaScript bundles

Impact:

  • Poor crawler experience (content buried after 400KB of CSS)
  • Slow page loads
  • Wasted bandwidth
  • LLM tools may truncate responses before reaching content

Comprehensive Guide to Reduce Page Bloat from 437KB to <50KB

🎯 Solution Overview

Target Metrics

  • Total page size: <100KB (77% reduction)
  • Initial HTML: <50KB with content-first structure
  • CSS: Extracted to external file, loaded async
  • Content/Noise ratio: 70% content, 30% overhead

Three-Phase Approach

  1. Phase 1 (Quick Wins): Extract CSS externally - 60% reduction
  2. Phase 2 (Optimization): Tree-shake and code-split - 25% reduction
  3. Phase 3 (Advanced): Critical CSS only - 10% reduction

🚀 Phase 1: Extract CSS Externally (1-2 days)

Goal

Move inline CSS to external file, load it asynchronously after content

Impact

  • Page size: 437KB → 180KB (-59%)
  • Initial HTML: Content-first structure
  • Crawlers: See documentation immediately

Step 1.1: Configure CSS Extraction

File: gatsby-config.ts

Action: Add CSS extraction plugin

// gatsby-config.ts
import type { GatsbyConfig } from 'gatsby';

const config: GatsbyConfig = {
  plugins: [
    // ADD THIS PLUGIN (insert after 'gatsby-plugin-postcss')
    {
      resolve: 'gatsby-plugin-extract-css',
      options: {
        // Extract CSS to external file instead of inlining
        // This will create a /styles.css file
        ignoreOrder: true, // Ignore CSS order warnings
      },
    },

    // ... rest of existing plugins
    'gatsby-plugin-postcss',
    'gatsby-plugin-image',
    // etc.
  ],
};

export default config;

Install dependency:

npm install gatsby-plugin-extract-css --save-dev
# or
yarn add gatsby-plugin-extract-css --dev

Result: All CSS extracted to /styles.{hash}.css, linked via <link> tag instead of inlined.


Step 1.2: Modify Layout to Load CSS Async

File: src/components/Layout/Layout.tsx

Current (imports CSS, gets inlined):

import '../../styles/global.css'; // ← This gets inlined

Change to (load via Helmet after content):

import React from 'react';
import { Helmet } from 'react-helmet';

const Layout = ({ children }) => {
  return (
    <>
      <Helmet>
        {/* Preload CSS file (starts download early, doesn't block render) */}
        <link
          rel="preload"
          href="/styles.css"
          as="style"
          onLoad="this.onload=null;this.rel='stylesheet'"
        />
        {/* Fallback for no-JS */}
        <noscript>{`<link rel="stylesheet" href="/styles.css">`}</noscript>
      </Helmet>

      {/* Content renders FIRST, before CSS */}
      {children}
    </>
  );
};

export default Layout;

Alternative (even better - async load with loadCSS):

import React, { useEffect } from 'react';
import { Helmet } from 'react-helmet';

const Layout = ({ children }) => {
  useEffect(() => {
    // Load CSS asynchronously after page loads
    const link = document.createElement('link');
    link.rel = 'stylesheet';
    link.href = '/styles.css';
    document.head.appendChild(link);
  }, []);

  return (
    <>
      <Helmet>
        {/* Critical inline CSS only (see Phase 3) */}
        <style>{`
          /* Minimal critical CSS here - ~5KB */
          body { font-family: sans-serif; margin: 0; }
          .container { max-width: 1200px; margin: 0 auto; }
        `}</style>
      </Helmet>

      {children}
    </>
  );
};

export default Layout;

Step 1.3: Update HTML Structure (Content First)

File: src/html.js (create if doesn't exist)

Gatsby allows customizing the base HTML template:

// src/html.js
import React from 'react';
import PropTypes from 'prop-types';

export default function HTML(props) {
  return (
    <html {...props.htmlAttributes}>
      <head>
        <meta charSet="utf-8" />
        <meta httpEquiv="x-ua-compatible" content="ie=edge" />
        <meta
          name="viewport"
          content="width=device-width, initial-scale=1, shrink-to-fit=no"
        />

        {/* Meta tags, title - all the SEO stuff */}
        {props.headComponents}

        {/* CRITICAL CSS ONLY - inline just what's needed */}
        <style dangerouslySetInnerHTML={{ __html: `
          /* Minimal above-the-fold CSS */
          body { margin: 0; font-family: system-ui, sans-serif; }
          .docs-content { max-width: 800px; margin: 0 auto; padding: 20px; }
          h1 { font-size: 2rem; font-weight: 700; }
          code { background: #f5f5f5; padding: 2px 6px; border-radius: 3px; }
        `}} />

        {/* CSS loaded async after content */}
        <link rel="preload" href="/styles.css" as="style" onLoad="this.rel='stylesheet'" />
        <noscript><link rel="stylesheet" href="/styles.css" /></noscript>
      </head>

      <body {...props.bodyAttributes}>
        {/* CONTENT COMES FIRST - this is what crawlers see immediately */}
        {props.preBodyComponents}

        <div
          key="body"
          id="___gatsby"
          dangerouslySetInnerHTML={{ __html: props.body }}
        />

        {/* Scripts at the end */}
        {props.postBodyComponents}
      </body>
    </html>
  );
}

HTML.propTypes = {
  htmlAttributes: PropTypes.object,
  headComponents: PropTypes.array,
  bodyAttributes: PropTypes.object,
  preBodyComponents: PropTypes.array,
  body: PropTypes.string,
  postBodyComponents: PropTypes.array,
};

Step 1.4: Test the Changes

# Clean cache
gatsby clean

# Build for production
gatsby build

# Serve and test
gatsby serve

# Check page size
curl -s http://localhost:9000/docs/chat/getting-started/android | wc -c

# Expected: ~150-180KB (down from 437KB)

Verify in browser DevTools:

  1. Open Network tab
  2. Load a docs page
  3. Check HTML document size: Should be <100KB
  4. CSS file loaded separately: ~300KB external file
  5. Content visible before CSS loads

Phase 1 Results

Before:

HTML Document: 437KB
├─ Inline CSS: 400KB (91%)
├─ Content: 30KB (7%)
└─ JS: 7KB (2%)

After Phase 1:

HTML Document: 180KB (-59%)
├─ Critical CSS: 5KB (3%)
├─ Content: 30KB (17%)
├─ Meta/Structure: 145KB (80%)

External CSS: 300KB (loaded async, cached)

Impact:

  • ✅ Crawlers see content in first 100KB
  • ✅ Page loads feel faster (content visible immediately)
  • ✅ CSS cached separately across page navigations
  • ✅ Content/CSS properly separated

🔧 Phase 2: Tree-Shake & Optimize (3-5 days)

Goal

Reduce CSS file size by removing unused styles

Impact

  • CSS file: 300KB → 80KB (-73%)
  • Total bandwidth: 180KB → 90KB (-50%)

Step 2.1: Optimize Tailwind CSS

File: tailwind.config.js

Current issue: Generating all Tailwind utilities

Solution: Configure purge properly

// tailwind.config.js
const ablyUIConfig = require('@ably/ui/tailwind.config.js');

module.exports = {
  presets: [ablyUIConfig],

  content: [
    './src/pages/**/*.{js,jsx,ts,tsx,mdx}',
    './src/components/**/*.{js,jsx,ts,tsx}',
    './src/templates/**/*.{js,jsx,ts,tsx}',
    './data/**/*.{js,ts}',
    './content/**/*.mdx',  // If you have MDX content

    // Include @ably/ui components
    './node_modules/@ably/ui/**/*.{js,jsx,ts,tsx}',
  ],

  // IMPORTANT: Enable JIT mode for faster builds and smaller output
  mode: 'jit',

  // Remove unused variants
  safelist: [
    // Only safelist classes that are dynamically generated
    // Example: 'bg-blue-500', 'text-red-600'
  ],

  theme: {
    extend: {
      // Your custom theme
    },
  },
};

Add to gatsby-config.ts (ensure Tailwind processes correctly):

{
  resolve: 'gatsby-plugin-postcss',
  options: {
    postCssPlugins: [
      require('tailwindcss'),
      require('autoprefixer'),
      // Purge unused CSS in production
      process.env.NODE_ENV === 'production' &&
        require('@fullhuman/postcss-purgecss')({
          content: [
            './src/**/*.{js,jsx,ts,tsx,mdx}',
            './node_modules/@ably/ui/**/*.{js,jsx,ts,tsx}',
          ],
          defaultExtractor: (content) => content.match(/[\w-/:]+(?<!:)/g) || [],
          safelist: {
            standard: [/^hljs/, /^language-/, /^token-/], // Preserve syntax highlighting
            deep: [/data-theme/], // Preserve theme classes
          },
        }),
    ].filter(Boolean),
  },
},

Install PurgeCSS:

npm install @fullhuman/postcss-purgecss --save-dev

Expected reduction: Tailwind CSS: 150KB → 30KB


Step 2.2: Lazy Load Syntax Highlighting

File: src/styles/global.css

Current (loads ALL syntax highlighting):

@import '@ably/ui/core/utils/syntax-highlighter.css';  /* ← 50KB! */

Solution: Load dynamically only when code blocks exist

Remove from global.css, add to code component:

File: src/components/blocks/software/Code/Code.tsx

import React, { useEffect, useState } from 'react';

const Code = ({ children, language }) => {
  const [syntaxLoaded, setSyntaxLoaded] = useState(false);

  useEffect(() => {
    // Dynamically import syntax highlighting CSS only when needed
    if (!syntaxLoaded) {
      import('@ably/ui/core/utils/syntax-highlighter.css')
        .then(() => setSyntaxLoaded(true));
    }
  }, []);

  return (
    <pre className={`language-${language}`}>
      <code>{children}</code>
    </pre>
  );
};

export default Code;

Better approach - Use dynamic import for Prism.js itself:

import React, { useEffect, useState } from 'react';

const Code = ({ children, language = 'javascript' }) => {
  const [highlighted, setHighlighted] = useState(false);

  useEffect(() => {
    // Only load Prism.js for this language
    Promise.all([
      import('prismjs'),
      import(`prismjs/components/prism-${language}`),
      import('prismjs/themes/prism-tomorrow.css'), // Just one theme
    ]).then(([Prism]) => {
      Prism.highlightAll();
      setHighlighted(true);
    });
  }, [language]);

  return (
    <pre className={`language-${language}`}>
      <code>{children}</code>
    </pre>
  );
};

export default Code;

Expected reduction: Syntax highlighting: 50KB → 5KB (per language, loaded on-demand)


Step 2.3: Code-Split @ably/ui Components

File: src/styles/global.css

Current (imports entire @ably/ui library):

@import '@ably/ui/reset/styles.css';
@import '@ably/ui/core/styles.css';      /* ← 150KB! */
@import '@ably/ui/core/CookieMessage/component.css';
@import '@ably/ui/core/Slider/component.css';
@import '@ably/ui/core/Code/component.css';
@import '@ably/ui/core/Flash/component.css';

Solution: Import only what each page needs

Create a minimal global.css:

/* src/styles/global.css - MINIMAL VERSION */

/* Tailwind base - necessary */
@import 'tailwindcss/base';
@import 'tailwindcss/components';
@import 'tailwindcss/utilities';

/* Ably UI reset only */
@import '@ably/ui/reset/styles.css';

/* DO NOT import full @ably/ui/core/styles.css */
/* Instead, import per-component in component files */

Import component styles where used:

File: src/components/CookieConsent.tsx

import '@ably/ui/core/CookieMessage/component.css';
import { CookieMessage } from '@ably/ui';

const CookieConsent = () => <CookieMessage />;

File: src/components/Slider.tsx

import '@ably/ui/core/Slider/component.css';
import { Slider } from '@ably/ui';

const CustomSlider = () => <Slider />;

Configure webpack to code-split (Gatsby 5 does this automatically for component imports)

Expected reduction: @ably/ui: 150KB → 40KB (only used components)


Step 2.4: Optimize Component CSS

File: Component CSS files (26 files, ~104KB)

Strategy: Use CSS Modules + tree-shaking

Example - Before (src/components/Menu/styles.css):

/* Lots of unused styles for all menu variants */
.menu { }
.menu-item { }
.menu-item-active { }
.menu-item-hover { }
/* ... 500 more lines */

After - Use CSS Modules properly:

/* src/components/Menu/Menu.module.css */
/* Only styles actually used by Menu component */
.menu {
  composes: flex flex-col from global; /* Use Tailwind */
}

.menuItem {
  /* Only custom styles Tailwind doesn't provide */
  transition: all 0.2s;
}

Configure Gatsby to use CSS Modules:

// gatsby-config.ts (this should already work by default)
{
  resolve: 'gatsby-plugin-postcss',
  options: {
    cssLoaderOptions: {
      modules: {
        auto: true, // Enable CSS Modules for *.module.css files
        localIdentName: '[local]_[hash:base64:5]',
      },
    },
  },
},

Expected reduction: Component CSS: 104KB → 30KB


Phase 2 Results

After Phase 2:

HTML Document: 50KB (-88% from original)
├─ Critical CSS: 5KB (10%)
├─ Content: 30KB (60%)
├─ Structure: 15KB (30%)

External CSS: 80KB (-73% reduction)
├─ Tailwind (purged): 30KB
├─ @ably/ui (code-split): 40KB
├─ Components: 10KB

Total page weight: 130KB (loaded: 50KB HTML + 80KB CSS cached)


🎯 Phase 3: Critical CSS Only (2-3 days)

Goal

Inline ONLY above-the-fold CSS, defer everything else

Impact

  • Initial HTML: 50KB → 35KB (-30%)
  • Perceived load time: Instant content visibility

Step 3.1: Extract Critical CSS

Use a tool to automatically extract above-the-fold CSS

Install Critical CSS plugin:

npm install gatsby-plugin-critical --save-dev

Add to gatsby-config.ts:

{
  resolve: 'gatsby-plugin-critical',
  options: {
    // Extract critical CSS (above-the-fold styles)
    inline: true,
    minify: true,
    extract: true,

    // Dimensions for "above-the-fold"
    dimensions: [
      {
        height: 900,
        width: 1200,
      },
      {
        height: 900,
        width: 768,
      },
      {
        height: 900,
        width: 390,
      },
    ],

    // Which paths to process
    paths: {
      base: './public',
    },
  },
},

How it works:

  1. Plugin renders each page in headless browser
  2. Captures CSS for visible elements
  3. Inlines only that CSS (~5-10KB)
  4. Defers rest of CSS

Step 3.2: Optimize Font Loading

File: src/html.js or via Helmet

Current (fonts may block render):

<link rel="stylesheet" href="https://fonts.googleapis.com/css2?family=Manrope:wght@400;600;700&display=swap" />

Optimized (preload + async):

<!-- Preconnect to font CDN -->
<link rel="preconnect" href="https://fonts.googleapis.com" />
<link rel="preconnect" href="https://fonts.gstatic.com" crossOrigin="anonymous" />

<!-- Preload font files -->
<link
  rel="preload"
  as="style"
  href="https://fonts.googleapis.com/css2?family=Manrope:wght@400;600;700&display=swap"
/>

<!-- Load async -->
<link
  rel="stylesheet"
  href="https://fonts.googleapis.com/css2?family=Manrope:wght@400;600;700&display=swap"
  media="print"
  onLoad="this.media='all'"
/>

<!-- Fallback system font inline -->
<style>
  body {
    font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, sans-serif;
  }
</style>

Or self-host fonts (even better - no external requests):

# Download fonts to /static/fonts/
# Add to CSS:
@font-face {
  font-family: 'Manrope';
  src: url('/fonts/manrope-regular.woff2') format('woff2');
  font-weight: 400;
  font-display: swap;
}

Step 3.3: Defer Non-Critical Resources

Images - use lazy loading:

import { GatsbyImage } from 'gatsby-plugin-image';

// Already optimized with gatsby-plugin-image
// Ensure loading="lazy" is set
<GatsbyImage image={imageData} alt="..." loading="lazy" />

Third-party scripts - load async:

<Helmet>
  <script async src="https://analytics.example.com/script.js" />
</Helmet>

Phase 3 Results

Final Result:

Initial HTML: 35KB (-92% from original 437KB!)
├─ Critical CSS: 5KB (14%)
├─ Content: 30KB (86%)

Deferred Resources (cached):
├─ Full CSS: 80KB
├─ Fonts: 40KB
├─ JavaScript: 150KB

First Contentful Paint: <0.5s (was 2-3s)
Crawler Experience: Content in first 35KB


📋 Implementation Checklist

Phase 1 (1-2 days) ✅

  • Install gatsby-plugin-extract-css
  • Configure plugin in gatsby-config.ts
  • Remove CSS import from Layout.tsx
  • Add async CSS loading via Helmet
  • Create custom src/html.js with content-first structure
  • Test build and verify page size reduction
  • Verify content visible before full CSS loads

Phase 2 (3-5 days) 🔧

  • Configure Tailwind JIT mode
  • Add PostCSS PurgeCSS plugin
  • Update Tailwind content paths
  • Move syntax highlighting to dynamic imports
  • Refactor @ably/ui imports (per-component)
  • Update global.css (minimal imports only)
  • Optimize component CSS files
  • Test all pages still render correctly
  • Measure CSS file size reduction

Phase 3 (2-3 days) 🎯

  • Install gatsby-plugin-critical
  • Configure critical CSS extraction
  • Optimize font loading (preload/async)
  • Self-host fonts (optional)
  • Add resource hints (preconnect, dns-prefetch)
  • Defer non-critical JavaScript
  • Test perceived performance
  • Validate with Lighthouse

🧪 Testing & Validation

Automated Tests

1. Page Size Check:

#!/bin/bash
# test-page-size.sh

URL="http://localhost:9000/docs/chat/getting-started/android"
SIZE=$(curl -s "$URL" | wc -c)

echo "Page size: $SIZE bytes"

if [ $SIZE -lt 100000 ]; then
  echo "✅ PASS: Page size under 100KB"
else
  echo "❌ FAIL: Page size over 100KB"
  exit 1
fi

2. Content-First Validation:

#!/bin/bash
# test-content-first.sh

URL="http://localhost:9000/docs/chat/getting-started/android"
FIRST_100KB=$(curl -s "$URL" | head -c 100000)

# Check if documentation content appears in first 100KB
if echo "$FIRST_100KB" | grep -q "implementation.*chat"; then
  echo "✅ PASS: Content found in first 100KB"
else
  echo "❌ FAIL: Content not in first 100KB"
  exit 1
fi

3. CSS Extraction Check:

#!/bin/bash
# test-css-extracted.sh

URL="http://localhost:9000/docs/chat/getting-started/android"
HTML=$(curl -s "$URL")

# Check for external CSS link
if echo "$HTML" | grep -q '<link.*rel="stylesheet".*href=.*\.css'; then
  echo "✅ PASS: CSS extracted to external file"
else
  echo "❌ FAIL: CSS still inlined"
  exit 1
fi

# Check inline CSS size is small
INLINE_CSS_SIZE=$(echo "$HTML" | grep -oP '<style[^>]*>.*?</style>' | wc -c)
if [ $INLINE_CSS_SIZE -lt 10000 ]; then
  echo "✅ PASS: Inline CSS under 10KB"
else
  echo "❌ FAIL: Too much inline CSS ($INLINE_CSS_SIZE bytes)"
  exit 1
fi

Manual Testing

1. Lighthouse Audit:

# Install Lighthouse CLI
npm install -g lighthouse

# Run audit
lighthouse http://localhost:9000/docs/chat/getting-started/android \
  --output html \
  --output-path ./lighthouse-report.html

# Target scores:
# - Performance: >90
# - First Contentful Paint: <1.5s
# - Largest Contentful Paint: <2.5s

2. Crawler Simulation:

# Simulate crawler (no JavaScript, no CSS)
curl -s http://localhost:9000/docs/chat/getting-started/android | \
  w3m -dump -T text/html | \
  head -50

# Should see:
# - Page title
# - Documentation content
# - Code examples
# - NOT: CSS, JSON data, JavaScript

3. WebPageTest:

  • Go to https://webpagetest.org
  • Test URL: Your docs page
  • Location: Multiple locations
  • Connection: 3G/4G
  • Check:
    • Start Render: <2s
    • Speed Index: <3s
    • Total Page Size: <500KB

📊 Expected Results Summary

Metric Before Phase 1 Phase 2 Phase 3 Improvement
HTML Size 437KB 180KB 50KB 35KB -92%
CSS (inline) 400KB 5KB 5KB 5KB -99%
CSS (external) 0KB 300KB 80KB 80KB N/A
Content % 7% 17% 60% 86% +1129%
FCP 2.5s 1.5s 0.8s 0.5s -80%
Lighthouse 65 75 88 95 +46%

🎓 Best Practices Going Forward

1. CSS Strategy

  • ✅ Use Tailwind JIT mode
  • ✅ PurgeCSS in production builds
  • ✅ Code-split component CSS
  • ✅ Lazy load syntax highlighting
  • ❌ Don't import entire design systems

2. Build Process

  • ✅ Extract CSS to external files
  • ✅ Generate critical CSS automatically
  • ✅ Minify all assets
  • ✅ Cache bust with content hashes
  • ❌ Don't inline large CSS/JS

3. Content Priority

  • ✅ Content before styling
  • ✅ Semantic HTML first
  • ✅ Progressive enhancement
  • ✅ Async load non-critical resources
  • ❌ Don't block render with CSS

4. Monitoring

  • ✅ Add bundle size checks to CI
  • ✅ Monitor Lighthouse scores
  • ✅ Track Core Web Vitals
  • ✅ Test with slow connections
  • ❌ Don't ship without testing page weight

🔗 Additional Resources

Gatsby Documentation

Tailwind CSS

Performance

Tools


💬 Support & Questions

For questions or issues during implementation:

  1. Check Gatsby plugin documentation
  2. Test changes incrementally (one phase at a time)
  3. Use browser DevTools to debug CSS loading
  4. Validate with automated tests before deploying

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions