-
Notifications
You must be signed in to change notification settings - Fork 80
GTM-130: Add llms*.txt
#476
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
✅ Deploy Preview for zero-to-nix ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
WalkthroughThis PR adds support for generating Large Language Model Metadata Standard (LLMS) files in three formats (standard, small, and full) using Handlebars templates. It introduces new API routes, templates, updates site configuration with metadata, adds the handlebars dependency, updates the footer with links to LLMS files, and refines locale formatting. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant APIRoute as LLMS Route<br/>(llms.txt.ts)
participant Compiler as Handlebars<br/>Compiler
participant Template as Template File<br/>(llms.txt.hbs)
participant Collections as Content<br/>Collections
participant Response
Client->>APIRoute: GET /llms.txt
APIRoute->>Collections: Fetch start & concept pages
Collections-->>APIRoute: Page data
APIRoute->>Template: Load template from disk
Template-->>APIRoute: Template source
APIRoute->>Compiler: Compile & render with data
Compiler->>Compiler: Inject site metadata,<br/>page titles, hrefs,<br/>otherFormats
Compiler-->>APIRoute: Rendered plain text
APIRoute->>Response: Return text/plain
Response-->>Client: LLMS documentation
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes The changes introduce a new LLMS documentation feature spanning multiple file categories (routes, templates, configuration). While individual files follow consistent patterns (three similar API routes, three similar templates), reviewers must verify template rendering logic, data flow through Handlebars compilation, correctness of content collection queries, and integration with site metadata. Footer restructuring adds minor UI complexity. Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro ⛔ Files ignored due to path filters (1)
📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
🔇 Additional comments (1)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
🧹 Nitpick comments (3)
src/templates/llms-small.txt.hbs (1)
5-6: Add line break after section header for readability.The section header runs directly into the description text without a line break, reducing readability. The same issue appears on lines 12-13.
Apply this diff to improve formatting:
-## Start pages These pages take you on a Nix journey from installing Nix through +## Start pages + +These pages take you on a Nix journey from installing Nix through accomplishing meaningful tasks with Nix:And similarly for lines 12-13:
-## Concept pages These pages provide a more theoretical take on some of the +## Concept pages + +These pages provide a more theoretical take on some of the trickier corners of Nix:src/components/Footer.astro (1)
10-42: LGTM! Consider extracting the file list.The new AI tools section is well-structured and correctly positioned. The implementation is clean and functional.
Optionally, consider extracting the file list to site configuration for easier maintenance:
// In src/site.ts llms: { description: "...", files: ["llms.txt", "llms-small.txt", "llms-full.txt"] }Then import and use it:
+import { site } from "../site"; +const { llms } = site; -{["llms.txt", "llms-small.txt", "llms-full.txt"].map((file) => ( +{llms.files.map((file) => (src/pages/llms-small.txt.ts (1)
11-50: Significant code duplication across llms route files.The three llms route files (llms.txt.ts, llms-small.txt.ts, llms-full.txt.ts) share nearly identical structure with only minor variations in template paths and data mapping. This creates maintenance overhead.
Consider creating a shared factory function:
// src/lib/llms-routes.ts export function createLlmsRoute(templateName: string, mapper: (collections) => object) { const templateFile = fs.readFileSync( path.join(process.cwd(), `src/templates/${templateName}`), "utf-8" ); const template = Handlebars.compile(templateFile); return async () => { const startPages = await getCollection("start"); const conceptPages = await getCollection("concepts"); const content = template(mapper({ startPages, conceptPages })); return new Response(content, { headers: { "Content-Type": "text/plain; charset=utf-8" } }); }; }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
pnpm-lock.yamlis excluded by!**/pnpm-lock.yaml
📒 Files selected for processing (14)
.github/workflows/update-flake-lock.yml(1 hunks).npmrc(0 hunks).prettierignore(1 hunks)package.json(1 hunks)pnpm-workspace.yaml(1 hunks)src/components/Footer.astro(1 hunks)src/lib/utils.ts(1 hunks)src/pages/llms-full.txt.ts(1 hunks)src/pages/llms-small.txt.ts(1 hunks)src/pages/llms.txt.ts(1 hunks)src/site.ts(2 hunks)src/templates/llms-full.txt.hbs(1 hunks)src/templates/llms-small.txt.hbs(1 hunks)src/templates/llms.txt.hbs(1 hunks)
💤 Files with no reviewable changes (1)
- .npmrc
🧰 Additional context used
🧬 Code graph analysis (3)
src/pages/llms-full.txt.ts (3)
src/site.ts (1)
site(54-162)src/pages/llms-small.txt.ts (2)
GET(18-51)prerender(53-53)src/lib/utils.ts (2)
startPagePath(11-13)conceptPagePath(3-5)
src/pages/llms.txt.ts (2)
src/site.ts (1)
site(54-162)src/lib/utils.ts (2)
startPagePath(11-13)conceptPagePath(3-5)
src/pages/llms-small.txt.ts (2)
src/site.ts (1)
site(54-162)src/pages/llms-full.txt.ts (2)
GET(18-60)prerender(62-62)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: test
🔇 Additional comments (7)
package.json (1)
37-37: LGTM! Handlebars dependency added.The addition of handlebars ^4.7.8 is appropriate for the new template-based content generation system. Version 4.7.8 is the latest 4.x release.
Based on learnings
pnpm-workspace.yaml (1)
2-2: LGTM! Quote style change.The change from single to double quotes is functionally equivalent and has no impact on behavior.
src/templates/llms.txt.hbs (1)
1-31: LGTM! Well-structured Handlebars template.The template correctly uses Handlebars syntax for rendering the LLMs documentation format with proper iteration blocks and Markdown formatting.
.prettierignore (1)
10-10: LGTM! Appropriate exclusion for Handlebars templates.Adding
**/*.hbsto the ignore list prevents Prettier from potentially breaking Handlebars syntax, which is good practice for template files.src/templates/llms-full.txt.hbs (1)
16-21: <!-- [scratchpad_update]
[task review_file_8/8 review_template_security verify_sources][new_observations]
- Astro's getCollection() API documented to retrieve content from src/content/ directory (1-1, 1-4)
- getCollection() retrieves from Markdown/MDX files by default (1-1, 4-12)
- Collections default to static files: src/content/<collection_name>/ (4-12, 9-6, 9-7)
- While custom loaders can fetch from CMS/API (2-1, 2-6), the default behavior is static files
- Need to verify actual collection config for "start" and "concepts" to be certain
[analysis_update]
- Default case: content from trusted, static repository files
- However, custom loaders CAN fetch from untrusted sources (APIs, CMS)
- Must verify actual content/config.ts to determine which case applies
[pending]- Check src/content/config.ts to confirm collection definitions
- Look at actual "start" and "concepts" directories
[actions]- Search codebase for config file to verify collection sources
[scratchpad_end] --><!-- [scratchpad_update_2]
[task review_file_8/8 review_template_security verify_sources_complete][additional_observations]
- Custom loaders CAN fetch from untrusted sources (CMS, APIs, databases) per docs and examples
- However, default Astro collections use glob() loader for static files in src/content/
- Need to check the actual content/config.ts to determine which case applies
- The web search shows custom loaders exist but doesn't reveal this project's setup
- Must check repository for config file to be definitive
[analysis_update_2]
- If using default glob() loader: content is static, trusted files in repository ✓ safe
- If using custom loader (Strapi, Hygraph, PocketBase, etc.): could be untrusted ✗ security concern
- Cannot determine from web search alone - must inspect codebase
- Original review comment is valid concern but premise depends on actual config
[pending]
- Check src/content/config.ts for collection definitions
- Verify if using default glob loader or custom loader
- This determines if the review comment is valid or moot
[actions]
- Run script to find and examine content/config.ts
[scratchpad_end] --><!-- [scratchpad_update_3]
[task review_file_8/8 review_template_security_complete analysis_complete][key_findings_from_web_search]
From Handlebars documentation:
- Triple braces {{{...}}} disable HTML escaping per (26-3, 22-2, 23-3, 23-4)
- XSS risk is real: "safe only when you have pre-escaped variables. Otherwise variables containing user-defined data may cause XSS vulnerability" (21-14, 21-15)
- Official guidance: "Always prioritize escaping unless you are absolutely certain the data is safe" (23-7)
- Markdown content (body) being rendered as HTML is intentional and safe if from trusted source
[critical_analysis]
- The template uses {{{this.title}}} and {{{this.content}}}
- This is indeed unsafe IF content comes from untrusted sources
- However, content comes from Astro collections via getCollection()
- Astro collections default to static files in src/content/ directory
- Static files from the repository are trusted sources
- Original review comment's concern is CONDITIONAL on collection source
[conclusion]
- If collections use default glob() loader: content is trusted ✓ review is incorrect
- If collections use custom loaders (CMS, API): content may be untrusted ✗ review has merit
- Cannot determine conclusively without checking src/content/config.ts
- Must inspect config to determine which case applies
[next_action]
Run script to find and examine src/content/config.ts to verify collection definitions
[scratchpad_end] -->I need to examine the collection configuration to verify whether the data sources are trusted or potentially untrusted.Based on my research, I now have sufficient information to generate the rewritten review comment. The key findings are:
- Astro's Content Collections API retrieves Markdown or MDX documents from src/content/
- Triple braces in Handlebars are safe only when data is pre-escaped; otherwise, user-defined data may cause XSS vulnerability
- While custom loaders can fetch from untrusted sources like CMS, databases, or APIs, the default behavior uses static repository files
- The script output shows data comes from
getCollection("start")andgetCollection("concepts")with.bodycontent rendered as HTMLThe concern depends on whether the collections use default static loaders or custom loaders. Without examining the actual
src/content/config.ts, I cannot definitively determine the collection source. However, the original review comment appropriately flags a security concern that warrants verification by the developer.
Verify that content sources for "start" and "concepts" collections are trusted before rendering unescaped HTML.
Triple-brace expressions in Handlebars produce raw, unescaped output. The template renders
bodycontent via{{{this.content}}}, which is safe only if the source is inherently trusted. Confirm whether the "start" and "concepts" collections use Astro's defaultglob()loader from static repository files (trusted) or custom loaders fetching from remote CMS, databases, or APIs (potentially untrusted). If using custom loaders with user-editable content, consider escaping the output or sanitizing at the loader level.src/pages/llms.txt.ts (1)
1-56: LGTM!The implementation correctly uses
startPagePathandconceptPagePathutilities to construct proper hrefs, and the template data is well-structured. The code duplication concern has already been noted in llms-small.txt.ts.src/site.ts (1)
33-39: LGTM!The type extensions are well-structured and appropriately typed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
src/pages/llms.txt.ts (1)
32-37: Template loading pattern duplicated across files.This is the same duplication noted in llms-small.txt.ts. See the refactor suggestion in that file for a shared helper approach.
🧹 Nitpick comments (1)
src/pages/llms-small.txt.ts (1)
11-16: Consider extracting template loading to a shared helper.This template loading pattern (fs.readFileSync + path.join + Handlebars.compile) is duplicated across llms.txt.ts, llms-small.txt.ts, and llms-full.txt.ts.
Example helper in
src/lib/template-utils.ts:import Handlebars from "handlebars"; import fs from "node:fs"; import path from "node:path"; export function loadTemplate(templateName: string): HandlebarsTemplateDelegate { const templateFile = fs.readFileSync( path.join(process.cwd(), `src/templates/${templateName}`), "utf-8" ); return Handlebars.compile(templateFile); }Then simplify to:
-const templateFile = fs.readFileSync( - path.join(process.cwd(), "src/templates/llms-small.txt.hbs"), - "utf-8", -); - -const template = Handlebars.compile(templateFile); +const template = loadTemplate("llms-small.txt.hbs");
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
src/pages/llms-full.txt.ts(1 hunks)src/pages/llms-small.txt.ts(1 hunks)src/pages/llms.txt.ts(1 hunks)src/site.ts(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- src/site.ts
- src/pages/llms-full.txt.ts
🧰 Additional context used
🧬 Code graph analysis (2)
src/pages/llms.txt.ts (4)
src/site.ts (1)
site(54-162)src/pages/llms-full.txt.ts (1)
GET(19-50)src/pages/llms-small.txt.ts (1)
GET(18-40)src/lib/utils.ts (2)
startPagePath(11-13)conceptPagePath(3-5)
src/pages/llms-small.txt.ts (3)
src/site.ts (1)
site(54-162)src/pages/llms-full.txt.ts (2)
GET(19-50)prerender(52-52)src/pages/llms.txt.ts (3)
GET(39-64)FORMATS(11-30)prerender(66-66)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: test
🔇 Additional comments (7)
src/pages/llms-small.txt.ts (3)
1-9: LGTM! Clean imports and data extraction.The imports are well-organized and the destructuring of site data is clear. The dependency on FORMATS from llms.txt.ts is appropriate for maintaining consistency across the different llms*.txt variants.
18-40: LGTM! GET handler correctly implements compact format.The handler appropriately extracts only titles for the "small" format, omitting hrefs and body content. The otherFormats array correctly references the other two variants, and the Content-Type header is properly set.
42-42: LGTM! Prerendering correctly enabled.The prerender flag ensures this route is statically generated at build time, which is appropriate for this content.
src/pages/llms.txt.ts (4)
1-7: LGTM! Imports are complete and well-organized.All necessary utilities and dependencies are imported, including the path helper functions needed for href construction.
9-30: LGTM! Excellent use of shared constants.The FORMATS export provides a single source of truth for all three llms*.txt variants, promoting consistency. The tagline alias is clear, and the data structure is well-designed.
39-64: LGTM! GET handler correctly implements standard format.The handler appropriately includes both titles and hrefs (but not body content), positioning this as the middle ground between the "small" (titles only) and "full" (complete content) variants. The use of utility functions for href construction ensures consistency.
66-66: LGTM! Prerendering correctly enabled.Consistent with the other llms routes and appropriate for static generation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
llms.txtllms-small.txtllms-full.txtSummary by CodeRabbit
Release Notes