Skip to content

metadata-keyword-generator #1408

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Feb 25, 2025
Merged

metadata-keyword-generator #1408

merged 6 commits into from
Feb 25, 2025

Conversation

cpengilly
Copy link
Contributor

  • metadata keyword generator for page front matter
  • has some automation functionality built in
  • set to only run on active files in the PR to allow us to slowly update using smaller PRs
  • provides suggestions in coderabbit
  • details are in notes/metadata-update.md

- metadata keyword generator for page front matter
- has some automation functionality built in
- set to only run on active files in the PR to allow us to slowly update using smaller PRs
- provides suggestions in coderabbit
- details are in notes/metadata-update.md
@cpengilly cpengilly requested a review from a team as a code owner February 22, 2025 04:52
Copy link

netlify bot commented Feb 22, 2025

Deploy Preview for docs-optimism ready!

Name Link
🔨 Latest commit b334410
🔍 Latest deploy log https://app.netlify.com/sites/docs-optimism/deploys/67b97bbffaa5b500085f3ac7
😎 Deploy Preview https://deploy-preview-1408--docs-optimism.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@cpengilly cpengilly requested review from krofax and sbvegan February 22, 2025 04:53
Copy link
Contributor

coderabbitai bot commented Feb 22, 2025

Warning

Rate limit exceeded

@cpengilly has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 13 minutes and 10 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 0cdacbb and b334410.

📒 Files selected for processing (1)
  • utils/metadata-analyzer.ts (1 hunks)
📝 Walkthrough

Walkthrough

This pull request introduces enhancements to the metadata management and linting processes for Markdown files. The CircleCI configuration now includes an improved lint job that retrieves changed files from pull requests and validates metadata post-linting. New configuration files and review instructions have been established to enforce strict metadata standards for both regular and landing pages. TypeScript scripts and interfaces have been added to facilitate automated analysis, categorization, and batch updating of metadata within .mdx files. Additionally, the package.json and tsconfig.json files have been updated to support ECMAScript module usage and include new scripts for metadata validation and batch processing. These changes aim to standardize metadata management, improve CI workflows, and ensure the consistency and accuracy of documentation.

Sequence Diagram(s)

sequenceDiagram
  participant CircleCI
  participant Lint_Job
  participant GitHub_API
  participant pnpm

  CircleCI->>Lint_Job: Trigger lint job (if CIRCLE_PULL_REQUEST set)
  Lint_Job->>Lint_Job: Extract PR number and check for changed files
  Lint_Job->>GitHub_API: Request list of changed `.mdx` files
  GitHub_API-->>Lint_Job: Return list of changed files
  Lint_Job->>Lint_Job: Export CHANGED_FILES variable
  Lint_Job->>pnpm: Run Markdown linting and execute `pnpm validate-pr-metadata`
Loading
sequenceDiagram
  participant User
  participant MetadataCLI
  participant FileSystem
  participant Analyzer
  participant Manager

  User->>MetadataCLI: Execute metadata batch CLI (with options)
  MetadataCLI->>FileSystem: Discover .mdx files matching pattern
  FileSystem-->>MetadataCLI: Return list of files
  MetadataCLI->>Analyzer: Analyze and update frontmatter metadata for each file
  Analyzer-->>MetadataCLI: Provide metadata analysis results
  MetadataCLI->>Manager: Validate and update metadata in files
  Manager-->>MetadataCLI: Return update summary and validation status
  MetadataCLI->>User: Output processing summary and errors (if any)
Loading

Possibly related PRs

Suggested reviewers

  • bradleycamacho

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (18)
utils/metadata-manager.ts (2)

16-17: Consider using optional chaining for safer checks.
The condition (metadata.content && metadata.content.includes('<Cards>')) can be replaced by metadata.content?.includes('<Cards>') to avoid potential runtime errors.

🧰 Tools
🪛 Biome (1.9.4)

[error] 17-17: Change to an optional chain.

Unsafe fix: Change to an optional chain.

(lint/complexity/useOptionalChain)


13-56: Add unit tests for coverage.
The validateMetadata function is central to enforcing correct metadata schema. Consider adding dedicated tests to confirm that missing fields, invalid personas, or empty categories are appropriately flagged.

Do you want me to generate a skeleton test suite for this function?

🧰 Tools
🪛 Biome (1.9.4)

[error] 17-17: Change to an optional chain.

Unsafe fix: Change to an optional chain.

(lint/complexity/useOptionalChain)

utils/metadata-analyzer.ts (4)

128-504: Reduce complexity in detectCategories.
This function relies on numerous if blocks to handle diverse cases, making it tricky to maintain. Splitting this logic into smaller, focused helper functions or adopting a configuration-driven approach may improve readability and flexibility.


509-530: Consider dynamic weighting or a scoring config.
Hardcoded counts in countReferenceSignals can be brittle. Abstracting the scoring logic into a data-driven model (e.g., external config) makes it easier to fine-tune detection without changing the code.


567-627: Clear logic for content type detection.
The checks for landing pages, references, and tutorials are concise and well-structured. However, there are repeated references to specialized paths like overview or index. Consolidating these into a single helper method could streamline maintenance.


674-683: Offer a structured output format.
Currently, printSummary prints plain text. If you plan to consume these summaries in other tools or scripts, consider a machine-readable output format such as JSON.

utils/types/metadata-types.ts (2)

9-10: Remove or guard debug logging in production.
Excessive console output, especially logging entire config objects, can clutter logs. Consider wrapping this in a debug flag or removing it in production builds.


55-62: Streamline checks with optional chaining.
You can simplify these nested checks by consistently using optional chaining (e.g., config?.metadata_rules?.persona?.validation_rules?.[0]?.enum). This reduces verbosity and risk of runtime errors.

🧰 Tools
🪛 Biome (1.9.4)

[error] 56-58: Change to an optional chain.

Unsafe fix: Change to an optional chain.

(lint/complexity/useOptionalChain)

utils/metadata-batch-cli.ts (2)

34-86: Remove unnecessary continue statements.

The continue statements on lines 62 and 78 are unnecessary as they are the last statements in their respective blocks.

Apply this diff to improve code clarity:

      } catch (e) {
-        continue
      }
🧰 Tools
🪛 Biome (1.9.4)

[error] 62-62: Unnecessary continue statement

Unsafe fix: Delete the unnecessary continue statement

(lint/correctness/noUnnecessaryContinue)


[error] 78-78: Unnecessary continue statement

Unsafe fix: Delete the unnecessary continue statement

(lint/correctness/noUnnecessaryContinue)


123-208: Consider breaking down the main function for better maintainability.

The main function is well-implemented but could be more maintainable by extracting the summary generation and printing logic into separate functions.

Consider refactoring like this:

+async function generateSummary(file: string, dryRun: boolean, verbose: boolean): Promise<ProcessingSummary> {
+  const result = await updateFrontmatter(file, dryRun, verbose)
+  const relativePath = path.relative(process.cwd(), file)
+  
+  const hasUncertainCategories = 
+    !result.categories.length || 
+    (result.categories.length === 1 && result.categories[0] === 'protocol')
+
+  return {
+    path: relativePath,
+    categories: result.categories,
+    uncertainCategories: hasUncertainCategories && !result.isImported,
+    contentType: result.contentType,
+    isImported: result.isImported
+  }
+}
+
+function printSummaries(summaries: ProcessingSummary[], dryRun: boolean): void {
+  console.log('\nProcessing Summary:')
+  console.log('==================')
+  
+  summaries.forEach(summary => {
+    console.log(`\n📄 ${colors.blue}${summary.path}${colors.reset}`)
+    console.log(`   Type: ${summary.contentType}`)
+    if (summary.isImported) {
+      console.log('   Status: Imported content')
+    }
+    console.log(`   Categories: ${summary.categories.join(', ') || 'none'}`)
+    
+    if (summary.uncertainCategories) {
+      console.log(`   ${colors.yellow}⚠️  Categories may need manual review${colors.reset}`)
+    }
+  })
+  
+  const needsReview = summaries.filter(s => s.uncertainCategories).length
+  const importedCount = summaries.filter(s => s.isImported).length
+  
+  console.log('\n=================')
+  console.log('Final Summary:')
+  console.log(`${colors.green}✓ Processed ${summaries.length} files${colors.reset}`)
+  if (importedCount > 0) {
+    console.log(`ℹ️  ${importedCount} imported files`)
+  }
+  if (needsReview > 0) {
+    console.log(`${colors.yellow}⚠️  ${needsReview} files may need category review${colors.reset}`)
+  }
+  if (dryRun) {
+    console.log(`${colors.blue}(Dry run - no changes made)${colors.reset}`)
+  }
+}
.circleci/config.yml (1)

74-81: New Step: Get Changed Files

The new step to extract changed files from a pull request is well implemented. It correctly checks for the presence of the CIRCLE_PULL_REQUEST variable, extracts the PR number, and filters for .mdx files.

Suggestion: It would be beneficial to add a brief comment explaining the rationale or handling possible edge cases (e.g., if the API call fails) in the command block.

notes/metadata-update.md (3)

13-27: Using the Scripts Section: List Indentation

The “Using the Scripts” section is informative; however, there are some minor markdown lint issues—specifically, the unordered list items (e.g., on lines 16 and 20) appear to have inconsistent indentation.

Suggestion: Adjust the indentation to conform with markdownlint MD007 guidelines for cleaner formatting.

🧰 Tools
🪛 markdownlint-cli2 (0.17.2)

20-20: Unordered list indentation
Expected: 0; Actual: 3

(MD007, ul-indent)


24-24: Unordered list indentation
Expected: 0; Actual: 3

(MD007, ul-indent)


40-90: Configuration Files Block Formatting

This section, which introduces the keywords.config.yaml sample, is very detailed and useful.

Suggestion: Please check for any trailing whitespace (as flagged by static analysis) and remove them to ensure consistency. A minor improvement would be to add a comment above the YAML code block to indicate that it is an example configuration.


181-195: Automated PR Checks – CodeRabbit Review Section

The “Automated PR Checks” section is very thorough. In the “CodeRabbit Review” bullet, there is a slight repetition with verbs "Reviews" and "Checks."

Suggestion: Consider rephrasing to prevent redundant language, e.g., "Validates that required fields are present in frontmatter" or similar, to improve clarity.

🧰 Tools
🪛 LanguageTool

[grammar] ~191-~191: You’ve repeated a verb. Did you mean to only write one of them?
Context: ... validate-pr-metadata` 2. CodeRabbit Review * Reviews frontmatter in modified files * Chec...

(REPEATED_VERBS)

keywords.config.yaml (4)

1-17: Metadata Configuration: Persona Section

The persona section is clear and enforces that at least one valid persona is selected. The structure and descriptions are effective in communicating the rules.

Note: Be sure to remove any trailing whitespaces from these lines (as indicated by static analysis) to maintain a clean YAML file.


18-31: Content Type Section – Spelling and Clarity

The content type block correctly lists all valid content types. However, on line 25, the comment “concept exlainers” contains a typo; it should be “concept explainers.”


45-90: Categories Section Formatting and Trailing Whitespace

This section is comprehensive with a long list of valid category values. The structure is clear; however, multiple lines in this block have trailing spaces (as pointed out by YAMLlint).

Suggestion: Remove the trailing whitespace from all affected lines to adhere to formatting standards. For example, you can update lines such as 59, 60, 63, 72, etc., by trimming the extra spaces.

🧰 Tools
🪛 YAMLlint (1.35.1)

[error] 59-59: trailing spaces

(trailing-spaces)


[error] 60-60: trailing spaces

(trailing-spaces)


[error] 63-63: trailing spaces

(trailing-spaces)


[error] 70-70: trailing spaces

(trailing-spaces)


[error] 72-72: trailing spaces

(trailing-spaces)


[error] 79-79: trailing spaces

(trailing-spaces)


180-237: Timeframe Section – Regex and File Ending

The timeframe section covers various versioning and date formats well. A couple of points to consider:

  1. Season Format Regex:
    The current pattern on line 203 is ^S[6-9]|S[1-9][0-9]+$. This may not be fully anchored as intended. Wrapping the alternatives in parentheses (i.e. ^(S[6-9]|S[1-9][0-9]+)$) would ensure that the entire string is validated correctly.

  2. Newline at End-of-File:
    The static analysis also flagged that there is no new line character at the end of the file. Please add a newline at the end.

🧰 Tools
🪛 YAMLlint (1.35.1)

[error] 181-181: trailing spaces

(trailing-spaces)


[error] 237-237: no new line character at the end of file

(new-line-at-end-of-file)

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 957ea3d and 6a577c7.

⛔ Files ignored due to path filters (1)
  • pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (11)
  • .circleci/config.yml (1 hunks)
  • .coderabbit.yaml (1 hunks)
  • keywords.config.yaml (1 hunks)
  • notes/README.md (1 hunks)
  • notes/metadata-update.md (1 hunks)
  • package.json (4 hunks)
  • tsconfig.json (2 hunks)
  • utils/metadata-analyzer.ts (1 hunks)
  • utils/metadata-batch-cli.ts (1 hunks)
  • utils/metadata-manager.ts (1 hunks)
  • utils/types/metadata-types.ts (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • notes/README.md
🧰 Additional context used
🪛 YAMLlint (1.35.1)
keywords.config.yaml

[error] 59-59: trailing spaces

(trailing-spaces)


[error] 60-60: trailing spaces

(trailing-spaces)


[error] 63-63: trailing spaces

(trailing-spaces)


[error] 70-70: trailing spaces

(trailing-spaces)


[error] 72-72: trailing spaces

(trailing-spaces)


[error] 79-79: trailing spaces

(trailing-spaces)


[error] 92-92: trailing spaces

(trailing-spaces)


[error] 103-103: trailing spaces

(trailing-spaces)


[error] 111-111: trailing spaces

(trailing-spaces)


[error] 119-119: trailing spaces

(trailing-spaces)


[error] 135-135: trailing spaces

(trailing-spaces)


[error] 146-146: trailing spaces

(trailing-spaces)


[error] 159-159: trailing spaces

(trailing-spaces)


[error] 173-173: trailing spaces

(trailing-spaces)


[error] 181-181: trailing spaces

(trailing-spaces)


[error] 237-237: no new line character at the end of file

(new-line-at-end-of-file)

🪛 LanguageTool
notes/metadata-update.md

[uncategorized] ~175-~175: Loose punctuation mark.
Context: ...ion Files * utils/metadata-manager.ts: Main metadata management system * `util...

(UNLIKELY_OPENING_PUNCTUATION)


[grammar] ~191-~191: You’ve repeated a verb. Did you mean to only write one of them?
Context: ... validate-pr-metadata` 2. CodeRabbit Review * Reviews frontmatter in modified files * Chec...

(REPEATED_VERBS)

🪛 markdownlint-cli2 (0.17.2)
notes/metadata-update.md

20-20: Unordered list indentation
Expected: 0; Actual: 3

(MD007, ul-indent)


24-24: Unordered list indentation
Expected: 0; Actual: 3

(MD007, ul-indent)

🪛 Biome (1.9.4)
utils/metadata-batch-cli.ts

[error] 62-62: Unnecessary continue statement

Unsafe fix: Delete the unnecessary continue statement

(lint/correctness/noUnnecessaryContinue)


[error] 78-78: Unnecessary continue statement

Unsafe fix: Delete the unnecessary continue statement

(lint/correctness/noUnnecessaryContinue)

utils/metadata-manager.ts

[error] 17-17: Change to an optional chain.

Unsafe fix: Change to an optional chain.

(lint/complexity/useOptionalChain)

utils/types/metadata-types.ts

[error] 56-58: Change to an optional chain.

Unsafe fix: Change to an optional chain.

(lint/complexity/useOptionalChain)

⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: Redirect rules - docs-optimism
  • GitHub Check: Header rules - docs-optimism
  • GitHub Check: Pages changed - docs-optimism
🔇 Additional comments (12)
utils/metadata-manager.ts (2)

130-159: Validate environment variable usage.
CHANGED_FILES might be undefined or empty. Confirm that the pipeline sets this variable reliably, and consider handling undefined cases gracefully.


160-168: Great CLI integration!
The inclusion of a direct CLI option via --pr is convenient for automated CI usage, and the fallback to process.exit ensures a clear pass/fail outcome.

utils/types/metadata-types.ts (1)

68-71: Consider dynamic reload of config.
These constants are loaded once at startup. If keywords.config.yaml is updated, your service won't see changes unless it's restarted. Document this or implement a reloading mechanism if live updates are needed.

utils/metadata-batch-cli.ts (3)

1-9: LGTM! Well-structured imports and script setup.

The script correctly uses ES module imports and includes all necessary dependencies for file operations and metadata management.


10-32: LGTM! Well-defined interfaces and constants.

The interfaces are properly typed, and the color constants improve code readability and maintainability.


88-121: LGTM! Well-implemented frontmatter update function.

The function has proper error handling, supports dry run and verbose modes, and returns a well-defined type.

tsconfig.json (1)

9-9: LGTM! Proper TypeScript configuration for ES modules.

The changes correctly enable ES module support and synthetic default imports, which are necessary for the new metadata management scripts.

Also applies to: 12-12, 19-19, 29-33

.coderabbit.yaml (1)

13-40: LGTM! Comprehensive metadata validation requirements.

The instructions clearly define the required metadata fields for both regular and landing pages, and provide helpful guidance for fixing missing metadata.

package.json (1)

5-5: LGTM! Well-configured package for metadata management.

The changes properly set up ES modules support and add necessary scripts and dependencies for metadata management.

Also applies to: 7-8, 17-21, 38-38, 54-54

.circleci/config.yml (2)

69-69: Clarify Lint Job Description

The updated description now indicates that the job both lints Markdown files and validates metadata. This clear description helps future maintainers understand the expanded responsibilities of the job.


85-87: New Step: Validate Metadata

The additional step running pnpm validate-pr-metadata is a clean integration point for ensuring metadata validity. This aligns perfectly with the PR’s objective of enhancing metadata management and linting.

notes/metadata-update.md (1)

1-12: Comprehensive Introduction

The new file starts with a clear title and a brief overview of what the metadata management system does. This introduction is concise and sets the stage well for the detailed instructions that follow.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
utils/metadata-analyzer.ts (1)

589-637: 🛠️ Refactor suggestion

Replace global variables with a proper state management approach.

Using global variables (global.totalFiles, global.filesNeedingCategoryReview, etc.) for tracking state is error-prone. Consider using a class or context object to manage state.

+interface AnalysisState {
+  totalFiles: number;
+  filesNeedingCategoryReview: number;
+  filesNeedingContentTypeReview: number;
+}
+
+class MetadataAnalyzer {
+  private state: AnalysisState = {
+    totalFiles: 0,
+    filesNeedingCategoryReview: 0,
+    filesNeedingContentTypeReview: 0
+  };
+
+  analyzeContent(content: string, filepath: string, verbose: boolean = false): MetadataResult {
     // ... existing implementation ...
+    this.state.totalFiles++;
+  }
+
+  printSummary(): void {
     console.log(`
     Final Summary:
-    ✓ Processed ${global.totalFiles} files
-    ⚠️  ${global.filesNeedingCategoryReview || 0} files may need category review
-    ⚠️  ${global.filesNeedingContentTypeReview || 0} files may need content type review
+    ✓ Processed ${this.state.totalFiles} files
+    ⚠️  ${this.state.filesNeedingCategoryReview} files may need category review
+    ⚠️  ${this.state.filesNeedingContentTypeReview} files may need content type review
     (Dry run - no changes made)
     `);
   }
+}
🧹 Nitpick comments (10)
utils/metadata-manager.ts (2)

21-24: Consider caching and stronger typing for the configuration.

You are reloading and parsing the config on every validation, which could affect performance. Also, return a stricter type from loadConfig instead of Promise<any> to avoid potential runtime errors.

Also applies to: 229-233


182-189: Leverage concurrency for better performance.

You could handle file validations in parallel with Promise.all, improving speed when many .mdx files are changed.

- for (const file of modifiedFiles) {
-   const result = await updateMetadata(file, { validateOnly: true, prMode: true })
-   ...
- }
+ await Promise.all(modifiedFiles.map(async (file) => {
+   const result = await updateMetadata(file, { validateOnly: true, prMode: true })
+   if (!result.isValid) {
+     ...
+   }
+ }));
utils/types/metadata-types.ts (2)

5-8: Consolidate config loading.

You load keywords.config.yaml here and also in metadata-manager.ts. Refactor to maintain a single entry point for reading and parsing the YAML configuration.


10-18: Apply optional chaining per static analysis recommendation.

Improve readability by using optional chains instead of multiple && checks. For instance:

-function isValidConfig(config: any): config is MetadataConfig {
-  return (
-    config &&
-    config.metadata_rules &&
-    config.metadata_rules.persona?.validation_rules?.[0]?.enum &&
-    config.metadata_rules.content_type?.validation_rules?.[0]?.enum &&
-    config.metadata_rules.categories?.values
-  )
+function isValidConfig(config: any): config is MetadataConfig {
+  return !!(
+    config?.metadata_rules?.persona?.validation_rules?.[0]?.enum &&
+    config?.metadata_rules?.content_type?.validation_rules?.[0]?.enum &&
+    config?.metadata_rules?.categories?.values
+  )
}
🧰 Tools
🪛 Biome (1.9.4)

[error] 12-14: Change to an optional chain.

Unsafe fix: Change to an optional chain.

(lint/complexity/useOptionalChain)

utils/breadcrumbs.ts (1)

45-61: Optimize file reading with concurrency.

You can gather file reads in parallel with Promise.all to reduce overall execution time, especially for directories with many files.

- for (const file of files) {
-   ...
-   const content = await fs.promises.readFile(filePath, 'utf-8');
-   ...
- }
+ await Promise.all(files.map(async (file) => {
+   ...
+   const content = await fs.promises.readFile(filePath, 'utf-8');
+   ...
+ }));
utils/metadata-batch-cli.ts (3)

9-10: Consider using a type-safe approach for glob import.

Instead of using @ts-ignore, consider installing and using @types/glob for proper TypeScript support.

-// @ts-ignore
-const globModule = await import('glob')
+import type { glob } from 'glob'
+const globModule = await import('glob') as { default: typeof glob }

43-84: Reduce code duplication in findParentMetadata.

The file reading and metadata extraction logic is duplicated. Consider extracting it into a helper function.

+async function readMetadataFromFile(filePath: string): Promise<ParentMetadata | null> {
+  try {
+    const content = await fs.readFile(filePath, 'utf8')
+    const { data } = matter(content)
+    return {
+      path: filePath,
+      categories: data.categories || []
+    }
+  } catch (e) {
+    return null
+  }
+}

 async function findParentMetadata(filePath: string): Promise<ParentMetadata | null> {
   try {
     const dir = path.dirname(filePath)
     const parentFiles = ['index.mdx', 'README.mdx']
     
     // Try same directory first
     for (const file of parentFiles) {
       const sameDirPath = path.join(dir, file)
-      try {
-        const content = await fs.readFile(sameDirPath, 'utf8')
-        const { data } = matter(content)
-        return {
-          path: sameDirPath,
-          categories: data.categories || []
-        }
-      } catch (e) {
+      const metadata = await readMetadataFromFile(sameDirPath)
+      if (metadata) {
+        return metadata
+      } else {
         continue
       }
     }
     
     // Try parent directory
     const parentDir = path.dirname(dir)
     for (const file of parentFiles) {
       const parentPath = path.join(parentDir, file)
-      try {
-        const content = await fs.readFile(parentPath, 'utf8')
-        const { data } = matter(content)
-        return {
-          path: parentPath,
-          categories: data.categories || []
-        }
-      } catch (e) {
+      const metadata = await readMetadataFromFile(parentPath)
+      if (metadata) {
+        return metadata
+      } else {
         continue
       }
     }
     
     return null
   } catch (e) {
     return null
   }
 }
🧰 Tools
🪛 Biome (1.9.4)

[error] 60-60: Unnecessary continue statement

Unsafe fix: Delete the unnecessary continue statement

(lint/correctness/noUnnecessaryContinue)


[error] 76-76: Unnecessary continue statement

Unsafe fix: Delete the unnecessary continue statement

(lint/correctness/noUnnecessaryContinue)


121-149: Consider adding batch processing for better performance.

For large sets of files, processing them sequentially might be slow. Consider implementing concurrent processing with a batch size limit.

 async function processFiles(files: string[]): Promise<boolean> {
   let hasErrors = false
   let processedCount = 0
+  const BATCH_SIZE = 5
 
-  for (const file of files) {
+  // Process files in batches
+  for (let i = 0; i < files.length; i += BATCH_SIZE) {
+    const batch = files.slice(i, i + BATCH_SIZE)
+    const results = await Promise.all(batch.map(async (file) => {
       try {
         const result = await updateMetadata(file, { dryRun: true, prMode: true })
         if (!result.isValid) {
           hasErrors = true
           console.log(`\n${colors.red}Error in ${file}:${colors.reset}`)
           result.errors.forEach(error => {
             console.log(`  ${colors.yellow}→${colors.reset} ${error}`)
           })
         }
-        processedCount++
+        return { success: true }
       } catch (e) {
         console.log(`\n${colors.red}Failed to process ${file}: ${e}${colors.reset}`)
-        hasErrors = true
+        return { success: false }
       }
-    }
+    }))
+    
+    processedCount += results.filter(r => r.success).length
+    hasErrors = hasErrors || results.some(r => !r.success)
+  }
utils/metadata-analyzer.ts (1)

498-551: Optimize regular expressions for better performance.

Some regex patterns could be compiled once and reused. Also, consider using more efficient patterns.

+// Compile regex patterns once
+const REFERENCE_PATTERNS = {
+  tables: /\|.*\|/g,
+  codeBlocks: /```[^`]+```/g,
+  specialSyntax: /\{.*?\}/g,
+  technicalTerms: /\b(API|RPC|SDK|JSON|HTTP|URL|ETH)\b/g,
+  parameters: /^Parameters:/gm,
+  returns: /^Returns:/gm,
+  type: /^Type:/gm,
+  endpoint: /^Endpoint:/gm,
+  endpoints: /\bend?points?\b/gi
+};

 function countReferenceSignals(content: string, filepath: string): number {
   let score = 0;
   
   // Technical content indicators
-  score += (content.match(/\|.*\|/g) || []).length * 2; // Tables
+  score += (content.match(REFERENCE_PATTERNS.tables) || []).length * 2; // Tables
package.json (1)

7-19: Enhance script robustness with error handling.

The scripts using git diff and environment variables could fail silently. Consider adding error handling and validation.

-    "validate-metadata": "CHANGED_FILES=$(git diff --name-only HEAD) node --loader ts-node/esm utils/metadata-manager.ts",
+    "validate-metadata": "CHANGED_FILES=$(git diff --name-only HEAD || echo '') node --loader ts-node/esm utils/metadata-manager.ts",

Also, consider adding a pre and post script for the metadata batch operations to ensure proper setup and cleanup.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6a577c7 and 7fcd6dd.

⛔ Files ignored due to path filters (1)
  • dist/tsconfig.tsbuildinfo is excluded by !**/dist/**
📒 Files selected for processing (6)
  • package.json (3 hunks)
  • utils/breadcrumbs.ts (6 hunks)
  • utils/metadata-analyzer.ts (1 hunks)
  • utils/metadata-batch-cli.ts (1 hunks)
  • utils/metadata-manager.ts (1 hunks)
  • utils/types/metadata-types.ts (1 hunks)
🧰 Additional context used
🪛 Biome (1.9.4)
utils/types/metadata-types.ts

[error] 12-14: Change to an optional chain.

Unsafe fix: Change to an optional chain.

(lint/complexity/useOptionalChain)

utils/metadata-batch-cli.ts

[error] 60-60: Unnecessary continue statement

Unsafe fix: Delete the unnecessary continue statement

(lint/correctness/noUnnecessaryContinue)


[error] 76-76: Unnecessary continue statement

Unsafe fix: Delete the unnecessary continue statement

(lint/correctness/noUnnecessaryContinue)

⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: Redirect rules - docs-optimism
  • GitHub Check: Header rules - docs-optimism
  • GitHub Check: Pages changed - docs-optimism
🔇 Additional comments (6)
utils/metadata-manager.ts (2)

134-148: Great job implementing atomic writes!

Using a temporary file and then renaming it is a proper way to avoid partial writes if an error occurs. This effectively addresses the previous suggestion to ensure file consistency.


169-176:

❓ Verification inconclusive

Validate that resolved paths remain within the repository.

Untrusted or malformed CHANGED_FILES could resolve to paths outside the repo. Consider restricting paths to a known safe location.

To automate, you can run a shell script confirming that all resolved paths share the project root:


🏁 Script executed:

#!/bin/bash
# Ensure all resolved .mdx files are within the repository directory
REPO_ROOT="$(pwd)"
while read -r FILE; do
  FULL_PATH="$(realpath "$FILE")"
  if [[ "$FULL_PATH" != "$REPO_ROOT"* ]]; then
    echo "Potentially unsafe path: $FULL_PATH"
    exit 1
  fi
done < <(echo "$CHANGED_FILES" | tr '\n' ' ' | xargs -n1 echo | grep '\.mdx$')

Length of output: 208


Attention: Ensure Resolved Paths Remain Within the Repository

The function in utils/metadata-manager.ts (lines 169–176) processes the CHANGED_FILES variable by resolving each .mdx file’s absolute path. However, our validation shell script encountered an error related to process substitution (/dev/fd/63: No such file or directory), which prevents clear confirmation that all resolved paths are safely constrained within the repository root.

  • Verify manually that the environment running this script supports process substitution.
  • Ensure that the resolved paths from CHANGED_FILES do not escape the repository (i.e., they all begin with the repository’s root path).
  • Consider revising the safety check to use a more robust method or environment-independent approach if needed.
utils/breadcrumbs.ts (1)

7-10: ESM setup looks good.

Nicely done converting to ES modules using fileURLToPath and path.dirname to replicate __filename and __dirname.

utils/metadata-batch-cli.ts (1)

86-119: Well-structured implementation with proper error handling and logging!

The function effectively handles different modes (dry run, verbose) and provides clear error messages.

utils/metadata-analyzer.ts (1)

36-75: Avoid using a global Set for state management.

Relying on a module-scoped detectedPages can lead to unintended side effects if the script is reused or extended. Consider making this state local or passing it through function parameters to keep the logic deterministic.

package.json (1)

36-36: Dependencies are properly specified with pinned versions.

The addition of js-yaml and its type definitions is well-structured and follows best practices.

Also applies to: 52-52

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
utils/metadata-analyzer.ts (1)

501-510: 🛠️ Refactor suggestion

Avoid using global variables for state management.

The use of global variables (global.totalFiles, global.filesNeedingCategoryReview, global.filesNeedingContentTypeReview) makes the code harder to test and can lead to unexpected behavior when the module is reused.

Consider passing these statistics as parameters or encapsulating them in a class:

-export function printSummary(): void {
+export function printSummary(stats: { 
+  totalFiles: number,
+  filesNeedingCategoryReview: number,
+  filesNeedingContentTypeReview: number
+}): void {
   console.log(`
 Final Summary:
-✓ Processed ${global.totalFiles} files
-⚠️  ${global.filesNeedingCategoryReview || 0} files may need category review
-⚠️  ${global.filesNeedingContentTypeReview || 0} files may need content type review
+✓ Processed ${stats.totalFiles} files
+⚠️  ${stats.filesNeedingCategoryReview} files may need category review
+⚠️  ${stats.filesNeedingContentTypeReview} files may need content type review
 (Dry run - no changes made)
 `);
 }
🧹 Nitpick comments (9)
utils/metadata-batch-cli.ts (5)

9-10: Replace @ts-ignore with proper typing.

Instead of suppressing TypeScript errors, consider adding proper type definitions for the glob module.

-// @ts-ignore
-const globModule = await import('glob')
+import type { glob } from 'glob'
+const globModule = await import('glob') as { glob: typeof glob }

43-84: Optimize parent metadata lookup.

The function can be improved in two ways:

  1. Remove unnecessary continue statements in catch blocks
  2. Use Promise.any for concurrent file checks
 async function findParentMetadata(filePath: string): Promise<ParentMetadata | null> {
   try {
     const dir = path.dirname(filePath)
     const parentFiles = ['index.mdx', 'README.mdx']
     
-    // Try same directory first
-    for (const file of parentFiles) {
-      const sameDirPath = path.join(dir, file)
-      try {
-        const content = await fs.readFile(sameDirPath, 'utf8')
-        const { data } = matter(content)
-        return {
-          path: sameDirPath,
-          categories: data.categories || []
-        }
-      } catch (e) {
-        continue
-      }
-    }
+    // Try files in same directory concurrently
+    const sameDirPromises = parentFiles.map(async file => {
+      const sameDirPath = path.join(dir, file)
+      const content = await fs.readFile(sameDirPath, 'utf8')
+      const { data } = matter(content)
+      return {
+        path: sameDirPath,
+        categories: data.categories || []
+      }
+    })
+    
+    try {
+      return await Promise.any(sameDirPromises)
+    } catch {
+      // Try parent directory if same directory fails
+      const parentDir = path.dirname(dir)
+      const parentPromises = parentFiles.map(async file => {
+        const parentPath = path.join(parentDir, file)
+        const content = await fs.readFile(parentPath, 'utf8')
+        const { data } = matter(content)
+        return {
+          path: parentPath,
+          categories: data.categories || []
+        }
+      })
     
-    // Try parent directory
-    const parentDir = path.dirname(dir)
-    for (const file of parentFiles) {
-      const parentPath = path.join(parentDir, file)
       try {
-        const content = await fs.readFile(parentPath, 'utf8')
-        const { data } = matter(content)
-        return {
-          path: parentPath,
-          categories: data.categories || []
-        }
-      } catch (e) {
-        continue
+        return await Promise.any(parentPromises)
+      } catch {
+        return null
       }
     }
-    
-    return null
   } catch (e) {
     return null
   }
 }
🧰 Tools
🪛 Biome (1.9.4)

[error] 60-60: Unnecessary continue statement

Unsafe fix: Delete the unnecessary continue statement

(lint/correctness/noUnnecessaryContinue)


[error] 76-76: Unnecessary continue statement

Unsafe fix: Delete the unnecessary continue statement

(lint/correctness/noUnnecessaryContinue)


86-90: Improve type safety with a dedicated interface.

Define an interface for the return type to improve type safety and documentation.

+interface FrontmatterUpdateResult {
+  categories: string[]
+  contentType: string
+  isImported: boolean
+}
+
-async function updateFrontmatter(filePath: string, dryRun: boolean = false, verbose: boolean = false): Promise<{
-  categories: string[]
-  contentType: string
-  isImported: boolean
-}> {
+async function updateFrontmatter(filePath: string, dryRun: boolean = false, verbose: boolean = false): Promise<FrontmatterUpdateResult> {

121-147: Enhance error handling with custom error types.

Consider creating custom error types for better error categorization and handling.

+enum FileValidationErrorType {
+  NOT_FOUND = 'NOT_FOUND',
+  NOT_FILE = 'NOT_FILE',
+  NO_ACCESS = 'NO_ACCESS'
+}
+
+interface FileValidationError {
+  type: FileValidationErrorType
+  path: string
+  message: string
+}

 async function validateFilePaths(files: string[]): Promise<string[]> {
   const validFiles = []
-  const errors = []
+  const errors: FileValidationError[] = []

   for (const file of files) {
     try {
       await fs.access(file, fs.constants.R_OK)
       const stats = await fs.stat(file)
       if (stats.isFile()) {
         validFiles.push(file)
       } else {
-        errors.push(`${file} is not a file`)
+        errors.push({
+          type: FileValidationErrorType.NOT_FILE,
+          path: file,
+          message: 'Path exists but is not a file'
+        })
       }
     } catch (error) {
-      errors.push(`Cannot access ${file}: ${error.message}`)
+      errors.push({
+        type: error.code === 'ENOENT' ? FileValidationErrorType.NOT_FOUND : FileValidationErrorType.NO_ACCESS,
+        path: file,
+        message: error.message
+      })
     }
   }

   if (errors.length > 0) {
     console.log(`${colors.yellow}Warning: Some files were skipped:${colors.reset}`)
-    errors.forEach(error => console.log(`  ${colors.yellow}→${colors.reset} ${error}`))
+    errors.forEach(error => console.log(`  ${colors.yellow}→${colors.reset} ${error.path}: ${error.message}`))
   }

   return validFiles
 }

149-177: Optimize file processing with parallel execution.

Consider processing files concurrently for better performance, while maintaining organized error reporting.

 async function processFiles(files: string[]): Promise<boolean> {
   let hasErrors = false
-  let processedCount = 0
+  const results = await Promise.allSettled(
+    files.map(async file => {
+      try {
+        const result = await updateMetadata(file, { dryRun: true, prMode: true })
+        return { file, result }
+      } catch (e) {
+        throw { file, error: e }
+      }
+    })
+  )

-  for (const file of files) {
-    try {
-      const result = await updateMetadata(file, { dryRun: true, prMode: true })
-      if (!result.isValid) {
-        hasErrors = true
-        console.log(`\n${colors.red}Error in ${file}:${colors.reset}`)
-        result.errors.forEach(error => {
-          console.log(`  ${colors.yellow}→${colors.reset} ${error}`)
-        })
-      }
-      processedCount++
-    } catch (e) {
-      console.log(`\n${colors.red}Failed to process ${file}: ${e}${colors.reset}`)
-      hasErrors = true
-    }
-  }
+  const processedCount = results.length
+  const errors = results.reduce((acc, result) => {
+    if (result.status === 'rejected') {
+      hasErrors = true
+      console.log(`\n${colors.red}Failed to process ${result.reason.file}: ${result.reason.error}${colors.reset}`)
+      return acc + 1
+    } else if (!result.value.result.isValid) {
+      hasErrors = true
+      console.log(`\n${colors.red}Error in ${result.value.file}:${colors.reset}`)
+      result.value.result.errors.forEach(error => {
+        console.log(`  ${colors.yellow}→${colors.reset} ${error}`)
+      })
+      return acc + 1
+    }
+    return acc
+  }, 0)

   console.log(
     hasErrors
-      ? `\n${colors.red}✖ Found metadata issues in some files${colors.reset}`
+      ? `\n${colors.red}✖ Found metadata issues in ${errors} of ${processedCount} files${colors.reset}`
       : `\n${colors.green}✓ Validated ${processedCount} files successfully${colors.reset}`
   )

   return hasErrors
 }
package.json (1)

7-20: Add script documentation in package.json.

Consider adding a "scripts-info" section to document the purpose and usage of each script, especially the new metadata-related scripts.

   "scripts": {
     "lint": "eslint . --ext mdx --max-warnings 0 && pnpm spellcheck:lint && pnpm check-breadcrumbs && pnpm check-redirects && pnpm validate-metadata",
     "fix": "eslint . --ext mdx --fix && pnpm spellcheck:fix && pnpm breadcrumbs && pnpm fix-redirects && pnpm metadata-batch-cli",
     ...
   },
+  "scripts-info": {
+    "metadata-batch-cli": "Process and update metadata in MDX files",
+    "metadata-batch-cli:dry": "Preview metadata changes without applying them",
+    "metadata-batch-cli:verbose": "Process metadata with detailed logging",
+    "validate-metadata": "Validate metadata in changed files",
+    "validate-pr-metadata": "Validate metadata in PR files"
+  },
utils/metadata-analyzer.ts (3)

36-75: Eliminate side effects in isLandingPage function.

The function modifies the passed detectedPages Set, which is a side effect that makes the function harder to test and reason about.

Consider returning both the boolean result and the updated Set:

-function isLandingPage(content: string, filepath: string, detectedPages: Set<string>): boolean {
+function isLandingPage(content: string, filepath: string, detectedPages: Set<string>): 
+  { isLanding: boolean; updatedPages: Set<string> } {
   // ... existing checks ...
 
   const isLanding = (isOperatorLanding || (isOverviewPage && hasMultipleCards)) && !isTooDeep && !notLandingPage;
 
+  const updatedPages = new Set(detectedPages);
   if (isLanding) {
-    detectedPages.add(filepath);
+    updatedPages.add(filepath);
     console.log(`Landing page detected: ${filepath}`);
   }
 
-  return isLanding;
+  return { isLanding, updatedPages };
 }

331-364: Extract magic numbers and improve category prioritization.

The category limit of 5 and the priority order array should be configuration constants.

Consider extracting these as configurable constants:

+const MAX_CATEGORIES = 5;
+const CATEGORY_PRIORITY_ORDER = [
+  'protocol',
+  'infrastructure',
+  // ... rest of the categories
+] as const;

 // Limit to 5 most relevant categories
-const priorityOrder = [
-  'protocol',
-  'infrastructure',
-  // ... rest of the categories
-];

 const sortedCategories = Array.from(categories)
-  .sort((a, b) => priorityOrder.indexOf(a) - priorityOrder.indexOf(b))
-  .slice(0, 5);
+  .sort((a, b) => CATEGORY_PRIORITY_ORDER.indexOf(a) - CATEGORY_PRIORITY_ORDER.indexOf(b))
+  .slice(0, MAX_CATEGORIES);

371-424: Improve type safety and configurability of content scoring.

The scoring weights are hardcoded and scattered throughout the functions. Consider making them configurable and type-safe.

Consider using an enum or const object for scoring weights:

+const CONTENT_SCORES = {
+  TABLE_WEIGHT: 2,
+  CODE_BLOCK_WEIGHT: 1,
+  SPECIAL_SYNTAX_WEIGHT: 1,
+  TECHNICAL_TERM_WEIGHT: 1,
+  PARAMETER_WEIGHT: 3,
+  RETURN_WEIGHT: 3,
+  TYPE_WEIGHT: 2,
+  ENDPOINT_WEIGHT: 2,
+  API_REFERENCE_WEIGHT: 5
+} as const;

 function countReferenceSignals(content: string, filepath: string): number {
   let score = 0;
   
-  score += (content.match(/\|.*\|/g) || []).length * 2; // Tables
-  score += (content.match(/```[^`]+```/g) || []).length; // Code blocks
+  score += (content.match(/\|.*\|/g) || []).length * CONTENT_SCORES.TABLE_WEIGHT;
+  score += (content.match(/```[^`]+```/g) || []).length * CONTENT_SCORES.CODE_BLOCK_WEIGHT;
   // ... rest of the scoring logic
 }
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7fcd6dd and 4a4dfa5.

📒 Files selected for processing (3)
  • package.json (3 hunks)
  • utils/metadata-analyzer.ts (1 hunks)
  • utils/metadata-batch-cli.ts (1 hunks)
🧰 Additional context used
🪛 Biome (1.9.4)
utils/metadata-batch-cli.ts

[error] 60-60: Unnecessary continue statement

Unsafe fix: Delete the unnecessary continue statement

(lint/correctness/noUnnecessaryContinue)


[error] 76-76: Unnecessary continue statement

Unsafe fix: Delete the unnecessary continue statement

(lint/correctness/noUnnecessaryContinue)

⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: Redirect rules - docs-optimism
  • GitHub Check: Header rules - docs-optimism
  • GitHub Check: Pages changed - docs-optimism
🔇 Additional comments (1)
package.json (1)

37-37: LGTM!

The js-yaml dependency and its type definitions are correctly added.

Also applies to: 53-53

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
utils/metadata-analyzer.ts (1)

82-121: 🛠️ Refactor suggestion

Avoid using console.log in utility functions.

The isLandingPage function directly uses console.log, which makes it harder to test and control logging behavior. Consider using the provided logger from the config.

-    console.log(`Landing page detected: ${filepath}`);
+    config.logger(`Landing page detected: ${filepath}`);

Additionally, consider passing the detectedPages Set through the config to maintain better state management.

🧹 Nitpick comments (3)
utils/metadata-analyzer.ts (3)

1-49: Consider using environment variables for configuration.

The DEFAULT_CONFIG contains hardcoded values that might need to change based on different environments or requirements. Consider making these configurable through environment variables.

 const DEFAULT_CONFIG: AnalyzerConfig = {
-  defaultLang: 'en-US',
-  defaultTitle: '',
-  defaultDescription: '',
-  maxCategories: 5,
+  defaultLang: process.env.DEFAULT_LANG || 'en-US',
+  defaultTitle: process.env.DEFAULT_TITLE || '',
+  defaultDescription: process.env.DEFAULT_DESCRIPTION || '',
+  maxCategories: parseInt(process.env.MAX_CATEGORIES || '5', 10),
   logger: console.log,
   priorityOrder: [
     // ... existing priorities ...
   ]
 };

393-414: Add threshold constants for better maintainability.

The countReferenceSignals function uses magic numbers for scoring. Consider extracting these into named constants for better maintainability.

+const REFERENCE_SCORE = {
+  TABLE_WEIGHT: 2,
+  CODE_BLOCK_WEIGHT: 1,
+  SPECIAL_SYNTAX_WEIGHT: 1,
+  TECHNICAL_TERM_WEIGHT: 1,
+  PARAMETER_WEIGHT: 3,
+  RETURN_WEIGHT: 3,
+  TYPE_WEIGHT: 2,
+  ENDPOINT_WEIGHT: 2,
+  API_REFERENCE_WEIGHT: 5
+};

 function countReferenceSignals(content: string, filepath: string): number {
   let score = 0;
   
-  score += (content.match(/\|.*\|/g) || []).length * 2; // Tables
+  score += (content.match(/\|.*\|/g) || []).length * REFERENCE_SCORE.TABLE_WEIGHT;
   // ... apply similar changes for other scoring rules

539-550: Consider using a test configuration file.

The testing export exposes internal functions for testing. Consider moving test-specific configurations and exports to a separate test configuration file.

Create a new file utils/__tests__/metadata-analyzer.test-config.ts:

import {
  DEFAULT_CONFIG,
  detectStackCategories,
  detectOperatorCategories,
  detectAppDeveloperCategories,
  detectCommonCategories,
  detectCategories,
  detectContentType,
  isLandingPage,
  getLandingPageCategories
} from '../metadata-analyzer';

export const testConfig = {
  DEFAULT_CONFIG,
  detectStackCategories,
  detectOperatorCategories,
  detectAppDeveloperCategories,
  detectCommonCategories,
  detectCategories,
  detectContentType,
  isLandingPage,
  getLandingPageCategories
};
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4a4dfa5 and 0cdacbb.

📒 Files selected for processing (1)
  • utils/metadata-analyzer.ts (1 hunks)

Copy link
Member

@bradleycamacho bradleycamacho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Can't validate the typescript, but since an eng has approved the rest of it looks great to me!

@cpengilly cpengilly merged commit bc814a0 into main Feb 25, 2025
8 checks passed
@cpengilly cpengilly deleted the metadata-test branch February 25, 2025 01:09
@coderabbitai coderabbitai bot mentioned this pull request Mar 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants