Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add feature split files by maxTokenSize per file #113

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

fridaystreet
Copy link

@fridaystreet fridaystreet commented Oct 9, 2024

related: #71

Summary by CodeRabbit

Summary by CodeRabbit

Release Notes

  • New Features

    • Introduced a new optional property maxTokens for CLI options, allowing users to specify a maximum number of tokens per output file.
    • Added a new property onlyShowPartFilesInRepoStructure in the default configuration.
    • Enhanced output generation to include a new section titled "Repository Size" across different output styles (Markdown, Plain, XML).
  • Bug Fixes

    • Improved handling of undefined properties for file paths and output configurations, ensuring defaults are applied where necessary.
  • Refactor

    • Streamlined output generation logic for better readability and maintainability.
    • Updated the handling of output splits based on token limits for more granular control over output structure.
    • Refined the internal logic for processing output generation to enhance performance and clarity.

Copy link
Contributor

coderabbitai bot commented Oct 9, 2024

Walkthrough

The changes in this pull request involve updates to several functions and interfaces related to the command-line interface (CLI) and output generation. Key modifications include enhancements to the handling of configuration options, specifically the introduction of a maxTokens property, updates to output file handling, and the restructuring of output-related interfaces. The changes ensure better defaults and improve the overall handling of output files and configurations.

Changes

File Path Change Summary
src/cli/actions/defaultAction.ts Updated runDefaultAction and buildCliConfig for better handling of config.output properties and defaults.
src/cli/actions/remoteAction.ts Enhanced runRemoteAction to use a default value for output.filePath with the nullish coalescing operator.
src/cli/cliRun.ts Added optional maxTokens property to CliOptions interface and updated CLI options to include --max-tokens <number>.
src/config/configLoad.ts Modified mergeConfigs to use nullish coalescing for filePath defaulting to 'plain'.
src/config/configTypes.ts Introduced RepopackOutputConfig interface and updated related types to use this new interface.
src/config/defaultConfig.ts Added onlyShowPartFilesInRepoStructure property to defaultConfig.
src/core/output/outputGenerate.ts Changed generateOutput return type to Promise<string[]> and updated logic for output generation.
src/core/output/outputGeneratorTypes.ts Updated OutputGeneratorContext interface by removing processedFiles and adding new properties.
src/core/output/outputSplitter.ts Introduced OutputSplit interface and splitOutput function for handling output splits.
src/core/output/outputStyles/*.ts Updated generate*Style functions to replace processedFiles with includedFiles and added new properties to context.
src/core/packager.ts Refined pack function for better output generation and readability, with internal logic improvements.

Possibly related PRs


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🧹 Outside diff range and nitpick comments (14)
src/core/output/outputGeneratorTypes.ts (1)

10-10: Remove redundant comment

The comment // Add the includedFiles property is redundant and doesn't provide any additional information. Consider removing it for cleaner code.

-  includedFiles: ProcessedFile[]; // Add the includedFiles property
+  includedFiles: ProcessedFile[];
src/config/configTypes.ts (1)

3-14: LGTM! Consider adding JSDoc comments for better documentation.

The new RepopackOutputConfig interface is well-structured and aligns with the PR objective. The maxTokensPerFile property has been successfully added to support splitting files by max token size.

To improve code documentation, consider adding JSDoc comments for each property, especially for the new maxTokensPerFile property. This will enhance code readability and provide better context for developers using this interface.

Example for maxTokensPerFile:

/**
 * The maximum number of tokens allowed per file when splitting output.
 * If specified, output files will be split to ensure they don't exceed this token limit.
 */
maxTokensPerFile?: number;
src/core/output/outputStyles/xmlStyle.ts (2)

28-32: LGTM! Consider adding type annotations for clarity.

The changes to the renderContext object look good. The renaming of processedFiles to includedFiles and the addition of new properties (partNumber, totalParts, totalPartFiles, and totalFiles) enhance the context provided to the template. These changes align well with the PR objective of splitting files by maxTokenSize.

Consider adding type annotations to these new properties for improved code clarity and maintainability. For example:

includedFiles: OutputGeneratorContext['includedFiles'],
partNumber: number,
totalParts: number,
totalPartFiles: number,
totalFiles: number

59-63: LGTM! Consider wrapping content in CDATA for consistency.

The addition of the <repository_size> section is a great improvement. It provides valuable metadata about the file structure and organization, which is crucial for understanding split files.

For consistency with other sections in the XML template, consider wrapping the content in a CDATA section. This ensures that any special characters in the content are treated as literal text. Here's a suggested modification:

<repository_size>
<![CDATA[
This file is part {{{partNumber}}} of {{{totalParts}}} of a split representation of the entire codebase.
This file contains {{{totalPartFiles}}} out of a total of {{{totalFiles}}} files.
]]>
</repository_size>
src/cli/cliRun.ts (3)

26-26: Approve the addition of maxTokens property, but remove the comment.

The addition of the maxTokens property to the CliOptions interface is correct and appropriate. However, the comment // Add the maxTokens option is unnecessary as the property name is self-explanatory.

Consider removing the comment:

-  maxTokens?: number; // Add the maxTokens option
+  maxTokens?: number;

48-48: Approve the addition of --max-tokens option, but consider improvements.

The addition of the --max-tokens option is correct and follows the established pattern. However, there are two suggestions for improvement:

  1. The comment // Add the maxTokens option is unnecessary and can be removed.
  2. Consider adding validation for a minimum value to ensure the maxTokens is a positive integer.

Here's a suggested improvement:

-      .option('--max-tokens <number>', 'maximum number of tokens per output file', Number.parseInt) // Add the maxTokens option
+      .option('--max-tokens <number>', 'maximum number of tokens per output file', (value) => {
+        const parsed = Number.parseInt(value);
+        if (isNaN(parsed) || parsed <= 0) {
+          throw new Error('--max-tokens must be a positive integer');
+        }
+        return parsed;
+      })

This change removes the unnecessary comment and adds validation to ensure maxTokens is a positive integer.


Successful Implementation of maxTokensPerFile

The maxTokensPerFile option is correctly implemented in the configuration and output splitter modules, effectively limiting the number of tokens per output file as intended.

  • Ensure that the terminology is consistent between the PR objectives and the implementation. The PR mentions "maxTokenSize," whereas the code uses "maxTokensPerFile." Aligning these terms will enhance clarity and maintain consistency across the codebase and documentation.
🔗 Analysis chain

Line range hint 1-48: Consider additional changes and clarify terminology.

The addition of the maxTokens option is a good feature for managing large outputs. However, there are two points to consider:

  1. Additional changes might be needed in other parts of the codebase to implement the actual token limiting functionality. Ensure that the logic for splitting files based on this limit is implemented where the output is generated.

  2. There's a discrepancy between the PR objectives, which mention "split files by maxTokenSize per file", and the implementation, which uses "tokens". Clarify whether the limit should be based on file size or token count, and ensure consistency in terminology across the codebase and documentation.

To verify the implementation of the token limiting functionality, you can run the following script:

This script will help identify areas where the maxTokens functionality might need to be implemented or where similar concepts are already in use.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Search for potential implementations of the maxTokens functionality

# Test: Look for files that might handle output generation or file splitting
rg --type typescript -g '!*test*' -g '!*spec*' -A 10 'output.*file|split.*file|maxTokens'

Length of output: 126


Script:

#!/bin/bash
# Description: Search for potential implementations of the maxTokens functionality

# Test: Look for files that might handle output generation or file splitting in TypeScript files
rg -g '*.ts' -g '*.tsx' -A 10 'output.*file|split.*file|maxTokens'

Length of output: 19469

src/cli/actions/defaultAction.ts (1)

59-60: Approve changes with a minor suggestion

The changes improve the robustness of the runDefaultAction function by adding null checks and default values. This prevents potential undefined errors and ensures that printTopFiles is only called when there are actually files to print.

Consider using the nullish coalescing operator (??) for a more concise syntax:

- if (config.output.topFilesLength && config.output.topFilesLength > 0) {
-   printTopFiles(packResult.fileCharCounts, packResult.fileTokenCounts, config.output.topFilesLength ?? 0);
+ if (config.output.topFilesLength > 0) {
+   printTopFiles(packResult.fileCharCounts, packResult.fileTokenCounts, config.output.topFilesLength);

This change maintains the same functionality while slightly improving readability.

Also applies to: 71-71

src/cli/actions/remoteAction.ts (1)

Line range hint 85-95: Suggestion: Simplify checkGitInstallation function

While not directly related to the current change, the checkGitInstallation function could be simplified. The current implementation checks for stderr, which isn't necessary since the function will throw an error if Git is not installed.

Consider refactoring it as follows:

const checkGitInstallation = async (): Promise<boolean> => {
  try {
    await execAsync('git --version');
    return true;
  } catch (error) {
    logger.debug('Git is not installed:', (error as Error).message);
    return false;
  }
};

This simplification maintains the same functionality while reducing unnecessary checks.

src/core/output/outputStyles/markdownStyle.ts (2)

28-32: LGTM! Consider adding a comment for clarity.

The changes to the renderContext object look good. They provide more detailed information about the repository structure and the specific part being represented, which aligns with the PR objectives.

Consider adding a brief comment explaining the purpose of these new properties, especially totalPartFiles and totalFiles, to improve code readability:

// Add context about repository structure and current part
includedFiles: outputGeneratorContext.includedFiles,
partNumber: outputGeneratorContext.partNumber,
totalParts: outputGeneratorContext.totalParts,
totalPartFiles: outputGeneratorContext.includedFiles.length, // Number of files in this part
totalFiles: outputGeneratorContext.totalFiles // Total number of files in the repository

55-58: LGTM! Consider minor formatting adjustment for consistency.

The new "Repository Size" section is a valuable addition to the markdown template. It provides clear information about the repository structure and the current part, which aligns well with the PR objectives.

For consistency with other sections, consider adding a newline before the "Repository Size" section header:

 {{{summaryUsageGuidelines}}}
 
+
 ## Repository Size
 This file is part {{{partNumber}}} of {{{totalParts}}} of a split representation of the entire codebase.
 This file contains {{{totalPartFiles}}} out of a total of {{{totalFiles}}} files.
 

This small change will improve the overall formatting consistency of the generated markdown.

src/core/output/outputGenerate.ts (3)

64-66: Remove redundant type annotations

The type annotations for totalFiles, totalParts, and partNumber are unnecessary because their types can be inferred from the default values. Removing them simplifies the code without losing type information.

Apply this diff to simplify the code:

 export const buildOutputGeneratorContext = async (
   rootDir: string,
   config: RepopackConfigMerged,
   includedFiles: ProcessedFile[] = [],
   repositoryStructure: string[] = [],
-  totalFiles: number = 1,
-  totalParts: number = 1,
-  partNumber: number = 1
+  totalFiles = 1,
+  totalParts = 1,
+  partNumber = 1
 ): Promise<OutputGeneratorContext> => {
🧰 Tools
🪛 Biome

[error] 64-64: This type annotation is trivially inferred from its initialization.

Safe fix: Remove the type annotation.

(lint/style/noInferrableTypes)


[error] 65-65: This type annotation is trivially inferred from its initialization.

Safe fix: Remove the type annotation.

(lint/style/noInferrableTypes)


[error] 66-66: This type annotation is trivially inferred from its initialization.

Safe fix: Remove the type annotation.

(lint/style/noInferrableTypes)


86-86: Clarify comment or adjust code usage

The comment suggests using includedFiles for treeString, but the code uses repositoryStructure. If the intention is to generate the tree string based on includedFiles, consider updating the code. Otherwise, update the comment to reflect the actual parameter used.

Apply this diff to update the comment:

   generationDate: new Date().toISOString(),
-  treeString: generateTreeString(repositoryStructure), // Use includedFiles for treeString
+  treeString: generateTreeString(repositoryStructure), // Use repositoryStructure for treeString

Alternatively, if you intend to use includedFiles, adjust the code:

-  treeString: generateTreeString(repositoryStructure), // Use includedFiles for treeString
+  treeString: generateTreeString(includedFiles.map(f => f.path)), // Generate treeString from includedFiles

19-19: Use Number.POSITIVE_INFINITY instead of Infinity

To adhere to best practices and improve code clarity, consider using Number.POSITIVE_INFINITY instead of the global Infinity value.

Apply this diff to make the change:

 const maxTokensPerFile = config.output.maxTokensPerFile ?? 
-  Infinity; // Use Infinity if no limit is set
+  Number.POSITIVE_INFINITY; // Use Number.POSITIVE_INFINITY if no limit is set

 const outputSplits: OutputSplit[] =
-  maxTokensPerFile < Infinity
+  maxTokensPerFile < Number.POSITIVE_INFINITY
     ? splitOutput(
         processedFiles,
         maxTokensPerFile
       )

Also applies to: 22-22

🧰 Tools
🪛 Biome

[error] 19-19: Use Number.Infinity instead of the equivalent global.

ES2015 moved some globals into the Number namespace for consistency.
Safe fix: Use Number.Infinity instead.

(lint/style/useNumberNamespace)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 08982bd and c924ba3.

📒 Files selected for processing (13)
  • src/cli/actions/defaultAction.ts (3 hunks)
  • src/cli/actions/remoteAction.ts (1 hunks)
  • src/cli/cliRun.ts (2 hunks)
  • src/config/configLoad.ts (1 hunks)
  • src/config/configTypes.ts (2 hunks)
  • src/config/defaultConfig.ts (1 hunks)
  • src/core/output/outputGenerate.ts (1 hunks)
  • src/core/output/outputGeneratorTypes.ts (1 hunks)
  • src/core/output/outputSplitter.ts (1 hunks)
  • src/core/output/outputStyles/markdownStyle.ts (2 hunks)
  • src/core/output/outputStyles/plainStyle.ts (2 hunks)
  • src/core/output/outputStyles/xmlStyle.ts (3 hunks)
  • src/core/packager.ts (5 hunks)
🧰 Additional context used
🪛 Biome
src/core/output/outputGenerate.ts

[error] 64-64: This type annotation is trivially inferred from its initialization.

Safe fix: Remove the type annotation.

(lint/style/noInferrableTypes)


[error] 65-65: This type annotation is trivially inferred from its initialization.

Safe fix: Remove the type annotation.

(lint/style/noInferrableTypes)


[error] 66-66: This type annotation is trivially inferred from its initialization.

Safe fix: Remove the type annotation.

(lint/style/noInferrableTypes)


[error] 19-19: Use Number.Infinity instead of the equivalent global.

ES2015 moved some globals into the Number namespace for consistency.
Safe fix: Use Number.Infinity instead.

(lint/style/useNumberNamespace)


[error] 22-22: Use Number.Infinity instead of the equivalent global.

ES2015 moved some globals into the Number namespace for consistency.
Safe fix: Use Number.Infinity instead.

(lint/style/useNumberNamespace)

🔇 Additional comments (14)
src/core/output/outputGeneratorTypes.ts (1)

9-13: LGTM: New properties support file splitting feature

The new properties (content, includedFiles, totalFiles, partNumber, totalParts) are well-structured and align with the PR objective of splitting files by maxTokenSize. They provide the necessary context for handling multiple parts of output.

src/config/defaultConfig.ts (1)

17-17: LGTM! Consider adding documentation and updating types.

The addition of the onlyShowPartFilesInRepoStructure property to the output object is a good enhancement to the configuration options. However, I have a few suggestions to improve this change:

  1. Consider adding a comment above this line to explain the purpose and usage of this new property. This will help other developers understand its functionality.

  2. Ensure that the RepopackConfigDefault type (likely defined in configTypes.ts) is updated to include this new property. This will maintain type safety across the project.

To verify if this change is properly reflected in other parts of the codebase, please run the following script:

This script will help ensure that the new property is properly typed and used consistently across the project.

✅ Verification successful

All checks passed! Documentation and type definitions are in place.

The onlyShowPartFilesInRepoStructure property has been correctly added to the RepopackConfigDefault type and is actively used in the codebase.

  • Type Definition: Verified in src/config/configTypes.ts.
  • Usage Locations:
    • src/config/defaultConfig.ts
    • src/core/output/outputGenerate.ts

No further actions are required.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for usage and type definition of the new property

# Test 1: Check if the type definition has been updated
echo "Checking type definition:"
rg --type typescript "onlyShowPartFilesInRepoStructure" src/config/configTypes.ts

# Test 2: Check for usage of the new property in other files
echo "Checking usage in other files:"
rg --type typescript "onlyShowPartFilesInRepoStructure" src

Length of output: 343


Script:

#!/bin/bash
# Description: Check for usage and type definition of the new property

# Test 1: Check if the type definition has been updated
echo "Checking type definition:"
rg "onlyShowPartFilesInRepoStructure" src/config/configTypes.ts

# Test 2: Check for usage of the new property in other files
echo "Checking usage in other files:"
rg "onlyShowPartFilesInRepoStructure" src -g "*.ts" -g "*.tsx"

Length of output: 598

src/config/configTypes.ts (2)

17-17: LGTM! Good refactoring of the output property type.

The change to use RepopackOutputConfig for the output property type in RepopackConfigBase is a good refactoring. It improves code organization and maintainability by centralizing the output configuration options.


Line range hint 1-30: Overall, good improvements to configuration structure and support for new feature.

The changes in this file effectively introduce the maxTokensPerFile option and improve the overall structure of the configuration types. The new RepopackOutputConfig interface centralizes output-related properties, enhancing maintainability and clarity. These changes align well with the PR objective of adding the feature to split files by max token size per file.

Key improvements:

  1. Introduction of RepopackOutputConfig interface
  2. Consistent updates to RepopackConfigBase and RepopackConfigDefault
  3. Addition of maxTokensPerFile property to support the new feature

These changes provide a solid foundation for implementing the file splitting feature based on token size.

src/core/output/outputStyles/xmlStyle.ts (2)

88-88: LGTM! Consistent use of includedFiles.

The update to use includedFiles instead of processedFiles in the <repository_files> section is correct and maintains consistency with the changes made to the renderContext object.


Line range hint 1-105: Overall, excellent improvements to support file splitting feature.

The changes made to this file significantly enhance the XML output generation process by incorporating additional metadata related to file structure and organization. These modifications directly support the PR objective of adding the feature to split files by maxTokenSize.

Key improvements include:

  1. Addition of new properties to the renderContext object, providing more detailed information about the file structure.
  2. Introduction of a new <repository_size> section in the XML template, offering clear metadata about the split files.
  3. Consistent renaming from processedFiles to includedFiles throughout the file.

These changes will greatly improve the clarity and usefulness of the generated XML output, especially in scenarios where the codebase is split across multiple files.

src/cli/actions/defaultAction.ts (1)

106-108: Approve changes and request clarification

The addition of the maxTokens option to the CLI configuration is well-implemented. It correctly handles the case when the option is undefined and adds the value to the appropriate config object.

Could you clarify the reason for changing the property name from maxTokens to maxTokensPerFile? This might indicate a more specific use of this option. To ensure consistency, let's verify if this property name is used correctly throughout the codebase:

This will help us ensure that the property name is used consistently and that we haven't missed any places where it should be updated.

✅ Verification successful

Run updated verification scripts for property usage

The previous search using --type typescript was unsuccessful. Let's adjust the script to target .ts and .tsx files directly:


Verified property naming consistency

The maxTokensPerFile property is used consistently across the codebase, and the maxTokens CLI option correctly maps to it in the configuration. No issues found.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check usage of maxTokensPerFile in the codebase

# Search for maxTokensPerFile
echo "Occurrences of maxTokensPerFile:"
rg "maxTokensPerFile" --type typescript

# Search for maxTokens (excluding this file)
echo "\nOccurrences of maxTokens:"
rg "maxTokens" --type typescript | grep -v "src/cli/actions/defaultAction.ts"

Length of output: 325


Script:

#!/bin/bash
# Description: Check usage of maxTokensPerFile and maxTokens in the codebase

# Search for maxTokensPerFile
echo "Occurrences of maxTokensPerFile:"
rg "maxTokensPerFile" --glob "*.ts" --glob "*.tsx"

# Search for maxTokens (excluding src/cli/actions/defaultAction.ts)
echo "\nOccurrences of maxTokens:"
rg "maxTokens" --glob "*.ts" --glob "*.tsx" | grep -v "src/cli/actions/defaultAction.ts"

Length of output: 1627

src/cli/actions/remoteAction.ts (1)

32-32: Improved robustness with fallback value for output file path

This change enhances the function's reliability by providing a default value for the output file path. Using the nullish coalescing operator ensures that even if result.config.output.filePath is undefined or null, a valid file path ('repopack-output.txt') will be used. This improvement aligns well with the PR's objective of better handling output files and configurations.

src/config/configLoad.ts (1)

88-88: 🛠️ Refactor suggestion

Improve robustness and readability of default file path assignment

The change improves the code by ensuring a valid style is always used when determining the default file path. However, there are a couple of areas for improvement:

  1. Modifying defaultConfig.output.filePath directly could lead to unexpected behavior if mergeConfigs is called multiple times.
  2. The logic for determining the style and setting the default file path is somewhat complex and could be simplified.

Consider refactoring this part of the function to improve clarity and avoid modifying the defaultConfig object:

let defaultFilePath: string;
if (cliConfig.output?.filePath == null && fileConfig.output?.filePath == null) {
  const style = cliConfig.output?.style ?? fileConfig.output?.style ?? defaultConfig.output.style ?? 'plain';
  defaultFilePath = defaultFilePathMap[style];
} else {
  defaultFilePath = defaultConfig.output.filePath;
}

return {
  cwd,
  output: {
    ...defaultConfig.output,
    filePath: defaultFilePath,
    ...fileConfig.output,
    ...cliConfig.output,
  },
  // ... rest of the merged config
};

This refactoring:

  1. Avoids modifying defaultConfig.
  2. Simplifies the style determination logic.
  3. Clearly separates the default file path logic from the rest of the config merging.

To ensure this change doesn't introduce any regressions, we should verify the usage of defaultConfig and mergeConfigs throughout the codebase. Run the following script:

This script will help us understand if the proposed changes might affect other parts of the codebase.

src/core/output/outputStyles/markdownStyle.ts (1)

Line range hint 1-214: Overall, the changes look good and align with the PR objectives.

The modifications to generateMarkdownStyle function and the markdown template effectively implement the feature to split files by maxTokenSize. The new properties added to renderContext and the "Repository Size" section in the template provide valuable information about the repository structure and the current part being represented.

The changes are well-implemented and there are no major issues. The minor suggestions for improvement (adding a comment for clarity and adjusting formatting for consistency) will enhance code readability and maintain consistent styling.

Great job on this implementation!

src/core/output/outputGenerate.ts (1)

35-35: Ensure correct file paths are used in repository structure

In the ternary operation, when config.output.onlyShowPartFilesInRepoStructure is true, you're mapping outputSplit.includedFiles to their paths. Ensure that f.path correctly represents the file paths relative to rootDir to avoid inconsistencies in the generated repository structure.

Would you like assistance in verifying the file paths used in the repository structure?

src/core/packager.ts (3)

60-63: LGTM!

The reformatting of the runSecurityCheck function call improves readability without altering functionality.


81-86: LGTM!

The call to generateOutput is clear and correctly passes all required parameters.


124-128: LGTM!

The progress callback now includes the file path with a fallback to 'Unknown File', which improves the clarity of the progress messages.

src/config/configTypes.ts Show resolved Hide resolved
src/core/output/outputSplitter.ts Show resolved Hide resolved
src/core/output/outputSplitter.ts Show resolved Hide resolved
src/core/output/outputStyles/plainStyle.ts Show resolved Hide resolved
src/core/output/outputGenerate.ts Show resolved Hide resolved
src/core/packager.ts Show resolved Hide resolved
src/core/packager.ts Show resolved Hide resolved
src/core/packager.ts Show resolved Hide resolved
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (2)
src/core/output/outputGenerate.ts (2)

11-27: LGTM with a minor style suggestion

The implementation of the new feature to split output based on a token limit looks good. It aligns well with the PR objective of adding the ability to split files by maxTokenSize per file.

Consider using Number.Infinity instead of Infinity for consistency with ES2015 standards:

- const maxTokensPerFile = config.output.maxTokensPerFile ?? Infinity;
+ const maxTokensPerFile = config.output.maxTokensPerFile ?? Number.Infinity;

- if (maxTokensPerFile < Infinity)
+ if (maxTokensPerFile < Number.Infinity)

This change improves code consistency and follows modern JavaScript best practices.

🧰 Tools
🪛 Biome

[error] 19-19: Use Number.Infinity instead of the equivalent global.

ES2015 moved some globals into the Number namespace for consistency.
Safe fix: Use Number.Infinity instead.

(lint/style/useNumberNamespace)


[error] 22-22: Use Number.Infinity instead of the equivalent global.

ES2015 moved some globals into the Number namespace for consistency.
Safe fix: Use Number.Infinity instead.

(lint/style/useNumberNamespace)


62-66: Approve changes with minor refactor suggestion

The addition of new parameters to buildOutputGeneratorContext effectively supports the new split output feature while maintaining backward compatibility through default values. This is a good approach.

Consider removing the redundant type annotations for parameters with default values:

export const buildOutputGeneratorContext = async (
  rootDir: string,
  config: RepopackConfigMerged,
- includedFiles: ProcessedFile[] = [],
- repositoryStructure: string[] = [],
- totalFiles: number = 1,
- totalParts: number = 1,
- partNumber: number = 1
+ includedFiles = [],
+ repositoryStructure = [],
+ totalFiles = 1,
+ totalParts = 1,
+ partNumber = 1
): Promise<OutputGeneratorContext> => {

This change simplifies the code without losing type information, as TypeScript can infer the types from the default values.

🧰 Tools
🪛 Biome

[error] 64-64: This type annotation is trivially inferred from its initialization.

Safe fix: Remove the type annotation.

(lint/style/noInferrableTypes)


[error] 65-65: This type annotation is trivially inferred from its initialization.

Safe fix: Remove the type annotation.

(lint/style/noInferrableTypes)


[error] 66-66: This type annotation is trivially inferred from its initialization.

Safe fix: Remove the type annotation.

(lint/style/noInferrableTypes)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between c924ba3 and f40353b.

📒 Files selected for processing (2)
  • src/core/output/outputGenerate.ts (1 hunks)
  • src/core/output/outputSplitter.ts (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/core/output/outputSplitter.ts
🧰 Additional context used
🪛 Biome
src/core/output/outputGenerate.ts

[error] 64-64: This type annotation is trivially inferred from its initialization.

Safe fix: Remove the type annotation.

(lint/style/noInferrableTypes)


[error] 65-65: This type annotation is trivially inferred from its initialization.

Safe fix: Remove the type annotation.

(lint/style/noInferrableTypes)


[error] 66-66: This type annotation is trivially inferred from its initialization.

Safe fix: Remove the type annotation.

(lint/style/noInferrableTypes)


[error] 19-19: Use Number.Infinity instead of the equivalent global.

ES2015 moved some globals into the Number namespace for consistency.
Safe fix: Use Number.Infinity instead.

(lint/style/useNumberNamespace)


[error] 22-22: Use Number.Infinity instead of the equivalent global.

ES2015 moved some globals into the Number namespace for consistency.
Safe fix: Use Number.Infinity instead.

(lint/style/useNumberNamespace)

🔇 Additional comments (2)
src/core/output/outputGenerate.ts (2)

29-56: Excellent implementation of parallel output processing

The use of Promise.all to handle multiple output splits is a great approach. It allows for efficient parallel processing of the splits, which can significantly improve performance for large repositories or when dealing with multiple output files.

The updated call to buildOutputGeneratorContext with new parameters ensures that each split is processed with the correct context, including part numbers and total counts.


86-93: Well-structured context object for split outputs

The updates to the returned context object are well-thought-out and provide all the necessary information for handling split outputs. The use of repositoryStructure for treeString generation is a good choice, as it allows for accurate representation of the file structure in each split.

The new properties (includedFiles, totalFiles, totalParts, partNumber) will be very useful for generating comprehensive and accurate output for each split.

@@ -85,7 +85,7 @@ export const mergeConfigs = (
// If the output file path is not provided in the config file or CLI, use the default file path for the style
if (cliConfig.output?.filePath == null && fileConfig.output?.filePath == null) {
const style = cliConfig.output?.style || fileConfig.output?.style || defaultConfig.output.style;
defaultConfig.output.filePath = defaultFilePathMap[style];
defaultConfig.output.filePath = defaultFilePathMap[style ?? 'plain'];
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is not necessary since the defaultConfig will eventually make it plain.

Suggested change
defaultConfig.output.filePath = defaultFilePathMap[style ?? 'plain'];
defaultConfig.output.filePath = defaultFilePathMap[style];

@@ -44,6 +45,7 @@ export async function run() {
.option('--init', 'initialize a new repopack.config.json file')
.option('--global', 'use global configuration (only applicable with --init)')
.option('--remote <url>', 'process a remote Git repository')
.option('--max-tokens <number>', 'maximum number of tokens per output file', Number.parseInt) // Add the maxTokens option
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to make it output-max-tokens if possible since it is max-tokens regarding output. I am concerned about the possibility of max-tokens for other uses in the future.

Suggested change
.option('--max-tokens <number>', 'maximum number of tokens per output file', Number.parseInt) // Add the maxTokens option
.option('--output-max-tokens <number>', 'maximum number of tokens per output file', Number.parseInt) // Add the maxTokens option

@@ -29,7 +29,7 @@ export const runRemoteAction = async (repoUrl: string, options: CliOptions): Pro
logger.log('');

const result = await runDefaultAction(tempDir, tempDir, options);
await copyOutputToCurrentDirectory(tempDir, process.cwd(), result.config.output.filePath);
await copyOutputToCurrentDirectory(tempDir, process.cwd(), result.config.output.filePath ?? 'repopack-output.txt');
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not necessary because it will eventually become the value of defaultConfig in the process of config merging.

Suggested change
await copyOutputToCurrentDirectory(tempDir, process.cwd(), result.config.output.filePath ?? 'repopack-output.txt');
await copyOutputToCurrentDirectory(tempDir, process.cwd(), result.config.output.filePath);

topFilesLength: number;
showLineNumbers: boolean;
};
output: RepopackOutputConfig;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since defaultConfig dares to make some parts of the code non-nullable, I would like it to be the original code with this option added.

@@ -75,7 +85,7 @@ This section contains a summary of this file.
<repository_files>
This section contains the contents of the repository's files.

{{#each processedFiles}}
{{#each includedFiles}}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make the same correction to plainStyle.ts and markdownStyle.ts.

@@ -56,8 +56,8 @@ export const runDefaultAction = async (
spinner.succeed('Packing completed successfully!');
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the whole thing,
please fix the tests that fail on npm run lint run and npm run test

@@ -59,6 +63,11 @@ Usage Guidelines:
-----------------
{{{summaryUsageGuidelines}}}

Repository Size:
-----------------
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This hyphen is aligned with the letter above.

Suggested change
-----------------
----------------

src/core/output/outputSplitter.ts Show resolved Hide resolved
let currentIncludedFiles: ProcessedFile[] = []; // Initialize currentIncludedFiles

for (const file of processedFiles) {
const fileTokenCount = tokenCounter.countTokens(file.content, file.path);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is a large number of files, the heavy processing of token count may be affected. Currently, the parts where token counts are being performed are slightly mitigated by adding sleep.

However, I thought it would be better to leave it as it is for now and fix it when the problem actually occurs.

progressCallback('Writing output file(s)...');

// Handle the case where filePath is undefined
if (config.output.filePath) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The filePath is set in the default config, so it will always be present.

@kagux
Copy link

kagux commented Oct 9, 2024

@fridaystreet appreciate a lot that you work on this feature! I've tried using the build from this PR and it produces empty files. They only include the header section with directory/files and nothing from files themselves. Even if no max token option is specified, still single file with no contents besides header.

@fridaystreet
Copy link
Author

@kagux thanks for the feedback. It was all working and seems to be wirking locally fir me. But Just after I generated the pr there were a few code changes and refactirs applied to the main branch I noticed. I pulled them in and fixed it up and pushed up another commit. Possibly I missed something.

It's still a draft pr -(or it was at least) as I haven't finished doco or tests yet. Was just looking fir initial feedback.

I'll have a look through and retest see what I've missed.

Thanks again

@fridaystreet
Copy link
Author

@kagux I just reran it again this morning and it all seems to be working for me. Both with max-tokens set and without.

Also tried with a folder specified and without.

Can you post up the command line you're using or the config.json, not sure what's going on, but sure we can get to the bottom of it.

Cheers

@fridaystreet
Copy link
Author

I've just noticed something I've missed though. It's not taking into account the token size of the header. I'll have to add some mechanism in to generate the header first and calulate it's size, then iterate over the files with the remaining allowable tokens from maxTokens-headerTokens

@fridaystreet
Copy link
Author

Also some interesting responses when trying to query the content to see if it's understanding it properly.

Based on the provided context, I cannot determine if the last file element node is the absolute last one in the entire file.

Here's why:

Partial File: The context states that the XML file you shared is only "part 1 of 62". This means there are 61 other files with potentially more file element nodes.

Seems to be getting confused how to interpret the repo size section. I might have to come up with a clearer way of defining the difference between parts and file count in each part.

<repository_size>
This file is part 1 of 62 of a split representation of the entire codebase.
This file contains 33 out of a total of 1054 files.

</repository_size>

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (2)
src/core/output/outputStyles/plainStyle.ts (2)

66-70: LGTM! New "Repository Size" section adds valuable context.

The addition of the "Repository Size" section provides important information about the split file representation. This aligns well with the new properties in the renderContext.

Consider a minor wording improvement for clarity:

 Repository Size:
 -----------------
-This file is part {{{partNumber}}} of {{{totalParts}}} of a split representation of the entire codebase.
+This file represents part {{{partNumber}}} of {{{totalParts}}} in a split representation of the entire codebase.
 This file contains {{{totalPartFiles}}} out of a total of {{{totalFiles}}} files.

Line range hint 1-94: Overall implementation of split file representation looks good.

The changes in this file successfully implement the split file representation feature. Key points:

  1. The renderContext now includes necessary properties for split representation.
  2. A new "Repository Size" section in the template provides valuable context.
  3. The template has been updated to use includedFiles consistently.

These changes align well with the PR objectives and address the concerns raised in the PR comments about empty files and confusion regarding file counts.

Consider adding a brief comment in the code explaining the purpose of the split file representation feature and how it relates to the maxTokens option mentioned in the PR description. This would improve code maintainability and help future developers understand the feature's context.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between f40353b and 5910bc2.

📒 Files selected for processing (2)
  • src/core/output/outputStyles/markdownStyle.ts (3 hunks)
  • src/core/output/outputStyles/plainStyle.ts (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/core/output/outputStyles/markdownStyle.ts
🧰 Additional context used
🔇 Additional comments (2)
src/core/output/outputStyles/plainStyle.ts (2)

94-94: LGTM! Template updated to use includedFiles.

The change from processedFiles to includedFiles in the template's each loop resolves the inconsistency mentioned in the past review comment. This update ensures that the template correctly uses the new property from the renderContext.


28-32: LGTM! Changes align with new split file representation.

The additions to the renderContext object correctly implement the new properties needed for split file representation. The past comment about inconsistency between includedFiles and processedFiles has been addressed by this change.

To ensure the template is using the new includedFiles property consistently, let's verify:

✅ Verification successful

Verified! 'processedFiles' has been successfully replaced with 'includedFiles' in plainStyle.ts.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Verify that 'processedFiles' has been replaced with 'includedFiles' in the template
rg --type typescript 'processedFiles' src/core/output/outputStyles/plainStyle.ts

Length of output: 115


Script:

#!/bin/bash
# Verify that 'processedFiles' has been replaced with 'includedFiles' in the template
grep 'processedFiles' src/core/output/outputStyles/plainStyle.ts

Length of output: 64

@yamadashy yamadashy added the enhancement New feature or request label Nov 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants