-
-
Notifications
You must be signed in to change notification settings - Fork 282
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add feature split files by maxTokenSize per file #113
base: main
Are you sure you want to change the base?
Conversation
WalkthroughThe changes in this pull request involve updates to several functions and interfaces related to the command-line interface (CLI) and output generation. Key modifications include enhancements to the handling of configuration options, specifically the introduction of a Changes
Possibly related PRs
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 8
🧹 Outside diff range and nitpick comments (14)
src/core/output/outputGeneratorTypes.ts (1)
10-10
: Remove redundant commentThe comment
// Add the includedFiles property
is redundant and doesn't provide any additional information. Consider removing it for cleaner code.- includedFiles: ProcessedFile[]; // Add the includedFiles property + includedFiles: ProcessedFile[];src/config/configTypes.ts (1)
3-14
: LGTM! Consider adding JSDoc comments for better documentation.The new
RepopackOutputConfig
interface is well-structured and aligns with the PR objective. ThemaxTokensPerFile
property has been successfully added to support splitting files by max token size.To improve code documentation, consider adding JSDoc comments for each property, especially for the new
maxTokensPerFile
property. This will enhance code readability and provide better context for developers using this interface.Example for
maxTokensPerFile
:/** * The maximum number of tokens allowed per file when splitting output. * If specified, output files will be split to ensure they don't exceed this token limit. */ maxTokensPerFile?: number;src/core/output/outputStyles/xmlStyle.ts (2)
28-32
: LGTM! Consider adding type annotations for clarity.The changes to the
renderContext
object look good. The renaming ofprocessedFiles
toincludedFiles
and the addition of new properties (partNumber
,totalParts
,totalPartFiles
, andtotalFiles
) enhance the context provided to the template. These changes align well with the PR objective of splitting files by maxTokenSize.Consider adding type annotations to these new properties for improved code clarity and maintainability. For example:
includedFiles: OutputGeneratorContext['includedFiles'], partNumber: number, totalParts: number, totalPartFiles: number, totalFiles: number
59-63
: LGTM! Consider wrapping content in CDATA for consistency.The addition of the
<repository_size>
section is a great improvement. It provides valuable metadata about the file structure and organization, which is crucial for understanding split files.For consistency with other sections in the XML template, consider wrapping the content in a CDATA section. This ensures that any special characters in the content are treated as literal text. Here's a suggested modification:
<repository_size> <![CDATA[ This file is part {{{partNumber}}} of {{{totalParts}}} of a split representation of the entire codebase. This file contains {{{totalPartFiles}}} out of a total of {{{totalFiles}}} files. ]]> </repository_size>src/cli/cliRun.ts (3)
26-26
: Approve the addition ofmaxTokens
property, but remove the comment.The addition of the
maxTokens
property to theCliOptions
interface is correct and appropriate. However, the comment// Add the maxTokens option
is unnecessary as the property name is self-explanatory.Consider removing the comment:
- maxTokens?: number; // Add the maxTokens option + maxTokens?: number;
48-48
: Approve the addition of--max-tokens
option, but consider improvements.The addition of the
--max-tokens
option is correct and follows the established pattern. However, there are two suggestions for improvement:
- The comment
// Add the maxTokens option
is unnecessary and can be removed.- Consider adding validation for a minimum value to ensure the
maxTokens
is a positive integer.Here's a suggested improvement:
- .option('--max-tokens <number>', 'maximum number of tokens per output file', Number.parseInt) // Add the maxTokens option + .option('--max-tokens <number>', 'maximum number of tokens per output file', (value) => { + const parsed = Number.parseInt(value); + if (isNaN(parsed) || parsed <= 0) { + throw new Error('--max-tokens must be a positive integer'); + } + return parsed; + })This change removes the unnecessary comment and adds validation to ensure
maxTokens
is a positive integer.
Successful Implementation of
maxTokensPerFile
The
maxTokensPerFile
option is correctly implemented in the configuration and output splitter modules, effectively limiting the number of tokens per output file as intended.
- Ensure that the terminology is consistent between the PR objectives and the implementation. The PR mentions "maxTokenSize," whereas the code uses "maxTokensPerFile." Aligning these terms will enhance clarity and maintain consistency across the codebase and documentation.
🔗 Analysis chain
Line range hint
1-48
: Consider additional changes and clarify terminology.The addition of the
maxTokens
option is a good feature for managing large outputs. However, there are two points to consider:
Additional changes might be needed in other parts of the codebase to implement the actual token limiting functionality. Ensure that the logic for splitting files based on this limit is implemented where the output is generated.
There's a discrepancy between the PR objectives, which mention "split files by maxTokenSize per file", and the implementation, which uses "tokens". Clarify whether the limit should be based on file size or token count, and ensure consistency in terminology across the codebase and documentation.
To verify the implementation of the token limiting functionality, you can run the following script:
This script will help identify areas where the
maxTokens
functionality might need to be implemented or where similar concepts are already in use.🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Search for potential implementations of the maxTokens functionality # Test: Look for files that might handle output generation or file splitting rg --type typescript -g '!*test*' -g '!*spec*' -A 10 'output.*file|split.*file|maxTokens'Length of output: 126
Script:
#!/bin/bash # Description: Search for potential implementations of the maxTokens functionality # Test: Look for files that might handle output generation or file splitting in TypeScript files rg -g '*.ts' -g '*.tsx' -A 10 'output.*file|split.*file|maxTokens'Length of output: 19469
src/cli/actions/defaultAction.ts (1)
59-60
: Approve changes with a minor suggestionThe changes improve the robustness of the
runDefaultAction
function by adding null checks and default values. This prevents potential undefined errors and ensures thatprintTopFiles
is only called when there are actually files to print.Consider using the nullish coalescing operator (
??
) for a more concise syntax:- if (config.output.topFilesLength && config.output.topFilesLength > 0) { - printTopFiles(packResult.fileCharCounts, packResult.fileTokenCounts, config.output.topFilesLength ?? 0); + if (config.output.topFilesLength > 0) { + printTopFiles(packResult.fileCharCounts, packResult.fileTokenCounts, config.output.topFilesLength);This change maintains the same functionality while slightly improving readability.
Also applies to: 71-71
src/cli/actions/remoteAction.ts (1)
Line range hint
85-95
: Suggestion: SimplifycheckGitInstallation
functionWhile not directly related to the current change, the
checkGitInstallation
function could be simplified. The current implementation checks for stderr, which isn't necessary since the function will throw an error if Git is not installed.Consider refactoring it as follows:
const checkGitInstallation = async (): Promise<boolean> => { try { await execAsync('git --version'); return true; } catch (error) { logger.debug('Git is not installed:', (error as Error).message); return false; } };This simplification maintains the same functionality while reducing unnecessary checks.
src/core/output/outputStyles/markdownStyle.ts (2)
28-32
: LGTM! Consider adding a comment for clarity.The changes to the
renderContext
object look good. They provide more detailed information about the repository structure and the specific part being represented, which aligns with the PR objectives.Consider adding a brief comment explaining the purpose of these new properties, especially
totalPartFiles
andtotalFiles
, to improve code readability:// Add context about repository structure and current part includedFiles: outputGeneratorContext.includedFiles, partNumber: outputGeneratorContext.partNumber, totalParts: outputGeneratorContext.totalParts, totalPartFiles: outputGeneratorContext.includedFiles.length, // Number of files in this part totalFiles: outputGeneratorContext.totalFiles // Total number of files in the repository
55-58
: LGTM! Consider minor formatting adjustment for consistency.The new "Repository Size" section is a valuable addition to the markdown template. It provides clear information about the repository structure and the current part, which aligns well with the PR objectives.
For consistency with other sections, consider adding a newline before the "Repository Size" section header:
{{{summaryUsageGuidelines}}} + ## Repository Size This file is part {{{partNumber}}} of {{{totalParts}}} of a split representation of the entire codebase. This file contains {{{totalPartFiles}}} out of a total of {{{totalFiles}}} files.
This small change will improve the overall formatting consistency of the generated markdown.
src/core/output/outputGenerate.ts (3)
64-66
: Remove redundant type annotationsThe type annotations for
totalFiles
,totalParts
, andpartNumber
are unnecessary because their types can be inferred from the default values. Removing them simplifies the code without losing type information.Apply this diff to simplify the code:
export const buildOutputGeneratorContext = async ( rootDir: string, config: RepopackConfigMerged, includedFiles: ProcessedFile[] = [], repositoryStructure: string[] = [], - totalFiles: number = 1, - totalParts: number = 1, - partNumber: number = 1 + totalFiles = 1, + totalParts = 1, + partNumber = 1 ): Promise<OutputGeneratorContext> => {🧰 Tools
🪛 Biome
[error] 64-64: This type annotation is trivially inferred from its initialization.
Safe fix: Remove the type annotation.
(lint/style/noInferrableTypes)
[error] 65-65: This type annotation is trivially inferred from its initialization.
Safe fix: Remove the type annotation.
(lint/style/noInferrableTypes)
[error] 66-66: This type annotation is trivially inferred from its initialization.
Safe fix: Remove the type annotation.
(lint/style/noInferrableTypes)
86-86
: Clarify comment or adjust code usageThe comment suggests using
includedFiles
fortreeString
, but the code usesrepositoryStructure
. If the intention is to generate the tree string based onincludedFiles
, consider updating the code. Otherwise, update the comment to reflect the actual parameter used.Apply this diff to update the comment:
generationDate: new Date().toISOString(), - treeString: generateTreeString(repositoryStructure), // Use includedFiles for treeString + treeString: generateTreeString(repositoryStructure), // Use repositoryStructure for treeStringAlternatively, if you intend to use
includedFiles
, adjust the code:- treeString: generateTreeString(repositoryStructure), // Use includedFiles for treeString + treeString: generateTreeString(includedFiles.map(f => f.path)), // Generate treeString from includedFiles
19-19
: UseNumber.POSITIVE_INFINITY
instead ofInfinity
To adhere to best practices and improve code clarity, consider using
Number.POSITIVE_INFINITY
instead of the globalInfinity
value.Apply this diff to make the change:
const maxTokensPerFile = config.output.maxTokensPerFile ?? - Infinity; // Use Infinity if no limit is set + Number.POSITIVE_INFINITY; // Use Number.POSITIVE_INFINITY if no limit is set const outputSplits: OutputSplit[] = - maxTokensPerFile < Infinity + maxTokensPerFile < Number.POSITIVE_INFINITY ? splitOutput( processedFiles, maxTokensPerFile )Also applies to: 22-22
🧰 Tools
🪛 Biome
[error] 19-19: Use Number.Infinity instead of the equivalent global.
ES2015 moved some globals into the Number namespace for consistency.
Safe fix: Use Number.Infinity instead.(lint/style/useNumberNamespace)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (13)
- src/cli/actions/defaultAction.ts (3 hunks)
- src/cli/actions/remoteAction.ts (1 hunks)
- src/cli/cliRun.ts (2 hunks)
- src/config/configLoad.ts (1 hunks)
- src/config/configTypes.ts (2 hunks)
- src/config/defaultConfig.ts (1 hunks)
- src/core/output/outputGenerate.ts (1 hunks)
- src/core/output/outputGeneratorTypes.ts (1 hunks)
- src/core/output/outputSplitter.ts (1 hunks)
- src/core/output/outputStyles/markdownStyle.ts (2 hunks)
- src/core/output/outputStyles/plainStyle.ts (2 hunks)
- src/core/output/outputStyles/xmlStyle.ts (3 hunks)
- src/core/packager.ts (5 hunks)
🧰 Additional context used
🪛 Biome
src/core/output/outputGenerate.ts
[error] 64-64: This type annotation is trivially inferred from its initialization.
Safe fix: Remove the type annotation.
(lint/style/noInferrableTypes)
[error] 65-65: This type annotation is trivially inferred from its initialization.
Safe fix: Remove the type annotation.
(lint/style/noInferrableTypes)
[error] 66-66: This type annotation is trivially inferred from its initialization.
Safe fix: Remove the type annotation.
(lint/style/noInferrableTypes)
[error] 19-19: Use Number.Infinity instead of the equivalent global.
ES2015 moved some globals into the Number namespace for consistency.
Safe fix: Use Number.Infinity instead.(lint/style/useNumberNamespace)
[error] 22-22: Use Number.Infinity instead of the equivalent global.
ES2015 moved some globals into the Number namespace for consistency.
Safe fix: Use Number.Infinity instead.(lint/style/useNumberNamespace)
🔇 Additional comments (14)
src/core/output/outputGeneratorTypes.ts (1)
9-13
: LGTM: New properties support file splitting featureThe new properties (
content
,includedFiles
,totalFiles
,partNumber
,totalParts
) are well-structured and align with the PR objective of splitting files by maxTokenSize. They provide the necessary context for handling multiple parts of output.src/config/defaultConfig.ts (1)
17-17
: LGTM! Consider adding documentation and updating types.The addition of the
onlyShowPartFilesInRepoStructure
property to theoutput
object is a good enhancement to the configuration options. However, I have a few suggestions to improve this change:
Consider adding a comment above this line to explain the purpose and usage of this new property. This will help other developers understand its functionality.
Ensure that the
RepopackConfigDefault
type (likely defined inconfigTypes.ts
) is updated to include this new property. This will maintain type safety across the project.To verify if this change is properly reflected in other parts of the codebase, please run the following script:
This script will help ensure that the new property is properly typed and used consistently across the project.
✅ Verification successful
All checks passed! Documentation and type definitions are in place.
The
onlyShowPartFilesInRepoStructure
property has been correctly added to theRepopackConfigDefault
type and is actively used in the codebase.
- Type Definition: Verified in
src/config/configTypes.ts
.- Usage Locations:
src/config/defaultConfig.ts
src/core/output/outputGenerate.ts
No further actions are required.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Check for usage and type definition of the new property # Test 1: Check if the type definition has been updated echo "Checking type definition:" rg --type typescript "onlyShowPartFilesInRepoStructure" src/config/configTypes.ts # Test 2: Check for usage of the new property in other files echo "Checking usage in other files:" rg --type typescript "onlyShowPartFilesInRepoStructure" srcLength of output: 343
Script:
#!/bin/bash # Description: Check for usage and type definition of the new property # Test 1: Check if the type definition has been updated echo "Checking type definition:" rg "onlyShowPartFilesInRepoStructure" src/config/configTypes.ts # Test 2: Check for usage of the new property in other files echo "Checking usage in other files:" rg "onlyShowPartFilesInRepoStructure" src -g "*.ts" -g "*.tsx"Length of output: 598
src/config/configTypes.ts (2)
17-17
: LGTM! Good refactoring of theoutput
property type.The change to use
RepopackOutputConfig
for theoutput
property type inRepopackConfigBase
is a good refactoring. It improves code organization and maintainability by centralizing the output configuration options.
Line range hint
1-30
: Overall, good improvements to configuration structure and support for new feature.The changes in this file effectively introduce the
maxTokensPerFile
option and improve the overall structure of the configuration types. The newRepopackOutputConfig
interface centralizes output-related properties, enhancing maintainability and clarity. These changes align well with the PR objective of adding the feature to split files by max token size per file.Key improvements:
- Introduction of
RepopackOutputConfig
interface- Consistent updates to
RepopackConfigBase
andRepopackConfigDefault
- Addition of
maxTokensPerFile
property to support the new featureThese changes provide a solid foundation for implementing the file splitting feature based on token size.
src/core/output/outputStyles/xmlStyle.ts (2)
88-88
: LGTM! Consistent use ofincludedFiles
.The update to use
includedFiles
instead ofprocessedFiles
in the<repository_files>
section is correct and maintains consistency with the changes made to therenderContext
object.
Line range hint
1-105
: Overall, excellent improvements to support file splitting feature.The changes made to this file significantly enhance the XML output generation process by incorporating additional metadata related to file structure and organization. These modifications directly support the PR objective of adding the feature to split files by maxTokenSize.
Key improvements include:
- Addition of new properties to the
renderContext
object, providing more detailed information about the file structure.- Introduction of a new
<repository_size>
section in the XML template, offering clear metadata about the split files.- Consistent renaming from
processedFiles
toincludedFiles
throughout the file.These changes will greatly improve the clarity and usefulness of the generated XML output, especially in scenarios where the codebase is split across multiple files.
src/cli/actions/defaultAction.ts (1)
106-108
: Approve changes and request clarificationThe addition of the
maxTokens
option to the CLI configuration is well-implemented. It correctly handles the case when the option is undefined and adds the value to the appropriate config object.Could you clarify the reason for changing the property name from
maxTokens
tomaxTokensPerFile
? This might indicate a more specific use of this option. To ensure consistency, let's verify if this property name is used correctly throughout the codebase:This will help us ensure that the property name is used consistently and that we haven't missed any places where it should be updated.
✅ Verification successful
Run updated verification scripts for property usage
The previous search using
--type typescript
was unsuccessful. Let's adjust the script to target.ts
and.tsx
files directly:
Verified property naming consistency
The
maxTokensPerFile
property is used consistently across the codebase, and themaxTokens
CLI option correctly maps to it in the configuration. No issues found.🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Check usage of maxTokensPerFile in the codebase # Search for maxTokensPerFile echo "Occurrences of maxTokensPerFile:" rg "maxTokensPerFile" --type typescript # Search for maxTokens (excluding this file) echo "\nOccurrences of maxTokens:" rg "maxTokens" --type typescript | grep -v "src/cli/actions/defaultAction.ts"Length of output: 325
Script:
#!/bin/bash # Description: Check usage of maxTokensPerFile and maxTokens in the codebase # Search for maxTokensPerFile echo "Occurrences of maxTokensPerFile:" rg "maxTokensPerFile" --glob "*.ts" --glob "*.tsx" # Search for maxTokens (excluding src/cli/actions/defaultAction.ts) echo "\nOccurrences of maxTokens:" rg "maxTokens" --glob "*.ts" --glob "*.tsx" | grep -v "src/cli/actions/defaultAction.ts"Length of output: 1627
src/cli/actions/remoteAction.ts (1)
32-32
: Improved robustness with fallback value for output file pathThis change enhances the function's reliability by providing a default value for the output file path. Using the nullish coalescing operator ensures that even if
result.config.output.filePath
is undefined or null, a valid file path ('repopack-output.txt') will be used. This improvement aligns well with the PR's objective of better handling output files and configurations.src/config/configLoad.ts (1)
88-88
: 🛠️ Refactor suggestionImprove robustness and readability of default file path assignment
The change improves the code by ensuring a valid style is always used when determining the default file path. However, there are a couple of areas for improvement:
- Modifying
defaultConfig.output.filePath
directly could lead to unexpected behavior ifmergeConfigs
is called multiple times.- The logic for determining the style and setting the default file path is somewhat complex and could be simplified.
Consider refactoring this part of the function to improve clarity and avoid modifying the
defaultConfig
object:let defaultFilePath: string; if (cliConfig.output?.filePath == null && fileConfig.output?.filePath == null) { const style = cliConfig.output?.style ?? fileConfig.output?.style ?? defaultConfig.output.style ?? 'plain'; defaultFilePath = defaultFilePathMap[style]; } else { defaultFilePath = defaultConfig.output.filePath; } return { cwd, output: { ...defaultConfig.output, filePath: defaultFilePath, ...fileConfig.output, ...cliConfig.output, }, // ... rest of the merged config };This refactoring:
- Avoids modifying
defaultConfig
.- Simplifies the style determination logic.
- Clearly separates the default file path logic from the rest of the config merging.
To ensure this change doesn't introduce any regressions, we should verify the usage of
defaultConfig
andmergeConfigs
throughout the codebase. Run the following script:This script will help us understand if the proposed changes might affect other parts of the codebase.
src/core/output/outputStyles/markdownStyle.ts (1)
Line range hint
1-214
: Overall, the changes look good and align with the PR objectives.The modifications to
generateMarkdownStyle
function and the markdown template effectively implement the feature to split files by maxTokenSize. The new properties added torenderContext
and the "Repository Size" section in the template provide valuable information about the repository structure and the current part being represented.The changes are well-implemented and there are no major issues. The minor suggestions for improvement (adding a comment for clarity and adjusting formatting for consistency) will enhance code readability and maintain consistent styling.
Great job on this implementation!
src/core/output/outputGenerate.ts (1)
35-35
: Ensure correct file paths are used in repository structureIn the ternary operation, when
config.output.onlyShowPartFilesInRepoStructure
istrue
, you're mappingoutputSplit.includedFiles
to their paths. Ensure thatf.path
correctly represents the file paths relative torootDir
to avoid inconsistencies in the generated repository structure.Would you like assistance in verifying the file paths used in the repository structure?
src/core/packager.ts (3)
60-63
: LGTM!The reformatting of the
runSecurityCheck
function call improves readability without altering functionality.
81-86
: LGTM!The call to
generateOutput
is clear and correctly passes all required parameters.
124-128
: LGTM!The progress callback now includes the file path with a fallback to 'Unknown File', which improves the clarity of the progress messages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (2)
src/core/output/outputGenerate.ts (2)
11-27
: LGTM with a minor style suggestionThe implementation of the new feature to split output based on a token limit looks good. It aligns well with the PR objective of adding the ability to split files by
maxTokenSize
per file.Consider using
Number.Infinity
instead ofInfinity
for consistency with ES2015 standards:- const maxTokensPerFile = config.output.maxTokensPerFile ?? Infinity; + const maxTokensPerFile = config.output.maxTokensPerFile ?? Number.Infinity; - if (maxTokensPerFile < Infinity) + if (maxTokensPerFile < Number.Infinity)This change improves code consistency and follows modern JavaScript best practices.
🧰 Tools
🪛 Biome
[error] 19-19: Use Number.Infinity instead of the equivalent global.
ES2015 moved some globals into the Number namespace for consistency.
Safe fix: Use Number.Infinity instead.(lint/style/useNumberNamespace)
[error] 22-22: Use Number.Infinity instead of the equivalent global.
ES2015 moved some globals into the Number namespace for consistency.
Safe fix: Use Number.Infinity instead.(lint/style/useNumberNamespace)
62-66
: Approve changes with minor refactor suggestionThe addition of new parameters to
buildOutputGeneratorContext
effectively supports the new split output feature while maintaining backward compatibility through default values. This is a good approach.Consider removing the redundant type annotations for parameters with default values:
export const buildOutputGeneratorContext = async ( rootDir: string, config: RepopackConfigMerged, - includedFiles: ProcessedFile[] = [], - repositoryStructure: string[] = [], - totalFiles: number = 1, - totalParts: number = 1, - partNumber: number = 1 + includedFiles = [], + repositoryStructure = [], + totalFiles = 1, + totalParts = 1, + partNumber = 1 ): Promise<OutputGeneratorContext> => {This change simplifies the code without losing type information, as TypeScript can infer the types from the default values.
🧰 Tools
🪛 Biome
[error] 64-64: This type annotation is trivially inferred from its initialization.
Safe fix: Remove the type annotation.
(lint/style/noInferrableTypes)
[error] 65-65: This type annotation is trivially inferred from its initialization.
Safe fix: Remove the type annotation.
(lint/style/noInferrableTypes)
[error] 66-66: This type annotation is trivially inferred from its initialization.
Safe fix: Remove the type annotation.
(lint/style/noInferrableTypes)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (2)
- src/core/output/outputGenerate.ts (1 hunks)
- src/core/output/outputSplitter.ts (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- src/core/output/outputSplitter.ts
🧰 Additional context used
🪛 Biome
src/core/output/outputGenerate.ts
[error] 64-64: This type annotation is trivially inferred from its initialization.
Safe fix: Remove the type annotation.
(lint/style/noInferrableTypes)
[error] 65-65: This type annotation is trivially inferred from its initialization.
Safe fix: Remove the type annotation.
(lint/style/noInferrableTypes)
[error] 66-66: This type annotation is trivially inferred from its initialization.
Safe fix: Remove the type annotation.
(lint/style/noInferrableTypes)
[error] 19-19: Use Number.Infinity instead of the equivalent global.
ES2015 moved some globals into the Number namespace for consistency.
Safe fix: Use Number.Infinity instead.(lint/style/useNumberNamespace)
[error] 22-22: Use Number.Infinity instead of the equivalent global.
ES2015 moved some globals into the Number namespace for consistency.
Safe fix: Use Number.Infinity instead.(lint/style/useNumberNamespace)
🔇 Additional comments (2)
src/core/output/outputGenerate.ts (2)
29-56
: Excellent implementation of parallel output processingThe use of
Promise.all
to handle multiple output splits is a great approach. It allows for efficient parallel processing of the splits, which can significantly improve performance for large repositories or when dealing with multiple output files.The updated call to
buildOutputGeneratorContext
with new parameters ensures that each split is processed with the correct context, including part numbers and total counts.
86-93
: Well-structured context object for split outputsThe updates to the returned context object are well-thought-out and provide all the necessary information for handling split outputs. The use of
repositoryStructure
fortreeString
generation is a good choice, as it allows for accurate representation of the file structure in each split.The new properties (
includedFiles
,totalFiles
,totalParts
,partNumber
) will be very useful for generating comprehensive and accurate output for each split.
@@ -85,7 +85,7 @@ export const mergeConfigs = ( | |||
// If the output file path is not provided in the config file or CLI, use the default file path for the style | |||
if (cliConfig.output?.filePath == null && fileConfig.output?.filePath == null) { | |||
const style = cliConfig.output?.style || fileConfig.output?.style || defaultConfig.output.style; | |||
defaultConfig.output.filePath = defaultFilePathMap[style]; | |||
defaultConfig.output.filePath = defaultFilePathMap[style ?? 'plain']; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is not necessary since the defaultConfig will eventually make it plain.
defaultConfig.output.filePath = defaultFilePathMap[style ?? 'plain']; | |
defaultConfig.output.filePath = defaultFilePathMap[style]; |
@@ -44,6 +45,7 @@ export async function run() { | |||
.option('--init', 'initialize a new repopack.config.json file') | |||
.option('--global', 'use global configuration (only applicable with --init)') | |||
.option('--remote <url>', 'process a remote Git repository') | |||
.option('--max-tokens <number>', 'maximum number of tokens per output file', Number.parseInt) // Add the maxTokens option |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to make it output-max-tokens if possible since it is max-tokens regarding output. I am concerned about the possibility of max-tokens for other uses in the future.
.option('--max-tokens <number>', 'maximum number of tokens per output file', Number.parseInt) // Add the maxTokens option | |
.option('--output-max-tokens <number>', 'maximum number of tokens per output file', Number.parseInt) // Add the maxTokens option |
@@ -29,7 +29,7 @@ export const runRemoteAction = async (repoUrl: string, options: CliOptions): Pro | |||
logger.log(''); | |||
|
|||
const result = await runDefaultAction(tempDir, tempDir, options); | |||
await copyOutputToCurrentDirectory(tempDir, process.cwd(), result.config.output.filePath); | |||
await copyOutputToCurrentDirectory(tempDir, process.cwd(), result.config.output.filePath ?? 'repopack-output.txt'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not necessary because it will eventually become the value of defaultConfig in the process of config merging.
await copyOutputToCurrentDirectory(tempDir, process.cwd(), result.config.output.filePath ?? 'repopack-output.txt'); | |
await copyOutputToCurrentDirectory(tempDir, process.cwd(), result.config.output.filePath); |
topFilesLength: number; | ||
showLineNumbers: boolean; | ||
}; | ||
output: RepopackOutputConfig; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since defaultConfig dares to make some parts of the code non-nullable, I would like it to be the original code with this option added.
@@ -75,7 +85,7 @@ This section contains a summary of this file. | |||
<repository_files> | |||
This section contains the contents of the repository's files. | |||
|
|||
{{#each processedFiles}} | |||
{{#each includedFiles}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please make the same correction to plainStyle.ts and markdownStyle.ts.
@@ -56,8 +56,8 @@ export const runDefaultAction = async ( | |||
spinner.succeed('Packing completed successfully!'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the whole thing,
please fix the tests that fail on npm run lint run and npm run test
@@ -59,6 +63,11 @@ Usage Guidelines: | |||
----------------- | |||
{{{summaryUsageGuidelines}}} | |||
|
|||
Repository Size: | |||
----------------- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This hyphen is aligned with the letter above.
----------------- | |
---------------- |
let currentIncludedFiles: ProcessedFile[] = []; // Initialize currentIncludedFiles | ||
|
||
for (const file of processedFiles) { | ||
const fileTokenCount = tokenCounter.countTokens(file.content, file.path); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there is a large number of files, the heavy processing of token count may be affected. Currently, the parts where token counts are being performed are slightly mitigated by adding sleep.
However, I thought it would be better to leave it as it is for now and fix it when the problem actually occurs.
progressCallback('Writing output file(s)...'); | ||
|
||
// Handle the case where filePath is undefined | ||
if (config.output.filePath) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The filePath is set in the default config, so it will always be present.
@fridaystreet appreciate a lot that you work on this feature! I've tried using the build from this PR and it produces empty files. They only include the header section with directory/files and nothing from files themselves. Even if no max token option is specified, still single file with no contents besides header. |
@kagux thanks for the feedback. It was all working and seems to be wirking locally fir me. But Just after I generated the pr there were a few code changes and refactirs applied to the main branch I noticed. I pulled them in and fixed it up and pushed up another commit. Possibly I missed something. It's still a draft pr -(or it was at least) as I haven't finished doco or tests yet. Was just looking fir initial feedback. I'll have a look through and retest see what I've missed. Thanks again |
@kagux I just reran it again this morning and it all seems to be working for me. Both with max-tokens set and without. Also tried with a folder specified and without. Can you post up the command line you're using or the config.json, not sure what's going on, but sure we can get to the bottom of it. Cheers |
I've just noticed something I've missed though. It's not taking into account the token size of the header. I'll have to add some mechanism in to generate the header first and calulate it's size, then iterate over the files with the remaining allowable tokens from maxTokens-headerTokens |
Also some interesting responses when trying to query the content to see if it's understanding it properly. Based on the provided context, I cannot determine if the last file element node is the absolute last one in the entire file. Here's why: Partial File: The context states that the XML file you shared is only "part 1 of 62". This means there are 61 other files with potentially more file element nodes. Seems to be getting confused how to interpret the repo size section. I might have to come up with a clearer way of defining the difference between parts and file count in each part. <repository_size> </repository_size> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (2)
src/core/output/outputStyles/plainStyle.ts (2)
66-70
: LGTM! New "Repository Size" section adds valuable context.The addition of the "Repository Size" section provides important information about the split file representation. This aligns well with the new properties in the
renderContext
.Consider a minor wording improvement for clarity:
Repository Size: ----------------- -This file is part {{{partNumber}}} of {{{totalParts}}} of a split representation of the entire codebase. +This file represents part {{{partNumber}}} of {{{totalParts}}} in a split representation of the entire codebase. This file contains {{{totalPartFiles}}} out of a total of {{{totalFiles}}} files.
Line range hint
1-94
: Overall implementation of split file representation looks good.The changes in this file successfully implement the split file representation feature. Key points:
- The
renderContext
now includes necessary properties for split representation.- A new "Repository Size" section in the template provides valuable context.
- The template has been updated to use
includedFiles
consistently.These changes align well with the PR objectives and address the concerns raised in the PR comments about empty files and confusion regarding file counts.
Consider adding a brief comment in the code explaining the purpose of the split file representation feature and how it relates to the
maxTokens
option mentioned in the PR description. This would improve code maintainability and help future developers understand the feature's context.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (2)
- src/core/output/outputStyles/markdownStyle.ts (3 hunks)
- src/core/output/outputStyles/plainStyle.ts (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- src/core/output/outputStyles/markdownStyle.ts
🧰 Additional context used
🔇 Additional comments (2)
src/core/output/outputStyles/plainStyle.ts (2)
94-94
: LGTM! Template updated to useincludedFiles
.The change from
processedFiles
toincludedFiles
in the template'seach
loop resolves the inconsistency mentioned in the past review comment. This update ensures that the template correctly uses the new property from therenderContext
.
28-32
: LGTM! Changes align with new split file representation.The additions to the
renderContext
object correctly implement the new properties needed for split file representation. The past comment about inconsistency betweenincludedFiles
andprocessedFiles
has been addressed by this change.To ensure the template is using the new
includedFiles
property consistently, let's verify:✅ Verification successful
Verified!
'processedFiles'
has been successfully replaced with'includedFiles'
inplainStyle.ts
.🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Verify that 'processedFiles' has been replaced with 'includedFiles' in the template rg --type typescript 'processedFiles' src/core/output/outputStyles/plainStyle.tsLength of output: 115
Script:
#!/bin/bash # Verify that 'processedFiles' has been replaced with 'includedFiles' in the template grep 'processedFiles' src/core/output/outputStyles/plainStyle.tsLength of output: 64
related: #71
Summary by CodeRabbit
Summary by CodeRabbit
Release Notes
New Features
maxTokens
for CLI options, allowing users to specify a maximum number of tokens per output file.onlyShowPartFilesInRepoStructure
in the default configuration.Bug Fixes
Refactor