Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add feature split files by maxTokenSize per file #113

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 6 additions & 3 deletions src/cli/actions/defaultAction.ts
Original file line number Diff line number Diff line change
Expand Up @@ -56,8 +56,8 @@ export const runDefaultAction = async (
spinner.succeed('Packing completed successfully!');
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the whole thing,
please fix the tests that fail on npm run lint run and npm run test

logger.log('');

if (config.output.topFilesLength > 0) {
printTopFiles(packResult.fileCharCounts, packResult.fileTokenCounts, config.output.topFilesLength);
if (config.output.topFilesLength && config.output.topFilesLength > 0) {
printTopFiles(packResult.fileCharCounts, packResult.fileTokenCounts, config.output.topFilesLength ?? 0);
logger.log('');
}

Expand All @@ -68,7 +68,7 @@ export const runDefaultAction = async (
packResult.totalFiles,
packResult.totalCharacters,
packResult.totalTokens,
config.output.filePath,
config.output.filePath ?? 'No output file specified',
packResult.suspiciousFilesResults,
config,
);
Expand Down Expand Up @@ -103,6 +103,9 @@ const buildCliConfig = (options: CliOptions): RepopackConfigCli => {
if (options.style) {
cliConfig.output = { ...cliConfig.output, style: options.style.toLowerCase() as RepopackOutputStyle };
}
if (options.maxTokens !== undefined) {
cliConfig.output = { ...cliConfig.output, maxTokensPerFile: options.maxTokens };
}

return cliConfig;
};
2 changes: 1 addition & 1 deletion src/cli/actions/remoteAction.ts
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ export const runRemoteAction = async (repoUrl: string, options: CliOptions): Pro
logger.log('');

const result = await runDefaultAction(tempDir, tempDir, options);
await copyOutputToCurrentDirectory(tempDir, process.cwd(), result.config.output.filePath);
await copyOutputToCurrentDirectory(tempDir, process.cwd(), result.config.output.filePath ?? 'repopack-output.txt');
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not necessary because it will eventually become the value of defaultConfig in the process of config merging.

Suggested change
await copyOutputToCurrentDirectory(tempDir, process.cwd(), result.config.output.filePath ?? 'repopack-output.txt');
await copyOutputToCurrentDirectory(tempDir, process.cwd(), result.config.output.filePath);

} finally {
// Clean up the temporary directory
await cleanupTempDirectory(tempDir);
Expand Down
2 changes: 2 additions & 0 deletions src/cli/cliRun.ts
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ export interface CliOptions extends OptionValues {
init?: boolean;
global?: boolean;
remote?: string;
maxTokens?: number; // Add the maxTokens option
}

export async function run() {
Expand All @@ -44,6 +45,7 @@ export async function run() {
.option('--init', 'initialize a new repopack.config.json file')
.option('--global', 'use global configuration (only applicable with --init)')
.option('--remote <url>', 'process a remote Git repository')
.option('--max-tokens <number>', 'maximum number of tokens per output file', Number.parseInt) // Add the maxTokens option
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to make it output-max-tokens if possible since it is max-tokens regarding output. I am concerned about the possibility of max-tokens for other uses in the future.

Suggested change
.option('--max-tokens <number>', 'maximum number of tokens per output file', Number.parseInt) // Add the maxTokens option
.option('--output-max-tokens <number>', 'maximum number of tokens per output file', Number.parseInt) // Add the maxTokens option

.action((directory = '.', options: CliOptions = {}) => executeAction(directory, process.cwd(), options));

await program.parseAsync(process.argv);
Expand Down
2 changes: 1 addition & 1 deletion src/config/configLoad.ts
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,7 @@ export const mergeConfigs = (
// If the output file path is not provided in the config file or CLI, use the default file path for the style
if (cliConfig.output?.filePath == null && fileConfig.output?.filePath == null) {
const style = cliConfig.output?.style || fileConfig.output?.style || defaultConfig.output.style;
defaultConfig.output.filePath = defaultFilePathMap[style];
defaultConfig.output.filePath = defaultFilePathMap[style ?? 'plain'];
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is not necessary since the defaultConfig will eventually make it plain.

Suggested change
defaultConfig.output.filePath = defaultFilePathMap[style ?? 'plain'];
defaultConfig.output.filePath = defaultFilePathMap[style];

}

return {
Expand Down
35 changes: 15 additions & 20 deletions src/config/configTypes.ts
Original file line number Diff line number Diff line change
@@ -1,16 +1,20 @@
export type RepopackOutputStyle = 'plain' | 'xml' | 'markdown';

export interface RepopackOutputConfig {
filePath?: string;
style?: RepopackOutputStyle;
headerText?: string;
instructionFilePath?: string;
removeComments?: boolean;
removeEmptyLines?: boolean;
topFilesLength?: number;
showLineNumbers?: boolean;
maxTokensPerFile?: number; // Added maxTokensPerFile
onlyShowPartFilesInRepoStructure?: boolean;
}

interface RepopackConfigBase {
output?: {
filePath?: string;
style?: RepopackOutputStyle;
headerText?: string;
instructionFilePath?: string;
removeComments?: boolean;
removeEmptyLines?: boolean;
topFilesLength?: number;
showLineNumbers?: boolean;
};
output?: RepopackOutputConfig;
include?: string[];
ignore?: {
useGitignore?: boolean;
Expand All @@ -23,16 +27,7 @@ interface RepopackConfigBase {
}

export type RepopackConfigDefault = RepopackConfigBase & {
output: {
filePath: string;
style: RepopackOutputStyle;
headerText?: string;
instructionFilePath?: string;
removeComments: boolean;
removeEmptyLines: boolean;
topFilesLength: number;
showLineNumbers: boolean;
};
output: RepopackOutputConfig;
yamadashy marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since defaultConfig dares to make some parts of the code non-nullable, I would like it to be the original code with this option added.

include: string[];
ignore: {
useGitignore: boolean;
Expand Down
1 change: 1 addition & 0 deletions src/config/defaultConfig.ts
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ export const defaultConfig: RepopackConfigDefault = {
removeEmptyLines: false,
topFilesLength: 5,
showLineNumbers: false,
onlyShowPartFilesInRepoStructure: false
},
include: [],
ignore: {
Expand Down
78 changes: 57 additions & 21 deletions src/core/output/outputGenerate.ts
Original file line number Diff line number Diff line change
Expand Up @@ -8,52 +8,88 @@ import type { OutputGeneratorContext } from './outputGeneratorTypes.js';
import { generateMarkdownStyle } from './outputStyles/markdownStyle.js';
import { generatePlainStyle } from './outputStyles/plainStyle.js';
import { generateXmlStyle } from './outputStyles/xmlStyle.js';
import { splitOutput, type OutputSplit } from './outputSplitter.js';

export const generateOutput = async (
rootDir: string,
config: RepopackConfigMerged,
processedFiles: ProcessedFile[],
allFilePaths: string[],
): Promise<string> => {
const outputGeneratorContext = await buildOutputGeneratorContext(rootDir, config, allFilePaths, processedFiles);

let output: string;
switch (config.output.style) {
case 'xml':
output = generateXmlStyle(outputGeneratorContext);
break;
case 'markdown':
output = generateMarkdownStyle(outputGeneratorContext);
break;
default:
output = generatePlainStyle(outputGeneratorContext);
}
): Promise<string[]> => {
const maxTokensPerFile = config.output.maxTokensPerFile ?? Infinity; // Use Infinity if no limit is set

const outputSplits: OutputSplit[] =
maxTokensPerFile < Infinity
? splitOutput(
processedFiles,
maxTokensPerFile
)
: [{ partNumber: 1, tokenCount: 0, includedFiles: processedFiles }];

const outputs = await Promise.all(
outputSplits.map(async (outputSplit) => {
const outputGeneratorContext = await buildOutputGeneratorContext(
rootDir,
config,
outputSplit.includedFiles,
config.output.onlyShowPartFilesInRepoStructure ? outputSplit.includedFiles.map(f => f.path) : allFilePaths,
processedFiles.length,
outputSplits.length,
outputSplit.partNumber,
)

return output;
let output: string;
switch (config.output.style) {
case 'xml':
output = generateXmlStyle(outputGeneratorContext);
break;
case 'markdown':
output = generateMarkdownStyle(outputGeneratorContext);
break;
default:
output = generatePlainStyle(outputGeneratorContext);
}
return output;
}),
);

return outputs;
};

export const buildOutputGeneratorContext = async (
rootDir: string,
config: RepopackConfigMerged,
allFilePaths: string[],
processedFiles: ProcessedFile[],
includedFiles: ProcessedFile[] = [], // Add includedFiles parameter
repositoryStructure: string[] = [],
totalFiles: number = 1,
totalParts: number = 1,
partNumber: number = 1
yamadashy marked this conversation as resolved.
Show resolved Hide resolved
): Promise<OutputGeneratorContext> => {
let repositoryInstruction = '';

if (config.output.instructionFilePath) {
const instructionPath = path.resolve(rootDir, config.output.instructionFilePath);
const instructionPath = path.resolve(
rootDir,
config.output.instructionFilePath,
);
try {
repositoryInstruction = await fs.readFile(instructionPath, 'utf-8');
} catch {
throw new RepopackError(`Instruction file not found at ${instructionPath}`);
throw new RepopackError(
`Instruction file not found at ${instructionPath}`,
);
}
}

return {
generationDate: new Date().toISOString(),
treeString: generateTreeString(allFilePaths),
processedFiles,
treeString: generateTreeString(repositoryStructure), // Use includedFiles for treeString
config,
instruction: repositoryInstruction,
content: '',
includedFiles,
totalFiles,
totalParts,
partNumber
};
};
6 changes: 5 additions & 1 deletion src/core/output/outputGeneratorTypes.ts
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,11 @@ import type { ProcessedFile } from '../file/fileTypes.js';
export interface OutputGeneratorContext {
generationDate: string;
treeString: string;
processedFiles: ProcessedFile[];
config: RepopackConfigMerged;
instruction: string;
content: string;
includedFiles: ProcessedFile[]; // Add the includedFiles property
totalFiles: number,
partNumber: number,
totalParts: number
}
53 changes: 53 additions & 0 deletions src/core/output/outputSplitter.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
import type { ProcessedFile } from '../file/fileTypes.js';
import { TokenCounter } from '../tokenCount/tokenCount.js';

export interface OutputSplit {
partNumber: number;
tokenCount: number;
includedFiles: ProcessedFile[]; // Add includedFiles property
}

export const splitOutput = (
processedFiles: ProcessedFile[],
maxTokensPerFile: number,
): OutputSplit[] => {
const tokenCounter = new TokenCounter();
const outputSplits: OutputSplit[] = [];
let currentTokenCount = 0;
let currentOutput = '';
yamadashy marked this conversation as resolved.
Show resolved Hide resolved
let currentIncludedFiles: ProcessedFile[] = []; // Initialize currentIncludedFiles

for (const file of processedFiles) {
const fileTokenCount = tokenCounter.countTokens(file.content, file.path);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is a large number of files, the heavy processing of token count may be affected. Currently, the parts where token counts are being performed are slightly mitigated by adding sleep.

However, I thought it would be better to leave it as it is for now and fix it when the problem actually occurs.


if (currentTokenCount + fileTokenCount > maxTokensPerFile) {
// Start a new part
outputSplits.push({
partNumber: outputSplits.length+1,
tokenCount: currentTokenCount,
includedFiles: currentIncludedFiles, // Add includedFiles to the outputSplit
});

currentTokenCount = 0;
currentOutput = '';
currentIncludedFiles = []; // Reset currentIncludedFiles
}
yamadashy marked this conversation as resolved.
Show resolved Hide resolved

currentOutput += file.content;
currentTokenCount += fileTokenCount;
currentIncludedFiles.push(file); // Add file path to currentIncludedFiles

}

if (currentIncludedFiles.length) {
// Add the last part
outputSplits.push({
partNumber: outputSplits.length+1,
tokenCount: currentTokenCount,
includedFiles: currentIncludedFiles, // Add includedFiles to the outputSplit
});
}
tokenCounter.free();

return outputSplits;
};
10 changes: 9 additions & 1 deletion src/core/output/outputStyles/markdownStyle.ts
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,11 @@ export const generateMarkdownStyle = (outputGeneratorContext: OutputGeneratorCon
headerText: outputGeneratorContext.config.output.headerText,
instruction: outputGeneratorContext.instruction,
treeString: outputGeneratorContext.treeString,
processedFiles: outputGeneratorContext.processedFiles,
includedFiles: outputGeneratorContext.includedFiles,
partNumber: outputGeneratorContext.partNumber,
totalParts: outputGeneratorContext.totalParts,
totalPartFiles: outputGeneratorContext.includedFiles.length,
totalFiles: outputGeneratorContext.totalFiles
};

return `${template(renderContext).trim()}\n`;
Expand All @@ -48,6 +52,10 @@ const markdownTemplate = /* md */ `
## Usage Guidelines
{{{summaryUsageGuidelines}}}

## Repository Size
This file is part {{{partNumber}}} of {{{totalParts}}} of a split representation of the entire codebase.
This file contains {{{totalPartFiles}}} out of a total of {{{totalFiles}}} files.

## Notes
{{{summaryNotes}}}

Expand Down
11 changes: 10 additions & 1 deletion src/core/output/outputStyles/plainStyle.ts
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,11 @@ export const generatePlainStyle = (outputGeneratorContext: OutputGeneratorContex
headerText: outputGeneratorContext.config.output.headerText,
instruction: outputGeneratorContext.instruction,
treeString: outputGeneratorContext.treeString,
processedFiles: outputGeneratorContext.processedFiles,
includedFiles: outputGeneratorContext.includedFiles,
partNumber: outputGeneratorContext.partNumber,
totalParts: outputGeneratorContext.totalParts,
totalPartFiles: outputGeneratorContext.includedFiles.length,
totalFiles: outputGeneratorContext.totalFiles
yamadashy marked this conversation as resolved.
Show resolved Hide resolved
};

return `${template(renderContext).trim()}\n`;
Expand Down Expand Up @@ -59,6 +63,11 @@ Usage Guidelines:
-----------------
{{{summaryUsageGuidelines}}}

Repository Size:
-----------------
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This hyphen is aligned with the letter above.

Suggested change
-----------------
----------------

This file is part {{{partNumber}}} of {{{totalParts}}} of a split representation of the entire codebase.
This file contains {{{totalPartFiles}}} out of a total of {{{totalFiles}}} files.

Notes:
------
{{{summaryNotes}}}
Expand Down
14 changes: 12 additions & 2 deletions src/core/output/outputStyles/xmlStyle.ts
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,11 @@ export const generateXmlStyle = (outputGeneratorContext: OutputGeneratorContext)
headerText: outputGeneratorContext.config.output.headerText,
instruction: outputGeneratorContext.instruction,
treeString: outputGeneratorContext.treeString,
processedFiles: outputGeneratorContext.processedFiles,
includedFiles: outputGeneratorContext.includedFiles,
partNumber: outputGeneratorContext.partNumber,
totalParts: outputGeneratorContext.totalParts,
totalPartFiles: outputGeneratorContext.includedFiles.length,
totalFiles: outputGeneratorContext.totalFiles
};

return `${template(renderContext).trim()}\n`;
Expand All @@ -52,6 +56,12 @@ This section contains a summary of this file.
{{{summaryUsageGuidelines}}}
</usage_guidelines>

<repository_size>
This file is part {{{partNumber}}} of {{{totalParts}}} of a split representation of the entire codebase.
This file contains {{{totalPartFiles}}} out of a total of {{{totalFiles}}} files.

</repository_size>

<notes>
{{{summaryNotes}}}
</notes>
Expand All @@ -75,7 +85,7 @@ This section contains a summary of this file.
<repository_files>
This section contains the contents of the repository's files.

{{#each processedFiles}}
{{#each includedFiles}}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make the same correction to plainStyle.ts and markdownStyle.ts.

<file path="{{{this.path}}}">
{{{this.content}}}
</file>
Expand Down
Loading
Loading