feat(pack): Performance Optimization for Large Repositories #307

yamadashy · 2025-01-24T18:04:33Z

Performance Improvement

yamadashy/repomix

Before: 868.73 millis (usr: 1.11 secs, sys: 0.14 secs)
After: 671.26 millis (usr: 1.42 secs, sys: 0.22 secs)

No significant change

facebook/react

Before: 123.31 secs (usr: 118.64s, sys: 1.60s)
After: 7.10 secs (usr: 20.01s, sys: 2.39s)

17x faster

vercel/next.js

Before: 17.85 mins (usr: 16.66 mins, sys: 0.18 mins)
After: 17.27 secs (usr: 52.93 secs, sys: 7.11 secs)

58x faster

Changes

Replace p-map with Piscina worker threads for parallel processing
Implement dedicated workers for file collection, processing, and metrics
Optimize file search with single worker implementation
Add proper resource cleanup for worker pools
Move file manipulation logic to worker threads

Checklist

Run npm run test
Run npm run lint

codecov · 2025-01-24T18:06:54Z

Codecov Report

Attention: Patch coverage is 88.59223% with 47 lines in your changes missing coverage. Please review.

Project coverage is 90.65%. Comparing base (5c466e1) to head (fc20dde).
Report is 7 commits behind head on main.

Files with missing lines	Patch %	Lines
src/shared/processConcurrency.ts	56.52%	10 Missing ⚠️
src/core/metrics/calculateAllFileMetrics.ts	84.61%	6 Missing ⚠️
src/core/file/fileCollect.ts	87.50%	5 Missing ⚠️
src/core/file/fileProcess.ts	85.71%	5 Missing ⚠️
src/core/metrics/workers/fileMetricsWorker.ts	84.37%	5 Missing ⚠️
src/core/metrics/workers/outputMetricsWorker.ts	82.14%	5 Missing ⚠️
src/core/metrics/calculateOutputMetrics.ts	88.88%	3 Missing ⚠️
src/core/security/securityCheck.ts	91.42%	3 Missing ⚠️
src/core/security/workers/securityCheckWorker.ts	93.47%	3 Missing ⚠️
src/core/file/workers/fileCollectWorker.ts	95.45%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #307      +/-   ##
==========================================
- Coverage   92.17%   90.65%   -1.52%     
==========================================
  Files          44       48       +4     
  Lines        2248     2450     +202     
  Branches      493      512      +19     
==========================================
+ Hits         2072     2221     +149     
- Misses        176      229      +53

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

coderabbitai · 2025-01-24T18:14:41Z

📝 Walkthrough

Walkthrough

This pull request introduces a comprehensive refactoring of the codebase to implement concurrent processing using the piscina library. The changes span multiple modules across file collection, processing, metrics calculation, and security checks. The primary focus is on transitioning from sequential to parallel task execution using worker threads.

Key modifications include adding the piscina dependency, updating the runRemoteAction function to enhance dependency injection, and restructuring the collectFiles and processFiles functions to utilize worker pools for concurrent execution. Additionally, new worker modules have been introduced for file collection, processing, and security checks, while existing functions have been updated to utilize these workers. The refactoring enhances concurrency by initializing task runners that can process multiple tasks simultaneously, improving overall performance and scalability of the application.

The changes affect core functionality in file handling, metrics calculation, and security checks, with a consistent pattern of introducing task runners, worker threads, and progress tracking mechanisms across different modules.

Sequence Diagram

sequenceDiagram
    participant Client
    participant Packager
    participant TaskRunner
    participant Workers
    
    Client->>Packager: Initiate pack
    Packager->>TaskRunner: Initialize worker pool
    TaskRunner-->>Packager: Worker pool ready
    
    Packager->>TaskRunner: Collect files
    TaskRunner->>Workers: Process file collection tasks
    Workers-->>TaskRunner: File collection results
    
    Packager->>TaskRunner: Process files
    TaskRunner->>Workers: Process file processing tasks
    Workers-->>TaskRunner: Processed files
    
    Packager->>TaskRunner: Calculate metrics
    TaskRunner->>Workers: Calculate file metrics tasks
    Workers-->>TaskRunner: File metrics
    
    Packager->>TaskRunner: Run security checks
    TaskRunner->>Workers: Security check tasks
    Workers-->>TaskRunner: Security check results
    
    TaskRunner-->>Packager: Aggregated results
    Packager-->>Client: Final output

Possibly related PRs

Handle large files gracefully #302: The main PR introduces a new dependency piscina in package.json, while PR Handle large files gracefully #302 modifies src/core/file/fileCollect.ts to implement a file size limit check using a constant MAX_FILE_SIZE. Both PRs involve changes to file handling and processing, indicating a focus on improving the management of resources and dependencies in the codebase.

✨ Finishing Touches

📝 Generate Docstrings (Beta)

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 6

🧹 Nitpick comments (41)

src/core/metrics/calculateOutputMetrics.ts (2)

1-7: Unused imports warning
path and fileURLToPath appear to be unused in this file. Consider removing them if they are no longer needed to keep the codebase clean.

20-29: Enhance the deps parameter with a typed interface
It’s generally good practice to define an explicit interface for deps. This improves readability and discoverability, and ensures that all required dependencies are present.

src/core/metrics/calculateMetrics.ts (2)

20-23: Document the deps parameter
The deps parameter is injected as an object. Provide inline documentation (docstring or comment) describing each of the dependencies (e.g., calculateAllFileMetrics, calculateOutputMetrics) so that future maintainers know how and why they’re being used.

35-40: Check memory usage for large file sets
Looping through all fileMetrics to compute character/token counts is straightforward. However, for extremely large projects, consider streaming or incremental aggregation to avoid potential memory spikes.

src/core/file/fileProcess.ts (3)

8-9: Clearly document the new type
FileProcessTask is a critical type for worker communication. Consider adding JSDoc or TSDoc so that maintainers can quickly understand each property.

10-22: Optionally extend concurrency
The worker pool is configured based on the number of tasks, which is a nice improvement. If certain tasks vary heavily in complexity or size, consider adding optional heuristics to optimize the pool size further.

28-30: Typed deps object
Similar to earlier feedback in the metrics file, define a consistent interface for deps to ensure clarity and enforcements.
src/core/file/fileCollect.ts (2)
9-20: Initialize the worker pool conditionally for zero tasks.

If numOfTasks can be zero due to no files found, consider handling that scenario gracefully (e.g., immediately returning a no-op runner). While this situation may be rare, it helps guard against edge cases.
 const initTaskRunner = (numOfTasks: number) => {
+  if (numOfTasks === 0) {
+    return async () => null;
+  }
   const { minThreads, maxThreads } = getWorkerThreadCount(numOfTasks);
   ...
 }
40-56: Consider using Promise.allSettled.

Relying on Promise.all means that if one file collection task fails, it rejects the entire collection process. Depending on the desired behavior, you might want to collect partial results. If partial success is acceptable, consider Promise.allSettled and filtering out failed tasks for more robust error handling.
src/core/security/securityCheck.ts (2)

16-28: Guard for unexpected zero tasks.

Similar to collectFiles, if numOfTasks can be zero for edge scenarios, you could short-circuit the concurrency or return a no-op runner for clarity.

46-50: Improve partial failure handling.

Using Promise.all will cause a single failure (e.g., a worker throwing an exception for a single file) to halt the entire security check. If partial results are acceptable, consider Promise.allSettled to capture both successes and failures.

src/core/metrics/calculateAllFileMetrics.ts (1)

16-28: Conditional initialization for minimal tasks.

As with the other modules, consider gracefully handling the scenario where numOfTasks is zero to avoid creating idle worker threads unnecessarily.

tests/integration-tests/packager.test.ts (2)

70-76: Leverage progress callback usage in collectFiles.
You’ve introduced an inline wrapper to inject initTaskRunner. Consider passing additional metadata via the progressCallback, such as partial result counts or timing info, for improved visibility into large repository processing.

93-113: Metrics calculation is on track.
Storing file-specific stats in fileCharCounts and fileTokenCounts is helpful. Consider adding aggregated or average metrics as well, especially if used to track performance improvements in large repositories.

src/shared/processConcurrency.ts (2)

6-8: Validate CPU count fallback.
os.cpus() might return an empty array in rare containerized or restricted environments. As a safeguard, consider falling back to 1 for getProcessConcurrency when os.cpus().length is 0.

13-24: Reevaluate max thread constraints for large tasks.
Limiting threads to Math.floor(numOfTasks / 100) could be beneficial in typical workloads, yet might underutilize multi-core systems when tasks exceed a few thousand. Consider making this ratio configurable to better handle diverse large-repository scenarios.

src/core/tokenCount/tokenCount.ts (1)

35-37: Expose encoding retrieval for better introspection.
Providing a getEncoding() method is convenient for debugging or analyzing token usage. Consider also returning additional config info (if relevant) to aid in advanced debugging.

src/core/security/validateFileSafety.ts (1)

14-14: Leverage dependency injection effectively.

Accepting runSecurityCheck via deps is a good approach to enable simpler testing and future extension. Consider adding a docstring or inline comment clarifying why dependency injection is important here (e.g., for test mocks or swapping in different security check implementations).

tests/core/metrics/calculateAllFileMetrics.test.ts (1)

28-30: Ensure coverage of new parameters.

You are now passing an options object containing { initTaskRunner } to calculateAllFileMetrics. Double-check that your tests confirm behavior if initTaskRunner is undefined or fails. This helps solidify the code's resilience.

src/core/metrics/workers/outputMetricsWorker.ts (1)

7-11: Interface design clarity.

Defining OutputMetricsTask makes the worker’s contract explicit. If you expect optional or additional fields in the future (e.g., fileSize or fileType), consider a more flexible interface or referencing a shared global interface.
src/core/file/workers/fileProcessWorker.ts (2)
11-17: Name the default export for easier debugging.
Currently, the default export is an anonymous async function. Explicitly naming it can improve stack traces and debugging clarity.
-export default async ({ rawFile, config }: FileProcessTask): Promise<ProcessedFile> => {
+export default async function processFileTask({ rawFile, config }: FileProcessTask): Promise<ProcessedFile> {
36-41: Be mindful of large file line numbering.
Splitting and rejoining massive content may increase memory usage. Stream-based processing or chunking might be beneficial for extremely large files.
src/core/metrics/workers/fileMetricsWorker.ts (1)

30-39: Log file metrics after calculation.
Currently, metrics are logged at trace level before actually calling calculateIndividualFileMetrics. If calculateIndividualFileMetrics throws an error, logs can be misleading. Consider rearranging the logic or adding error handling.

src/core/file/workers/fileCollectWorker.ts (2)

35-45: Suggest partial reads for very large files.
Although rejecting files over 50 MB helps, partial reading can handle edge cases where slightly over-limit files still have relevant data and reduce memory usage.

60-61: Validate encoding detection and decoding.
jschardet can guess incorrectly. Consider fallback strategies or user-configurable defaults for ambiguous encodings.

src/core/security/workers/securityCheckWorker.ts (2)

40-63: Extend or customize secretlint rules.
Consider exposing additional custom rules or configuration to detect project-specific secrets for more comprehensive security checks.

Would you like me to generate a template for custom secretlint plugin development?

65-72: Consider dynamic config for secretlint.
Create a dynamic config to conditionally enable/disable certain rules based on environment or project type, improving maintainability in large-scale repositories.

tests/core/metrics/calculateMetrics.test.ts (1)

56-56: Extract hard-coded reference to 'o200k_base' into a constant or configuration.

Having 'o200k_base' directly in the test can obscure its meaning. Consider using a named constant or referencing a configuration source to improve clarity and maintainability.

src/cli/actions/remoteAction.ts (1)

50-50: Ensure robust error handling around default action execution.

While the try/catch block handles errors, confirm that runDefaultAction properly cleans up any resources or partial results in case of failure, especially for large projects cloned in the temp directory.
tests/core/file/fileCollect.test.ts (4)
8-10: Use single import statement for related items.
Consider combining these imports into a single statement for better readability and to avoid repeatedly referencing the same path.
-import { collectFiles } from '../../../src/core/file/fileCollect.js';
-import type { FileCollectTask } from '../../../src/core/file/workers/fileCollectWorker.js';
-import { MAX_FILE_SIZE } from '../../../src/core/file/workers/fileCollectWorker.js';
+import {
+  collectFiles,
+  type FileCollectTask,
+  MAX_FILE_SIZE
+} from '../../../src/core/file/workers/fileCollectWorker.js';
20-24: Ensure resilience in your mock task runner.
The mock task runner is straightforward, but consider adding error handling or logging for failures within fileCollectWorker, especially if you anticipate partial failures in concurrent tasks.

Would you like help adding error handling logic in your mock or real implementation?

71-73: Expand testing for partial results.
By skipping the binary file, you are ensuring the rest are processed. Consider adding an assertion that the progress callback or logs reflect the skipped file event.

100-102: Cover large file edge cases more thoroughly.
Currently, the test checks a file slightly over the limit. It may be worth testing a file exactly at the limit as well.

Would you like me to add such a test to catch potential off-by-one scenarios?
tests/core/file/fileProcess.test.ts (7)
3-3: Document the rationale for re-exporting processFiles.
Importing and immediately re-exporting can sometimes hide the real source. Adding a short docstring or comment can help maintainers quickly understand the relation between fileProcess.js and fileProcessWorker.js.

5-6: Check if local solutions suffice before adding external dependencies.
Using a fileProcessWorker is a good approach. Before expanding to more libraries or heavy frameworks, confirm that you can handle concurrency effectively with your existing worker solution.

78-78: Test scenario with partial whitespace-only lines.
This is good coverage for ignoring comments. Additionally test partial whitespace lines to confirm your removal logic is robust.

95-95: Add explicit logging for manipulator absence.
Currently, you return the original content. Adding a debug/warning log might help trace the flow when no manipulator is found.
+ logger.debug(`No file manipulator found for ${filePath}, returning original content`);
127-127: Validate combined config scenarios.
If the user toggles multiple features (e.g. removeComments + showLineNumbers), confirm the correct transformations occur in the correct order.

143-143: Corner case: ensure '1:' label is not repeated multiple times
Some line numbering logic can inadvertently repeat line labels or offset them if the file is empty. Test thoroughly.

159-159: Keep a handle on performance with large line counts.
You’re testing 100 lines, but real files can have thousands. Consider measuring any performance overhead from line numbering on large inputs.
tests/cli/actions/remoteAction.test.ts (1)

11-11: Document the role of SuspiciousFileResult in remoteAction tests.
You’re importing SuspiciousFileResult but not using it. Add clarifications or tests that address suspicious files.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 745df97 and d8cca0b.

⛔ Files ignored due to path filters (1)

package-lock.json is excluded by !**/package-lock.json

📒 Files selected for processing (33)

package.json (1 hunks)
src/cli/actions/remoteAction.ts (2 hunks)
src/core/file/fileCollect.ts (1 hunks)
src/core/file/fileProcess.ts (1 hunks)
src/core/file/workers/fileCollectWorker.ts (1 hunks)
src/core/file/workers/fileProcessWorker.ts (1 hunks)
src/core/metrics/aggregateMetrics.ts (0 hunks)
src/core/metrics/calculateAllFileMetrics.ts (1 hunks)
src/core/metrics/calculateIndividualFileMetrics.ts (0 hunks)
src/core/metrics/calculateMetrics.ts (2 hunks)
src/core/metrics/calculateOutputMetrics.ts (1 hunks)
src/core/metrics/workers/fileMetricsWorker.ts (1 hunks)
src/core/metrics/workers/outputMetricsWorker.ts (1 hunks)
src/core/metrics/workers/types.ts (1 hunks)
src/core/packager.ts (1 hunks)
src/core/security/runSecurityCheckIfEnabled.ts (0 hunks)
src/core/security/securityCheck.ts (1 hunks)
src/core/security/validateFileSafety.ts (1 hunks)
src/core/security/workers/securityCheckWorker.ts (1 hunks)
src/core/tokenCount/tokenCount.ts (2 hunks)
src/shared/processConcurrency.ts (1 hunks)
tests/cli/actions/remoteAction.test.ts (2 hunks)
tests/core/file/fileCollect.test.ts (6 hunks)
tests/core/file/fileProcess.test.ts (9 hunks)
tests/core/metrics/aggregateMetrics.test.ts (0 hunks)
tests/core/metrics/calculateAllFileMetrics.test.ts (1 hunks)
tests/core/metrics/calculateIndividualFileMetrics.test.ts (0 hunks)
tests/core/metrics/calculateMetrics.test.ts (1 hunks)
tests/core/packager.test.ts (1 hunks)
tests/core/security/runSecurityCheckIfEnabled.test.ts (0 hunks)
tests/core/security/securityCheck.test.ts (1 hunks)
tests/core/security/validateFileSafety.test.ts (1 hunks)
tests/integration-tests/packager.test.ts (2 hunks)

💤 Files with no reviewable changes (6)

tests/core/metrics/aggregateMetrics.test.ts
src/core/security/runSecurityCheckIfEnabled.ts
src/core/metrics/calculateIndividualFileMetrics.ts
src/core/metrics/aggregateMetrics.ts
tests/core/security/runSecurityCheckIfEnabled.test.ts
tests/core/metrics/calculateIndividualFileMetrics.test.ts

✅ Files skipped from review due to trivial changes (1)

src/core/metrics/workers/types.ts

⏰ Context from checks skipped due to timeout of 90000ms (3)

GitHub Check: Test (windows-latest, 22.x)
GitHub Check: Analyze (javascript-typescript)
GitHub Check: Cloudflare Pages

🔇 Additional comments (43)

src/core/metrics/calculateOutputMetrics.ts (1)

30-45: Validate error handling and side-effects
The try-catch block logs and re-throws errors while ignoring potential partial states. If this function is critical, consider whether you need additional cleanup logic after an error is caught (e.g., destroying or cleaning up worker pools).

src/core/metrics/calculateMetrics.ts (3)

5-5: Confirm usage alignment
Importing calculateOutputMetrics here is consistent with the new concurrency approach. Ensure that this file is the only place using this function so that the concurrency logic is centralized.

27-30: Parallel execution
Using Promise.all to invoke calculateAllFileMetrics and calculateOutputMetrics concurrently is an excellent use of parallelism for performance.

42-48: Return value clarity
The returned object is clear and self-explanatory. Good job returning separate fields for both aggregated and file-level metrics.

src/core/file/fileProcess.ts (3)

5-5: Ensure getWorkerThreadCount handles edge cases
Validate that getWorkerThreadCount gracefully handles cases where numOfTasks is zero or extremely large.

32-40: Progress callback usage
The approach of updating progressCallback after each file is processed is effective. Ensure that the callback function is performant, especially in large-scale usage where thousands of files are processed—excessive logging or console I/O can degrade performance.

2-2: Verify removal of pMap
pMap was previously used for concurrency. Confirm that no other files rely on it. If not, consider removing it from dependencies entirely to reduce bundle size.

src/core/file/fileCollect.ts (2)

1-2: Imports look good.

The addition of picocolors for formatting and piscina for concurrency is consistent with your overall objective of enhancing performance and providing clearer output logs.

62-65: Re-throwing error ensures transparency.

Catching and then rethrowing the error is a good practice to preserve the original stack trace. This approach makes debugging easier and ensures that upstream error-handling logic remains consistent.

src/core/security/securityCheck.ts (1)

67-70: Appropriate error logging and rethrow.

Logging and then rethrowing preserves the error context, which aligns well with debugging and consistent error handling across modules. Good approach!

src/core/metrics/calculateAllFileMetrics.ts (3)

41-49: Retaining index might be beneficial.

Storing and passing the index of each file is helpful for reporting or debugging, especially if the processing outcome needs to be correlated to the original file list order. This is a good approach.

52-66: Performance clarity in logs.

Logging the hrtime.bigint() duration in milliseconds and the number of files processed is helpful for diagnosing potential bottlenecks. This aligns well with your performance optimization goals.

67-75: Error approach is consistent.

Catching an error, logging it, and then rethrowing is consistent with the rest of the codebase. This pattern ensures that metrics calculation failures are transparent and don’t silently fail.

tests/integration-tests/packager.test.ts (2)

26-30: Ensure concurrency safety in the mock collect file task runner.
It’s great that you’re utilizing a dedicated mock for worker-based file collection. However, ensure that concurrent calls from multiple tests won’t conflict when sharing the same mock instance (if ever used simultaneously).

85-90: Validating file safety with mock security check.
You’re currently returning an empty array from runSecurityCheck. Make sure your integration tests also cover real security checks to confirm your concurrency logic handles potential security flags properly.

src/core/tokenCount/tokenCount.ts (1)

6-9: Initialize encoding name and encoding properly.
Injecting the encodingName and storing it is a good design choice. Ensure any external calls referencing the same TokenCounter instance consistently expect the same encoding.

src/core/security/validateFileSafety.ts (2)

6-6: Use consistent import naming for security logic.

Renaming imports from runSecurityCheckIfEnabled to runSecurityCheck clarifies the underlying functionality. Good job keeping the naming clear. Ensure that all references throughout the codebase reflect the new import name.

18-23: Add error handling and logging around security checks.

While the conditional call to runSecurityCheck is a good step, consider handling exceptions if the security check fails or encounters unexpected data. For example, you could log errors or fallback to skipping suspicious file detection in that scenario to prevent the entire process from crashing.

Would you like me to provide a sample snippet demonstrating graceful error handling?

tests/core/metrics/calculateAllFileMetrics.test.ts (3)

1-7: Improved test boundaries for worker-based metrics.

Importing calculateIndividualFileMetrics directly from fileMetricsWorker.js is a more accurate reflection of the real usage. This reduces the chance of discrepancies between production code and test mocks. Ensure that any utility code needed for advanced scenarios (e.g., file size extremes) is also tested with concurrency.

14-19: Mock initialization logic for task runners.

The mockInitTaskRunner effectively simulates worker-based concurrency. Consider verifying how the concurrency scales with increasing file counts (e.g., performance or memory usage) using additional test cases.

33-34: Token count adjustments in tests.

The updated token counts (13 and 50) might imply a more accurate tokenization method. Confirm these thresholds remain correct under actual tokenization rules, especially for boundary cases like large or non-text files.

src/core/metrics/workers/outputMetricsWorker.ts (2)

13-24: Singleton usage of TokenCounter.

Caching the TokenCounter in a worker-level singleton is efficient for performance. However, ensure your code handles concurrency properly if multiple tasks are processed simultaneously. If TokenCounter is stateful, you may need synchronization or separate instances per task.

44-50: Freeing resources on worker exit.

Good practice to free tokenCounter resources upon worker exit. Verify that no residual memory or references remain, especially if the worker might be re-instantiated frequently in large-scale tasks.

tests/core/security/validateFileSafety.test.ts (2)

24-24: Mocking the correct function name.

Replacing runSecurityCheckIfEnabled with runSecurityCheck aligns with the updated implementation. This ensures consistency in tests and prevents missing or mismatched mock calls.

30-30: Validate arguments for the security check.

The assertion confirms runSecurityCheck is invoked properly. Consider adding a separate test case covering config.security.enableSecurityCheck = false to ensure the function bypasses the check as intended.

src/core/file/workers/fileProcessWorker.ts (2)

1-3: Consider providing explicit error handling for imports.
If any of these imports fail or the module path changes, it could cause runtime issues without clear error logs.

26-28: Validate tool output before using manipulator methods.
If manipulator.removeComments returns undefined or malformatted output, it could break subsequent steps. Consider adding a safety check or fallback.

src/core/metrics/workers/fileMetricsWorker.ts (2)

14-16: Ensure shared token counter is thread-safe.
Since tokenCounter is stored at the module level, concurrent calls could lead to race conditions if the implementation isn’t thread-safe.

52-58: Confirm proper cleanup handling.
Attaching an exit listener ensures that the token counter is freed, but it might not handle forced termination scenarios (e.g., process.kill). Verify your approach for unexpected shutdowns.

src/core/file/workers/fileCollectWorker.ts (1)

46-49: Verify binary detection logic.
isBinary typically checks file extensions or partial content heuristics. Confirm that it’s accurate for different file types (e.g., unusual file extensions or compressed files).

src/core/security/workers/securityCheckWorker.ts (1)

14-38: Add concurrency/error handling.
If many security checks run in parallel, ensure that secretlint usage is safe in multi-threaded contexts. Also, confirm proper re-throws of key security errors so they aren’t silently swallowed.

tests/core/metrics/calculateMetrics.test.ts (1)

50-53: Leverage dependency injection for flexible testing.

Introducing a fifth parameter with calculateAllFileMetrics and calculateOutputMetrics is a smart design choice, making the function more testable and modular.

tests/core/security/securityCheck.test.ts (1)

3-3: Updated import path for worker usage looks correct.

The import from the new securityCheckWorker.js aligns with the concurrency approach. Good job ensuring the test references the proper worker implementation.

tests/core/packager.test.ts (1)

75-75: Good coverage for progress callback parameter.

Adding the progressCallback argument to mockDeps.collectFiles and verifying it here improve visibility into the new progress-tracking logic.

src/cli/actions/remoteAction.ts (1)

22-22: Injecting runDefaultAction into deps enhances testability and consistency.

This change improves your dependency injection pattern and keeps the function logic decoupled from direct imports.

tests/core/file/fileCollect.test.ts (2)

123-125: Confirm that error logging occurs once.
You successfully test that the result array is empty on a file read error, but also ensure that you log only once for repeated read failures if that situation arises.

50-52: Verify that progressCallback is properly tested.
You're passing an empty function (() => {}) as the progress callback. It might be beneficial to verify that the callback is invoked or not invoked as expected, ensuring accurate coverage for progress reporting logic.

tests/core/file/fileProcess.test.ts (3)

36-38: Include separate test for progressCallback usage.
Similar to file collection, ensure that the progress callback is being utilized in processFiles and that it’s tested thoroughly.

63-63: Confirm charset detection fallback.
For advanced usage, consider the scenario where jschardet returns an undefined or low-confidence encoding. Validate that the fallback handles strange charsets correctly.

111-111: Edge case: multiline content with line numbering.
Ensure each line is sequentially numbered, including any lines that might be entirely whitespace or special characters.

tests/cli/actions/remoteAction.test.ts (1)

4-4: Verify usage of DefaultActionRunnerResult.
Ensure that all fields in DefaultActionRunnerResult are indeed required for your tests. If some are optional, the test might mischaracterize the data structure.

src/core/packager.ts (1)

41-41: Good use of progressCallback injection.
Upgrading collectFiles to accept a callback fosters improved user experience and modular design. The approach is aligned with large-scale concurrency improvements.
package.json (1)

79-79: Verify piscina version and remove redundant p-map dependency.

Since p-map is being replaced with piscina for parallel processing, consider removing the p-map dependency to avoid confusion and reduce package size.

Let's verify the piscina version and check for any known vulnerabilities:

Apply this diff to remove the redundant dependency:
-    "p-map": "^7.0.3",
✅ Verification successful

Remove unused p-map dependency, piscina version is appropriate

The migration from p-map to piscina is complete as there are no remaining p-map imports in the codebase. The current piscina version (^4.8.0) is secure and stable for production use.
-    "p-map": "^7.0.3",
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Check latest version and security advisories for piscina

# Get latest version from npm
echo "Latest version from npm:"
npm view piscina version

# Check for known vulnerabilities
echo -e "\nVulnerability check:"
npm audit piscina@4.8.0 --json || true

# Check GitHub Advisory Database
echo -e "\nGitHub Advisory Database check:"
gh api graphql -f query='
{
  securityVulnerabilities(first: 5, ecosystem: NPM, package: "piscina") {
    nodes {
      advisory {
        summary
        severity
        publishedAt
      }
      vulnerableVersionRange
      firstPatchedVersion {
        identifier
      }
    }
  }
}'
Length of output: 1205

Script:
#!/bin/bash
# Search for p-map usage in the codebase
echo "Files containing p-map imports:"
rg "require\(['\"]p-map['\"]|from ['\"]p-map['\"]" -l

echo -e "\nDetailed p-map usage:"
rg "require\(['\"]p-map['\"]|from ['\"]p-map['\"]" -C 2
Length of output: 261

src/core/metrics/calculateOutputMetrics.ts

tests/integration-tests/packager.test.ts

src/core/metrics/workers/outputMetricsWorker.ts

tests/core/file/fileProcess.test.ts

tests/cli/actions/remoteAction.test.ts

…i-threaded mode

cloudflare-workers-and-pages · 2025-01-25T03:02:04Z

Deploying repomix with Cloudflare Pages

Latest commit:	`fc20dde`
Status:	✅ Deploy successful!
Preview URL:	https://b6ae6556.repomix.pages.dev
Branch Preview URL:	https://feat-multithread.repomix.pages.dev

View logs

coderabbitai

Actionable comments posted: 4

♻️ Duplicate comments (2)

src/core/metrics/calculateOutputMetrics.ts (1)
8-18: ⚠️ Potential issue

Worker pool cleanup mechanism is still missing.

The worker pool created by initTaskRunner lacks cleanup, which could lead to resource leaks.

Apply this diff to add cleanup:
 const initTaskRunner = () => {
   const pool = new Piscina({
     filename: new URL('./workers/outputMetricsWorker.js', import.meta.url).href,
     minThreads: 1,
     maxThreads: 1,
     idleTimeout: 5000,
   });

+  const cleanup = async () => {
+    await pool.destroy();
+  };
 
-  return (task: OutputMetricsTask) => pool.run(task);
+  return {
+    runTask: (task: OutputMetricsTask) => pool.run(task),
+    cleanup,
+  };
 };
🧰 Tools

🪛 GitHub Check: codecov/patch

[warning] 9-10: src/core/metrics/calculateOutputMetrics.ts#L9-L10
Added lines #L9 - L10 were not covered by tests

[warning] 12-15: src/core/metrics/calculateOutputMetrics.ts#L12-L15
Added lines #L12 - L15 were not covered by tests

[warning] 17-18: src/core/metrics/calculateOutputMetrics.ts#L17-L18
Added lines #L17 - L18 were not covered by tests
src/core/metrics/workers/outputMetricsWorker.ts (1)
29-42: ⚠️ Potential issue

Add error handling for token counting.

Token counting operations should be wrapped in try-catch to handle potential errors gracefully.

Apply this diff:
 export default async ({ content, encoding, path }: OutputMetricsTask): Promise<number> => {
   const processStartAt = process.hrtime.bigint();
   const counter = getTokenCounter(encoding);
-  const tokenCount = counter.countTokens(content, path);
+  let tokenCount: number;
+  try {
+    tokenCount = counter.countTokens(content, path);
+  } catch (error) {
+    logger.error(`Failed to count tokens for ${path ?? 'unknown file'}:`, error);
+    return 0;
+  }
🧰 Tools

🪛 GitHub Check: codecov/patch

[warning] 29-32: src/core/metrics/workers/outputMetricsWorker.ts#L29-L32
Added lines #L29 - L32 were not covered by tests

🧹 Nitpick comments (18)

src/core/metrics/workers/fileMetricsWorker.ts (3)
14-25: Consider double-checked locking pattern for thread safety.

While the singleton pattern is appropriate for worker-level caching, the current implementation might have a race condition if multiple calls to getTokenCounter happen simultaneously.

Consider this thread-safe implementation:
 let tokenCounter: TokenCounter | null = null;
+let initializationInProgress = false;
 
 const getTokenCounter = (encoding: TiktokenEncoding): TokenCounter => {
   if (!tokenCounter) {
+    if (initializationInProgress) {
+      // Wait for initialization to complete
+      while (initializationInProgress) {
+        // Busy wait or use proper synchronization mechanism
+      }
+      return tokenCounter!;
+    }
+    initializationInProgress = true;
     tokenCounter = new TokenCounter(encoding);
+    initializationInProgress = false;
   }
   return tokenCounter;
 };
41-50: Add input validation for file content.

The function assumes file content is always present and valid.

Consider adding validation:
 export const calculateIndividualFileMetrics = async (
   file: ProcessedFile,
   encoding: TiktokenEncoding,
 ): Promise<FileMetrics> => {
+  if (!file.content) {
+    throw new Error(`No content available for file: ${file.path}`);
+  }
+
   const charCount = file.content.length;
   const tokenCounter = getTokenCounter(encoding);
   const tokenCount = tokenCounter.countTokens(file.content, file.path);
 
   return { path: file.path, charCount, tokenCount };
 };
1-58: Great implementation aligned with performance goals!

The worker implementation effectively supports the PR's performance optimization objectives by:

Using a singleton pattern for TokenCounter to avoid redundant initialization

Implementing proper resource cleanup

Providing clear metrics calculation logic

The significant performance improvements mentioned in the PR objectives (17x-58x speedup) are well-supported by this implementation.

Consider adding metrics for worker pool utilization and memory usage to help tune the number of workers for optimal performance.

🧰 Tools

🪛 GitHub Check: codecov/patch

[warning] 31-31: src/core/metrics/workers/fileMetricsWorker.ts#L31
Added line #L31 was not covered by tests

[warning] 33-36: src/core/metrics/workers/fileMetricsWorker.ts#L33-L36
Added lines #L33 - L36 were not covered by tests

[warning] 38-39: src/core/metrics/workers/fileMetricsWorker.ts#L38-L39
Added lines #L38 - L39 were not covered by tests

[warning] 54-58: src/core/metrics/workers/fileMetricsWorker.ts#L54-L58
Added lines #L54 - L58 were not covered by tests
tests/integration-tests/packager.test.ts (1)

26-31: Consider making the mock more configurable.
mockCollectFileInitTaskRunner is straightforward, but allowing parameters or configurable behavior could facilitate advanced testing scenarios.

src/shared/processConcurrency.ts (2)

6-8: Consider reserving a CPU for system overhead.
Returning os.cpus().length might saturate the system if all cores are in use. For multi-task environments, subtracting one or two cores can prevent resource contention.

10-24: Handle extreme edge cases for worker thread limits.
While the logic is reasonable, if numOfTasks is very large or extremely small, you may need additional safeguards or custom scaling rules to prevent under- or over-utilization of threads.

src/core/security/validateFileSafety.ts (1)

18-23: Add robust error handling for security checks.
Currently, any failures inside runSecurityCheck could go unhandled. Consider try-catch blocks or fallback logic to handle partial or failed scanning gracefully.

tests/core/metrics/calculateAllFileMetrics.test.ts (2)

14-18: Consider expanding task runner test coverage.

The mock implementation is minimal and doesn't test error scenarios or worker pool behavior. Consider adding tests for:

Task runner initialization failures

Task execution errors

Concurrent task execution

28-30: Add more test cases for edge scenarios.

The test only covers a basic scenario. Consider adding test cases for:

Empty files

Very large files

Files with special characters

Error conditions

Also applies to: 33-34

src/core/file/workers/fileCollectWorker.ts (1)

8-10: Consider making MAX_FILE_SIZE configurable.

The file size limit is hardcoded. Consider making it configurable through the application's configuration system.
src/core/security/workers/securityCheckWorker.ts (2)
34-37: Enhance error logging for better debugging.

Consider including more context in the error log by extracting and logging specific error properties.
   } catch (error) {
-    logger.error(`Error checking security on ${filePath}:`, error);
+    logger.error(`Error checking security on ${filePath}:`, {
+      message: error.message,
+      name: error.name,
+      stack: error.stack,
+    });
     throw error;
   }
40-44: Add input validation for file content.

Consider validating the input parameters to ensure robustness.
 export const runSecretLint = async (
   filePath: string,
   content: string,
   config: SecretLintCoreConfig,
 ): Promise<SecretLintCoreResult> => {
+  if (!content || typeof content !== 'string') {
+    throw new Error('Invalid content provided for security check');
+  }
+  if (!filePath || typeof filePath !== 'string') {
+    throw new Error('Invalid file path provided for security check');
+  }
   const result = await lintSource({
tests/core/metrics/calculateMetrics.test.ts (1)
50-53: Improve mock implementation with explicit typing.

The mock implementation could benefit from explicit typing for better maintainability.
     const result = await calculateMetrics(processedFiles, output, progressCallback, config, {
-      calculateAllFileMetrics,
-      calculateOutputMetrics: () => Promise.resolve(30),
+      calculateAllFileMetrics: calculateAllFileMetrics as jest.Mock<Promise<typeof fileMetrics>>,
+      calculateOutputMetrics: async () => 30,
     });
src/core/security/securityCheck.ts (3)
16-28: Consider implementing worker pool cleanup.

The worker pool initialization looks good, but there's no explicit cleanup mechanism. While Piscina has an idleTimeout, it's good practice to properly terminate the worker pool after all tasks are completed.

Consider adding a cleanup function:
 const initTaskRunner = (numOfTasks: number) => {
   const { minThreads, maxThreads } = getWorkerThreadCount(numOfTasks);
   logger.trace(`Initializing worker pool with min=${minThreads}, max=${maxThreads} threads`);

   const pool = new Piscina({
     filename: new URL('./workers/securityCheckWorker.js', import.meta.url).href,
     minThreads,
     maxThreads,
     idleTimeout: 5000,
   });

-  return (task: SecurityCheckTask) => pool.run(task);
+  return {
+    runTask: (task: SecurityCheckTask) => pool.run(task),
+    cleanup: async () => {
+      await pool.destroy();
+    }
+  };
 };
53-61: Consider implementing batch processing for large repositories.

While Promise.all is efficient, processing all files at once might consume too much memory for very large repositories. Consider implementing batch processing to handle such cases more efficiently.

Here's a suggested implementation:
-    const results = await Promise.all(
-      tasks.map((task) =>
-        runTask(task).then((result) => {
-          completedTasks++;
-          progressCallback(`Running security check... (${completedTasks}/${totalTasks}) ${pc.dim(task.filePath)}`);
-          return result;
-        }),
-      ),
-    );
+    const BATCH_SIZE = 100;
+    const results = [];
+    for (let i = 0; i < tasks.length; i += BATCH_SIZE) {
+      const batch = tasks.slice(i, i + BATCH_SIZE);
+      const batchResults = await Promise.all(
+        batch.map((task) =>
+          runTask(task).then((result) => {
+            completedTasks++;
+            progressCallback(`Running security check... (${completedTasks}/${totalTasks}) ${pc.dim(task.filePath)}`);
+            return result;
+          }),
+        ),
+      );
+      results.push(...batchResults);
+    }
68-70: Enhance error reporting with more context.

The current error handling is basic. Consider providing more detailed error information to help with debugging.
   } catch (error) {
-    logger.error('Error during security check:', error);
+    const errorContext = {
+      totalFiles: tasks.length,
+      completedTasks,
+      lastProcessedFile: tasks[completedTasks - 1]?.filePath,
+      error: error instanceof Error ? {
+        name: error.name,
+        message: error.message,
+        stack: error.stack
+      } : error
+    };
+    logger.error('Error during security check:', errorContext);
     throw error;
   }
src/core/metrics/calculateAllFileMetrics.ts (1)
33-75: Consider extracting common worker pool patterns.

The implementation shares significant patterns with securityCheck.ts. Consider extracting these common patterns into a shared utility.

Create a new utility file src/shared/workerPool.ts:
import { Piscina } from 'piscina';
import { logger } from './logger.js';
import { getWorkerThreadCount } from './processConcurrency.js';

export interface WorkerPoolOptions {
  filename: string;
  idleTimeout?: number;
}

export const createWorkerPool = <T, R>(
  numOfTasks: number,
  options: WorkerPoolOptions
) => {
  const { minThreads, maxThreads } = getWorkerThreadCount(numOfTasks);
  logger.trace(`Initializing worker pool with min=${minThreads}, max=${maxThreads} threads`);

  const pool = new Piscina({
    filename: options.filename,
    minThreads,
    maxThreads,
    idleTimeout: options.idleTimeout ?? 5000,
  });

  return {
    runTask: (task: T) => pool.run(task) as Promise<R>,
    cleanup: async () => {
      await pool.destroy();
    }
  };
};
🧰 Tools

🪛 GitHub Check: codecov/patch

[warning] 73-75: src/core/metrics/calculateAllFileMetrics.ts#L73-L75
Added lines #L73 - L75 were not covered by tests
tests/core/file/fileProcess.test.ts (1)
11-15: Add error handling test cases.

The mock task runner should include tests for worker failures and error propagation.

Add test cases for:

Worker initialization failures

Task execution errors

Cleanup failures
it('should handle worker failures gracefully', async () => {
  const mockError = new Error('Worker failed');
  const failingTaskRunner = () => {
    return async () => {
      throw mockError;
    };
  };

  const mockRawFiles = [{ path: 'file.js', content: 'content' }];
  const config = createMockConfig({});

  await expect(
    processFiles(mockRawFiles, config, () => {}, {
      initTaskRunner: failingTaskRunner,
    })
  ).rejects.toThrow(mockError);
});

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d8cca0b and 2be1e27.

⛔ Files ignored due to path filters (1)

package-lock.json is excluded by !**/package-lock.json

📒 Files selected for processing (33)

package.json (1 hunks)
src/cli/actions/remoteAction.ts (2 hunks)
src/core/file/fileCollect.ts (1 hunks)
src/core/file/fileProcess.ts (1 hunks)
src/core/file/workers/fileCollectWorker.ts (1 hunks)
src/core/file/workers/fileProcessWorker.ts (1 hunks)
src/core/metrics/aggregateMetrics.ts (0 hunks)
src/core/metrics/calculateAllFileMetrics.ts (1 hunks)
src/core/metrics/calculateIndividualFileMetrics.ts (0 hunks)
src/core/metrics/calculateMetrics.ts (2 hunks)
src/core/metrics/calculateOutputMetrics.ts (1 hunks)
src/core/metrics/workers/fileMetricsWorker.ts (1 hunks)
src/core/metrics/workers/outputMetricsWorker.ts (1 hunks)
src/core/metrics/workers/types.ts (1 hunks)
src/core/packager.ts (1 hunks)
src/core/security/runSecurityCheckIfEnabled.ts (0 hunks)
src/core/security/securityCheck.ts (1 hunks)
src/core/security/validateFileSafety.ts (1 hunks)
src/core/security/workers/securityCheckWorker.ts (1 hunks)
src/core/tokenCount/tokenCount.ts (2 hunks)
src/shared/processConcurrency.ts (1 hunks)
tests/cli/actions/remoteAction.test.ts (2 hunks)
tests/core/file/fileCollect.test.ts (6 hunks)
tests/core/file/fileProcess.test.ts (9 hunks)
tests/core/metrics/aggregateMetrics.test.ts (0 hunks)
tests/core/metrics/calculateAllFileMetrics.test.ts (1 hunks)
tests/core/metrics/calculateIndividualFileMetrics.test.ts (0 hunks)
tests/core/metrics/calculateMetrics.test.ts (1 hunks)
tests/core/packager.test.ts (1 hunks)
tests/core/security/runSecurityCheckIfEnabled.test.ts (0 hunks)
tests/core/security/securityCheck.test.ts (1 hunks)
tests/core/security/validateFileSafety.test.ts (1 hunks)
tests/integration-tests/packager.test.ts (2 hunks)

💤 Files with no reviewable changes (6)

src/core/metrics/aggregateMetrics.ts
tests/core/metrics/aggregateMetrics.test.ts
src/core/security/runSecurityCheckIfEnabled.ts
src/core/metrics/calculateIndividualFileMetrics.ts
tests/core/metrics/calculateIndividualFileMetrics.test.ts
tests/core/security/runSecurityCheckIfEnabled.test.ts

🚧 Files skipped from review as they are similar to previous changes (9)

package.json
tests/core/security/securityCheck.test.ts
src/core/metrics/workers/types.ts
tests/cli/actions/remoteAction.test.ts
src/core/tokenCount/tokenCount.ts
src/cli/actions/remoteAction.ts
tests/core/security/validateFileSafety.test.ts
tests/core/packager.test.ts
src/core/file/workers/fileProcessWorker.ts

🧰 Additional context used

🪛 GitHub Check: codecov/patch

src/core/metrics/calculateOutputMetrics.ts

[warning] 9-10: src/core/metrics/calculateOutputMetrics.ts#L9-L10
Added lines #L9 - L10 were not covered by tests

[warning] 12-15: src/core/metrics/calculateOutputMetrics.ts#L12-L15
Added lines #L12 - L15 were not covered by tests

[warning] 17-18: src/core/metrics/calculateOutputMetrics.ts#L17-L18
Added lines #L17 - L18 were not covered by tests

[warning] 21-28: src/core/metrics/calculateOutputMetrics.ts#L21-L28
Added lines #L21 - L28 were not covered by tests

[warning] 30-32: src/core/metrics/calculateOutputMetrics.ts#L30-L32
Added lines #L30 - L32 were not covered by tests

[warning] 34-34: src/core/metrics/calculateOutputMetrics.ts#L34
Added line #L34 was not covered by tests

[warning] 36-38: src/core/metrics/calculateOutputMetrics.ts#L36-L38
Added lines #L36 - L38 were not covered by tests

[warning] 40-45: src/core/metrics/calculateOutputMetrics.ts#L40-L45
Added lines #L40 - L45 were not covered by tests

src/core/metrics/calculateAllFileMetrics.ts

[warning] 17-18: src/core/metrics/calculateAllFileMetrics.ts#L17-L18
Added lines #L17 - L18 were not covered by tests

[warning] 20-25: src/core/metrics/calculateAllFileMetrics.ts#L20-L25
Added lines #L20 - L25 were not covered by tests

[warning] 27-28: src/core/metrics/calculateAllFileMetrics.ts#L27-L28
Added lines #L27 - L28 were not covered by tests

[warning] 73-75: src/core/metrics/calculateAllFileMetrics.ts#L73-L75
Added lines #L73 - L75 were not covered by tests

src/core/metrics/workers/outputMetricsWorker.ts

[warning] 1-1: src/core/metrics/workers/outputMetricsWorker.ts#L1
Added line #L1 was not covered by tests

[warning] 4-5: src/core/metrics/workers/outputMetricsWorker.ts#L4-L5
Added lines #L4 - L5 were not covered by tests

[warning] 14-14: src/core/metrics/workers/outputMetricsWorker.ts#L14
Added line #L14 was not covered by tests

[warning] 19-24: src/core/metrics/workers/outputMetricsWorker.ts#L19-L24
Added lines #L19 - L24 were not covered by tests

[warning] 29-32: src/core/metrics/workers/outputMetricsWorker.ts#L29-L32
Added lines #L29 - L32 were not covered by tests

src/core/metrics/workers/fileMetricsWorker.ts

[warning] 31-31: src/core/metrics/workers/fileMetricsWorker.ts#L31
Added line #L31 was not covered by tests

[warning] 33-36: src/core/metrics/workers/fileMetricsWorker.ts#L33-L36
Added lines #L33 - L36 were not covered by tests

[warning] 38-39: src/core/metrics/workers/fileMetricsWorker.ts#L38-L39
Added lines #L38 - L39 were not covered by tests

[warning] 54-58: src/core/metrics/workers/fileMetricsWorker.ts#L54-L58
Added lines #L54 - L58 were not covered by tests

src/core/file/workers/fileCollectWorker.ts

[warning] 56-57: src/core/file/workers/fileCollectWorker.ts#L56-L57
Added lines #L56 - L57 were not covered by tests

src/core/file/fileProcess.ts

[warning] 11-12: src/core/file/fileProcess.ts#L11-L12
Added lines #L11 - L12 were not covered by tests

[warning] 14-19: src/core/file/fileProcess.ts#L14-L19
Added lines #L14 - L19 were not covered by tests

[warning] 21-22: src/core/file/fileProcess.ts#L21-L22
Added lines #L21 - L22 were not covered by tests

[warning] 64-65: src/core/file/fileProcess.ts#L64-L65
Added lines #L64 - L65 were not covered by tests

src/core/file/fileCollect.ts

[warning] 10-11: src/core/file/fileCollect.ts#L10-L11
Added lines #L10 - L11 were not covered by tests

[warning] 13-18: src/core/file/fileCollect.ts#L13-L18
Added lines #L13 - L18 were not covered by tests

[warning] 20-20: src/core/file/fileCollect.ts#L20
Added line #L20 was not covered by tests

[warning] 64-65: src/core/file/fileCollect.ts#L64-L65
Added lines #L64 - L65 were not covered by tests

⏰ Context from checks skipped due to timeout of 90000ms (17)

GitHub Check: Build and run (macos-latest, 18.x)
GitHub Check: Build and run (macos-latest, 18.0.0)
GitHub Check: Test (macos-latest, 22.x)
GitHub Check: Test (windows-latest, 23.x)
GitHub Check: Test (windows-latest, 22.x)
GitHub Check: Build and run (windows-latest, 18.0.0)
GitHub Check: Test (windows-latest, 21.x)
GitHub Check: Test (windows-latest, 20.x)
GitHub Check: Test (windows-latest, 19.x)
GitHub Check: Test (windows-latest, 18.x)
GitHub Check: Build and run (ubuntu-latest, 20.x)
GitHub Check: Test (windows-latest, 18.0.0)
GitHub Check: Build and run (ubuntu-latest, 19.x)
GitHub Check: Build and run (ubuntu-latest, 18.x)
GitHub Check: Build and run (ubuntu-latest, 18.0.0)
GitHub Check: Analyze (javascript-typescript)
GitHub Check: Cloudflare Pages

🔇 Additional comments (18)

src/core/metrics/workers/fileMetricsWorker.ts (3)

1-12: LGTM! Clean imports and well-defined interface.

The imports are properly typed and the FileMetricsTask interface clearly defines the contract for worker tasks.

31-39: Add test coverage for the worker function.

The worker function lacks test coverage according to static analysis.

Could you add tests for:

Successful metric calculation

Performance measurement logging

Error handling scenarios

Would you like me to help generate the test cases?

🧰 Tools

🪛 GitHub Check: codecov/patch

[warning] 31-31: src/core/metrics/workers/fileMetricsWorker.ts#L31
Added line #L31 was not covered by tests

[warning] 33-36: src/core/metrics/workers/fileMetricsWorker.ts#L33-L36
Added lines #L33 - L36 were not covered by tests

[warning] 38-39: src/core/metrics/workers/fileMetricsWorker.ts#L38-L39
Added lines #L38 - L39 were not covered by tests

52-58: Add test coverage for cleanup handler.

The cleanup handler lacks test coverage according to static analysis. This is critical for ensuring proper resource management.

Could you add tests to verify:

TokenCounter is properly freed

TokenCounter reference is nullified

Multiple cleanup calls are handled safely

Would you like me to help generate the test cases?

🧰 Tools

🪛 GitHub Check: codecov/patch

[warning] 54-58: src/core/metrics/workers/fileMetricsWorker.ts#L54-L58
Added lines #L54 - L58 were not covered by tests

tests/integration-tests/packager.test.ts (5)

8-19: Imports align with the new concurrency modules.
These added imports appear consistent and support the PR's objective of modularizing file collection and processing.

70-76: Good approach for injective testing.
Overriding the collectFiles function with a custom init runner is a clean pattern for integration tests, allowing you to isolate each stage.

77-83: Parallelize file processing to fully realize performance gains.
This sequential loop may reintroduce a bottleneck in large repositories. Using a worker pool for parallel processing aligns better with your PR’s performance objectives.

84-90: Ensure comprehensive security test coverage.
Here, validateFileSafety is partially mocked by returning an empty array. Verify real-world security checks in a separate test to confirm detection logic.

91-114: Metric calculation logic is concise and performant.
Aggregating file statistics in one pass is a clean approach. Consider potential edge cases (e.g., empty or binary files) and verify that the aggregator handles them gracefully.

src/core/security/validateFileSafety.ts (1)

6-6: Transition from runSecurityCheckIfEnabled is consistent.
Replacing the conditional wrapper with a direct runSecurityCheck import simplifies the code. Ensure external references are likewise updated.

Also applies to: 14-14

src/core/metrics/workers/outputMetricsWorker.ts (1)

44-50: LGTM: Proper cleanup implementation.

The cleanup on worker termination is well implemented, ensuring proper resource management.

src/core/metrics/calculateMetrics.ts (1)

27-30: LGTM: Excellent performance optimization.

The concurrent execution of file metrics and output metrics calculations using Promise.all is a great performance improvement.

src/core/file/workers/fileCollectWorker.ts (1)

31-68: LGTM: Robust file reading implementation.

The implementation includes:

Proper error handling

Binary file detection

Encoding detection

Detailed logging

Clear warning messages for large files

🧰 Tools

🪛 GitHub Check: codecov/patch

[warning] 56-57: src/core/file/workers/fileCollectWorker.ts#L56-L57
Added lines #L56 - L57 were not covered by tests

src/core/security/workers/securityCheckWorker.ts (1)

1-9: LGTM! Clean interface definition and appropriate imports.

src/core/file/fileProcess.ts (1)

63-65: Consider partial results for recoverable errors.

Your error logging and subsequent re-throw provide visibility into failures, but consider whether partial results should be returned or saved in the event of recoverable errors. This would allow the system to skip problematic files rather than fail the entire batch.

🧰 Tools

🪛 GitHub Check: codecov/patch

[warning] 64-65: src/core/file/fileProcess.ts#L64-L65
Added lines #L64 - L65 were not covered by tests

src/core/packager.ts (1)

41-41: LGTM! Clean implementation of progress tracking.

The progress callback integration is well-structured and consistent across all operations.

src/core/file/fileCollect.ts (1)

62-65: Add test coverage for error scenarios.

The error handling block needs test coverage. Add test cases for:

File read failures

Worker pool initialization failures

Edge cases with empty file lists

🧰 Tools

🪛 GitHub Check: codecov/patch

[warning] 64-65: src/core/file/fileCollect.ts#L64-L65
Added lines #L64 - L65 were not covered by tests

src/core/metrics/calculateAllFileMetrics.ts (1)

16-28: Add test coverage for worker pool initialization.

The static analysis indicates that the worker pool initialization code is not covered by tests.

Consider adding the following test cases:

Verify correct thread count calculation

Verify worker pool creation with correct parameters

Test error handling during initialization

🧰 Tools

🪛 GitHub Check: codecov/patch

[warning] 17-18: src/core/metrics/calculateAllFileMetrics.ts#L17-L18
Added lines #L17 - L18 were not covered by tests

[warning] 20-25: src/core/metrics/calculateAllFileMetrics.ts#L20-L25
Added lines #L20 - L25 were not covered by tests

[warning] 27-28: src/core/metrics/calculateAllFileMetrics.ts#L27-L28
Added lines #L27 - L28 were not covered by tests
tests/core/file/fileCollect.test.ts (1)

20-24: Enhance test coverage for concurrent task processing.

The mock task runner processes tasks sequentially. Consider adding tests for concurrent processing scenarios.
 const mockInitTaskRunner = () => {
+  const processingTasks = new Set();
   return async (task: FileCollectTask) => {
+    if (processingTasks.size > 0) {
+      logger.trace(`Processing ${processingTasks.size} concurrent tasks`);
+    }
+    processingTasks.add(task);
+    try {
       return await fileCollectWorker(task);
+    } finally {
+      processingTasks.delete(task);
+    }
   };
 };

src/core/metrics/workers/fileMetricsWorker.ts

src/core/metrics/calculateOutputMetrics.ts

src/core/file/fileProcess.ts

src/core/file/fileCollect.ts

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

.github/workflows/ci.yml (1)

Line range hint 1-140: Consider workflow optimization for performance improvements

Since this PR focuses on performance optimization, consider these workflow improvements:

Use dependency caching more effectively

Implement job dependencies to avoid redundant work

Consider using matrix includes/excludes for specific version combinations

Would you like me to provide a detailed optimization plan for the workflow?

🧰 Tools

🪛 actionlint (1.7.4)

13-13: label "ubuntu-latest-arm" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2be1e27 and a3c1eef.

📒 Files selected for processing (1)

.github/workflows/ci.yml (9 hunks)

🧰 Additional context used

🪛 actionlint (1.7.4)

.github/workflows/ci.yml

13-13: label "ubuntu-latest-arm" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

26-26: label "ubuntu-latest-arm" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

39-39: label "ubuntu-latest-arm" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

52-52: label "ubuntu-latest-arm" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

61-61: label "ubuntu-latest-arm" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

72-72: label "ubuntu-latest-arm" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

81-81: label "ubuntu-latest-arm" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

97-97: label "ubuntu-latest-arm" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

123-123: label "ubuntu-latest-arm" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

⏰ Context from checks skipped due to timeout of 90000ms (3)

GitHub Check: Build and run (windows-latest, 22.x)
GitHub Check: Analyze (javascript-typescript)
GitHub Check: Cloudflare Pages

🔇 Additional comments (1)

.github/workflows/ci.yml (1)

81-82: Verify Node.js ARM compatibility

The test matrix includes Node.js versions up to 23.x. Please ensure all these versions are available and fully compatible with ARM architecture.

✅ Verification successful

Node.js ARM compatibility verified successfully

All Node.js versions specified in the test matrix (18.0.0 through 23.x) provide official ARM64 builds and are fully compatible with the ARM architecture.
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Check Node.js ARM compatibility for specified versions

# Fetch Node.js release data
curl -s https://nodejs.org/dist/index.json | jq -r '.[] | select(.version | startswith("v18") or startswith("v19") or startswith("v20") or startswith("v21") or startswith("v22") or startswith("v23")) | {version: .version, arm64: (.files | any(contains("arm64")))} | select(.arm64 == false) | .version'
Length of output: 303

Script:
#!/bin/bash
# Fetch Node.js release data and check ARM64 availability
echo "Checking Node.js 18.0.0 specifically:"
curl -s https://nodejs.org/dist/v18.0.0/SHASUMS256.txt | grep -i 'arm64\|aarch64' || echo "No ARM64 builds found for 18.0.0"

echo -e "\nChecking latest versions:"
for version in 18 19 20 21 22 23; do
  latest=$(curl -s https://nodejs.org/dist/index.json | jq -r ".[] | select(.version | startswith(\"v$version\")) | .version" | head -n1)
  if [ ! -z "$latest" ]; then
    echo "Version $latest:"
    curl -s "https://nodejs.org/dist/$latest/SHASUMS256.txt" | grep -i 'arm64\|aarch64' || echo "No ARM64 builds found"
  fi
done
Length of output: 8093
🧰 Tools

🪛 actionlint (1.7.4)

81-81: label "ubuntu-latest-arm" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

.github/workflows/ci.yml

coderabbitai

Actionable comments posted: 6

♻️ Duplicate comments (3)

tests/integration-tests/packager.test.ts (1)
77-83: 🛠️ Refactor suggestion

Implement parallel file processing for better performance.

The current implementation processes files sequentially, which could be a performance bottleneck for large repositories. This contradicts the PR's performance optimization goals.

Consider using Promise.all for parallel processing:
 processFiles: async (rawFiles, config, progressCallback) => {
-  const processedFiles: ProcessedFile[] = [];
-  for (const rawFile of rawFiles) {
-    processedFiles.push(await fileProcessWorker({ rawFile, config }));
-  }
-  return processedFiles;
+  return Promise.all(
+    rawFiles.map(rawFile => fileProcessWorker({ rawFile, config }))
+  );
 },
src/core/file/fileProcess.ts (1)
10-22: ⚠️ Potential issue

Add worker pool cleanup mechanism.

The worker pool should be properly cleaned up to prevent resource leaks.

Apply this diff:
 const initTaskRunner = (numOfTasks: number) => {
   const { minThreads, maxThreads } = getWorkerThreadCount(numOfTasks);
   logger.trace(`Initializing worker pool with min=${minThreads}, max=${maxThreads} threads`);

   const pool = new Piscina({
     filename: new URL('./workers/fileProcessWorker.js', import.meta.url).href,
     minThreads,
     maxThreads,
     idleTimeout: 5000,
   });

-  return (task: FileProcessTask) => pool.run(task);
+  return {
+    run: (task: FileProcessTask) => pool.run(task),
+    cleanup: async () => {
+      await pool.destroy();
+      logger.trace('Worker pool destroyed');
+    }
+  };
 };
🧰 Tools

🪛 GitHub Check: codecov/patch

[warning] 11-12: src/core/file/fileProcess.ts#L11-L12
Added lines #L11 - L12 were not covered by tests

[warning] 14-19: src/core/file/fileProcess.ts#L14-L19
Added lines #L14 - L19 were not covered by tests

[warning] 21-22: src/core/file/fileProcess.ts#L21-L22
Added lines #L21 - L22 were not covered by tests
src/core/file/fileCollect.ts (1)

62-65: 🛠️ Refactor suggestion

Align error handling with fileProcess.ts.

For consistency, implement the same error handling pattern as suggested for fileProcess.ts.

🧰 Tools

🪛 GitHub Check: codecov/patch

[warning] 64-65: src/core/file/fileCollect.ts#L64-L65
Added lines #L64 - L65 were not covered by tests

🧹 Nitpick comments (16)

src/core/metrics/workers/fileMetricsWorker.ts (3)
14-22: Consider implementing a thread-safe singleton pattern.

While the singleton pattern is appropriate here for resource management, the current implementation might be susceptible to race conditions in a multi-threaded environment. Although Piscina workers run in separate threads, it's a good practice to implement thread-safe patterns.

Consider using a more robust singleton implementation:
-// Worker-level singleton for TokenCounter
-let tokenCounter: TokenCounter | null = null;
-
-const getTokenCounter = (encoding: TiktokenEncoding): TokenCounter => {
-  if (!tokenCounter) {
-    tokenCounter = new TokenCounter(encoding);
-  }
-  return tokenCounter;
-};
+// Worker-level singleton for TokenCounter
+let tokenCounter: TokenCounter | null = null;
+let tokenCounterEncoding: TiktokenEncoding | null = null;
+
+const getTokenCounter = (encoding: TiktokenEncoding): TokenCounter => {
+  if (!tokenCounter || tokenCounterEncoding !== encoding) {
+    if (tokenCounter) {
+      tokenCounter.free();
+    }
+    tokenCounter = new TokenCounter(encoding);
+    tokenCounterEncoding = encoding;
+  }
+  return tokenCounter;
+};
This implementation also handles cases where the encoding type changes between calls.

35-44: Add input validation for file content.

Consider adding validation for the file content to handle edge cases gracefully.

Consider adding these validations:
 export const calculateIndividualFileMetrics = async (
   file: ProcessedFile,
   encoding: TiktokenEncoding,
 ): Promise<FileMetrics> => {
+  if (!file.content) {
+    return { path: file.path, charCount: 0, tokenCount: 0 };
+  }
+
   const charCount = file.content.length;
   const tokenCounter = getTokenCounter(encoding);
   const tokenCount = tokenCounter.countTokens(file.content, file.path);

   return { path: file.path, charCount, tokenCount };
 };
46-52: Enhance worker cleanup handling.

While the current cleanup is good, consider these improvements:

Add test coverage for the cleanup handler

Handle additional process events for more robust cleanup

Consider enhancing the cleanup:
 // Cleanup when worker is terminated
-process.on('exit', () => {
+const cleanup = () => {
   if (tokenCounter) {
     tokenCounter.free();
     tokenCounter = null;
   }
-});
+};
+
+// Handle various termination scenarios
+process.on('exit', cleanup);
+process.on('SIGINT', cleanup);
+process.on('SIGTERM', cleanup);
+process.on('uncaughtException', (error) => {
+  logger.error('Uncaught exception:', error);
+  cleanup();
+  process.exit(1);
+});
🧰 Tools

🪛 GitHub Check: codecov/patch

[warning] 48-52: src/core/metrics/workers/fileMetricsWorker.ts#L48-L52
Added lines #L48 - L52 were not covered by tests
tests/integration-tests/packager.test.ts (2)
26-30: Add JSDoc documentation for the mock function.

Consider adding JSDoc documentation to explain the purpose and usage of this mock function.
+/**
+ * Creates a mock task runner for file collection testing.
+ * @returns An async function that executes fileCollectWorker with the given task
+ */
const mockCollectFileInitTaskRunner = () => {
  return async (task: FileCollectTask) => {
    return await fileCollectWorker(task);
  };
};
93-113: Simplify metrics calculation using array methods.

The metrics calculation can be simplified using array methods for better readability and maintainability.
 calculateMetrics: async (processedFiles, output, progressCallback, config) => {
   return {
     totalFiles: processedFiles.length,
     totalCharacters: processedFiles.reduce((acc, file) => acc + file.content.length, 0),
     totalTokens: processedFiles.reduce((acc, file) => acc + file.content.split(/\s+/).length, 0),
-    fileCharCounts: processedFiles.reduce(
-      (acc, file) => {
-        acc[file.path] = file.content.length;
-        return acc;
-      },
-      {} as Record<string, number>,
-    ),
-    fileTokenCounts: processedFiles.reduce(
-      (acc, file) => {
-        acc[file.path] = file.content.split(/\s+/).length;
-        return acc;
-      },
-      {} as Record<string, number>,
-    ),
+    fileCharCounts: Object.fromEntries(
+      processedFiles.map(file => [file.path, file.content.length])
+    ),
+    fileTokenCounts: Object.fromEntries(
+      processedFiles.map(file => [file.path, file.content.split(/\s+/).length])
+    ),
   };
 },
src/core/security/workers/securityCheckWorker.ts (2)
31-34: Enhance error logging for better debugging.

Consider adding more context to the error log by including the error type and stack trace.
   } catch (error) {
-    logger.error(`Error checking security on ${filePath}:`, error);
+    logger.error(
+      `Error checking security on ${filePath}:`,
+      error instanceof Error ? { type: error.constructor.name, message: error.message, stack: error.stack } : error
+    );
     throw error;
   }
62-69: Add configuration logging for better transparency.

Consider logging the SecretLint configuration for better debugging and transparency.
 export const createSecretLintConfig = (): SecretLintCoreConfig => ({
   rules: [
     {
       id: '@secretlint/secretlint-rule-preset-recommend',
       rule: creator,
     },
   ],
 });
+logger.trace('SecretLint configuration:', createSecretLintConfig());
src/core/file/workers/fileCollectWorker.ts (3)
8-10: Consider making MAX_FILE_SIZE configurable.

While 50MB is a reasonable default, consider making this limit configurable through the application's configuration to accommodate different use cases.
-// Maximum file size to process (50MB)
-// This prevents out-of-memory errors when processing very large files
-export const MAX_FILE_SIZE = 50 * 1024 * 1024;
+// Maximum file size to process (defaults to 50MB)
+// This prevents out-of-memory errors when processing very large files
+export const MAX_FILE_SIZE = Number(process.env.REPOMIX_MAX_FILE_SIZE) || 50 * 1024 * 1024;
37-42: Enhance the large file warning message.

The warning message could be more actionable by providing the exact command to add the file to .repomixignore.
       logger.log('');
       logger.log('⚠️ Large File Warning:');
       logger.log('──────────────────────');
       logger.log(`File exceeds size limit: ${sizeMB}MB > ${MAX_FILE_SIZE / 1024 / 1024}MB (${filePath})`);
-      logger.note('Add this file to .repomixignore if you want to exclude it permanently');
+      logger.note(`Run 'echo "${filePath}" >> .repomixignore' to exclude this file permanently`);
       logger.log('');
60-62: Log encoding detection fallback.

Add logging when falling back to utf-8 encoding to help diagnose potential encoding issues.
-    const encoding = jschardet.detect(buffer).encoding || 'utf-8';
+    const detected = jschardet.detect(buffer);
+    const encoding = detected.encoding || 'utf-8';
+    if (!detected.encoding) {
+      logger.debug(`No encoding detected for ${filePath}, falling back to utf-8`);
+    }
     const content = iconv.decode(buffer, encoding);
src/core/security/securityCheck.ts (2)
16-28: Consider increasing the idle timeout for large repositories.

The current idleTimeout of 5000ms might be too short for processing large repositories, potentially leading to premature worker termination between batches. This could cause unnecessary overhead from worker pool recreation.
  const pool = new Piscina({
    filename: new URL('./workers/securityCheckWorker.js', import.meta.url).href,
    minThreads,
    maxThreads,
-   idleTimeout: 5000,
+   idleTimeout: 30000, // 30 seconds for better handling of large repositories
  });
53-61: Consider implementing batch processing for large repositories.

While the current implementation efficiently processes files in parallel, for very large repositories, processing all files simultaneously might consume excessive memory. Consider implementing batch processing to better control resource usage.
-    const results = await Promise.all(
-      tasks.map((task) =>
-        runTask(task).then((result) => {
-          completedTasks++;
-          progressCallback(`Running security check... (${completedTasks}/${totalTasks}) ${pc.dim(task.filePath)}`);
-          return result;
-        }),
-      ),
-    );
+    const batchSize = 100; // Adjust based on available system resources
+    const results = [];
+    for (let i = 0; i < tasks.length; i += batchSize) {
+      const batch = tasks.slice(i, i + batchSize);
+      const batchResults = await Promise.all(
+        batch.map((task) =>
+          runTask(task).then((result) => {
+            completedTasks++;
+            progressCallback(`Running security check... (${completedTasks}/${totalTasks}) ${pc.dim(task.filePath)}`);
+            return result;
+          }),
+        ),
+      );
+      results.push(...batchResults);
+    }
tests/core/file/fileCollect.test.ts (1)
20-24: Enhance test coverage for concurrent file collection.

The current mockInitTaskRunner processes tasks sequentially. Consider adding tests that verify concurrent file collection behavior.
 const mockInitTaskRunner = () => {
+  const activeTasks = new Set();
   return async (task: FileCollectTask) => {
+    activeTasks.add(task);
+    if (activeTasks.size > 1) {
+      logger.debug(`Processing ${activeTasks.size} concurrent tasks`);
+    }
     const result = await fileCollectWorker(task);
+    activeTasks.delete(task);
     return result;
   };
 };
src/core/file/fileProcess.ts (2)
41-65: Enhance error handling with specific error types.

The current error handling is too generic. Consider:

Creating specific error types for different failure scenarios

Adding retry logic for transient failures

Including more context in error messages

Example implementation:
+ class FileProcessingError extends Error {
+   constructor(message: string, public readonly cause?: unknown) {
+     super(message);
+     this.name = 'FileProcessingError';
+   }
+ }

  try {
    // ... existing code ...
  } catch (error) {
-   logger.error('Error during file processing:', error);
-   throw error;
+   logger.error('Error during file processing:', {
+     error,
+     taskCount: tasks.length,
+     completedTasks
+   });
+   throw new FileProcessingError('File processing failed', error);
  }
🧰 Tools

🪛 GitHub Check: codecov/patch

[warning] 64-65: src/core/file/fileProcess.ts#L64-L65
Added lines #L64 - L65 were not covered by tests

32-56: Consider implementing batch processing for better performance.

Instead of processing all files at once, consider implementing batch processing to:

Reduce memory usage

Provide more granular progress updates

Allow for better error recovery

Would you like me to provide an example implementation of batch processing?
src/core/file/fileCollect.ts (1)
47-56: Improve progress tracking with estimated time remaining.

The current progress updates could be enhanced with time estimates.

Example implementation:
const startTime = process.hrtime.bigint();
let completedTasks = 0;
const totalTasks = tasks.length;

const updateProgress = (task: FileCollectTask) => {
  completedTasks++;
  const elapsed = Number(process.hrtime.bigint() - startTime) / 1e9;
  const rate = completedTasks / elapsed;
  const remaining = (totalTasks - completedTasks) / rate;
  
  progressCallback(
    `Collect file... (${completedTasks}/${totalTasks}, ~${remaining.toFixed(1)}s remaining) ${pc.dim(task.filePath)}`
  );
  logger.trace(`Collect files... (${completedTasks}/${totalTasks}) ${task.filePath}`);
};

const results = await Promise.all(
  tasks.map((task) =>
    runTask(task).then((result) => {
      updateProgress(task);
      return result;
    }),
  ),
);

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a3c1eef and ee46a4f.

📒 Files selected for processing (30)

src/cli/actions/remoteAction.ts (2 hunks)
src/core/file/fileCollect.ts (1 hunks)
src/core/file/fileProcess.ts (1 hunks)
src/core/file/workers/fileCollectWorker.ts (1 hunks)
src/core/file/workers/fileProcessWorker.ts (1 hunks)
src/core/metrics/aggregateMetrics.ts (0 hunks)
src/core/metrics/calculateAllFileMetrics.ts (1 hunks)
src/core/metrics/calculateIndividualFileMetrics.ts (0 hunks)
src/core/metrics/calculateMetrics.ts (2 hunks)
src/core/metrics/calculateOutputMetrics.ts (1 hunks)
src/core/metrics/workers/fileMetricsWorker.ts (1 hunks)
src/core/metrics/workers/outputMetricsWorker.ts (1 hunks)
src/core/metrics/workers/types.ts (1 hunks)
src/core/packager.ts (1 hunks)
src/core/security/runSecurityCheckIfEnabled.ts (0 hunks)
src/core/security/securityCheck.ts (1 hunks)
src/core/security/validateFileSafety.ts (1 hunks)
src/core/security/workers/securityCheckWorker.ts (1 hunks)
src/shared/processConcurrency.ts (1 hunks)
tests/cli/actions/remoteAction.test.ts (2 hunks)
tests/core/file/fileCollect.test.ts (6 hunks)
tests/core/file/fileProcess.test.ts (9 hunks)
tests/core/metrics/aggregateMetrics.test.ts (0 hunks)
tests/core/metrics/calculateAllFileMetrics.test.ts (1 hunks)
tests/core/metrics/calculateIndividualFileMetrics.test.ts (0 hunks)
tests/core/metrics/calculateMetrics.test.ts (1 hunks)
tests/core/packager.test.ts (1 hunks)
tests/core/security/runSecurityCheckIfEnabled.test.ts (0 hunks)
tests/core/security/validateFileSafety.test.ts (1 hunks)
tests/integration-tests/packager.test.ts (2 hunks)

💤 Files with no reviewable changes (6)

tests/core/metrics/aggregateMetrics.test.ts
src/core/metrics/aggregateMetrics.ts
tests/core/metrics/calculateIndividualFileMetrics.test.ts
src/core/security/runSecurityCheckIfEnabled.ts
src/core/metrics/calculateIndividualFileMetrics.ts
tests/core/security/runSecurityCheckIfEnabled.test.ts

🚧 Files skipped from review as they are similar to previous changes (8)

src/core/metrics/workers/types.ts
src/cli/actions/remoteAction.ts
tests/cli/actions/remoteAction.test.ts
src/core/file/workers/fileProcessWorker.ts
src/core/security/validateFileSafety.ts
tests/core/security/validateFileSafety.test.ts
tests/core/packager.test.ts
src/core/packager.ts

🧰 Additional context used

🪛 GitHub Check: codecov/patch

src/core/metrics/workers/fileMetricsWorker.ts

[warning] 25-30: src/core/metrics/workers/fileMetricsWorker.ts#L25-L30
Added lines #L25 - L30 were not covered by tests

[warning] 32-33: src/core/metrics/workers/fileMetricsWorker.ts#L32-L33
Added lines #L32 - L33 were not covered by tests

[warning] 48-52: src/core/metrics/workers/fileMetricsWorker.ts#L48-L52
Added lines #L48 - L52 were not covered by tests

src/core/metrics/calculateAllFileMetrics.ts

[warning] 12-13: src/core/metrics/calculateAllFileMetrics.ts#L12-L13
Added lines #L12 - L13 were not covered by tests

[warning] 15-20: src/core/metrics/calculateAllFileMetrics.ts#L15-L20
Added lines #L15 - L20 were not covered by tests

[warning] 22-23: src/core/metrics/calculateAllFileMetrics.ts#L22-L23
Added lines #L22 - L23 were not covered by tests

[warning] 65-67: src/core/metrics/calculateAllFileMetrics.ts#L65-L67
Added lines #L65 - L67 were not covered by tests

src/core/file/fileCollect.ts

[warning] 10-11: src/core/file/fileCollect.ts#L10-L11
Added lines #L10 - L11 were not covered by tests

[warning] 13-18: src/core/file/fileCollect.ts#L13-L18
Added lines #L13 - L18 were not covered by tests

[warning] 20-20: src/core/file/fileCollect.ts#L20
Added line #L20 was not covered by tests

[warning] 64-65: src/core/file/fileCollect.ts#L64-L65
Added lines #L64 - L65 were not covered by tests

src/core/file/fileProcess.ts

[warning] 11-12: src/core/file/fileProcess.ts#L11-L12
Added lines #L11 - L12 were not covered by tests

[warning] 14-19: src/core/file/fileProcess.ts#L14-L19
Added lines #L14 - L19 were not covered by tests

[warning] 21-22: src/core/file/fileProcess.ts#L21-L22
Added lines #L21 - L22 were not covered by tests

[warning] 64-65: src/core/file/fileProcess.ts#L64-L65
Added lines #L64 - L65 were not covered by tests

src/core/metrics/workers/outputMetricsWorker.ts

[warning] 1-3: src/core/metrics/workers/outputMetricsWorker.ts#L1-L3
Added lines #L1 - L3 were not covered by tests

[warning] 12-12: src/core/metrics/workers/outputMetricsWorker.ts#L12
Added line #L12 was not covered by tests

[warning] 14-19: src/core/metrics/workers/outputMetricsWorker.ts#L14-L19
Added lines #L14 - L19 were not covered by tests

[warning] 21-24: src/core/metrics/workers/outputMetricsWorker.ts#L21-L24
Added lines #L21 - L24 were not covered by tests

[warning] 26-31: src/core/metrics/workers/outputMetricsWorker.ts#L26-L31
Added lines #L26 - L31 were not covered by tests

[warning] 33-34: src/core/metrics/workers/outputMetricsWorker.ts#L33-L34
Added lines #L33 - L34 were not covered by tests