Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(pack): Performance Optimization for Large Repositories #307

Merged
merged 6 commits into from
Jan 25, 2025

Conversation

yamadashy
Copy link
Owner

@yamadashy yamadashy commented Jan 24, 2025

Performance Improvement

yamadashy/repomix

  • Before: 868.73 millis (usr: 1.11 secs, sys: 0.14 secs)
  • After: 671.26 millis (usr: 1.42 secs, sys: 0.22 secs)

No significant change

facebook/react

  • Before: 123.31 secs (usr: 118.64s, sys: 1.60s)
  • After: 7.10 secs (usr: 20.01s, sys: 2.39s)

17x faster

vercel/next.js

  • Before: 17.85 mins (usr: 16.66 mins, sys: 0.18 mins)
  • After: 17.27 secs (usr: 52.93 secs, sys: 7.11 secs)

58x faster

Changes

  • Replace p-map with Piscina worker threads for parallel processing
  • Implement dedicated workers for file collection, processing, and metrics
  • Optimize file search with single worker implementation
  • Add proper resource cleanup for worker pools
  • Move file manipulation logic to worker threads

Checklist

  • Run npm run test
  • Run npm run lint

Copy link

codecov bot commented Jan 24, 2025

Codecov Report

Attention: Patch coverage is 88.59223% with 47 lines in your changes missing coverage. Please review.

Project coverage is 90.65%. Comparing base (5c466e1) to head (fc20dde).
Report is 7 commits behind head on main.

Files with missing lines Patch % Lines
src/shared/processConcurrency.ts 56.52% 10 Missing ⚠️
src/core/metrics/calculateAllFileMetrics.ts 84.61% 6 Missing ⚠️
src/core/file/fileCollect.ts 87.50% 5 Missing ⚠️
src/core/file/fileProcess.ts 85.71% 5 Missing ⚠️
src/core/metrics/workers/fileMetricsWorker.ts 84.37% 5 Missing ⚠️
src/core/metrics/workers/outputMetricsWorker.ts 82.14% 5 Missing ⚠️
src/core/metrics/calculateOutputMetrics.ts 88.88% 3 Missing ⚠️
src/core/security/securityCheck.ts 91.42% 3 Missing ⚠️
src/core/security/workers/securityCheckWorker.ts 93.47% 3 Missing ⚠️
src/core/file/workers/fileCollectWorker.ts 95.45% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #307      +/-   ##
==========================================
- Coverage   92.17%   90.65%   -1.52%     
==========================================
  Files          44       48       +4     
  Lines        2248     2450     +202     
  Branches      493      512      +19     
==========================================
+ Hits         2072     2221     +149     
- Misses        176      229      +53     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

coderabbitai bot commented Jan 24, 2025

📝 Walkthrough

Walkthrough

This pull request introduces a comprehensive refactoring of the codebase to implement concurrent processing using the piscina library. The changes span multiple modules across file collection, processing, metrics calculation, and security checks. The primary focus is on transitioning from sequential to parallel task execution using worker threads.

Key modifications include adding the piscina dependency, updating the runRemoteAction function to enhance dependency injection, and restructuring the collectFiles and processFiles functions to utilize worker pools for concurrent execution. Additionally, new worker modules have been introduced for file collection, processing, and security checks, while existing functions have been updated to utilize these workers. The refactoring enhances concurrency by initializing task runners that can process multiple tasks simultaneously, improving overall performance and scalability of the application.

The changes affect core functionality in file handling, metrics calculation, and security checks, with a consistent pattern of introducing task runners, worker threads, and progress tracking mechanisms across different modules.

Sequence Diagram

sequenceDiagram
    participant Client
    participant Packager
    participant TaskRunner
    participant Workers
    
    Client->>Packager: Initiate pack
    Packager->>TaskRunner: Initialize worker pool
    TaskRunner-->>Packager: Worker pool ready
    
    Packager->>TaskRunner: Collect files
    TaskRunner->>Workers: Process file collection tasks
    Workers-->>TaskRunner: File collection results
    
    Packager->>TaskRunner: Process files
    TaskRunner->>Workers: Process file processing tasks
    Workers-->>TaskRunner: Processed files
    
    Packager->>TaskRunner: Calculate metrics
    TaskRunner->>Workers: Calculate file metrics tasks
    Workers-->>TaskRunner: File metrics
    
    Packager->>TaskRunner: Run security checks
    TaskRunner->>Workers: Security check tasks
    Workers-->>TaskRunner: Security check results
    
    TaskRunner-->>Packager: Aggregated results
    Packager-->>Client: Final output
Loading

Possibly related PRs

  • Handle large files gracefully #302: The main PR introduces a new dependency piscina in package.json, while PR Handle large files gracefully #302 modifies src/core/file/fileCollect.ts to implement a file size limit check using a constant MAX_FILE_SIZE. Both PRs involve changes to file handling and processing, indicating a focus on improving the management of resources and dependencies in the codebase.
✨ Finishing Touches
  • 📝 Generate Docstrings (Beta)

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (41)
src/core/metrics/calculateOutputMetrics.ts (2)

1-7: Unused imports warning
path and fileURLToPath appear to be unused in this file. Consider removing them if they are no longer needed to keep the codebase clean.


20-29: Enhance the deps parameter with a typed interface
It’s generally good practice to define an explicit interface for deps. This improves readability and discoverability, and ensures that all required dependencies are present.

src/core/metrics/calculateMetrics.ts (2)

20-23: Document the deps parameter
The deps parameter is injected as an object. Provide inline documentation (docstring or comment) describing each of the dependencies (e.g., calculateAllFileMetrics, calculateOutputMetrics) so that future maintainers know how and why they’re being used.


35-40: Check memory usage for large file sets
Looping through all fileMetrics to compute character/token counts is straightforward. However, for extremely large projects, consider streaming or incremental aggregation to avoid potential memory spikes.

src/core/file/fileProcess.ts (3)

8-9: Clearly document the new type
FileProcessTask is a critical type for worker communication. Consider adding JSDoc or TSDoc so that maintainers can quickly understand each property.


10-22: Optionally extend concurrency
The worker pool is configured based on the number of tasks, which is a nice improvement. If certain tasks vary heavily in complexity or size, consider adding optional heuristics to optimize the pool size further.


28-30: Typed deps object
Similar to earlier feedback in the metrics file, define a consistent interface for deps to ensure clarity and enforcements.

src/core/file/fileCollect.ts (2)

9-20: Initialize the worker pool conditionally for zero tasks.

If numOfTasks can be zero due to no files found, consider handling that scenario gracefully (e.g., immediately returning a no-op runner). While this situation may be rare, it helps guard against edge cases.

 const initTaskRunner = (numOfTasks: number) => {
+  if (numOfTasks === 0) {
+    return async () => null;
+  }
   const { minThreads, maxThreads } = getWorkerThreadCount(numOfTasks);
   ...
 }

40-56: Consider using Promise.allSettled.

Relying on Promise.all means that if one file collection task fails, it rejects the entire collection process. Depending on the desired behavior, you might want to collect partial results. If partial success is acceptable, consider Promise.allSettled and filtering out failed tasks for more robust error handling.

src/core/security/securityCheck.ts (2)

16-28: Guard for unexpected zero tasks.

Similar to collectFiles, if numOfTasks can be zero for edge scenarios, you could short-circuit the concurrency or return a no-op runner for clarity.


46-50: Improve partial failure handling.

Using Promise.all will cause a single failure (e.g., a worker throwing an exception for a single file) to halt the entire security check. If partial results are acceptable, consider Promise.allSettled to capture both successes and failures.

src/core/metrics/calculateAllFileMetrics.ts (1)

16-28: Conditional initialization for minimal tasks.

As with the other modules, consider gracefully handling the scenario where numOfTasks is zero to avoid creating idle worker threads unnecessarily.

tests/integration-tests/packager.test.ts (2)

70-76: Leverage progress callback usage in collectFiles.
You’ve introduced an inline wrapper to inject initTaskRunner. Consider passing additional metadata via the progressCallback, such as partial result counts or timing info, for improved visibility into large repository processing.


93-113: Metrics calculation is on track.
Storing file-specific stats in fileCharCounts and fileTokenCounts is helpful. Consider adding aggregated or average metrics as well, especially if used to track performance improvements in large repositories.

src/shared/processConcurrency.ts (2)

6-8: Validate CPU count fallback.
os.cpus() might return an empty array in rare containerized or restricted environments. As a safeguard, consider falling back to 1 for getProcessConcurrency when os.cpus().length is 0.


13-24: Reevaluate max thread constraints for large tasks.
Limiting threads to Math.floor(numOfTasks / 100) could be beneficial in typical workloads, yet might underutilize multi-core systems when tasks exceed a few thousand. Consider making this ratio configurable to better handle diverse large-repository scenarios.

src/core/tokenCount/tokenCount.ts (1)

35-37: Expose encoding retrieval for better introspection.
Providing a getEncoding() method is convenient for debugging or analyzing token usage. Consider also returning additional config info (if relevant) to aid in advanced debugging.

src/core/security/validateFileSafety.ts (1)

14-14: Leverage dependency injection effectively.

Accepting runSecurityCheck via deps is a good approach to enable simpler testing and future extension. Consider adding a docstring or inline comment clarifying why dependency injection is important here (e.g., for test mocks or swapping in different security check implementations).

tests/core/metrics/calculateAllFileMetrics.test.ts (1)

28-30: Ensure coverage of new parameters.

You are now passing an options object containing { initTaskRunner } to calculateAllFileMetrics. Double-check that your tests confirm behavior if initTaskRunner is undefined or fails. This helps solidify the code's resilience.

src/core/metrics/workers/outputMetricsWorker.ts (1)

7-11: Interface design clarity.

Defining OutputMetricsTask makes the worker’s contract explicit. If you expect optional or additional fields in the future (e.g., fileSize or fileType), consider a more flexible interface or referencing a shared global interface.

src/core/file/workers/fileProcessWorker.ts (2)

11-17: Name the default export for easier debugging.
Currently, the default export is an anonymous async function. Explicitly naming it can improve stack traces and debugging clarity.

-export default async ({ rawFile, config }: FileProcessTask): Promise<ProcessedFile> => {
+export default async function processFileTask({ rawFile, config }: FileProcessTask): Promise<ProcessedFile> {

36-41: Be mindful of large file line numbering.
Splitting and rejoining massive content may increase memory usage. Stream-based processing or chunking might be beneficial for extremely large files.

src/core/metrics/workers/fileMetricsWorker.ts (1)

30-39: Log file metrics after calculation.
Currently, metrics are logged at trace level before actually calling calculateIndividualFileMetrics. If calculateIndividualFileMetrics throws an error, logs can be misleading. Consider rearranging the logic or adding error handling.

src/core/file/workers/fileCollectWorker.ts (2)

35-45: Suggest partial reads for very large files.
Although rejecting files over 50 MB helps, partial reading can handle edge cases where slightly over-limit files still have relevant data and reduce memory usage.


60-61: Validate encoding detection and decoding.
jschardet can guess incorrectly. Consider fallback strategies or user-configurable defaults for ambiguous encodings.

src/core/security/workers/securityCheckWorker.ts (2)

40-63: Extend or customize secretlint rules.
Consider exposing additional custom rules or configuration to detect project-specific secrets for more comprehensive security checks.

Would you like me to generate a template for custom secretlint plugin development?


65-72: Consider dynamic config for secretlint.
Create a dynamic config to conditionally enable/disable certain rules based on environment or project type, improving maintainability in large-scale repositories.

tests/core/metrics/calculateMetrics.test.ts (1)

56-56: Extract hard-coded reference to 'o200k_base' into a constant or configuration.

Having 'o200k_base' directly in the test can obscure its meaning. Consider using a named constant or referencing a configuration source to improve clarity and maintainability.

src/cli/actions/remoteAction.ts (1)

50-50: Ensure robust error handling around default action execution.

While the try/catch block handles errors, confirm that runDefaultAction properly cleans up any resources or partial results in case of failure, especially for large projects cloned in the temp directory.

tests/core/file/fileCollect.test.ts (4)

8-10: Use single import statement for related items.
Consider combining these imports into a single statement for better readability and to avoid repeatedly referencing the same path.

-import { collectFiles } from '../../../src/core/file/fileCollect.js';
-import type { FileCollectTask } from '../../../src/core/file/workers/fileCollectWorker.js';
-import { MAX_FILE_SIZE } from '../../../src/core/file/workers/fileCollectWorker.js';
+import {
+  collectFiles,
+  type FileCollectTask,
+  MAX_FILE_SIZE
+} from '../../../src/core/file/workers/fileCollectWorker.js';

20-24: Ensure resilience in your mock task runner.
The mock task runner is straightforward, but consider adding error handling or logging for failures within fileCollectWorker, especially if you anticipate partial failures in concurrent tasks.

Would you like help adding error handling logic in your mock or real implementation?


71-73: Expand testing for partial results.
By skipping the binary file, you are ensuring the rest are processed. Consider adding an assertion that the progress callback or logs reflect the skipped file event.


100-102: Cover large file edge cases more thoroughly.
Currently, the test checks a file slightly over the limit. It may be worth testing a file exactly at the limit as well.

Would you like me to add such a test to catch potential off-by-one scenarios?

tests/core/file/fileProcess.test.ts (7)

3-3: Document the rationale for re-exporting processFiles.
Importing and immediately re-exporting can sometimes hide the real source. Adding a short docstring or comment can help maintainers quickly understand the relation between fileProcess.js and fileProcessWorker.js.


5-6: Check if local solutions suffice before adding external dependencies.
Using a fileProcessWorker is a good approach. Before expanding to more libraries or heavy frameworks, confirm that you can handle concurrency effectively with your existing worker solution.


78-78: Test scenario with partial whitespace-only lines.
This is good coverage for ignoring comments. Additionally test partial whitespace lines to confirm your removal logic is robust.


95-95: Add explicit logging for manipulator absence.
Currently, you return the original content. Adding a debug/warning log might help trace the flow when no manipulator is found.

+ logger.debug(`No file manipulator found for ${filePath}, returning original content`);

127-127: Validate combined config scenarios.
If the user toggles multiple features (e.g. removeComments + showLineNumbers), confirm the correct transformations occur in the correct order.


143-143: Corner case: ensure '1:' label is not repeated multiple times
Some line numbering logic can inadvertently repeat line labels or offset them if the file is empty. Test thoroughly.


159-159: Keep a handle on performance with large line counts.
You’re testing 100 lines, but real files can have thousands. Consider measuring any performance overhead from line numbering on large inputs.

tests/cli/actions/remoteAction.test.ts (1)

11-11: Document the role of SuspiciousFileResult in remoteAction tests.
You’re importing SuspiciousFileResult but not using it. Add clarifications or tests that address suspicious files.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 745df97 and d8cca0b.

⛔ Files ignored due to path filters (1)
  • package-lock.json is excluded by !**/package-lock.json
📒 Files selected for processing (33)
  • package.json (1 hunks)
  • src/cli/actions/remoteAction.ts (2 hunks)
  • src/core/file/fileCollect.ts (1 hunks)
  • src/core/file/fileProcess.ts (1 hunks)
  • src/core/file/workers/fileCollectWorker.ts (1 hunks)
  • src/core/file/workers/fileProcessWorker.ts (1 hunks)
  • src/core/metrics/aggregateMetrics.ts (0 hunks)
  • src/core/metrics/calculateAllFileMetrics.ts (1 hunks)
  • src/core/metrics/calculateIndividualFileMetrics.ts (0 hunks)
  • src/core/metrics/calculateMetrics.ts (2 hunks)
  • src/core/metrics/calculateOutputMetrics.ts (1 hunks)
  • src/core/metrics/workers/fileMetricsWorker.ts (1 hunks)
  • src/core/metrics/workers/outputMetricsWorker.ts (1 hunks)
  • src/core/metrics/workers/types.ts (1 hunks)
  • src/core/packager.ts (1 hunks)
  • src/core/security/runSecurityCheckIfEnabled.ts (0 hunks)
  • src/core/security/securityCheck.ts (1 hunks)
  • src/core/security/validateFileSafety.ts (1 hunks)
  • src/core/security/workers/securityCheckWorker.ts (1 hunks)
  • src/core/tokenCount/tokenCount.ts (2 hunks)
  • src/shared/processConcurrency.ts (1 hunks)
  • tests/cli/actions/remoteAction.test.ts (2 hunks)
  • tests/core/file/fileCollect.test.ts (6 hunks)
  • tests/core/file/fileProcess.test.ts (9 hunks)
  • tests/core/metrics/aggregateMetrics.test.ts (0 hunks)
  • tests/core/metrics/calculateAllFileMetrics.test.ts (1 hunks)
  • tests/core/metrics/calculateIndividualFileMetrics.test.ts (0 hunks)
  • tests/core/metrics/calculateMetrics.test.ts (1 hunks)
  • tests/core/packager.test.ts (1 hunks)
  • tests/core/security/runSecurityCheckIfEnabled.test.ts (0 hunks)
  • tests/core/security/securityCheck.test.ts (1 hunks)
  • tests/core/security/validateFileSafety.test.ts (1 hunks)
  • tests/integration-tests/packager.test.ts (2 hunks)
💤 Files with no reviewable changes (6)
  • tests/core/metrics/aggregateMetrics.test.ts
  • src/core/security/runSecurityCheckIfEnabled.ts
  • src/core/metrics/calculateIndividualFileMetrics.ts
  • src/core/metrics/aggregateMetrics.ts
  • tests/core/security/runSecurityCheckIfEnabled.test.ts
  • tests/core/metrics/calculateIndividualFileMetrics.test.ts
✅ Files skipped from review due to trivial changes (1)
  • src/core/metrics/workers/types.ts
⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: Test (windows-latest, 22.x)
  • GitHub Check: Analyze (javascript-typescript)
  • GitHub Check: Cloudflare Pages
🔇 Additional comments (43)
src/core/metrics/calculateOutputMetrics.ts (1)

30-45: Validate error handling and side-effects
The try-catch block logs and re-throws errors while ignoring potential partial states. If this function is critical, consider whether you need additional cleanup logic after an error is caught (e.g., destroying or cleaning up worker pools).

src/core/metrics/calculateMetrics.ts (3)

5-5: Confirm usage alignment
Importing calculateOutputMetrics here is consistent with the new concurrency approach. Ensure that this file is the only place using this function so that the concurrency logic is centralized.


27-30: Parallel execution
Using Promise.all to invoke calculateAllFileMetrics and calculateOutputMetrics concurrently is an excellent use of parallelism for performance.


42-48: Return value clarity
The returned object is clear and self-explanatory. Good job returning separate fields for both aggregated and file-level metrics.

src/core/file/fileProcess.ts (3)

5-5: Ensure getWorkerThreadCount handles edge cases
Validate that getWorkerThreadCount gracefully handles cases where numOfTasks is zero or extremely large.


32-40: Progress callback usage
The approach of updating progressCallback after each file is processed is effective. Ensure that the callback function is performant, especially in large-scale usage where thousands of files are processed—excessive logging or console I/O can degrade performance.


2-2: Verify removal of pMap
pMap was previously used for concurrency. Confirm that no other files rely on it. If not, consider removing it from dependencies entirely to reduce bundle size.

src/core/file/fileCollect.ts (2)

1-2: Imports look good.

The addition of picocolors for formatting and piscina for concurrency is consistent with your overall objective of enhancing performance and providing clearer output logs.


62-65: Re-throwing error ensures transparency.

Catching and then rethrowing the error is a good practice to preserve the original stack trace. This approach makes debugging easier and ensures that upstream error-handling logic remains consistent.

src/core/security/securityCheck.ts (1)

67-70: Appropriate error logging and rethrow.

Logging and then rethrowing preserves the error context, which aligns well with debugging and consistent error handling across modules. Good approach!

src/core/metrics/calculateAllFileMetrics.ts (3)

41-49: Retaining index might be beneficial.

Storing and passing the index of each file is helpful for reporting or debugging, especially if the processing outcome needs to be correlated to the original file list order. This is a good approach.


52-66: Performance clarity in logs.

Logging the hrtime.bigint() duration in milliseconds and the number of files processed is helpful for diagnosing potential bottlenecks. This aligns well with your performance optimization goals.


67-75: Error approach is consistent.

Catching an error, logging it, and then rethrowing is consistent with the rest of the codebase. This pattern ensures that metrics calculation failures are transparent and don’t silently fail.

tests/integration-tests/packager.test.ts (2)

26-30: Ensure concurrency safety in the mock collect file task runner.
It’s great that you’re utilizing a dedicated mock for worker-based file collection. However, ensure that concurrent calls from multiple tests won’t conflict when sharing the same mock instance (if ever used simultaneously).


85-90: Validating file safety with mock security check.
You’re currently returning an empty array from runSecurityCheck. Make sure your integration tests also cover real security checks to confirm your concurrency logic handles potential security flags properly.

src/core/tokenCount/tokenCount.ts (1)

6-9: Initialize encoding name and encoding properly.
Injecting the encodingName and storing it is a good design choice. Ensure any external calls referencing the same TokenCounter instance consistently expect the same encoding.

src/core/security/validateFileSafety.ts (2)

6-6: Use consistent import naming for security logic.

Renaming imports from runSecurityCheckIfEnabled to runSecurityCheck clarifies the underlying functionality. Good job keeping the naming clear. Ensure that all references throughout the codebase reflect the new import name.


18-23: Add error handling and logging around security checks.

While the conditional call to runSecurityCheck is a good step, consider handling exceptions if the security check fails or encounters unexpected data. For example, you could log errors or fallback to skipping suspicious file detection in that scenario to prevent the entire process from crashing.

Would you like me to provide a sample snippet demonstrating graceful error handling?

tests/core/metrics/calculateAllFileMetrics.test.ts (3)

1-7: Improved test boundaries for worker-based metrics.

Importing calculateIndividualFileMetrics directly from fileMetricsWorker.js is a more accurate reflection of the real usage. This reduces the chance of discrepancies between production code and test mocks. Ensure that any utility code needed for advanced scenarios (e.g., file size extremes) is also tested with concurrency.


14-19: Mock initialization logic for task runners.

The mockInitTaskRunner effectively simulates worker-based concurrency. Consider verifying how the concurrency scales with increasing file counts (e.g., performance or memory usage) using additional test cases.


33-34: Token count adjustments in tests.

The updated token counts (13 and 50) might imply a more accurate tokenization method. Confirm these thresholds remain correct under actual tokenization rules, especially for boundary cases like large or non-text files.

src/core/metrics/workers/outputMetricsWorker.ts (2)

13-24: Singleton usage of TokenCounter.

Caching the TokenCounter in a worker-level singleton is efficient for performance. However, ensure your code handles concurrency properly if multiple tasks are processed simultaneously. If TokenCounter is stateful, you may need synchronization or separate instances per task.


44-50: Freeing resources on worker exit.

Good practice to free tokenCounter resources upon worker exit. Verify that no residual memory or references remain, especially if the worker might be re-instantiated frequently in large-scale tasks.

tests/core/security/validateFileSafety.test.ts (2)

24-24: Mocking the correct function name.

Replacing runSecurityCheckIfEnabled with runSecurityCheck aligns with the updated implementation. This ensures consistency in tests and prevents missing or mismatched mock calls.


30-30: Validate arguments for the security check.

The assertion confirms runSecurityCheck is invoked properly. Consider adding a separate test case covering config.security.enableSecurityCheck = false to ensure the function bypasses the check as intended.

src/core/file/workers/fileProcessWorker.ts (2)

1-3: Consider providing explicit error handling for imports.
If any of these imports fail or the module path changes, it could cause runtime issues without clear error logs.


26-28: Validate tool output before using manipulator methods.
If manipulator.removeComments returns undefined or malformatted output, it could break subsequent steps. Consider adding a safety check or fallback.

src/core/metrics/workers/fileMetricsWorker.ts (2)

14-16: Ensure shared token counter is thread-safe.
Since tokenCounter is stored at the module level, concurrent calls could lead to race conditions if the implementation isn’t thread-safe.


52-58: Confirm proper cleanup handling.
Attaching an exit listener ensures that the token counter is freed, but it might not handle forced termination scenarios (e.g., process.kill). Verify your approach for unexpected shutdowns.

src/core/file/workers/fileCollectWorker.ts (1)

46-49: Verify binary detection logic.
isBinary typically checks file extensions or partial content heuristics. Confirm that it’s accurate for different file types (e.g., unusual file extensions or compressed files).

src/core/security/workers/securityCheckWorker.ts (1)

14-38: Add concurrency/error handling.
If many security checks run in parallel, ensure that secretlint usage is safe in multi-threaded contexts. Also, confirm proper re-throws of key security errors so they aren’t silently swallowed.

tests/core/metrics/calculateMetrics.test.ts (1)

50-53: Leverage dependency injection for flexible testing.

Introducing a fifth parameter with calculateAllFileMetrics and calculateOutputMetrics is a smart design choice, making the function more testable and modular.

tests/core/security/securityCheck.test.ts (1)

3-3: Updated import path for worker usage looks correct.

The import from the new securityCheckWorker.js aligns with the concurrency approach. Good job ensuring the test references the proper worker implementation.

tests/core/packager.test.ts (1)

75-75: Good coverage for progress callback parameter.

Adding the progressCallback argument to mockDeps.collectFiles and verifying it here improve visibility into the new progress-tracking logic.

src/cli/actions/remoteAction.ts (1)

22-22: Injecting runDefaultAction into deps enhances testability and consistency.

This change improves your dependency injection pattern and keeps the function logic decoupled from direct imports.

tests/core/file/fileCollect.test.ts (2)

123-125: Confirm that error logging occurs once.
You successfully test that the result array is empty on a file read error, but also ensure that you log only once for repeated read failures if that situation arises.


50-52: Verify that progressCallback is properly tested.
You're passing an empty function (() => {}) as the progress callback. It might be beneficial to verify that the callback is invoked or not invoked as expected, ensuring accurate coverage for progress reporting logic.

tests/core/file/fileProcess.test.ts (3)

36-38: Include separate test for progressCallback usage.
Similar to file collection, ensure that the progress callback is being utilized in processFiles and that it’s tested thoroughly.


63-63: Confirm charset detection fallback.
For advanced usage, consider the scenario where jschardet returns an undefined or low-confidence encoding. Validate that the fallback handles strange charsets correctly.


111-111: Edge case: multiline content with line numbering.
Ensure each line is sequentially numbered, including any lines that might be entirely whitespace or special characters.

tests/cli/actions/remoteAction.test.ts (1)

4-4: Verify usage of DefaultActionRunnerResult.
Ensure that all fields in DefaultActionRunnerResult are indeed required for your tests. If some are optional, the test might mischaracterize the data structure.

src/core/packager.ts (1)

41-41: Good use of progressCallback injection.
Upgrading collectFiles to accept a callback fosters improved user experience and modular design. The approach is aligned with large-scale concurrency improvements.

package.json (1)

79-79: Verify piscina version and remove redundant p-map dependency.

Since p-map is being replaced with piscina for parallel processing, consider removing the p-map dependency to avoid confusion and reduce package size.

Let's verify the piscina version and check for any known vulnerabilities:

Apply this diff to remove the redundant dependency:

-    "p-map": "^7.0.3",
✅ Verification successful

Remove unused p-map dependency, piscina version is appropriate

The migration from p-map to piscina is complete as there are no remaining p-map imports in the codebase. The current piscina version (^4.8.0) is secure and stable for production use.

-    "p-map": "^7.0.3",
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check latest version and security advisories for piscina

# Get latest version from npm
echo "Latest version from npm:"
npm view piscina version

# Check for known vulnerabilities
echo -e "\nVulnerability check:"
npm audit piscina@4.8.0 --json || true

# Check GitHub Advisory Database
echo -e "\nGitHub Advisory Database check:"
gh api graphql -f query='
{
  securityVulnerabilities(first: 5, ecosystem: NPM, package: "piscina") {
    nodes {
      advisory {
        summary
        severity
        publishedAt
      }
      vulnerableVersionRange
      firstPatchedVersion {
        identifier
      }
    }
  }
}'

Length of output: 1205


Script:

#!/bin/bash
# Search for p-map usage in the codebase
echo "Files containing p-map imports:"
rg "require\(['\"]p-map['\"]|from ['\"]p-map['\"]" -l

echo -e "\nDetailed p-map usage:"
rg "require\(['\"]p-map['\"]|from ['\"]p-map['\"]" -C 2

Length of output: 261

src/core/metrics/calculateOutputMetrics.ts Outdated Show resolved Hide resolved
tests/integration-tests/packager.test.ts Show resolved Hide resolved
tests/core/file/fileProcess.test.ts Show resolved Hide resolved
tests/cli/actions/remoteAction.test.ts Show resolved Hide resolved
Copy link

cloudflare-workers-and-pages bot commented Jan 25, 2025

Deploying repomix with  Cloudflare Pages  Cloudflare Pages

Latest commit: fc20dde
Status: ✅  Deploy successful!
Preview URL: https://b6ae6556.repomix.pages.dev
Branch Preview URL: https://feat-multithread.repomix.pages.dev

View logs

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (2)
src/core/metrics/calculateOutputMetrics.ts (1)

8-18: ⚠️ Potential issue

Worker pool cleanup mechanism is still missing.

The worker pool created by initTaskRunner lacks cleanup, which could lead to resource leaks.

Apply this diff to add cleanup:

 const initTaskRunner = () => {
   const pool = new Piscina({
     filename: new URL('./workers/outputMetricsWorker.js', import.meta.url).href,
     minThreads: 1,
     maxThreads: 1,
     idleTimeout: 5000,
   });

+  const cleanup = async () => {
+    await pool.destroy();
+  };
 
-  return (task: OutputMetricsTask) => pool.run(task);
+  return {
+    runTask: (task: OutputMetricsTask) => pool.run(task),
+    cleanup,
+  };
 };
🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 9-10: src/core/metrics/calculateOutputMetrics.ts#L9-L10
Added lines #L9 - L10 were not covered by tests


[warning] 12-15: src/core/metrics/calculateOutputMetrics.ts#L12-L15
Added lines #L12 - L15 were not covered by tests


[warning] 17-18: src/core/metrics/calculateOutputMetrics.ts#L17-L18
Added lines #L17 - L18 were not covered by tests

src/core/metrics/workers/outputMetricsWorker.ts (1)

29-42: ⚠️ Potential issue

Add error handling for token counting.

Token counting operations should be wrapped in try-catch to handle potential errors gracefully.

Apply this diff:

 export default async ({ content, encoding, path }: OutputMetricsTask): Promise<number> => {
   const processStartAt = process.hrtime.bigint();
   const counter = getTokenCounter(encoding);
-  const tokenCount = counter.countTokens(content, path);
+  let tokenCount: number;
+  try {
+    tokenCount = counter.countTokens(content, path);
+  } catch (error) {
+    logger.error(`Failed to count tokens for ${path ?? 'unknown file'}:`, error);
+    return 0;
+  }
🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 29-32: src/core/metrics/workers/outputMetricsWorker.ts#L29-L32
Added lines #L29 - L32 were not covered by tests

🧹 Nitpick comments (18)
src/core/metrics/workers/fileMetricsWorker.ts (3)

14-25: Consider double-checked locking pattern for thread safety.

While the singleton pattern is appropriate for worker-level caching, the current implementation might have a race condition if multiple calls to getTokenCounter happen simultaneously.

Consider this thread-safe implementation:

 let tokenCounter: TokenCounter | null = null;
+let initializationInProgress = false;
 
 const getTokenCounter = (encoding: TiktokenEncoding): TokenCounter => {
   if (!tokenCounter) {
+    if (initializationInProgress) {
+      // Wait for initialization to complete
+      while (initializationInProgress) {
+        // Busy wait or use proper synchronization mechanism
+      }
+      return tokenCounter!;
+    }
+    initializationInProgress = true;
     tokenCounter = new TokenCounter(encoding);
+    initializationInProgress = false;
   }
   return tokenCounter;
 };

41-50: Add input validation for file content.

The function assumes file content is always present and valid.

Consider adding validation:

 export const calculateIndividualFileMetrics = async (
   file: ProcessedFile,
   encoding: TiktokenEncoding,
 ): Promise<FileMetrics> => {
+  if (!file.content) {
+    throw new Error(`No content available for file: ${file.path}`);
+  }
+
   const charCount = file.content.length;
   const tokenCounter = getTokenCounter(encoding);
   const tokenCount = tokenCounter.countTokens(file.content, file.path);
 
   return { path: file.path, charCount, tokenCount };
 };

1-58: Great implementation aligned with performance goals!

The worker implementation effectively supports the PR's performance optimization objectives by:

  • Using a singleton pattern for TokenCounter to avoid redundant initialization
  • Implementing proper resource cleanup
  • Providing clear metrics calculation logic

The significant performance improvements mentioned in the PR objectives (17x-58x speedup) are well-supported by this implementation.

Consider adding metrics for worker pool utilization and memory usage to help tune the number of workers for optimal performance.

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 31-31: src/core/metrics/workers/fileMetricsWorker.ts#L31
Added line #L31 was not covered by tests


[warning] 33-36: src/core/metrics/workers/fileMetricsWorker.ts#L33-L36
Added lines #L33 - L36 were not covered by tests


[warning] 38-39: src/core/metrics/workers/fileMetricsWorker.ts#L38-L39
Added lines #L38 - L39 were not covered by tests


[warning] 54-58: src/core/metrics/workers/fileMetricsWorker.ts#L54-L58
Added lines #L54 - L58 were not covered by tests

tests/integration-tests/packager.test.ts (1)

26-31: Consider making the mock more configurable.
mockCollectFileInitTaskRunner is straightforward, but allowing parameters or configurable behavior could facilitate advanced testing scenarios.

src/shared/processConcurrency.ts (2)

6-8: Consider reserving a CPU for system overhead.
Returning os.cpus().length might saturate the system if all cores are in use. For multi-task environments, subtracting one or two cores can prevent resource contention.


10-24: Handle extreme edge cases for worker thread limits.
While the logic is reasonable, if numOfTasks is very large or extremely small, you may need additional safeguards or custom scaling rules to prevent under- or over-utilization of threads.

src/core/security/validateFileSafety.ts (1)

18-23: Add robust error handling for security checks.
Currently, any failures inside runSecurityCheck could go unhandled. Consider try-catch blocks or fallback logic to handle partial or failed scanning gracefully.

tests/core/metrics/calculateAllFileMetrics.test.ts (2)

14-18: Consider expanding task runner test coverage.

The mock implementation is minimal and doesn't test error scenarios or worker pool behavior. Consider adding tests for:

  • Task runner initialization failures
  • Task execution errors
  • Concurrent task execution

28-30: Add more test cases for edge scenarios.

The test only covers a basic scenario. Consider adding test cases for:

  • Empty files
  • Very large files
  • Files with special characters
  • Error conditions

Also applies to: 33-34

src/core/file/workers/fileCollectWorker.ts (1)

8-10: Consider making MAX_FILE_SIZE configurable.

The file size limit is hardcoded. Consider making it configurable through the application's configuration system.

src/core/security/workers/securityCheckWorker.ts (2)

34-37: Enhance error logging for better debugging.

Consider including more context in the error log by extracting and logging specific error properties.

   } catch (error) {
-    logger.error(`Error checking security on ${filePath}:`, error);
+    logger.error(`Error checking security on ${filePath}:`, {
+      message: error.message,
+      name: error.name,
+      stack: error.stack,
+    });
     throw error;
   }

40-44: Add input validation for file content.

Consider validating the input parameters to ensure robustness.

 export const runSecretLint = async (
   filePath: string,
   content: string,
   config: SecretLintCoreConfig,
 ): Promise<SecretLintCoreResult> => {
+  if (!content || typeof content !== 'string') {
+    throw new Error('Invalid content provided for security check');
+  }
+  if (!filePath || typeof filePath !== 'string') {
+    throw new Error('Invalid file path provided for security check');
+  }
   const result = await lintSource({
tests/core/metrics/calculateMetrics.test.ts (1)

50-53: Improve mock implementation with explicit typing.

The mock implementation could benefit from explicit typing for better maintainability.

     const result = await calculateMetrics(processedFiles, output, progressCallback, config, {
-      calculateAllFileMetrics,
-      calculateOutputMetrics: () => Promise.resolve(30),
+      calculateAllFileMetrics: calculateAllFileMetrics as jest.Mock<Promise<typeof fileMetrics>>,
+      calculateOutputMetrics: async () => 30,
     });
src/core/security/securityCheck.ts (3)

16-28: Consider implementing worker pool cleanup.

The worker pool initialization looks good, but there's no explicit cleanup mechanism. While Piscina has an idleTimeout, it's good practice to properly terminate the worker pool after all tasks are completed.

Consider adding a cleanup function:

 const initTaskRunner = (numOfTasks: number) => {
   const { minThreads, maxThreads } = getWorkerThreadCount(numOfTasks);
   logger.trace(`Initializing worker pool with min=${minThreads}, max=${maxThreads} threads`);

   const pool = new Piscina({
     filename: new URL('./workers/securityCheckWorker.js', import.meta.url).href,
     minThreads,
     maxThreads,
     idleTimeout: 5000,
   });

-  return (task: SecurityCheckTask) => pool.run(task);
+  return {
+    runTask: (task: SecurityCheckTask) => pool.run(task),
+    cleanup: async () => {
+      await pool.destroy();
+    }
+  };
 };

53-61: Consider implementing batch processing for large repositories.

While Promise.all is efficient, processing all files at once might consume too much memory for very large repositories. Consider implementing batch processing to handle such cases more efficiently.

Here's a suggested implementation:

-    const results = await Promise.all(
-      tasks.map((task) =>
-        runTask(task).then((result) => {
-          completedTasks++;
-          progressCallback(`Running security check... (${completedTasks}/${totalTasks}) ${pc.dim(task.filePath)}`);
-          return result;
-        }),
-      ),
-    );
+    const BATCH_SIZE = 100;
+    const results = [];
+    for (let i = 0; i < tasks.length; i += BATCH_SIZE) {
+      const batch = tasks.slice(i, i + BATCH_SIZE);
+      const batchResults = await Promise.all(
+        batch.map((task) =>
+          runTask(task).then((result) => {
+            completedTasks++;
+            progressCallback(`Running security check... (${completedTasks}/${totalTasks}) ${pc.dim(task.filePath)}`);
+            return result;
+          }),
+        ),
+      );
+      results.push(...batchResults);
+    }

68-70: Enhance error reporting with more context.

The current error handling is basic. Consider providing more detailed error information to help with debugging.

   } catch (error) {
-    logger.error('Error during security check:', error);
+    const errorContext = {
+      totalFiles: tasks.length,
+      completedTasks,
+      lastProcessedFile: tasks[completedTasks - 1]?.filePath,
+      error: error instanceof Error ? {
+        name: error.name,
+        message: error.message,
+        stack: error.stack
+      } : error
+    };
+    logger.error('Error during security check:', errorContext);
     throw error;
   }
src/core/metrics/calculateAllFileMetrics.ts (1)

33-75: Consider extracting common worker pool patterns.

The implementation shares significant patterns with securityCheck.ts. Consider extracting these common patterns into a shared utility.

Create a new utility file src/shared/workerPool.ts:

import { Piscina } from 'piscina';
import { logger } from './logger.js';
import { getWorkerThreadCount } from './processConcurrency.js';

export interface WorkerPoolOptions {
  filename: string;
  idleTimeout?: number;
}

export const createWorkerPool = <T, R>(
  numOfTasks: number,
  options: WorkerPoolOptions
) => {
  const { minThreads, maxThreads } = getWorkerThreadCount(numOfTasks);
  logger.trace(`Initializing worker pool with min=${minThreads}, max=${maxThreads} threads`);

  const pool = new Piscina({
    filename: options.filename,
    minThreads,
    maxThreads,
    idleTimeout: options.idleTimeout ?? 5000,
  });

  return {
    runTask: (task: T) => pool.run(task) as Promise<R>,
    cleanup: async () => {
      await pool.destroy();
    }
  };
};
🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 73-75: src/core/metrics/calculateAllFileMetrics.ts#L73-L75
Added lines #L73 - L75 were not covered by tests

tests/core/file/fileProcess.test.ts (1)

11-15: Add error handling test cases.

The mock task runner should include tests for worker failures and error propagation.

Add test cases for:

  1. Worker initialization failures
  2. Task execution errors
  3. Cleanup failures
it('should handle worker failures gracefully', async () => {
  const mockError = new Error('Worker failed');
  const failingTaskRunner = () => {
    return async () => {
      throw mockError;
    };
  };

  const mockRawFiles = [{ path: 'file.js', content: 'content' }];
  const config = createMockConfig({});

  await expect(
    processFiles(mockRawFiles, config, () => {}, {
      initTaskRunner: failingTaskRunner,
    })
  ).rejects.toThrow(mockError);
});
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d8cca0b and 2be1e27.

⛔ Files ignored due to path filters (1)
  • package-lock.json is excluded by !**/package-lock.json
📒 Files selected for processing (33)
  • package.json (1 hunks)
  • src/cli/actions/remoteAction.ts (2 hunks)
  • src/core/file/fileCollect.ts (1 hunks)
  • src/core/file/fileProcess.ts (1 hunks)
  • src/core/file/workers/fileCollectWorker.ts (1 hunks)
  • src/core/file/workers/fileProcessWorker.ts (1 hunks)
  • src/core/metrics/aggregateMetrics.ts (0 hunks)
  • src/core/metrics/calculateAllFileMetrics.ts (1 hunks)
  • src/core/metrics/calculateIndividualFileMetrics.ts (0 hunks)
  • src/core/metrics/calculateMetrics.ts (2 hunks)
  • src/core/metrics/calculateOutputMetrics.ts (1 hunks)
  • src/core/metrics/workers/fileMetricsWorker.ts (1 hunks)
  • src/core/metrics/workers/outputMetricsWorker.ts (1 hunks)
  • src/core/metrics/workers/types.ts (1 hunks)
  • src/core/packager.ts (1 hunks)
  • src/core/security/runSecurityCheckIfEnabled.ts (0 hunks)
  • src/core/security/securityCheck.ts (1 hunks)
  • src/core/security/validateFileSafety.ts (1 hunks)
  • src/core/security/workers/securityCheckWorker.ts (1 hunks)
  • src/core/tokenCount/tokenCount.ts (2 hunks)
  • src/shared/processConcurrency.ts (1 hunks)
  • tests/cli/actions/remoteAction.test.ts (2 hunks)
  • tests/core/file/fileCollect.test.ts (6 hunks)
  • tests/core/file/fileProcess.test.ts (9 hunks)
  • tests/core/metrics/aggregateMetrics.test.ts (0 hunks)
  • tests/core/metrics/calculateAllFileMetrics.test.ts (1 hunks)
  • tests/core/metrics/calculateIndividualFileMetrics.test.ts (0 hunks)
  • tests/core/metrics/calculateMetrics.test.ts (1 hunks)
  • tests/core/packager.test.ts (1 hunks)
  • tests/core/security/runSecurityCheckIfEnabled.test.ts (0 hunks)
  • tests/core/security/securityCheck.test.ts (1 hunks)
  • tests/core/security/validateFileSafety.test.ts (1 hunks)
  • tests/integration-tests/packager.test.ts (2 hunks)
💤 Files with no reviewable changes (6)
  • src/core/metrics/aggregateMetrics.ts
  • tests/core/metrics/aggregateMetrics.test.ts
  • src/core/security/runSecurityCheckIfEnabled.ts
  • src/core/metrics/calculateIndividualFileMetrics.ts
  • tests/core/metrics/calculateIndividualFileMetrics.test.ts
  • tests/core/security/runSecurityCheckIfEnabled.test.ts
🚧 Files skipped from review as they are similar to previous changes (9)
  • package.json
  • tests/core/security/securityCheck.test.ts
  • src/core/metrics/workers/types.ts
  • tests/cli/actions/remoteAction.test.ts
  • src/core/tokenCount/tokenCount.ts
  • src/cli/actions/remoteAction.ts
  • tests/core/security/validateFileSafety.test.ts
  • tests/core/packager.test.ts
  • src/core/file/workers/fileProcessWorker.ts
🧰 Additional context used
🪛 GitHub Check: codecov/patch
src/core/metrics/calculateOutputMetrics.ts

[warning] 9-10: src/core/metrics/calculateOutputMetrics.ts#L9-L10
Added lines #L9 - L10 were not covered by tests


[warning] 12-15: src/core/metrics/calculateOutputMetrics.ts#L12-L15
Added lines #L12 - L15 were not covered by tests


[warning] 17-18: src/core/metrics/calculateOutputMetrics.ts#L17-L18
Added lines #L17 - L18 were not covered by tests


[warning] 21-28: src/core/metrics/calculateOutputMetrics.ts#L21-L28
Added lines #L21 - L28 were not covered by tests


[warning] 30-32: src/core/metrics/calculateOutputMetrics.ts#L30-L32
Added lines #L30 - L32 were not covered by tests


[warning] 34-34: src/core/metrics/calculateOutputMetrics.ts#L34
Added line #L34 was not covered by tests


[warning] 36-38: src/core/metrics/calculateOutputMetrics.ts#L36-L38
Added lines #L36 - L38 were not covered by tests


[warning] 40-45: src/core/metrics/calculateOutputMetrics.ts#L40-L45
Added lines #L40 - L45 were not covered by tests

src/core/metrics/calculateAllFileMetrics.ts

[warning] 17-18: src/core/metrics/calculateAllFileMetrics.ts#L17-L18
Added lines #L17 - L18 were not covered by tests


[warning] 20-25: src/core/metrics/calculateAllFileMetrics.ts#L20-L25
Added lines #L20 - L25 were not covered by tests


[warning] 27-28: src/core/metrics/calculateAllFileMetrics.ts#L27-L28
Added lines #L27 - L28 were not covered by tests


[warning] 73-75: src/core/metrics/calculateAllFileMetrics.ts#L73-L75
Added lines #L73 - L75 were not covered by tests

src/core/metrics/workers/outputMetricsWorker.ts

[warning] 1-1: src/core/metrics/workers/outputMetricsWorker.ts#L1
Added line #L1 was not covered by tests


[warning] 4-5: src/core/metrics/workers/outputMetricsWorker.ts#L4-L5
Added lines #L4 - L5 were not covered by tests


[warning] 14-14: src/core/metrics/workers/outputMetricsWorker.ts#L14
Added line #L14 was not covered by tests


[warning] 19-24: src/core/metrics/workers/outputMetricsWorker.ts#L19-L24
Added lines #L19 - L24 were not covered by tests


[warning] 29-32: src/core/metrics/workers/outputMetricsWorker.ts#L29-L32
Added lines #L29 - L32 were not covered by tests

src/core/metrics/workers/fileMetricsWorker.ts

[warning] 31-31: src/core/metrics/workers/fileMetricsWorker.ts#L31
Added line #L31 was not covered by tests


[warning] 33-36: src/core/metrics/workers/fileMetricsWorker.ts#L33-L36
Added lines #L33 - L36 were not covered by tests


[warning] 38-39: src/core/metrics/workers/fileMetricsWorker.ts#L38-L39
Added lines #L38 - L39 were not covered by tests


[warning] 54-58: src/core/metrics/workers/fileMetricsWorker.ts#L54-L58
Added lines #L54 - L58 were not covered by tests

src/core/file/workers/fileCollectWorker.ts

[warning] 56-57: src/core/file/workers/fileCollectWorker.ts#L56-L57
Added lines #L56 - L57 were not covered by tests

src/core/file/fileProcess.ts

[warning] 11-12: src/core/file/fileProcess.ts#L11-L12
Added lines #L11 - L12 were not covered by tests


[warning] 14-19: src/core/file/fileProcess.ts#L14-L19
Added lines #L14 - L19 were not covered by tests


[warning] 21-22: src/core/file/fileProcess.ts#L21-L22
Added lines #L21 - L22 were not covered by tests


[warning] 64-65: src/core/file/fileProcess.ts#L64-L65
Added lines #L64 - L65 were not covered by tests

src/core/file/fileCollect.ts

[warning] 10-11: src/core/file/fileCollect.ts#L10-L11
Added lines #L10 - L11 were not covered by tests


[warning] 13-18: src/core/file/fileCollect.ts#L13-L18
Added lines #L13 - L18 were not covered by tests


[warning] 20-20: src/core/file/fileCollect.ts#L20
Added line #L20 was not covered by tests


[warning] 64-65: src/core/file/fileCollect.ts#L64-L65
Added lines #L64 - L65 were not covered by tests

⏰ Context from checks skipped due to timeout of 90000ms (17)
  • GitHub Check: Build and run (macos-latest, 18.x)
  • GitHub Check: Build and run (macos-latest, 18.0.0)
  • GitHub Check: Test (macos-latest, 22.x)
  • GitHub Check: Test (windows-latest, 23.x)
  • GitHub Check: Test (windows-latest, 22.x)
  • GitHub Check: Build and run (windows-latest, 18.0.0)
  • GitHub Check: Test (windows-latest, 21.x)
  • GitHub Check: Test (windows-latest, 20.x)
  • GitHub Check: Test (windows-latest, 19.x)
  • GitHub Check: Test (windows-latest, 18.x)
  • GitHub Check: Build and run (ubuntu-latest, 20.x)
  • GitHub Check: Test (windows-latest, 18.0.0)
  • GitHub Check: Build and run (ubuntu-latest, 19.x)
  • GitHub Check: Build and run (ubuntu-latest, 18.x)
  • GitHub Check: Build and run (ubuntu-latest, 18.0.0)
  • GitHub Check: Analyze (javascript-typescript)
  • GitHub Check: Cloudflare Pages
🔇 Additional comments (18)
src/core/metrics/workers/fileMetricsWorker.ts (3)

1-12: LGTM! Clean imports and well-defined interface.

The imports are properly typed and the FileMetricsTask interface clearly defines the contract for worker tasks.


31-39: Add test coverage for the worker function.

The worker function lacks test coverage according to static analysis.

Could you add tests for:

  • Successful metric calculation
  • Performance measurement logging
  • Error handling scenarios

Would you like me to help generate the test cases?

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 31-31: src/core/metrics/workers/fileMetricsWorker.ts#L31
Added line #L31 was not covered by tests


[warning] 33-36: src/core/metrics/workers/fileMetricsWorker.ts#L33-L36
Added lines #L33 - L36 were not covered by tests


[warning] 38-39: src/core/metrics/workers/fileMetricsWorker.ts#L38-L39
Added lines #L38 - L39 were not covered by tests


52-58: Add test coverage for cleanup handler.

The cleanup handler lacks test coverage according to static analysis. This is critical for ensuring proper resource management.

Could you add tests to verify:

  • TokenCounter is properly freed
  • TokenCounter reference is nullified
  • Multiple cleanup calls are handled safely

Would you like me to help generate the test cases?

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 54-58: src/core/metrics/workers/fileMetricsWorker.ts#L54-L58
Added lines #L54 - L58 were not covered by tests

tests/integration-tests/packager.test.ts (5)

8-19: Imports align with the new concurrency modules.
These added imports appear consistent and support the PR's objective of modularizing file collection and processing.


70-76: Good approach for injective testing.
Overriding the collectFiles function with a custom init runner is a clean pattern for integration tests, allowing you to isolate each stage.


77-83: Parallelize file processing to fully realize performance gains.
This sequential loop may reintroduce a bottleneck in large repositories. Using a worker pool for parallel processing aligns better with your PR’s performance objectives.


84-90: Ensure comprehensive security test coverage.
Here, validateFileSafety is partially mocked by returning an empty array. Verify real-world security checks in a separate test to confirm detection logic.


91-114: Metric calculation logic is concise and performant.
Aggregating file statistics in one pass is a clean approach. Consider potential edge cases (e.g., empty or binary files) and verify that the aggregator handles them gracefully.

src/core/security/validateFileSafety.ts (1)

6-6: Transition from runSecurityCheckIfEnabled is consistent.
Replacing the conditional wrapper with a direct runSecurityCheck import simplifies the code. Ensure external references are likewise updated.

Also applies to: 14-14

src/core/metrics/workers/outputMetricsWorker.ts (1)

44-50: LGTM: Proper cleanup implementation.

The cleanup on worker termination is well implemented, ensuring proper resource management.

src/core/metrics/calculateMetrics.ts (1)

27-30: LGTM: Excellent performance optimization.

The concurrent execution of file metrics and output metrics calculations using Promise.all is a great performance improvement.

src/core/file/workers/fileCollectWorker.ts (1)

31-68: LGTM: Robust file reading implementation.

The implementation includes:

  • Proper error handling
  • Binary file detection
  • Encoding detection
  • Detailed logging
  • Clear warning messages for large files
🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 56-57: src/core/file/workers/fileCollectWorker.ts#L56-L57
Added lines #L56 - L57 were not covered by tests

src/core/security/workers/securityCheckWorker.ts (1)

1-9: LGTM! Clean interface definition and appropriate imports.

src/core/file/fileProcess.ts (1)

63-65: Consider partial results for recoverable errors.

Your error logging and subsequent re-throw provide visibility into failures, but consider whether partial results should be returned or saved in the event of recoverable errors. This would allow the system to skip problematic files rather than fail the entire batch.

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 64-65: src/core/file/fileProcess.ts#L64-L65
Added lines #L64 - L65 were not covered by tests

src/core/packager.ts (1)

41-41: LGTM! Clean implementation of progress tracking.

The progress callback integration is well-structured and consistent across all operations.

src/core/file/fileCollect.ts (1)

62-65: Add test coverage for error scenarios.

The error handling block needs test coverage. Add test cases for:

  • File read failures
  • Worker pool initialization failures
  • Edge cases with empty file lists
🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 64-65: src/core/file/fileCollect.ts#L64-L65
Added lines #L64 - L65 were not covered by tests

src/core/metrics/calculateAllFileMetrics.ts (1)

16-28: Add test coverage for worker pool initialization.

The static analysis indicates that the worker pool initialization code is not covered by tests.

Consider adding the following test cases:

  1. Verify correct thread count calculation
  2. Verify worker pool creation with correct parameters
  3. Test error handling during initialization
🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 17-18: src/core/metrics/calculateAllFileMetrics.ts#L17-L18
Added lines #L17 - L18 were not covered by tests


[warning] 20-25: src/core/metrics/calculateAllFileMetrics.ts#L20-L25
Added lines #L20 - L25 were not covered by tests


[warning] 27-28: src/core/metrics/calculateAllFileMetrics.ts#L27-L28
Added lines #L27 - L28 were not covered by tests

tests/core/file/fileCollect.test.ts (1)

20-24: Enhance test coverage for concurrent task processing.

The mock task runner processes tasks sequentially. Consider adding tests for concurrent processing scenarios.

 const mockInitTaskRunner = () => {
+  const processingTasks = new Set();
   return async (task: FileCollectTask) => {
+    if (processingTasks.size > 0) {
+      logger.trace(`Processing ${processingTasks.size} concurrent tasks`);
+    }
+    processingTasks.add(task);
+    try {
       return await fileCollectWorker(task);
+    } finally {
+      processingTasks.delete(task);
+    }
   };
 };

src/core/metrics/workers/fileMetricsWorker.ts Outdated Show resolved Hide resolved
src/core/metrics/calculateOutputMetrics.ts Show resolved Hide resolved
src/core/file/fileProcess.ts Outdated Show resolved Hide resolved
src/core/file/fileCollect.ts Outdated Show resolved Hide resolved
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
.github/workflows/ci.yml (1)

Line range hint 1-140: Consider workflow optimization for performance improvements

Since this PR focuses on performance optimization, consider these workflow improvements:

  1. Use dependency caching more effectively
  2. Implement job dependencies to avoid redundant work
  3. Consider using matrix includes/excludes for specific version combinations

Would you like me to provide a detailed optimization plan for the workflow?

🧰 Tools
🪛 actionlint (1.7.4)

13-13: label "ubuntu-latest-arm" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2be1e27 and a3c1eef.

📒 Files selected for processing (1)
  • .github/workflows/ci.yml (9 hunks)
🧰 Additional context used
🪛 actionlint (1.7.4)
.github/workflows/ci.yml

13-13: label "ubuntu-latest-arm" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)


26-26: label "ubuntu-latest-arm" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)


39-39: label "ubuntu-latest-arm" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)


52-52: label "ubuntu-latest-arm" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)


61-61: label "ubuntu-latest-arm" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)


72-72: label "ubuntu-latest-arm" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)


81-81: label "ubuntu-latest-arm" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)


97-97: label "ubuntu-latest-arm" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)


123-123: label "ubuntu-latest-arm" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: Build and run (windows-latest, 22.x)
  • GitHub Check: Analyze (javascript-typescript)
  • GitHub Check: Cloudflare Pages
🔇 Additional comments (1)
.github/workflows/ci.yml (1)

81-82: Verify Node.js ARM compatibility

The test matrix includes Node.js versions up to 23.x. Please ensure all these versions are available and fully compatible with ARM architecture.

✅ Verification successful

Node.js ARM compatibility verified successfully

All Node.js versions specified in the test matrix (18.0.0 through 23.x) provide official ARM64 builds and are fully compatible with the ARM architecture.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check Node.js ARM compatibility for specified versions

# Fetch Node.js release data
curl -s https://nodejs.org/dist/index.json | jq -r '.[] | select(.version | startswith("v18") or startswith("v19") or startswith("v20") or startswith("v21") or startswith("v22") or startswith("v23")) | {version: .version, arm64: (.files | any(contains("arm64")))} | select(.arm64 == false) | .version'

Length of output: 303


Script:

#!/bin/bash
# Fetch Node.js release data and check ARM64 availability
echo "Checking Node.js 18.0.0 specifically:"
curl -s https://nodejs.org/dist/v18.0.0/SHASUMS256.txt | grep -i 'arm64\|aarch64' || echo "No ARM64 builds found for 18.0.0"

echo -e "\nChecking latest versions:"
for version in 18 19 20 21 22 23; do
  latest=$(curl -s https://nodejs.org/dist/index.json | jq -r ".[] | select(.version | startswith(\"v$version\")) | .version" | head -n1)
  if [ ! -z "$latest" ]; then
    echo "Version $latest:"
    curl -s "https://nodejs.org/dist/$latest/SHASUMS256.txt" | grep -i 'arm64\|aarch64' || echo "No ARM64 builds found"
  fi
done

Length of output: 8093

🧰 Tools
🪛 actionlint (1.7.4)

81-81: label "ubuntu-latest-arm" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

.github/workflows/ci.yml Outdated Show resolved Hide resolved
.github/workflows/ci.yml Outdated Show resolved Hide resolved
@yamadashy yamadashy force-pushed the feat/multithread branch 2 times, most recently from 69d8798 to 37e2a9d Compare January 25, 2025 03:41
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

♻️ Duplicate comments (3)
tests/integration-tests/packager.test.ts (1)

77-83: 🛠️ Refactor suggestion

Implement parallel file processing for better performance.

The current implementation processes files sequentially, which could be a performance bottleneck for large repositories. This contradicts the PR's performance optimization goals.

Consider using Promise.all for parallel processing:

 processFiles: async (rawFiles, config, progressCallback) => {
-  const processedFiles: ProcessedFile[] = [];
-  for (const rawFile of rawFiles) {
-    processedFiles.push(await fileProcessWorker({ rawFile, config }));
-  }
-  return processedFiles;
+  return Promise.all(
+    rawFiles.map(rawFile => fileProcessWorker({ rawFile, config }))
+  );
 },
src/core/file/fileProcess.ts (1)

10-22: ⚠️ Potential issue

Add worker pool cleanup mechanism.

The worker pool should be properly cleaned up to prevent resource leaks.

Apply this diff:

 const initTaskRunner = (numOfTasks: number) => {
   const { minThreads, maxThreads } = getWorkerThreadCount(numOfTasks);
   logger.trace(`Initializing worker pool with min=${minThreads}, max=${maxThreads} threads`);

   const pool = new Piscina({
     filename: new URL('./workers/fileProcessWorker.js', import.meta.url).href,
     minThreads,
     maxThreads,
     idleTimeout: 5000,
   });

-  return (task: FileProcessTask) => pool.run(task);
+  return {
+    run: (task: FileProcessTask) => pool.run(task),
+    cleanup: async () => {
+      await pool.destroy();
+      logger.trace('Worker pool destroyed');
+    }
+  };
 };
🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 11-12: src/core/file/fileProcess.ts#L11-L12
Added lines #L11 - L12 were not covered by tests


[warning] 14-19: src/core/file/fileProcess.ts#L14-L19
Added lines #L14 - L19 were not covered by tests


[warning] 21-22: src/core/file/fileProcess.ts#L21-L22
Added lines #L21 - L22 were not covered by tests

src/core/file/fileCollect.ts (1)

62-65: 🛠️ Refactor suggestion

Align error handling with fileProcess.ts.

For consistency, implement the same error handling pattern as suggested for fileProcess.ts.

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 64-65: src/core/file/fileCollect.ts#L64-L65
Added lines #L64 - L65 were not covered by tests

🧹 Nitpick comments (16)
src/core/metrics/workers/fileMetricsWorker.ts (3)

14-22: Consider implementing a thread-safe singleton pattern.

While the singleton pattern is appropriate here for resource management, the current implementation might be susceptible to race conditions in a multi-threaded environment. Although Piscina workers run in separate threads, it's a good practice to implement thread-safe patterns.

Consider using a more robust singleton implementation:

-// Worker-level singleton for TokenCounter
-let tokenCounter: TokenCounter | null = null;
-
-const getTokenCounter = (encoding: TiktokenEncoding): TokenCounter => {
-  if (!tokenCounter) {
-    tokenCounter = new TokenCounter(encoding);
-  }
-  return tokenCounter;
-};
+// Worker-level singleton for TokenCounter
+let tokenCounter: TokenCounter | null = null;
+let tokenCounterEncoding: TiktokenEncoding | null = null;
+
+const getTokenCounter = (encoding: TiktokenEncoding): TokenCounter => {
+  if (!tokenCounter || tokenCounterEncoding !== encoding) {
+    if (tokenCounter) {
+      tokenCounter.free();
+    }
+    tokenCounter = new TokenCounter(encoding);
+    tokenCounterEncoding = encoding;
+  }
+  return tokenCounter;
+};

This implementation also handles cases where the encoding type changes between calls.


35-44: Add input validation for file content.

Consider adding validation for the file content to handle edge cases gracefully.

Consider adding these validations:

 export const calculateIndividualFileMetrics = async (
   file: ProcessedFile,
   encoding: TiktokenEncoding,
 ): Promise<FileMetrics> => {
+  if (!file.content) {
+    return { path: file.path, charCount: 0, tokenCount: 0 };
+  }
+
   const charCount = file.content.length;
   const tokenCounter = getTokenCounter(encoding);
   const tokenCount = tokenCounter.countTokens(file.content, file.path);

   return { path: file.path, charCount, tokenCount };
 };

46-52: Enhance worker cleanup handling.

While the current cleanup is good, consider these improvements:

  1. Add test coverage for the cleanup handler
  2. Handle additional process events for more robust cleanup

Consider enhancing the cleanup:

 // Cleanup when worker is terminated
-process.on('exit', () => {
+const cleanup = () => {
   if (tokenCounter) {
     tokenCounter.free();
     tokenCounter = null;
   }
-});
+};
+
+// Handle various termination scenarios
+process.on('exit', cleanup);
+process.on('SIGINT', cleanup);
+process.on('SIGTERM', cleanup);
+process.on('uncaughtException', (error) => {
+  logger.error('Uncaught exception:', error);
+  cleanup();
+  process.exit(1);
+});
🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 48-52: src/core/metrics/workers/fileMetricsWorker.ts#L48-L52
Added lines #L48 - L52 were not covered by tests

tests/integration-tests/packager.test.ts (2)

26-30: Add JSDoc documentation for the mock function.

Consider adding JSDoc documentation to explain the purpose and usage of this mock function.

+/**
+ * Creates a mock task runner for file collection testing.
+ * @returns An async function that executes fileCollectWorker with the given task
+ */
const mockCollectFileInitTaskRunner = () => {
  return async (task: FileCollectTask) => {
    return await fileCollectWorker(task);
  };
};

93-113: Simplify metrics calculation using array methods.

The metrics calculation can be simplified using array methods for better readability and maintainability.

 calculateMetrics: async (processedFiles, output, progressCallback, config) => {
   return {
     totalFiles: processedFiles.length,
     totalCharacters: processedFiles.reduce((acc, file) => acc + file.content.length, 0),
     totalTokens: processedFiles.reduce((acc, file) => acc + file.content.split(/\s+/).length, 0),
-    fileCharCounts: processedFiles.reduce(
-      (acc, file) => {
-        acc[file.path] = file.content.length;
-        return acc;
-      },
-      {} as Record<string, number>,
-    ),
-    fileTokenCounts: processedFiles.reduce(
-      (acc, file) => {
-        acc[file.path] = file.content.split(/\s+/).length;
-        return acc;
-      },
-      {} as Record<string, number>,
-    ),
+    fileCharCounts: Object.fromEntries(
+      processedFiles.map(file => [file.path, file.content.length])
+    ),
+    fileTokenCounts: Object.fromEntries(
+      processedFiles.map(file => [file.path, file.content.split(/\s+/).length])
+    ),
   };
 },
src/core/security/workers/securityCheckWorker.ts (2)

31-34: Enhance error logging for better debugging.

Consider adding more context to the error log by including the error type and stack trace.

   } catch (error) {
-    logger.error(`Error checking security on ${filePath}:`, error);
+    logger.error(
+      `Error checking security on ${filePath}:`,
+      error instanceof Error ? { type: error.constructor.name, message: error.message, stack: error.stack } : error
+    );
     throw error;
   }

62-69: Add configuration logging for better transparency.

Consider logging the SecretLint configuration for better debugging and transparency.

 export const createSecretLintConfig = (): SecretLintCoreConfig => ({
   rules: [
     {
       id: '@secretlint/secretlint-rule-preset-recommend',
       rule: creator,
     },
   ],
 });
+logger.trace('SecretLint configuration:', createSecretLintConfig());
src/core/file/workers/fileCollectWorker.ts (3)

8-10: Consider making MAX_FILE_SIZE configurable.

While 50MB is a reasonable default, consider making this limit configurable through the application's configuration to accommodate different use cases.

-// Maximum file size to process (50MB)
-// This prevents out-of-memory errors when processing very large files
-export const MAX_FILE_SIZE = 50 * 1024 * 1024;
+// Maximum file size to process (defaults to 50MB)
+// This prevents out-of-memory errors when processing very large files
+export const MAX_FILE_SIZE = Number(process.env.REPOMIX_MAX_FILE_SIZE) || 50 * 1024 * 1024;

37-42: Enhance the large file warning message.

The warning message could be more actionable by providing the exact command to add the file to .repomixignore.

       logger.log('');
       logger.log('⚠️ Large File Warning:');
       logger.log('──────────────────────');
       logger.log(`File exceeds size limit: ${sizeMB}MB > ${MAX_FILE_SIZE / 1024 / 1024}MB (${filePath})`);
-      logger.note('Add this file to .repomixignore if you want to exclude it permanently');
+      logger.note(`Run 'echo "${filePath}" >> .repomixignore' to exclude this file permanently`);
       logger.log('');

60-62: Log encoding detection fallback.

Add logging when falling back to utf-8 encoding to help diagnose potential encoding issues.

-    const encoding = jschardet.detect(buffer).encoding || 'utf-8';
+    const detected = jschardet.detect(buffer);
+    const encoding = detected.encoding || 'utf-8';
+    if (!detected.encoding) {
+      logger.debug(`No encoding detected for ${filePath}, falling back to utf-8`);
+    }
     const content = iconv.decode(buffer, encoding);
src/core/security/securityCheck.ts (2)

16-28: Consider increasing the idle timeout for large repositories.

The current idleTimeout of 5000ms might be too short for processing large repositories, potentially leading to premature worker termination between batches. This could cause unnecessary overhead from worker pool recreation.

  const pool = new Piscina({
    filename: new URL('./workers/securityCheckWorker.js', import.meta.url).href,
    minThreads,
    maxThreads,
-   idleTimeout: 5000,
+   idleTimeout: 30000, // 30 seconds for better handling of large repositories
  });

53-61: Consider implementing batch processing for large repositories.

While the current implementation efficiently processes files in parallel, for very large repositories, processing all files simultaneously might consume excessive memory. Consider implementing batch processing to better control resource usage.

-    const results = await Promise.all(
-      tasks.map((task) =>
-        runTask(task).then((result) => {
-          completedTasks++;
-          progressCallback(`Running security check... (${completedTasks}/${totalTasks}) ${pc.dim(task.filePath)}`);
-          return result;
-        }),
-      ),
-    );
+    const batchSize = 100; // Adjust based on available system resources
+    const results = [];
+    for (let i = 0; i < tasks.length; i += batchSize) {
+      const batch = tasks.slice(i, i + batchSize);
+      const batchResults = await Promise.all(
+        batch.map((task) =>
+          runTask(task).then((result) => {
+            completedTasks++;
+            progressCallback(`Running security check... (${completedTasks}/${totalTasks}) ${pc.dim(task.filePath)}`);
+            return result;
+          }),
+        ),
+      );
+      results.push(...batchResults);
+    }
tests/core/file/fileCollect.test.ts (1)

20-24: Enhance test coverage for concurrent file collection.

The current mockInitTaskRunner processes tasks sequentially. Consider adding tests that verify concurrent file collection behavior.

 const mockInitTaskRunner = () => {
+  const activeTasks = new Set();
   return async (task: FileCollectTask) => {
+    activeTasks.add(task);
+    if (activeTasks.size > 1) {
+      logger.debug(`Processing ${activeTasks.size} concurrent tasks`);
+    }
     const result = await fileCollectWorker(task);
+    activeTasks.delete(task);
     return result;
   };
 };
src/core/file/fileProcess.ts (2)

41-65: Enhance error handling with specific error types.

The current error handling is too generic. Consider:

  1. Creating specific error types for different failure scenarios
  2. Adding retry logic for transient failures
  3. Including more context in error messages

Example implementation:

+ class FileProcessingError extends Error {
+   constructor(message: string, public readonly cause?: unknown) {
+     super(message);
+     this.name = 'FileProcessingError';
+   }
+ }

  try {
    // ... existing code ...
  } catch (error) {
-   logger.error('Error during file processing:', error);
-   throw error;
+   logger.error('Error during file processing:', {
+     error,
+     taskCount: tasks.length,
+     completedTasks
+   });
+   throw new FileProcessingError('File processing failed', error);
  }
🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 64-65: src/core/file/fileProcess.ts#L64-L65
Added lines #L64 - L65 were not covered by tests


32-56: Consider implementing batch processing for better performance.

Instead of processing all files at once, consider implementing batch processing to:

  1. Reduce memory usage
  2. Provide more granular progress updates
  3. Allow for better error recovery

Would you like me to provide an example implementation of batch processing?

src/core/file/fileCollect.ts (1)

47-56: Improve progress tracking with estimated time remaining.

The current progress updates could be enhanced with time estimates.

Example implementation:

const startTime = process.hrtime.bigint();
let completedTasks = 0;
const totalTasks = tasks.length;

const updateProgress = (task: FileCollectTask) => {
  completedTasks++;
  const elapsed = Number(process.hrtime.bigint() - startTime) / 1e9;
  const rate = completedTasks / elapsed;
  const remaining = (totalTasks - completedTasks) / rate;
  
  progressCallback(
    `Collect file... (${completedTasks}/${totalTasks}, ~${remaining.toFixed(1)}s remaining) ${pc.dim(task.filePath)}`
  );
  logger.trace(`Collect files... (${completedTasks}/${totalTasks}) ${task.filePath}`);
};

const results = await Promise.all(
  tasks.map((task) =>
    runTask(task).then((result) => {
      updateProgress(task);
      return result;
    }),
  ),
);
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a3c1eef and ee46a4f.

📒 Files selected for processing (30)
  • src/cli/actions/remoteAction.ts (2 hunks)
  • src/core/file/fileCollect.ts (1 hunks)
  • src/core/file/fileProcess.ts (1 hunks)
  • src/core/file/workers/fileCollectWorker.ts (1 hunks)
  • src/core/file/workers/fileProcessWorker.ts (1 hunks)
  • src/core/metrics/aggregateMetrics.ts (0 hunks)
  • src/core/metrics/calculateAllFileMetrics.ts (1 hunks)
  • src/core/metrics/calculateIndividualFileMetrics.ts (0 hunks)
  • src/core/metrics/calculateMetrics.ts (2 hunks)
  • src/core/metrics/calculateOutputMetrics.ts (1 hunks)
  • src/core/metrics/workers/fileMetricsWorker.ts (1 hunks)
  • src/core/metrics/workers/outputMetricsWorker.ts (1 hunks)
  • src/core/metrics/workers/types.ts (1 hunks)
  • src/core/packager.ts (1 hunks)
  • src/core/security/runSecurityCheckIfEnabled.ts (0 hunks)
  • src/core/security/securityCheck.ts (1 hunks)
  • src/core/security/validateFileSafety.ts (1 hunks)
  • src/core/security/workers/securityCheckWorker.ts (1 hunks)
  • src/shared/processConcurrency.ts (1 hunks)
  • tests/cli/actions/remoteAction.test.ts (2 hunks)
  • tests/core/file/fileCollect.test.ts (6 hunks)
  • tests/core/file/fileProcess.test.ts (9 hunks)
  • tests/core/metrics/aggregateMetrics.test.ts (0 hunks)
  • tests/core/metrics/calculateAllFileMetrics.test.ts (1 hunks)
  • tests/core/metrics/calculateIndividualFileMetrics.test.ts (0 hunks)
  • tests/core/metrics/calculateMetrics.test.ts (1 hunks)
  • tests/core/packager.test.ts (1 hunks)
  • tests/core/security/runSecurityCheckIfEnabled.test.ts (0 hunks)
  • tests/core/security/validateFileSafety.test.ts (1 hunks)
  • tests/integration-tests/packager.test.ts (2 hunks)
💤 Files with no reviewable changes (6)
  • tests/core/metrics/aggregateMetrics.test.ts
  • src/core/metrics/aggregateMetrics.ts
  • tests/core/metrics/calculateIndividualFileMetrics.test.ts
  • src/core/security/runSecurityCheckIfEnabled.ts
  • src/core/metrics/calculateIndividualFileMetrics.ts
  • tests/core/security/runSecurityCheckIfEnabled.test.ts
🚧 Files skipped from review as they are similar to previous changes (8)
  • src/core/metrics/workers/types.ts
  • src/cli/actions/remoteAction.ts
  • tests/cli/actions/remoteAction.test.ts
  • src/core/file/workers/fileProcessWorker.ts
  • src/core/security/validateFileSafety.ts
  • tests/core/security/validateFileSafety.test.ts
  • tests/core/packager.test.ts
  • src/core/packager.ts
🧰 Additional context used
🪛 GitHub Check: codecov/patch
src/core/metrics/workers/fileMetricsWorker.ts

[warning] 25-30: src/core/metrics/workers/fileMetricsWorker.ts#L25-L30
Added lines #L25 - L30 were not covered by tests


[warning] 32-33: src/core/metrics/workers/fileMetricsWorker.ts#L32-L33
Added lines #L32 - L33 were not covered by tests


[warning] 48-52: src/core/metrics/workers/fileMetricsWorker.ts#L48-L52
Added lines #L48 - L52 were not covered by tests

src/core/metrics/calculateAllFileMetrics.ts

[warning] 12-13: src/core/metrics/calculateAllFileMetrics.ts#L12-L13
Added lines #L12 - L13 were not covered by tests


[warning] 15-20: src/core/metrics/calculateAllFileMetrics.ts#L15-L20
Added lines #L15 - L20 were not covered by tests


[warning] 22-23: src/core/metrics/calculateAllFileMetrics.ts#L22-L23
Added lines #L22 - L23 were not covered by tests


[warning] 65-67: src/core/metrics/calculateAllFileMetrics.ts#L65-L67
Added lines #L65 - L67 were not covered by tests

src/core/file/fileCollect.ts

[warning] 10-11: src/core/file/fileCollect.ts#L10-L11
Added lines #L10 - L11 were not covered by tests


[warning] 13-18: src/core/file/fileCollect.ts#L13-L18
Added lines #L13 - L18 were not covered by tests


[warning] 20-20: src/core/file/fileCollect.ts#L20
Added line #L20 was not covered by tests


[warning] 64-65: src/core/file/fileCollect.ts#L64-L65
Added lines #L64 - L65 were not covered by tests

src/core/file/fileProcess.ts

[warning] 11-12: src/core/file/fileProcess.ts#L11-L12
Added lines #L11 - L12 were not covered by tests


[warning] 14-19: src/core/file/fileProcess.ts#L14-L19
Added lines #L14 - L19 were not covered by tests


[warning] 21-22: src/core/file/fileProcess.ts#L21-L22
Added lines #L21 - L22 were not covered by tests


[warning] 64-65: src/core/file/fileProcess.ts#L64-L65
Added lines #L64 - L65 were not covered by tests

src/core/metrics/workers/outputMetricsWorker.ts

[warning] 1-3: src/core/metrics/workers/outputMetricsWorker.ts#L1-L3
Added lines #L1 - L3 were not covered by tests


[warning] 12-12: src/core/metrics/workers/outputMetricsWorker.ts#L12
Added line #L12 was not covered by tests


[warning] 14-19: src/core/metrics/workers/outputMetricsWorker.ts#L14-L19
Added lines #L14 - L19 were not covered by tests


[warning] 21-24: src/core/metrics/workers/outputMetricsWorker.ts#L21-L24
Added lines #L21 - L24 were not covered by tests


[warning] 26-31: src/core/metrics/workers/outputMetricsWorker.ts#L26-L31
Added lines #L26 - L31 were not covered by tests


[warning] 33-34: src/core/metrics/workers/outputMetricsWorker.ts#L33-L34
Added lines #L33 - L34 were not covered by tests

src/core/file/workers/fileCollectWorker.ts

[warning] 56-57: src/core/file/workers/fileCollectWorker.ts#L56-L57
Added lines #L56 - L57 were not covered by tests

src/core/metrics/calculateOutputMetrics.ts

[warning] 9-10: src/core/metrics/calculateOutputMetrics.ts#L9-L10
Added lines #L9 - L10 were not covered by tests


[warning] 12-15: src/core/metrics/calculateOutputMetrics.ts#L12-L15
Added lines #L12 - L15 were not covered by tests


[warning] 17-18: src/core/metrics/calculateOutputMetrics.ts#L17-L18
Added lines #L17 - L18 were not covered by tests


[warning] 21-28: src/core/metrics/calculateOutputMetrics.ts#L21-L28
Added lines #L21 - L28 were not covered by tests


[warning] 30-32: src/core/metrics/calculateOutputMetrics.ts#L30-L32
Added lines #L30 - L32 were not covered by tests


[warning] 34-34: src/core/metrics/calculateOutputMetrics.ts#L34
Added line #L34 was not covered by tests


[warning] 36-38: src/core/metrics/calculateOutputMetrics.ts#L36-L38
Added lines #L36 - L38 were not covered by tests


[warning] 40-45: src/core/metrics/calculateOutputMetrics.ts#L40-L45
Added lines #L40 - L45 were not covered by tests

⏰ Context from checks skipped due to timeout of 90000ms (8)
  • GitHub Check: Build and run (windows-latest, 23.x)
  • GitHub Check: Build and run (windows-latest, 22.x)
  • GitHub Check: Build and run (windows-latest, 21.x)
  • GitHub Check: Build and run (windows-latest, 20.x)
  • GitHub Check: Build and run (windows-latest, 19.x)
  • GitHub Check: Build and run (windows-latest, 18.x)
  • GitHub Check: Build and run (windows-latest, 18.0.0)
  • GitHub Check: Cloudflare Pages
🔇 Additional comments (18)
src/core/metrics/workers/fileMetricsWorker.ts (3)

1-12: LGTM! Well-structured imports and interface definition.

The imports are properly typed and the FileMetricsTask interface is well-defined with clear properties.


24-33: Add test coverage for the main worker function.

The static analysis indicates that this section lacks test coverage. Consider adding tests for:

  • Successful metric calculation
  • Performance timing measurement
  • Error scenarios

Would you like me to help generate test cases for this function?

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 25-30: src/core/metrics/workers/fileMetricsWorker.ts#L25-L30
Added lines #L25 - L30 were not covered by tests


[warning] 32-33: src/core/metrics/workers/fileMetricsWorker.ts#L32-L33
Added lines #L32 - L33 were not covered by tests


1-52: Great implementation of the worker thread functionality! 🚀

The implementation aligns perfectly with the PR's performance optimization goals, as evidenced by the impressive improvements in processing times across various repositories. The code is well-structured, properly typed, and includes essential features like resource management and performance measurements.

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 25-30: src/core/metrics/workers/fileMetricsWorker.ts#L25-L30
Added lines #L25 - L30 were not covered by tests


[warning] 32-33: src/core/metrics/workers/fileMetricsWorker.ts#L32-L33
Added lines #L32 - L33 were not covered by tests


[warning] 48-52: src/core/metrics/workers/fileMetricsWorker.ts#L48-L52
Added lines #L48 - L52 were not covered by tests

tests/integration-tests/packager.test.ts (2)

8-19: LGTM! Well-organized imports.

The new imports align well with the PR's objective of implementing worker-based processing for performance optimization.


85-90: Verify mock security validation behavior.

The current mock implementation of validateFileSafety returns an empty array without performing any validation. While this might be intentional for testing, it's important to ensure this doesn't mask potential security issues.

Consider adding test cases that verify the security validation behavior:

  1. Add test cases with known unsafe files
  2. Verify that the actual implementation properly filters these files
  3. Document why the mock returns an empty array in this test
✅ Verification successful

Mock implementation is appropriate, security validation is properly tested

The empty array mock in the integration test is intentional and correct. Security validation is thoroughly tested in dedicated test files:

  • tests/core/security/validateFileSafety.test.ts verifies the main validation flow
  • tests/core/security/filterOutUntrustedFiles.test.ts covers both suspicious and clean file scenarios
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for actual security validation tests
rg -l "validateFileSafety.*test" --type ts

Length of output: 44


Script:

#!/bin/bash
# Find validateFileSafety implementation and related files
rg -l "validateFileSafety" --type ts

# Find filterOutUntrustedFiles implementation
rg -l "filterOutUntrustedFiles" --type ts

# Look for security-related test files
rg -l "security.*test|test.*security" --type ts

# Search for actual implementation of security validation
ast-grep --pattern 'function validateFileSafety($$$) { $$$ }'

Length of output: 646


Script:

#!/bin/bash
# Check content of security validation test files
echo "=== validateFileSafety.test.ts ==="
cat tests/core/security/validateFileSafety.test.ts

echo -e "\n=== filterOutUntrustedFiles.test.ts ==="
cat tests/core/security/filterOutUntrustedFiles.test.ts

Length of output: 3377

src/shared/processConcurrency.ts (2)

3-5: LGTM! Efficient CPU core detection.

The implementation correctly uses the modern availableParallelism() API with a fallback to cpus().length.


7-24: Consider adding upper bound validation for numOfTasks.

While the worker thread calculation logic is sound, extremely large numOfTasks values could cause integer overflow in the division operation.

Add input validation:

 export const getWorkerThreadCount = (numOfTasks: number): { minThreads: number; maxThreads: number } => {
+  if (!Number.isFinite(numOfTasks) || numOfTasks <= 0) {
+    throw new Error('numOfTasks must be a positive finite number');
+  }
   const processConcurrency = getProcessConcurrency();
✅ Verification successful

The suggested input validation is valuable and should be implemented

While integer overflow isn't a practical concern due to JavaScript's number handling, the validation is important because:

  • It prevents negative numbers and zero which could cause worker pool initialization issues
  • It handles edge cases like NaN and Infinity that could impact system stability
  • It provides clear error messages for debugging
🏁 Scripts executed

The following scripts were executed for the analysis:

Error: No shell scripts found


Script:

#!/bin/bash
# Search for calls to getWorkerThreadCount to understand usage patterns
rg "getWorkerThreadCount\(" -A 2

# Search for tests to understand expected behavior
fd "test" -e ts -e js --exec rg -l "getWorkerThreadCount"

Length of output: 1174

src/core/metrics/workers/outputMetricsWorker.ts (2)

21-34: Consider handling token counting errors.

While counting tokens is usually straightforward, malformed encoding strings or unexpectedly large content might throw errors.

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 21-24: src/core/metrics/workers/outputMetricsWorker.ts#L21-L24
Added lines #L21 - L24 were not covered by tests


[warning] 26-31: src/core/metrics/workers/outputMetricsWorker.ts#L26-L31
Added lines #L26 - L31 were not covered by tests


[warning] 33-34: src/core/metrics/workers/outputMetricsWorker.ts#L33-L34
Added lines #L33 - L34 were not covered by tests


36-42: LGTM! Proper resource cleanup.

The cleanup implementation correctly frees resources when the worker is terminated.

tests/core/metrics/calculateAllFileMetrics.test.ts (1)

14-18: LGTM! Clean mock implementation.

The mock task runner correctly simulates the worker thread behavior for testing.

src/core/metrics/calculateOutputMetrics.ts (2)

8-18: Consider adding worker pool cleanup.

The worker pool should be cleaned up to prevent resource leaks.

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 9-10: src/core/metrics/calculateOutputMetrics.ts#L9-L10
Added lines #L9 - L10 were not covered by tests


[warning] 12-15: src/core/metrics/calculateOutputMetrics.ts#L12-L15
Added lines #L12 - L15 were not covered by tests


[warning] 17-18: src/core/metrics/calculateOutputMetrics.ts#L17-L18
Added lines #L17 - L18 were not covered by tests


20-45: LGTM! Good error handling and performance tracking.

The implementation includes:

  • Proper error propagation
  • Detailed logging
  • Performance measurement
🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 21-28: src/core/metrics/calculateOutputMetrics.ts#L21-L28
Added lines #L21 - L28 were not covered by tests


[warning] 30-32: src/core/metrics/calculateOutputMetrics.ts#L30-L32
Added lines #L30 - L32 were not covered by tests


[warning] 34-34: src/core/metrics/calculateOutputMetrics.ts#L34
Added line #L34 was not covered by tests


[warning] 36-38: src/core/metrics/calculateOutputMetrics.ts#L36-L38
Added lines #L36 - L38 were not covered by tests


[warning] 40-45: src/core/metrics/calculateOutputMetrics.ts#L40-L45
Added lines #L40 - L45 were not covered by tests

src/core/metrics/calculateMetrics.ts (1)

27-30: LGTM! Efficient concurrent execution.

Good use of Promise.all to parallelize file metrics and output metrics calculation.

src/core/security/workers/securityCheckWorker.ts (1)

6-9: LGTM! Well-structured interface for the security check task.

The SecurityCheckTask interface clearly defines the required input for the worker thread.

src/core/file/workers/fileCollectWorker.ts (1)

46-49: Add test coverage for binary file detection.

The binary file detection logic lacks test coverage. Add test cases for both file path and content-based binary detection.

Also applies to: 55-58

src/core/metrics/calculateAllFileMetrics.ts (1)

49-57: Consider implementing batch processing for large repositories.

Similar to the security check implementation, consider adding batch processing to better manage memory usage when calculating metrics for large repositories.

tests/core/file/fileProcess.test.ts (1)

11-15: Enhance test coverage for concurrent file processing.

The current mockInitTaskRunner processes tasks sequentially. Consider adding tests that verify concurrent file processing behavior.

tests/core/metrics/calculateMetrics.test.ts (1)

50-53: Add test cases for error scenarios and edge cases.

While the happy path is tested, consider adding test cases for:

  • Worker initialization failures
  • Edge cases with empty file lists
  • Error propagation from worker tasks

Would you like me to help generate additional test cases?

src/core/metrics/workers/fileMetricsWorker.ts Show resolved Hide resolved
src/core/metrics/calculateAllFileMetrics.ts Show resolved Hide resolved
src/core/metrics/calculateAllFileMetrics.ts Outdated Show resolved Hide resolved
src/core/file/fileCollect.ts Outdated Show resolved Hide resolved
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🧹 Nitpick comments (12)
tests/shared/processConcurrency.test.ts (1)

20-59: Consider adding edge cases for getWorkerThreadCount.

While the current test cases cover the basic scenarios well, consider adding tests for:

  • Negative task counts
  • Decimal/float task counts
  • Very large numbers (Number.MAX_SAFE_INTEGER)
+    it('should handle negative task counts', () => {
+      const { minThreads, maxThreads } = getWorkerThreadCount(-10);
+      expect(minThreads).toBe(1);
+      expect(maxThreads).toBe(1);
+    });
+
+    it('should handle decimal task counts', () => {
+      const { minThreads, maxThreads } = getWorkerThreadCount(150.75);
+      expect(minThreads).toBe(1);
+      expect(maxThreads).toBe(2);
+    });
+
+    it('should handle very large task counts', () => {
+      const { minThreads, maxThreads } = getWorkerThreadCount(Number.MAX_SAFE_INTEGER);
+      expect(minThreads).toBe(1);
+      expect(maxThreads).toBe(8); // Limited by CPU count
+    });
tests/core/metrics/calculateOutputMetrics.test.ts (2)

9-13: Consider testing task runner initialization failure.

The mock task runner could be enhanced to test initialization failures.

+const mockFailedInitTaskRunner = () => {
+  throw new Error('Failed to initialize task runner');
+};
+
+it('should handle task runner initialization failure', async () => {
+  const content = 'test content';
+  const encoding = 'o200k_base';
+
+  await expect(
+    calculateOutputMetrics(content, encoding, undefined, {
+      initTaskRunner: mockFailedInitTaskRunner,
+    }),
+  ).rejects.toThrow('Failed to initialize task runner');
+});

70-80: Enhance complex content test assertions.

The current test only verifies that the result is a positive number. Consider adding more specific assertions.

   it('should work with longer complex content', async () => {
     const content = 'This is a longer test content with multiple sentences. It should work correctly.';
     const encoding = 'o200k_base';
 
     const result = await calculateOutputMetrics(content, encoding, undefined, {
       initTaskRunner: mockInitTaskRunner,
     });
 
     expect(result).toBeGreaterThan(0);
     expect(typeof result).toBe('number');
+    // Add specific token count assertion
+    expect(result).toBe(16); // Verify exact token count
+    // Test that spaces don't affect token count
+    const contentWithExtraSpaces = '  ' + content + '   ';
+    const resultWithSpaces = await calculateOutputMetrics(contentWithExtraSpaces, encoding, undefined, {
+      initTaskRunner: mockInitTaskRunner,
+    });
+    expect(resultWithSpaces).toBe(result);
   });
src/core/metrics/calculateOutputMetrics.ts (1)

6-9: Consider optimizing thread count and adding pool cleanup.

The worker pool is initialized with a single thread which might be a bottleneck for large repositories. Additionally, there's no cleanup mechanism for the worker pool.

 const initTaskRunner = () => {
   const pool = initPiscina(1, new URL('./workers/outputMetricsWorker.js', import.meta.url).href);
-  return (task: OutputMetricsTask) => pool.run(task);
+  return {
+    runTask: (task: OutputMetricsTask) => pool.run(task),
+    cleanup: async () => {
+      await pool.destroy();
+      logger.trace('Output metrics worker pool destroyed');
+    }
+  };
 };
src/core/security/workers/securityCheckWorker.ts (3)

6-28: Consider adding specific error type handling.

While the error handling is good, it could be improved by handling specific error types differently.

 } catch (error) {
-    logger.error(`Error checking security on ${filePath}:`, error);
-    throw error;
+    if (error instanceof SyntaxError) {
+      logger.warn(`Syntax error in ${filePath}:`, error);
+      return null;
+    } else {
+      logger.error(`Error checking security on ${filePath}:`, error);
+      throw error;
+    }
 }

30-54: Consider structured logging for better analysis.

The logging implementation could be improved to facilitate log parsing and analysis.

-    logger.trace(`Found ${result.messages.length} issues in ${filePath}`);
-    logger.trace(result.messages.map((message) => `  - ${message.message}`).join('\n'));
+    logger.trace({
+      event: 'security_check_issues',
+      file: filePath,
+      issueCount: result.messages.length,
+      issues: result.messages.map((message) => message.message)
+    });

56-63: Consider adding configuration customization.

The SecretLint configuration is currently static. Consider allowing rule customization through configuration options.

-export const createSecretLintConfig = (): SecretLintCoreConfig => ({
+export const createSecretLintConfig = (
+  options: { customRules?: Array<{ id: string; rule: any }> } = {}
+): SecretLintCoreConfig => ({
   rules: [
     {
       id: '@secretlint/secretlint-rule-preset-recommend',
       rule: creator,
-    },
+    },
+    ...(options.customRules || []),
   ],
 });
src/core/file/fileProcess.ts (1)

31-55: Consider implementing batch processing for very large repositories.

While Promise.all works well, processing all files at once might consume too much memory for very large repositories.

+const processBatch = async (tasks: FileProcessTask[], runTask: any, batchSize: number = 100) => {
+  const results = [];
+  for (let i = 0; i < tasks.length; i += batchSize) {
+    const batch = tasks.slice(i, i + batchSize);
+    const batchResults = await Promise.all(batch.map(task => runTask(task)));
+    results.push(...batchResults);
+  }
+  return results;
+};

 try {
   const startTime = process.hrtime.bigint();
   logger.trace(`Starting file processing for ${rawFiles.length} files using worker pool`);

   let completedTasks = 0;
   const totalTasks = tasks.length;

-  const results = await Promise.all(
-    tasks.map((task) =>
+  const results = await processBatch(
+    tasks,
+    (task: FileProcessTask) =>
       runTask(task).then((result) => {
         completedTasks++;
         progressCallback(`Processing file... (${completedTasks}/${totalTasks}) ${pc.dim(task.rawFile.path)}`);
         return result;
       }),
-    ),
   );
src/core/security/securityCheck.ts (1)

25-31: Consider batching tasks for better performance.

For large repositories, creating individual tasks for each file might lead to overhead. Consider batching files to reduce the number of worker tasks.

Example implementation:

-  const tasks = rawFiles.map(
-    (file) =>
-      ({
-        filePath: file.path,
-        content: file.content,
-      }) satisfies SecurityCheckTask,
-  );
+  const BATCH_SIZE = 100;
+  const tasks = [];
+  for (let i = 0; i < rawFiles.length; i += BATCH_SIZE) {
+    const batch = rawFiles.slice(i, i + BATCH_SIZE);
+    tasks.push({
+      files: batch.map(file => ({
+        filePath: file.path,
+        content: file.content,
+      })),
+    } satisfies SecurityCheckTask);
+  }
src/core/metrics/calculateAllFileMetrics.ts (2)

24-31: Consider adding file type validation.

The task creation assumes all files can be processed for metrics. Consider validating file types to avoid unnecessary processing.

+const isProcessableFile = (file: ProcessedFile) => {
+  const PROCESSABLE_EXTENSIONS = ['.js', '.ts', '.jsx', '.tsx', '.py', '.java'];
+  return PROCESSABLE_EXTENSIONS.some(ext => file.path.endsWith(ext));
+};

 const tasks = processedFiles
+  .filter(isProcessableFile)
   .map(
     (file, index) =>
       ({
         file,
         index,
         totalFiles: processedFiles.length,
         encoding: tokenCounterEncoding,
       }) satisfies FileMetricsTask,
   );

38-47: Consider implementing task prioritization.

For large repositories, prioritizing certain file types or smaller files could improve perceived performance.

+const prioritizeTasks = (tasks: FileMetricsTask[]): FileMetricsTask[] => {
+  return [...tasks].sort((a, b) => {
+    // Prioritize smaller files
+    return a.file.content.length - b.file.content.length;
+  });
+};

-    const results = await Promise.all(
+    const results = await Promise.all(
+      prioritizeTasks(tasks).map((task) =>
-      tasks.map((task) =>
         runTask(task).then((result) => {
           completedTasks++;
           progressCallback(`Calculating metrics... (${completedTasks}/${task.totalFiles}) ${pc.dim(task.file.path)}`);
           return result;
         }),
       ),
     );
tests/core/security/securityCheck.test.ts (1)

13-24: Add more edge cases to test data.

The mock files could include more diverse scenarios to ensure robust testing.

 const mockFiles: RawFile[] = [
   {
     path: 'test1.js',
     // secretlint-disable
     content: 'URL: https://user:pass@example.com', // Clear security issue
     // secretlint-enable
   },
   {
     path: 'test2.js',
     content: 'console.log("Hello World");', // No secrets
   },
+  {
+    path: 'test3.js',
+    content: '', // Empty file
+  },
+  {
+    path: 'test4.js',
+    content: Buffer.from('binary data').toString(), // Binary-like content
+  },
+  {
+    path: 'test5.js',
+    content: 'a'.repeat(1024 * 1024), // Large file
+  },
 ];
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ee46a4f and fc20dde.

📒 Files selected for processing (19)
  • src/config/configLoad.ts (0 hunks)
  • src/core/file/fileCollect.ts (1 hunks)
  • src/core/file/fileProcess.ts (1 hunks)
  • src/core/metrics/calculateAllFileMetrics.ts (1 hunks)
  • src/core/metrics/calculateOutputMetrics.ts (1 hunks)
  • src/core/security/securityCheck.ts (1 hunks)
  • src/core/security/workers/securityCheckWorker.ts (1 hunks)
  • src/shared/processConcurrency.ts (1 hunks)
  • tests/cli/actions/remoteAction.test.ts (2 hunks)
  • tests/cli/cliPrint.test.ts (1 hunks)
  • tests/config/configSchema.test.ts (1 hunks)
  • tests/core/metrics/calculateAllFileMetrics.test.ts (1 hunks)
  • tests/core/metrics/calculateOutputMetrics.test.ts (1 hunks)
  • tests/core/packager/copyToClipboardIfEnabled.test.ts (0 hunks)
  • tests/core/security/securityCheck.test.ts (1 hunks)
  • tests/core/security/workers/securityCheckWorker.test.ts (1 hunks)
  • tests/shared/processConcurrency.test.ts (1 hunks)
  • website/compose.yml (1 hunks)
  • website/server/src/utils/cache.ts (1 hunks)
💤 Files with no reviewable changes (2)
  • src/config/configLoad.ts
  • tests/core/packager/copyToClipboardIfEnabled.test.ts
✅ Files skipped from review due to trivial changes (4)
  • website/compose.yml
  • tests/cli/cliPrint.test.ts
  • tests/config/configSchema.test.ts
  • website/server/src/utils/cache.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/cli/actions/remoteAction.test.ts
🧰 Additional context used
🪛 Gitleaks (8.21.2)
tests/core/security/workers/securityCheckWorker.test.ts

24-24: Identified a Slack Bot token, which may compromise bot integrations and communication channel security.

(slack-bot-token)


16-16: Uncovered a GitHub Personal Access Token, potentially leading to unauthorized repository access and sensitive content exposure.

(github-pat)


23-23: Identified a Slack Legacy Workspace token, potentially compromising access to workspace data and legacy features.

(slack-legacy-workspace-token)


29-43: Identified a Private Key, which may compromise cryptographic security and sensitive data encryption.

(private-key)

⏰ Context from checks skipped due to timeout of 90000ms (8)
  • GitHub Check: Test (macos-latest, 18.x)
  • GitHub Check: Test (windows-latest, 23.x)
  • GitHub Check: Test (windows-latest, 22.x)
  • GitHub Check: Test (windows-latest, 21.x)
  • GitHub Check: Test (windows-latest, 20.x)
  • GitHub Check: Test (windows-latest, 19.x)
  • GitHub Check: Analyze (javascript-typescript)
  • GitHub Check: Cloudflare Pages
🔇 Additional comments (10)
tests/shared/processConcurrency.test.ts (1)

9-17: LGTM! Good test coverage for getProcessConcurrency.

The test effectively verifies the behavior when os.availableParallelism is available.

tests/core/security/workers/securityCheckWorker.test.ts (1)

51-76: LGTM! Good negative test case.

The test for normal content helps ensure no false positives.

src/core/metrics/calculateOutputMetrics.ts (2)

11-18: Well-structured implementation with good practices!

The function follows good practices:

  • Clean separation of concerns
  • Dependency injection for better testability
  • Optional parameters with sensible defaults

21-36: Excellent error handling and performance logging!

The implementation includes:

  • High-resolution performance measurements
  • Detailed error logging with context
  • Proper error propagation
src/core/file/fileProcess.ts (2)

9-12: Add worker pool cleanup mechanism.

The worker pool should be cleaned up after task completion to prevent resource leaks.


22-28: Excellent type-safe implementation!

The task mapping implementation is clean and type-safe, using the TypeScript satisfies operator effectively.

src/core/file/fileCollect.ts (3)

8-11: Add worker pool cleanup mechanism.

The worker pool should be cleaned up after task completion to prevent resource leaks.


13-20: Well-structured function signature with good defaults!

The implementation follows best practices:

  • Sensible default for progressCallback
  • Clean dependency injection pattern

31-55: Consider implementing batch processing for very large repositories.

Similar to fileProcess.ts, consider implementing batch processing for better memory management with large repositories.

tests/core/metrics/calculateAllFileMetrics.test.ts (1)

12-16: Enhance test coverage with more scenarios.

The mock implementation is basic. Consider adding tests for:

  • Worker initialization failures
  • Task execution errors
  • Edge cases (empty files, large files)
✅ Verification successful

The review comment is valid - test coverage needs improvement

The implementation includes error handling and worker pool management that should be tested. The current mock bypasses important mechanisms. Specific improvements needed:

  • Test worker pool initialization failures
  • Test error propagation from the worker pool
  • Test progress callback accuracy
  • Test cleanup of worker pool resources
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check current test coverage
rg -l "calculateAllFileMetrics.*test" tests/
# Look for existing error handling tests
rg "test.*error.*calculateAllFileMetrics" tests/

Length of output: 97


Script:

#!/bin/bash
# Find the test file and implementation
fd calculateAllFileMetrics
# Check test contents
rg "describe|it\(" -A 2 tests/core/metrics/calculateAllFileMetrics.test.ts
# Look for worker-related error handling
rg "catch|throw|error" tests/core/metrics/calculateAllFileMetrics.test.ts

Length of output: 706


Script:

#!/bin/bash
# Check implementation for error handling
cat src/core/metrics/calculateAllFileMetrics.ts
# Look for worker implementation
fd fileMetricsWorker

Length of output: 2097

src/shared/processConcurrency.ts Show resolved Hide resolved
src/shared/processConcurrency.ts Show resolved Hide resolved
src/core/security/securityCheck.ts Show resolved Hide resolved
src/core/security/securityCheck.ts Show resolved Hide resolved
tests/core/security/securityCheck.test.ts Show resolved Hide resolved
@yamadashy yamadashy merged commit c444cfd into main Jan 25, 2025
54 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant