-
-
Notifications
You must be signed in to change notification settings - Fork 364
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(pack): Performance Optimization for Large Repositories #307
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #307 +/- ##
==========================================
- Coverage 92.17% 90.65% -1.52%
==========================================
Files 44 48 +4
Lines 2248 2450 +202
Branches 493 512 +19
==========================================
+ Hits 2072 2221 +149
- Misses 176 229 +53 ☔ View full report in Codecov by Sentry. |
📝 WalkthroughWalkthroughThis pull request introduces a comprehensive refactoring of the codebase to implement concurrent processing using the Key modifications include adding the The changes affect core functionality in file handling, metrics calculation, and security checks, with a consistent pattern of introducing task runners, worker threads, and progress tracking mechanisms across different modules. Sequence DiagramsequenceDiagram
participant Client
participant Packager
participant TaskRunner
participant Workers
Client->>Packager: Initiate pack
Packager->>TaskRunner: Initialize worker pool
TaskRunner-->>Packager: Worker pool ready
Packager->>TaskRunner: Collect files
TaskRunner->>Workers: Process file collection tasks
Workers-->>TaskRunner: File collection results
Packager->>TaskRunner: Process files
TaskRunner->>Workers: Process file processing tasks
Workers-->>TaskRunner: Processed files
Packager->>TaskRunner: Calculate metrics
TaskRunner->>Workers: Calculate file metrics tasks
Workers-->>TaskRunner: File metrics
Packager->>TaskRunner: Run security checks
TaskRunner->>Workers: Security check tasks
Workers-->>TaskRunner: Security check results
TaskRunner-->>Packager: Aggregated results
Packager-->>Client: Final output
Possibly related PRs
✨ Finishing Touches
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
🧹 Nitpick comments (41)
src/core/metrics/calculateOutputMetrics.ts (2)
1-7
: Unused imports warning
path
andfileURLToPath
appear to be unused in this file. Consider removing them if they are no longer needed to keep the codebase clean.
20-29
: Enhance thedeps
parameter with a typed interface
It’s generally good practice to define an explicit interface fordeps
. This improves readability and discoverability, and ensures that all required dependencies are present.src/core/metrics/calculateMetrics.ts (2)
20-23
: Document thedeps
parameter
Thedeps
parameter is injected as an object. Provide inline documentation (docstring or comment) describing each of the dependencies (e.g.,calculateAllFileMetrics
,calculateOutputMetrics
) so that future maintainers know how and why they’re being used.
35-40
: Check memory usage for large file sets
Looping through all fileMetrics to compute character/token counts is straightforward. However, for extremely large projects, consider streaming or incremental aggregation to avoid potential memory spikes.src/core/file/fileProcess.ts (3)
8-9
: Clearly document the new type
FileProcessTask
is a critical type for worker communication. Consider adding JSDoc or TSDoc so that maintainers can quickly understand each property.
10-22
: Optionally extend concurrency
The worker pool is configured based on the number of tasks, which is a nice improvement. If certain tasks vary heavily in complexity or size, consider adding optional heuristics to optimize the pool size further.
28-30
: Typeddeps
object
Similar to earlier feedback in the metrics file, define a consistent interface fordeps
to ensure clarity and enforcements.src/core/file/fileCollect.ts (2)
9-20
: Initialize the worker pool conditionally for zero tasks.If
numOfTasks
can be zero due to no files found, consider handling that scenario gracefully (e.g., immediately returning a no-op runner). While this situation may be rare, it helps guard against edge cases.const initTaskRunner = (numOfTasks: number) => { + if (numOfTasks === 0) { + return async () => null; + } const { minThreads, maxThreads } = getWorkerThreadCount(numOfTasks); ... }
40-56
: Consider usingPromise.allSettled
.Relying on
Promise.all
means that if one file collection task fails, it rejects the entire collection process. Depending on the desired behavior, you might want to collect partial results. If partial success is acceptable, considerPromise.allSettled
and filtering out failed tasks for more robust error handling.src/core/security/securityCheck.ts (2)
16-28
: Guard for unexpected zero tasks.Similar to
collectFiles
, ifnumOfTasks
can be zero for edge scenarios, you could short-circuit the concurrency or return a no-op runner for clarity.
46-50
: Improve partial failure handling.Using
Promise.all
will cause a single failure (e.g., a worker throwing an exception for a single file) to halt the entire security check. If partial results are acceptable, considerPromise.allSettled
to capture both successes and failures.src/core/metrics/calculateAllFileMetrics.ts (1)
16-28
: Conditional initialization for minimal tasks.As with the other modules, consider gracefully handling the scenario where
numOfTasks
is zero to avoid creating idle worker threads unnecessarily.tests/integration-tests/packager.test.ts (2)
70-76
: Leverage progress callback usage in collectFiles.
You’ve introduced an inline wrapper to injectinitTaskRunner
. Consider passing additional metadata via theprogressCallback
, such as partial result counts or timing info, for improved visibility into large repository processing.
93-113
: Metrics calculation is on track.
Storing file-specific stats infileCharCounts
andfileTokenCounts
is helpful. Consider adding aggregated or average metrics as well, especially if used to track performance improvements in large repositories.src/shared/processConcurrency.ts (2)
6-8
: Validate CPU count fallback.
os.cpus()
might return an empty array in rare containerized or restricted environments. As a safeguard, consider falling back to 1 forgetProcessConcurrency
whenos.cpus().length
is 0.
13-24
: Reevaluate max thread constraints for large tasks.
Limiting threads toMath.floor(numOfTasks / 100)
could be beneficial in typical workloads, yet might underutilize multi-core systems when tasks exceed a few thousand. Consider making this ratio configurable to better handle diverse large-repository scenarios.src/core/tokenCount/tokenCount.ts (1)
35-37
: Expose encoding retrieval for better introspection.
Providing agetEncoding()
method is convenient for debugging or analyzing token usage. Consider also returning additional config info (if relevant) to aid in advanced debugging.src/core/security/validateFileSafety.ts (1)
14-14
: Leverage dependency injection effectively.Accepting
runSecurityCheck
viadeps
is a good approach to enable simpler testing and future extension. Consider adding a docstring or inline comment clarifying why dependency injection is important here (e.g., for test mocks or swapping in different security check implementations).tests/core/metrics/calculateAllFileMetrics.test.ts (1)
28-30
: Ensure coverage of new parameters.You are now passing an options object containing
{ initTaskRunner }
tocalculateAllFileMetrics
. Double-check that your tests confirm behavior ifinitTaskRunner
is undefined or fails. This helps solidify the code's resilience.src/core/metrics/workers/outputMetricsWorker.ts (1)
7-11
: Interface design clarity.Defining
OutputMetricsTask
makes the worker’s contract explicit. If you expect optional or additional fields in the future (e.g.,fileSize
orfileType
), consider a more flexible interface or referencing a shared global interface.src/core/file/workers/fileProcessWorker.ts (2)
11-17
: Name the default export for easier debugging.
Currently, the default export is an anonymous async function. Explicitly naming it can improve stack traces and debugging clarity.-export default async ({ rawFile, config }: FileProcessTask): Promise<ProcessedFile> => { +export default async function processFileTask({ rawFile, config }: FileProcessTask): Promise<ProcessedFile> {
36-41
: Be mindful of large file line numbering.
Splitting and rejoining massive content may increase memory usage. Stream-based processing or chunking might be beneficial for extremely large files.src/core/metrics/workers/fileMetricsWorker.ts (1)
30-39
: Log file metrics after calculation.
Currently, metrics are logged at trace level before actually callingcalculateIndividualFileMetrics
. IfcalculateIndividualFileMetrics
throws an error, logs can be misleading. Consider rearranging the logic or adding error handling.src/core/file/workers/fileCollectWorker.ts (2)
35-45
: Suggest partial reads for very large files.
Although rejecting files over 50 MB helps, partial reading can handle edge cases where slightly over-limit files still have relevant data and reduce memory usage.
60-61
: Validate encoding detection and decoding.
jschardet
can guess incorrectly. Consider fallback strategies or user-configurable defaults for ambiguous encodings.src/core/security/workers/securityCheckWorker.ts (2)
40-63
: Extend or customize secretlint rules.
Consider exposing additional custom rules or configuration to detect project-specific secrets for more comprehensive security checks.Would you like me to generate a template for custom secretlint plugin development?
65-72
: Consider dynamic config for secretlint.
Create a dynamic config to conditionally enable/disable certain rules based on environment or project type, improving maintainability in large-scale repositories.tests/core/metrics/calculateMetrics.test.ts (1)
56-56
: Extract hard-coded reference to'o200k_base'
into a constant or configuration.Having
'o200k_base'
directly in the test can obscure its meaning. Consider using a named constant or referencing a configuration source to improve clarity and maintainability.src/cli/actions/remoteAction.ts (1)
50-50
: Ensure robust error handling around default action execution.While the
try/catch
block handles errors, confirm thatrunDefaultAction
properly cleans up any resources or partial results in case of failure, especially for large projects cloned in the temp directory.tests/core/file/fileCollect.test.ts (4)
8-10
: Use single import statement for related items.
Consider combining these imports into a single statement for better readability and to avoid repeatedly referencing the same path.-import { collectFiles } from '../../../src/core/file/fileCollect.js'; -import type { FileCollectTask } from '../../../src/core/file/workers/fileCollectWorker.js'; -import { MAX_FILE_SIZE } from '../../../src/core/file/workers/fileCollectWorker.js'; +import { + collectFiles, + type FileCollectTask, + MAX_FILE_SIZE +} from '../../../src/core/file/workers/fileCollectWorker.js';
20-24
: Ensure resilience in your mock task runner.
The mock task runner is straightforward, but consider adding error handling or logging for failures withinfileCollectWorker
, especially if you anticipate partial failures in concurrent tasks.Would you like help adding error handling logic in your mock or real implementation?
71-73
: Expand testing for partial results.
By skipping the binary file, you are ensuring the rest are processed. Consider adding an assertion that the progress callback or logs reflect the skipped file event.
100-102
: Cover large file edge cases more thoroughly.
Currently, the test checks a file slightly over the limit. It may be worth testing a file exactly at the limit as well.Would you like me to add such a test to catch potential off-by-one scenarios?
tests/core/file/fileProcess.test.ts (7)
3-3
: Document the rationale for re-exporting processFiles.
Importing and immediately re-exporting can sometimes hide the real source. Adding a short docstring or comment can help maintainers quickly understand the relation betweenfileProcess.js
andfileProcessWorker.js
.
5-6
: Check if local solutions suffice before adding external dependencies.
Using afileProcessWorker
is a good approach. Before expanding to more libraries or heavy frameworks, confirm that you can handle concurrency effectively with your existing worker solution.
78-78
: Test scenario with partial whitespace-only lines.
This is good coverage for ignoring comments. Additionally test partial whitespace lines to confirm your removal logic is robust.
95-95
: Add explicit logging for manipulator absence.
Currently, you return the original content. Adding a debug/warning log might help trace the flow when no manipulator is found.+ logger.debug(`No file manipulator found for ${filePath}, returning original content`);
127-127
: Validate combined config scenarios.
If the user toggles multiple features (e.g. removeComments + showLineNumbers), confirm the correct transformations occur in the correct order.
143-143
: Corner case: ensure '1:' label is not repeated multiple times
Some line numbering logic can inadvertently repeat line labels or offset them if the file is empty. Test thoroughly.
159-159
: Keep a handle on performance with large line counts.
You’re testing 100 lines, but real files can have thousands. Consider measuring any performance overhead from line numbering on large inputs.tests/cli/actions/remoteAction.test.ts (1)
11-11
: Document the role of SuspiciousFileResult in remoteAction tests.
You’re importingSuspiciousFileResult
but not using it. Add clarifications or tests that address suspicious files.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
package-lock.json
is excluded by!**/package-lock.json
📒 Files selected for processing (33)
package.json
(1 hunks)src/cli/actions/remoteAction.ts
(2 hunks)src/core/file/fileCollect.ts
(1 hunks)src/core/file/fileProcess.ts
(1 hunks)src/core/file/workers/fileCollectWorker.ts
(1 hunks)src/core/file/workers/fileProcessWorker.ts
(1 hunks)src/core/metrics/aggregateMetrics.ts
(0 hunks)src/core/metrics/calculateAllFileMetrics.ts
(1 hunks)src/core/metrics/calculateIndividualFileMetrics.ts
(0 hunks)src/core/metrics/calculateMetrics.ts
(2 hunks)src/core/metrics/calculateOutputMetrics.ts
(1 hunks)src/core/metrics/workers/fileMetricsWorker.ts
(1 hunks)src/core/metrics/workers/outputMetricsWorker.ts
(1 hunks)src/core/metrics/workers/types.ts
(1 hunks)src/core/packager.ts
(1 hunks)src/core/security/runSecurityCheckIfEnabled.ts
(0 hunks)src/core/security/securityCheck.ts
(1 hunks)src/core/security/validateFileSafety.ts
(1 hunks)src/core/security/workers/securityCheckWorker.ts
(1 hunks)src/core/tokenCount/tokenCount.ts
(2 hunks)src/shared/processConcurrency.ts
(1 hunks)tests/cli/actions/remoteAction.test.ts
(2 hunks)tests/core/file/fileCollect.test.ts
(6 hunks)tests/core/file/fileProcess.test.ts
(9 hunks)tests/core/metrics/aggregateMetrics.test.ts
(0 hunks)tests/core/metrics/calculateAllFileMetrics.test.ts
(1 hunks)tests/core/metrics/calculateIndividualFileMetrics.test.ts
(0 hunks)tests/core/metrics/calculateMetrics.test.ts
(1 hunks)tests/core/packager.test.ts
(1 hunks)tests/core/security/runSecurityCheckIfEnabled.test.ts
(0 hunks)tests/core/security/securityCheck.test.ts
(1 hunks)tests/core/security/validateFileSafety.test.ts
(1 hunks)tests/integration-tests/packager.test.ts
(2 hunks)
💤 Files with no reviewable changes (6)
- tests/core/metrics/aggregateMetrics.test.ts
- src/core/security/runSecurityCheckIfEnabled.ts
- src/core/metrics/calculateIndividualFileMetrics.ts
- src/core/metrics/aggregateMetrics.ts
- tests/core/security/runSecurityCheckIfEnabled.test.ts
- tests/core/metrics/calculateIndividualFileMetrics.test.ts
✅ Files skipped from review due to trivial changes (1)
- src/core/metrics/workers/types.ts
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: Test (windows-latest, 22.x)
- GitHub Check: Analyze (javascript-typescript)
- GitHub Check: Cloudflare Pages
🔇 Additional comments (43)
src/core/metrics/calculateOutputMetrics.ts (1)
30-45
: Validate error handling and side-effects
The try-catch block logs and re-throws errors while ignoring potential partial states. If this function is critical, consider whether you need additional cleanup logic after an error is caught (e.g., destroying or cleaning up worker pools).src/core/metrics/calculateMetrics.ts (3)
5-5
: Confirm usage alignment
ImportingcalculateOutputMetrics
here is consistent with the new concurrency approach. Ensure that this file is the only place using this function so that the concurrency logic is centralized.
27-30
: Parallel execution
UsingPromise.all
to invokecalculateAllFileMetrics
andcalculateOutputMetrics
concurrently is an excellent use of parallelism for performance.
42-48
: Return value clarity
The returned object is clear and self-explanatory. Good job returning separate fields for both aggregated and file-level metrics.src/core/file/fileProcess.ts (3)
5-5
: EnsuregetWorkerThreadCount
handles edge cases
Validate thatgetWorkerThreadCount
gracefully handles cases wherenumOfTasks
is zero or extremely large.
32-40
: Progress callback usage
The approach of updatingprogressCallback
after each file is processed is effective. Ensure that the callback function is performant, especially in large-scale usage where thousands of files are processed—excessive logging or console I/O can degrade performance.
2-2
: Verify removal of pMap
pMap
was previously used for concurrency. Confirm that no other files rely on it. If not, consider removing it from dependencies entirely to reduce bundle size.src/core/file/fileCollect.ts (2)
1-2
: Imports look good.The addition of
picocolors
for formatting andpiscina
for concurrency is consistent with your overall objective of enhancing performance and providing clearer output logs.
62-65
: Re-throwing error ensures transparency.Catching and then rethrowing the error is a good practice to preserve the original stack trace. This approach makes debugging easier and ensures that upstream error-handling logic remains consistent.
src/core/security/securityCheck.ts (1)
67-70
: Appropriate error logging and rethrow.Logging and then rethrowing preserves the error context, which aligns well with debugging and consistent error handling across modules. Good approach!
src/core/metrics/calculateAllFileMetrics.ts (3)
41-49
: Retainingindex
might be beneficial.Storing and passing the
index
of each file is helpful for reporting or debugging, especially if the processing outcome needs to be correlated to the original file list order. This is a good approach.
52-66
: Performance clarity in logs.Logging the
hrtime.bigint()
duration in milliseconds and the number of files processed is helpful for diagnosing potential bottlenecks. This aligns well with your performance optimization goals.
67-75
: Error approach is consistent.Catching an error, logging it, and then rethrowing is consistent with the rest of the codebase. This pattern ensures that metrics calculation failures are transparent and don’t silently fail.
tests/integration-tests/packager.test.ts (2)
26-30
: Ensure concurrency safety in the mock collect file task runner.
It’s great that you’re utilizing a dedicated mock for worker-based file collection. However, ensure that concurrent calls from multiple tests won’t conflict when sharing the same mock instance (if ever used simultaneously).
85-90
: Validating file safety with mock security check.
You’re currently returning an empty array fromrunSecurityCheck
. Make sure your integration tests also cover real security checks to confirm your concurrency logic handles potential security flags properly.src/core/tokenCount/tokenCount.ts (1)
6-9
: Initialize encoding name and encoding properly.
Injecting theencodingName
and storing it is a good design choice. Ensure any external calls referencing the same TokenCounter instance consistently expect the same encoding.src/core/security/validateFileSafety.ts (2)
6-6
: Use consistent import naming for security logic.Renaming imports from
runSecurityCheckIfEnabled
torunSecurityCheck
clarifies the underlying functionality. Good job keeping the naming clear. Ensure that all references throughout the codebase reflect the new import name.
18-23
: Add error handling and logging around security checks.While the conditional call to
runSecurityCheck
is a good step, consider handling exceptions if the security check fails or encounters unexpected data. For example, you could log errors or fallback to skipping suspicious file detection in that scenario to prevent the entire process from crashing.Would you like me to provide a sample snippet demonstrating graceful error handling?
tests/core/metrics/calculateAllFileMetrics.test.ts (3)
1-7
: Improved test boundaries for worker-based metrics.Importing
calculateIndividualFileMetrics
directly fromfileMetricsWorker.js
is a more accurate reflection of the real usage. This reduces the chance of discrepancies between production code and test mocks. Ensure that any utility code needed for advanced scenarios (e.g., file size extremes) is also tested with concurrency.
14-19
: Mock initialization logic for task runners.The
mockInitTaskRunner
effectively simulates worker-based concurrency. Consider verifying how the concurrency scales with increasing file counts (e.g., performance or memory usage) using additional test cases.
33-34
: Token count adjustments in tests.The updated token counts (13 and 50) might imply a more accurate tokenization method. Confirm these thresholds remain correct under actual tokenization rules, especially for boundary cases like large or non-text files.
src/core/metrics/workers/outputMetricsWorker.ts (2)
13-24
: Singleton usage ofTokenCounter
.Caching the
TokenCounter
in a worker-level singleton is efficient for performance. However, ensure your code handles concurrency properly if multiple tasks are processed simultaneously. IfTokenCounter
is stateful, you may need synchronization or separate instances per task.
44-50
: Freeing resources on worker exit.Good practice to free
tokenCounter
resources upon worker exit. Verify that no residual memory or references remain, especially if the worker might be re-instantiated frequently in large-scale tasks.tests/core/security/validateFileSafety.test.ts (2)
24-24
: Mocking the correct function name.Replacing
runSecurityCheckIfEnabled
withrunSecurityCheck
aligns with the updated implementation. This ensures consistency in tests and prevents missing or mismatched mock calls.
30-30
: Validate arguments for the security check.The assertion confirms
runSecurityCheck
is invoked properly. Consider adding a separate test case coveringconfig.security.enableSecurityCheck = false
to ensure the function bypasses the check as intended.src/core/file/workers/fileProcessWorker.ts (2)
1-3
: Consider providing explicit error handling for imports.
If any of these imports fail or the module path changes, it could cause runtime issues without clear error logs.
26-28
: Validate tool output before using manipulator methods.
Ifmanipulator.removeComments
returns undefined or malformatted output, it could break subsequent steps. Consider adding a safety check or fallback.src/core/metrics/workers/fileMetricsWorker.ts (2)
14-16
: Ensure shared token counter is thread-safe.
SincetokenCounter
is stored at the module level, concurrent calls could lead to race conditions if the implementation isn’t thread-safe.
52-58
: Confirm proper cleanup handling.
Attaching anexit
listener ensures that the token counter is freed, but it might not handle forced termination scenarios (e.g.,process.kill
). Verify your approach for unexpected shutdowns.src/core/file/workers/fileCollectWorker.ts (1)
46-49
: Verify binary detection logic.
isBinary
typically checks file extensions or partial content heuristics. Confirm that it’s accurate for different file types (e.g., unusual file extensions or compressed files).src/core/security/workers/securityCheckWorker.ts (1)
14-38
: Add concurrency/error handling.
If many security checks run in parallel, ensure that secretlint usage is safe in multi-threaded contexts. Also, confirm proper re-throws of key security errors so they aren’t silently swallowed.tests/core/metrics/calculateMetrics.test.ts (1)
50-53
: Leverage dependency injection for flexible testing.Introducing a fifth parameter with
calculateAllFileMetrics
andcalculateOutputMetrics
is a smart design choice, making the function more testable and modular.tests/core/security/securityCheck.test.ts (1)
3-3
: Updated import path for worker usage looks correct.The import from the new
securityCheckWorker.js
aligns with the concurrency approach. Good job ensuring the test references the proper worker implementation.tests/core/packager.test.ts (1)
75-75
: Good coverage for progress callback parameter.Adding the
progressCallback
argument tomockDeps.collectFiles
and verifying it here improve visibility into the new progress-tracking logic.src/cli/actions/remoteAction.ts (1)
22-22
: InjectingrunDefaultAction
into deps enhances testability and consistency.This change improves your dependency injection pattern and keeps the function logic decoupled from direct imports.
tests/core/file/fileCollect.test.ts (2)
123-125
: Confirm that error logging occurs once.
You successfully test that the result array is empty on a file read error, but also ensure that you log only once for repeated read failures if that situation arises.
50-52
: Verify that progressCallback is properly tested.
You're passing an empty function (() => {}
) as the progress callback. It might be beneficial to verify that the callback is invoked or not invoked as expected, ensuring accurate coverage for progress reporting logic.tests/core/file/fileProcess.test.ts (3)
36-38
: Include separate test for progressCallback usage.
Similar to file collection, ensure that the progress callback is being utilized inprocessFiles
and that it’s tested thoroughly.
63-63
: Confirm charset detection fallback.
For advanced usage, consider the scenario where jschardet returns an undefined or low-confidence encoding. Validate that the fallback handles strange charsets correctly.
111-111
: Edge case: multiline content with line numbering.
Ensure each line is sequentially numbered, including any lines that might be entirely whitespace or special characters.tests/cli/actions/remoteAction.test.ts (1)
4-4
: Verify usage of DefaultActionRunnerResult.
Ensure that all fields inDefaultActionRunnerResult
are indeed required for your tests. If some are optional, the test might mischaracterize the data structure.src/core/packager.ts (1)
41-41
: Good use of progressCallback injection.
UpgradingcollectFiles
to accept a callback fosters improved user experience and modular design. The approach is aligned with large-scale concurrency improvements.package.json (1)
79-79
: Verify piscina version and remove redundant p-map dependency.Since p-map is being replaced with piscina for parallel processing, consider removing the p-map dependency to avoid confusion and reduce package size.
Let's verify the piscina version and check for any known vulnerabilities:
Apply this diff to remove the redundant dependency:
- "p-map": "^7.0.3",
✅ Verification successful
Remove unused p-map dependency, piscina version is appropriate
The migration from p-map to piscina is complete as there are no remaining p-map imports in the codebase. The current piscina version (^4.8.0) is secure and stable for production use.
- "p-map": "^7.0.3",
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Check latest version and security advisories for piscina # Get latest version from npm echo "Latest version from npm:" npm view piscina version # Check for known vulnerabilities echo -e "\nVulnerability check:" npm audit piscina@4.8.0 --json || true # Check GitHub Advisory Database echo -e "\nGitHub Advisory Database check:" gh api graphql -f query=' { securityVulnerabilities(first: 5, ecosystem: NPM, package: "piscina") { nodes { advisory { summary severity publishedAt } vulnerableVersionRange firstPatchedVersion { identifier } } } }'Length of output: 1205
Script:
#!/bin/bash # Search for p-map usage in the codebase echo "Files containing p-map imports:" rg "require\(['\"]p-map['\"]|from ['\"]p-map['\"]" -l echo -e "\nDetailed p-map usage:" rg "require\(['\"]p-map['\"]|from ['\"]p-map['\"]" -C 2Length of output: 261
d8cca0b
to
2be1e27
Compare
Deploying repomix with Cloudflare Pages
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
♻️ Duplicate comments (2)
src/core/metrics/calculateOutputMetrics.ts (1)
8-18
:⚠️ Potential issueWorker pool cleanup mechanism is still missing.
The worker pool created by
initTaskRunner
lacks cleanup, which could lead to resource leaks.Apply this diff to add cleanup:
const initTaskRunner = () => { const pool = new Piscina({ filename: new URL('./workers/outputMetricsWorker.js', import.meta.url).href, minThreads: 1, maxThreads: 1, idleTimeout: 5000, }); + const cleanup = async () => { + await pool.destroy(); + }; - return (task: OutputMetricsTask) => pool.run(task); + return { + runTask: (task: OutputMetricsTask) => pool.run(task), + cleanup, + }; };🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 9-10: src/core/metrics/calculateOutputMetrics.ts#L9-L10
Added lines #L9 - L10 were not covered by tests
[warning] 12-15: src/core/metrics/calculateOutputMetrics.ts#L12-L15
Added lines #L12 - L15 were not covered by tests
[warning] 17-18: src/core/metrics/calculateOutputMetrics.ts#L17-L18
Added lines #L17 - L18 were not covered by testssrc/core/metrics/workers/outputMetricsWorker.ts (1)
29-42
:⚠️ Potential issueAdd error handling for token counting.
Token counting operations should be wrapped in try-catch to handle potential errors gracefully.
Apply this diff:
export default async ({ content, encoding, path }: OutputMetricsTask): Promise<number> => { const processStartAt = process.hrtime.bigint(); const counter = getTokenCounter(encoding); - const tokenCount = counter.countTokens(content, path); + let tokenCount: number; + try { + tokenCount = counter.countTokens(content, path); + } catch (error) { + logger.error(`Failed to count tokens for ${path ?? 'unknown file'}:`, error); + return 0; + }🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 29-32: src/core/metrics/workers/outputMetricsWorker.ts#L29-L32
Added lines #L29 - L32 were not covered by tests
🧹 Nitpick comments (18)
src/core/metrics/workers/fileMetricsWorker.ts (3)
14-25
: Consider double-checked locking pattern for thread safety.While the singleton pattern is appropriate for worker-level caching, the current implementation might have a race condition if multiple calls to
getTokenCounter
happen simultaneously.Consider this thread-safe implementation:
let tokenCounter: TokenCounter | null = null; +let initializationInProgress = false; const getTokenCounter = (encoding: TiktokenEncoding): TokenCounter => { if (!tokenCounter) { + if (initializationInProgress) { + // Wait for initialization to complete + while (initializationInProgress) { + // Busy wait or use proper synchronization mechanism + } + return tokenCounter!; + } + initializationInProgress = true; tokenCounter = new TokenCounter(encoding); + initializationInProgress = false; } return tokenCounter; };
41-50
: Add input validation for file content.The function assumes file content is always present and valid.
Consider adding validation:
export const calculateIndividualFileMetrics = async ( file: ProcessedFile, encoding: TiktokenEncoding, ): Promise<FileMetrics> => { + if (!file.content) { + throw new Error(`No content available for file: ${file.path}`); + } + const charCount = file.content.length; const tokenCounter = getTokenCounter(encoding); const tokenCount = tokenCounter.countTokens(file.content, file.path); return { path: file.path, charCount, tokenCount }; };
1-58
: Great implementation aligned with performance goals!The worker implementation effectively supports the PR's performance optimization objectives by:
- Using a singleton pattern for TokenCounter to avoid redundant initialization
- Implementing proper resource cleanup
- Providing clear metrics calculation logic
The significant performance improvements mentioned in the PR objectives (17x-58x speedup) are well-supported by this implementation.
Consider adding metrics for worker pool utilization and memory usage to help tune the number of workers for optimal performance.
🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 31-31: src/core/metrics/workers/fileMetricsWorker.ts#L31
Added line #L31 was not covered by tests
[warning] 33-36: src/core/metrics/workers/fileMetricsWorker.ts#L33-L36
Added lines #L33 - L36 were not covered by tests
[warning] 38-39: src/core/metrics/workers/fileMetricsWorker.ts#L38-L39
Added lines #L38 - L39 were not covered by tests
[warning] 54-58: src/core/metrics/workers/fileMetricsWorker.ts#L54-L58
Added lines #L54 - L58 were not covered by teststests/integration-tests/packager.test.ts (1)
26-31
: Consider making the mock more configurable.
mockCollectFileInitTaskRunner
is straightforward, but allowing parameters or configurable behavior could facilitate advanced testing scenarios.src/shared/processConcurrency.ts (2)
6-8
: Consider reserving a CPU for system overhead.
Returningos.cpus().length
might saturate the system if all cores are in use. For multi-task environments, subtracting one or two cores can prevent resource contention.
10-24
: Handle extreme edge cases for worker thread limits.
While the logic is reasonable, ifnumOfTasks
is very large or extremely small, you may need additional safeguards or custom scaling rules to prevent under- or over-utilization of threads.src/core/security/validateFileSafety.ts (1)
18-23
: Add robust error handling for security checks.
Currently, any failures insiderunSecurityCheck
could go unhandled. Consider try-catch blocks or fallback logic to handle partial or failed scanning gracefully.tests/core/metrics/calculateAllFileMetrics.test.ts (2)
14-18
: Consider expanding task runner test coverage.The mock implementation is minimal and doesn't test error scenarios or worker pool behavior. Consider adding tests for:
- Task runner initialization failures
- Task execution errors
- Concurrent task execution
28-30
: Add more test cases for edge scenarios.The test only covers a basic scenario. Consider adding test cases for:
- Empty files
- Very large files
- Files with special characters
- Error conditions
Also applies to: 33-34
src/core/file/workers/fileCollectWorker.ts (1)
8-10
: Consider making MAX_FILE_SIZE configurable.The file size limit is hardcoded. Consider making it configurable through the application's configuration system.
src/core/security/workers/securityCheckWorker.ts (2)
34-37
: Enhance error logging for better debugging.Consider including more context in the error log by extracting and logging specific error properties.
} catch (error) { - logger.error(`Error checking security on ${filePath}:`, error); + logger.error(`Error checking security on ${filePath}:`, { + message: error.message, + name: error.name, + stack: error.stack, + }); throw error; }
40-44
: Add input validation for file content.Consider validating the input parameters to ensure robustness.
export const runSecretLint = async ( filePath: string, content: string, config: SecretLintCoreConfig, ): Promise<SecretLintCoreResult> => { + if (!content || typeof content !== 'string') { + throw new Error('Invalid content provided for security check'); + } + if (!filePath || typeof filePath !== 'string') { + throw new Error('Invalid file path provided for security check'); + } const result = await lintSource({tests/core/metrics/calculateMetrics.test.ts (1)
50-53
: Improve mock implementation with explicit typing.The mock implementation could benefit from explicit typing for better maintainability.
const result = await calculateMetrics(processedFiles, output, progressCallback, config, { - calculateAllFileMetrics, - calculateOutputMetrics: () => Promise.resolve(30), + calculateAllFileMetrics: calculateAllFileMetrics as jest.Mock<Promise<typeof fileMetrics>>, + calculateOutputMetrics: async () => 30, });src/core/security/securityCheck.ts (3)
16-28
: Consider implementing worker pool cleanup.The worker pool initialization looks good, but there's no explicit cleanup mechanism. While Piscina has an
idleTimeout
, it's good practice to properly terminate the worker pool after all tasks are completed.Consider adding a cleanup function:
const initTaskRunner = (numOfTasks: number) => { const { minThreads, maxThreads } = getWorkerThreadCount(numOfTasks); logger.trace(`Initializing worker pool with min=${minThreads}, max=${maxThreads} threads`); const pool = new Piscina({ filename: new URL('./workers/securityCheckWorker.js', import.meta.url).href, minThreads, maxThreads, idleTimeout: 5000, }); - return (task: SecurityCheckTask) => pool.run(task); + return { + runTask: (task: SecurityCheckTask) => pool.run(task), + cleanup: async () => { + await pool.destroy(); + } + }; };
53-61
: Consider implementing batch processing for large repositories.While
Promise.all
is efficient, processing all files at once might consume too much memory for very large repositories. Consider implementing batch processing to handle such cases more efficiently.Here's a suggested implementation:
- const results = await Promise.all( - tasks.map((task) => - runTask(task).then((result) => { - completedTasks++; - progressCallback(`Running security check... (${completedTasks}/${totalTasks}) ${pc.dim(task.filePath)}`); - return result; - }), - ), - ); + const BATCH_SIZE = 100; + const results = []; + for (let i = 0; i < tasks.length; i += BATCH_SIZE) { + const batch = tasks.slice(i, i + BATCH_SIZE); + const batchResults = await Promise.all( + batch.map((task) => + runTask(task).then((result) => { + completedTasks++; + progressCallback(`Running security check... (${completedTasks}/${totalTasks}) ${pc.dim(task.filePath)}`); + return result; + }), + ), + ); + results.push(...batchResults); + }
68-70
: Enhance error reporting with more context.The current error handling is basic. Consider providing more detailed error information to help with debugging.
} catch (error) { - logger.error('Error during security check:', error); + const errorContext = { + totalFiles: tasks.length, + completedTasks, + lastProcessedFile: tasks[completedTasks - 1]?.filePath, + error: error instanceof Error ? { + name: error.name, + message: error.message, + stack: error.stack + } : error + }; + logger.error('Error during security check:', errorContext); throw error; }src/core/metrics/calculateAllFileMetrics.ts (1)
33-75
: Consider extracting common worker pool patterns.The implementation shares significant patterns with
securityCheck.ts
. Consider extracting these common patterns into a shared utility.Create a new utility file
src/shared/workerPool.ts
:import { Piscina } from 'piscina'; import { logger } from './logger.js'; import { getWorkerThreadCount } from './processConcurrency.js'; export interface WorkerPoolOptions { filename: string; idleTimeout?: number; } export const createWorkerPool = <T, R>( numOfTasks: number, options: WorkerPoolOptions ) => { const { minThreads, maxThreads } = getWorkerThreadCount(numOfTasks); logger.trace(`Initializing worker pool with min=${minThreads}, max=${maxThreads} threads`); const pool = new Piscina({ filename: options.filename, minThreads, maxThreads, idleTimeout: options.idleTimeout ?? 5000, }); return { runTask: (task: T) => pool.run(task) as Promise<R>, cleanup: async () => { await pool.destroy(); } }; };🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 73-75: src/core/metrics/calculateAllFileMetrics.ts#L73-L75
Added lines #L73 - L75 were not covered by teststests/core/file/fileProcess.test.ts (1)
11-15
: Add error handling test cases.The mock task runner should include tests for worker failures and error propagation.
Add test cases for:
- Worker initialization failures
- Task execution errors
- Cleanup failures
it('should handle worker failures gracefully', async () => { const mockError = new Error('Worker failed'); const failingTaskRunner = () => { return async () => { throw mockError; }; }; const mockRawFiles = [{ path: 'file.js', content: 'content' }]; const config = createMockConfig({}); await expect( processFiles(mockRawFiles, config, () => {}, { initTaskRunner: failingTaskRunner, }) ).rejects.toThrow(mockError); });
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
package-lock.json
is excluded by!**/package-lock.json
📒 Files selected for processing (33)
package.json
(1 hunks)src/cli/actions/remoteAction.ts
(2 hunks)src/core/file/fileCollect.ts
(1 hunks)src/core/file/fileProcess.ts
(1 hunks)src/core/file/workers/fileCollectWorker.ts
(1 hunks)src/core/file/workers/fileProcessWorker.ts
(1 hunks)src/core/metrics/aggregateMetrics.ts
(0 hunks)src/core/metrics/calculateAllFileMetrics.ts
(1 hunks)src/core/metrics/calculateIndividualFileMetrics.ts
(0 hunks)src/core/metrics/calculateMetrics.ts
(2 hunks)src/core/metrics/calculateOutputMetrics.ts
(1 hunks)src/core/metrics/workers/fileMetricsWorker.ts
(1 hunks)src/core/metrics/workers/outputMetricsWorker.ts
(1 hunks)src/core/metrics/workers/types.ts
(1 hunks)src/core/packager.ts
(1 hunks)src/core/security/runSecurityCheckIfEnabled.ts
(0 hunks)src/core/security/securityCheck.ts
(1 hunks)src/core/security/validateFileSafety.ts
(1 hunks)src/core/security/workers/securityCheckWorker.ts
(1 hunks)src/core/tokenCount/tokenCount.ts
(2 hunks)src/shared/processConcurrency.ts
(1 hunks)tests/cli/actions/remoteAction.test.ts
(2 hunks)tests/core/file/fileCollect.test.ts
(6 hunks)tests/core/file/fileProcess.test.ts
(9 hunks)tests/core/metrics/aggregateMetrics.test.ts
(0 hunks)tests/core/metrics/calculateAllFileMetrics.test.ts
(1 hunks)tests/core/metrics/calculateIndividualFileMetrics.test.ts
(0 hunks)tests/core/metrics/calculateMetrics.test.ts
(1 hunks)tests/core/packager.test.ts
(1 hunks)tests/core/security/runSecurityCheckIfEnabled.test.ts
(0 hunks)tests/core/security/securityCheck.test.ts
(1 hunks)tests/core/security/validateFileSafety.test.ts
(1 hunks)tests/integration-tests/packager.test.ts
(2 hunks)
💤 Files with no reviewable changes (6)
- src/core/metrics/aggregateMetrics.ts
- tests/core/metrics/aggregateMetrics.test.ts
- src/core/security/runSecurityCheckIfEnabled.ts
- src/core/metrics/calculateIndividualFileMetrics.ts
- tests/core/metrics/calculateIndividualFileMetrics.test.ts
- tests/core/security/runSecurityCheckIfEnabled.test.ts
🚧 Files skipped from review as they are similar to previous changes (9)
- package.json
- tests/core/security/securityCheck.test.ts
- src/core/metrics/workers/types.ts
- tests/cli/actions/remoteAction.test.ts
- src/core/tokenCount/tokenCount.ts
- src/cli/actions/remoteAction.ts
- tests/core/security/validateFileSafety.test.ts
- tests/core/packager.test.ts
- src/core/file/workers/fileProcessWorker.ts
🧰 Additional context used
🪛 GitHub Check: codecov/patch
src/core/metrics/calculateOutputMetrics.ts
[warning] 9-10: src/core/metrics/calculateOutputMetrics.ts#L9-L10
Added lines #L9 - L10 were not covered by tests
[warning] 12-15: src/core/metrics/calculateOutputMetrics.ts#L12-L15
Added lines #L12 - L15 were not covered by tests
[warning] 17-18: src/core/metrics/calculateOutputMetrics.ts#L17-L18
Added lines #L17 - L18 were not covered by tests
[warning] 21-28: src/core/metrics/calculateOutputMetrics.ts#L21-L28
Added lines #L21 - L28 were not covered by tests
[warning] 30-32: src/core/metrics/calculateOutputMetrics.ts#L30-L32
Added lines #L30 - L32 were not covered by tests
[warning] 34-34: src/core/metrics/calculateOutputMetrics.ts#L34
Added line #L34 was not covered by tests
[warning] 36-38: src/core/metrics/calculateOutputMetrics.ts#L36-L38
Added lines #L36 - L38 were not covered by tests
[warning] 40-45: src/core/metrics/calculateOutputMetrics.ts#L40-L45
Added lines #L40 - L45 were not covered by tests
src/core/metrics/calculateAllFileMetrics.ts
[warning] 17-18: src/core/metrics/calculateAllFileMetrics.ts#L17-L18
Added lines #L17 - L18 were not covered by tests
[warning] 20-25: src/core/metrics/calculateAllFileMetrics.ts#L20-L25
Added lines #L20 - L25 were not covered by tests
[warning] 27-28: src/core/metrics/calculateAllFileMetrics.ts#L27-L28
Added lines #L27 - L28 were not covered by tests
[warning] 73-75: src/core/metrics/calculateAllFileMetrics.ts#L73-L75
Added lines #L73 - L75 were not covered by tests
src/core/metrics/workers/outputMetricsWorker.ts
[warning] 1-1: src/core/metrics/workers/outputMetricsWorker.ts#L1
Added line #L1 was not covered by tests
[warning] 4-5: src/core/metrics/workers/outputMetricsWorker.ts#L4-L5
Added lines #L4 - L5 were not covered by tests
[warning] 14-14: src/core/metrics/workers/outputMetricsWorker.ts#L14
Added line #L14 was not covered by tests
[warning] 19-24: src/core/metrics/workers/outputMetricsWorker.ts#L19-L24
Added lines #L19 - L24 were not covered by tests
[warning] 29-32: src/core/metrics/workers/outputMetricsWorker.ts#L29-L32
Added lines #L29 - L32 were not covered by tests
src/core/metrics/workers/fileMetricsWorker.ts
[warning] 31-31: src/core/metrics/workers/fileMetricsWorker.ts#L31
Added line #L31 was not covered by tests
[warning] 33-36: src/core/metrics/workers/fileMetricsWorker.ts#L33-L36
Added lines #L33 - L36 were not covered by tests
[warning] 38-39: src/core/metrics/workers/fileMetricsWorker.ts#L38-L39
Added lines #L38 - L39 were not covered by tests
[warning] 54-58: src/core/metrics/workers/fileMetricsWorker.ts#L54-L58
Added lines #L54 - L58 were not covered by tests
src/core/file/workers/fileCollectWorker.ts
[warning] 56-57: src/core/file/workers/fileCollectWorker.ts#L56-L57
Added lines #L56 - L57 were not covered by tests
src/core/file/fileProcess.ts
[warning] 11-12: src/core/file/fileProcess.ts#L11-L12
Added lines #L11 - L12 were not covered by tests
[warning] 14-19: src/core/file/fileProcess.ts#L14-L19
Added lines #L14 - L19 were not covered by tests
[warning] 21-22: src/core/file/fileProcess.ts#L21-L22
Added lines #L21 - L22 were not covered by tests
[warning] 64-65: src/core/file/fileProcess.ts#L64-L65
Added lines #L64 - L65 were not covered by tests
src/core/file/fileCollect.ts
[warning] 10-11: src/core/file/fileCollect.ts#L10-L11
Added lines #L10 - L11 were not covered by tests
[warning] 13-18: src/core/file/fileCollect.ts#L13-L18
Added lines #L13 - L18 were not covered by tests
[warning] 20-20: src/core/file/fileCollect.ts#L20
Added line #L20 was not covered by tests
[warning] 64-65: src/core/file/fileCollect.ts#L64-L65
Added lines #L64 - L65 were not covered by tests
⏰ Context from checks skipped due to timeout of 90000ms (17)
- GitHub Check: Build and run (macos-latest, 18.x)
- GitHub Check: Build and run (macos-latest, 18.0.0)
- GitHub Check: Test (macos-latest, 22.x)
- GitHub Check: Test (windows-latest, 23.x)
- GitHub Check: Test (windows-latest, 22.x)
- GitHub Check: Build and run (windows-latest, 18.0.0)
- GitHub Check: Test (windows-latest, 21.x)
- GitHub Check: Test (windows-latest, 20.x)
- GitHub Check: Test (windows-latest, 19.x)
- GitHub Check: Test (windows-latest, 18.x)
- GitHub Check: Build and run (ubuntu-latest, 20.x)
- GitHub Check: Test (windows-latest, 18.0.0)
- GitHub Check: Build and run (ubuntu-latest, 19.x)
- GitHub Check: Build and run (ubuntu-latest, 18.x)
- GitHub Check: Build and run (ubuntu-latest, 18.0.0)
- GitHub Check: Analyze (javascript-typescript)
- GitHub Check: Cloudflare Pages
🔇 Additional comments (18)
src/core/metrics/workers/fileMetricsWorker.ts (3)
1-12
: LGTM! Clean imports and well-defined interface.The imports are properly typed and the
FileMetricsTask
interface clearly defines the contract for worker tasks.
31-39
: Add test coverage for the worker function.The worker function lacks test coverage according to static analysis.
Could you add tests for:
- Successful metric calculation
- Performance measurement logging
- Error handling scenarios
Would you like me to help generate the test cases?
🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 31-31: src/core/metrics/workers/fileMetricsWorker.ts#L31
Added line #L31 was not covered by tests
[warning] 33-36: src/core/metrics/workers/fileMetricsWorker.ts#L33-L36
Added lines #L33 - L36 were not covered by tests
[warning] 38-39: src/core/metrics/workers/fileMetricsWorker.ts#L38-L39
Added lines #L38 - L39 were not covered by tests
52-58
: Add test coverage for cleanup handler.The cleanup handler lacks test coverage according to static analysis. This is critical for ensuring proper resource management.
Could you add tests to verify:
- TokenCounter is properly freed
- TokenCounter reference is nullified
- Multiple cleanup calls are handled safely
Would you like me to help generate the test cases?
🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 54-58: src/core/metrics/workers/fileMetricsWorker.ts#L54-L58
Added lines #L54 - L58 were not covered by teststests/integration-tests/packager.test.ts (5)
8-19
: Imports align with the new concurrency modules.
These added imports appear consistent and support the PR's objective of modularizing file collection and processing.
70-76
: Good approach for injective testing.
Overriding thecollectFiles
function with a custom init runner is a clean pattern for integration tests, allowing you to isolate each stage.
77-83
: Parallelize file processing to fully realize performance gains.
This sequential loop may reintroduce a bottleneck in large repositories. Using a worker pool for parallel processing aligns better with your PR’s performance objectives.
84-90
: Ensure comprehensive security test coverage.
Here,validateFileSafety
is partially mocked by returning an empty array. Verify real-world security checks in a separate test to confirm detection logic.
91-114
: Metric calculation logic is concise and performant.
Aggregating file statistics in one pass is a clean approach. Consider potential edge cases (e.g., empty or binary files) and verify that the aggregator handles them gracefully.src/core/security/validateFileSafety.ts (1)
6-6
: Transition fromrunSecurityCheckIfEnabled
is consistent.
Replacing the conditional wrapper with a directrunSecurityCheck
import simplifies the code. Ensure external references are likewise updated.Also applies to: 14-14
src/core/metrics/workers/outputMetricsWorker.ts (1)
44-50
: LGTM: Proper cleanup implementation.The cleanup on worker termination is well implemented, ensuring proper resource management.
src/core/metrics/calculateMetrics.ts (1)
27-30
: LGTM: Excellent performance optimization.The concurrent execution of file metrics and output metrics calculations using Promise.all is a great performance improvement.
src/core/file/workers/fileCollectWorker.ts (1)
31-68
: LGTM: Robust file reading implementation.The implementation includes:
- Proper error handling
- Binary file detection
- Encoding detection
- Detailed logging
- Clear warning messages for large files
🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 56-57: src/core/file/workers/fileCollectWorker.ts#L56-L57
Added lines #L56 - L57 were not covered by testssrc/core/security/workers/securityCheckWorker.ts (1)
1-9
: LGTM! Clean interface definition and appropriate imports.src/core/file/fileProcess.ts (1)
63-65
: Consider partial results for recoverable errors.Your error logging and subsequent re-throw provide visibility into failures, but consider whether partial results should be returned or saved in the event of recoverable errors. This would allow the system to skip problematic files rather than fail the entire batch.
🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 64-65: src/core/file/fileProcess.ts#L64-L65
Added lines #L64 - L65 were not covered by testssrc/core/packager.ts (1)
41-41
: LGTM! Clean implementation of progress tracking.The progress callback integration is well-structured and consistent across all operations.
src/core/file/fileCollect.ts (1)
62-65
: Add test coverage for error scenarios.The error handling block needs test coverage. Add test cases for:
- File read failures
- Worker pool initialization failures
- Edge cases with empty file lists
🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 64-65: src/core/file/fileCollect.ts#L64-L65
Added lines #L64 - L65 were not covered by testssrc/core/metrics/calculateAllFileMetrics.ts (1)
16-28
: Add test coverage for worker pool initialization.The static analysis indicates that the worker pool initialization code is not covered by tests.
Consider adding the following test cases:
- Verify correct thread count calculation
- Verify worker pool creation with correct parameters
- Test error handling during initialization
🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 17-18: src/core/metrics/calculateAllFileMetrics.ts#L17-L18
Added lines #L17 - L18 were not covered by tests
[warning] 20-25: src/core/metrics/calculateAllFileMetrics.ts#L20-L25
Added lines #L20 - L25 were not covered by tests
[warning] 27-28: src/core/metrics/calculateAllFileMetrics.ts#L27-L28
Added lines #L27 - L28 were not covered by teststests/core/file/fileCollect.test.ts (1)
20-24
: Enhance test coverage for concurrent task processing.The mock task runner processes tasks sequentially. Consider adding tests for concurrent processing scenarios.
const mockInitTaskRunner = () => { + const processingTasks = new Set(); return async (task: FileCollectTask) => { + if (processingTasks.size > 0) { + logger.trace(`Processing ${processingTasks.size} concurrent tasks`); + } + processingTasks.add(task); + try { return await fileCollectWorker(task); + } finally { + processingTasks.delete(task); + } }; };
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (1)
.github/workflows/ci.yml (1)
Line range hint
1-140
: Consider workflow optimization for performance improvementsSince this PR focuses on performance optimization, consider these workflow improvements:
- Use dependency caching more effectively
- Implement job dependencies to avoid redundant work
- Consider using matrix includes/excludes for specific version combinations
Would you like me to provide a detailed optimization plan for the workflow?
🧰 Tools
🪛 actionlint (1.7.4)
13-13: label "ubuntu-latest-arm" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file
(runner-label)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
.github/workflows/ci.yml
(9 hunks)
🧰 Additional context used
🪛 actionlint (1.7.4)
.github/workflows/ci.yml
13-13: label "ubuntu-latest-arm" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file
(runner-label)
26-26: label "ubuntu-latest-arm" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file
(runner-label)
39-39: label "ubuntu-latest-arm" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file
(runner-label)
52-52: label "ubuntu-latest-arm" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file
(runner-label)
61-61: label "ubuntu-latest-arm" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file
(runner-label)
72-72: label "ubuntu-latest-arm" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file
(runner-label)
81-81: label "ubuntu-latest-arm" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file
(runner-label)
97-97: label "ubuntu-latest-arm" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file
(runner-label)
123-123: label "ubuntu-latest-arm" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file
(runner-label)
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: Build and run (windows-latest, 22.x)
- GitHub Check: Analyze (javascript-typescript)
- GitHub Check: Cloudflare Pages
🔇 Additional comments (1)
.github/workflows/ci.yml (1)
81-82
: Verify Node.js ARM compatibilityThe test matrix includes Node.js versions up to 23.x. Please ensure all these versions are available and fully compatible with ARM architecture.
✅ Verification successful
Node.js ARM compatibility verified successfully
All Node.js versions specified in the test matrix (18.0.0 through 23.x) provide official ARM64 builds and are fully compatible with the ARM architecture.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Check Node.js ARM compatibility for specified versions # Fetch Node.js release data curl -s https://nodejs.org/dist/index.json | jq -r '.[] | select(.version | startswith("v18") or startswith("v19") or startswith("v20") or startswith("v21") or startswith("v22") or startswith("v23")) | {version: .version, arm64: (.files | any(contains("arm64")))} | select(.arm64 == false) | .version'Length of output: 303
Script:
#!/bin/bash # Fetch Node.js release data and check ARM64 availability echo "Checking Node.js 18.0.0 specifically:" curl -s https://nodejs.org/dist/v18.0.0/SHASUMS256.txt | grep -i 'arm64\|aarch64' || echo "No ARM64 builds found for 18.0.0" echo -e "\nChecking latest versions:" for version in 18 19 20 21 22 23; do latest=$(curl -s https://nodejs.org/dist/index.json | jq -r ".[] | select(.version | startswith(\"v$version\")) | .version" | head -n1) if [ ! -z "$latest" ]; then echo "Version $latest:" curl -s "https://nodejs.org/dist/$latest/SHASUMS256.txt" | grep -i 'arm64\|aarch64' || echo "No ARM64 builds found" fi doneLength of output: 8093
🧰 Tools
🪛 actionlint (1.7.4)
81-81: label "ubuntu-latest-arm" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2022", "windows-2019", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-22.04", "ubuntu-20.04", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "macos-12-xl", "macos-12-xlarge", "macos-12-large", "macos-12", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file
(runner-label)
69d8798
to
37e2a9d
Compare
37e2a9d
to
ebacdd9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
♻️ Duplicate comments (3)
tests/integration-tests/packager.test.ts (1)
77-83
: 🛠️ Refactor suggestionImplement parallel file processing for better performance.
The current implementation processes files sequentially, which could be a performance bottleneck for large repositories. This contradicts the PR's performance optimization goals.
Consider using Promise.all for parallel processing:
processFiles: async (rawFiles, config, progressCallback) => { - const processedFiles: ProcessedFile[] = []; - for (const rawFile of rawFiles) { - processedFiles.push(await fileProcessWorker({ rawFile, config })); - } - return processedFiles; + return Promise.all( + rawFiles.map(rawFile => fileProcessWorker({ rawFile, config })) + ); },src/core/file/fileProcess.ts (1)
10-22
:⚠️ Potential issueAdd worker pool cleanup mechanism.
The worker pool should be properly cleaned up to prevent resource leaks.
Apply this diff:
const initTaskRunner = (numOfTasks: number) => { const { minThreads, maxThreads } = getWorkerThreadCount(numOfTasks); logger.trace(`Initializing worker pool with min=${minThreads}, max=${maxThreads} threads`); const pool = new Piscina({ filename: new URL('./workers/fileProcessWorker.js', import.meta.url).href, minThreads, maxThreads, idleTimeout: 5000, }); - return (task: FileProcessTask) => pool.run(task); + return { + run: (task: FileProcessTask) => pool.run(task), + cleanup: async () => { + await pool.destroy(); + logger.trace('Worker pool destroyed'); + } + }; };🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 11-12: src/core/file/fileProcess.ts#L11-L12
Added lines #L11 - L12 were not covered by tests
[warning] 14-19: src/core/file/fileProcess.ts#L14-L19
Added lines #L14 - L19 were not covered by tests
[warning] 21-22: src/core/file/fileProcess.ts#L21-L22
Added lines #L21 - L22 were not covered by testssrc/core/file/fileCollect.ts (1)
62-65
: 🛠️ Refactor suggestionAlign error handling with fileProcess.ts.
For consistency, implement the same error handling pattern as suggested for fileProcess.ts.
🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 64-65: src/core/file/fileCollect.ts#L64-L65
Added lines #L64 - L65 were not covered by tests
🧹 Nitpick comments (16)
src/core/metrics/workers/fileMetricsWorker.ts (3)
14-22
: Consider implementing a thread-safe singleton pattern.While the singleton pattern is appropriate here for resource management, the current implementation might be susceptible to race conditions in a multi-threaded environment. Although Piscina workers run in separate threads, it's a good practice to implement thread-safe patterns.
Consider using a more robust singleton implementation:
-// Worker-level singleton for TokenCounter -let tokenCounter: TokenCounter | null = null; - -const getTokenCounter = (encoding: TiktokenEncoding): TokenCounter => { - if (!tokenCounter) { - tokenCounter = new TokenCounter(encoding); - } - return tokenCounter; -}; +// Worker-level singleton for TokenCounter +let tokenCounter: TokenCounter | null = null; +let tokenCounterEncoding: TiktokenEncoding | null = null; + +const getTokenCounter = (encoding: TiktokenEncoding): TokenCounter => { + if (!tokenCounter || tokenCounterEncoding !== encoding) { + if (tokenCounter) { + tokenCounter.free(); + } + tokenCounter = new TokenCounter(encoding); + tokenCounterEncoding = encoding; + } + return tokenCounter; +};This implementation also handles cases where the encoding type changes between calls.
35-44
: Add input validation for file content.Consider adding validation for the file content to handle edge cases gracefully.
Consider adding these validations:
export const calculateIndividualFileMetrics = async ( file: ProcessedFile, encoding: TiktokenEncoding, ): Promise<FileMetrics> => { + if (!file.content) { + return { path: file.path, charCount: 0, tokenCount: 0 }; + } + const charCount = file.content.length; const tokenCounter = getTokenCounter(encoding); const tokenCount = tokenCounter.countTokens(file.content, file.path); return { path: file.path, charCount, tokenCount }; };
46-52
: Enhance worker cleanup handling.While the current cleanup is good, consider these improvements:
- Add test coverage for the cleanup handler
- Handle additional process events for more robust cleanup
Consider enhancing the cleanup:
// Cleanup when worker is terminated -process.on('exit', () => { +const cleanup = () => { if (tokenCounter) { tokenCounter.free(); tokenCounter = null; } -}); +}; + +// Handle various termination scenarios +process.on('exit', cleanup); +process.on('SIGINT', cleanup); +process.on('SIGTERM', cleanup); +process.on('uncaughtException', (error) => { + logger.error('Uncaught exception:', error); + cleanup(); + process.exit(1); +});🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 48-52: src/core/metrics/workers/fileMetricsWorker.ts#L48-L52
Added lines #L48 - L52 were not covered by teststests/integration-tests/packager.test.ts (2)
26-30
: Add JSDoc documentation for the mock function.Consider adding JSDoc documentation to explain the purpose and usage of this mock function.
+/** + * Creates a mock task runner for file collection testing. + * @returns An async function that executes fileCollectWorker with the given task + */ const mockCollectFileInitTaskRunner = () => { return async (task: FileCollectTask) => { return await fileCollectWorker(task); }; };
93-113
: Simplify metrics calculation using array methods.The metrics calculation can be simplified using array methods for better readability and maintainability.
calculateMetrics: async (processedFiles, output, progressCallback, config) => { return { totalFiles: processedFiles.length, totalCharacters: processedFiles.reduce((acc, file) => acc + file.content.length, 0), totalTokens: processedFiles.reduce((acc, file) => acc + file.content.split(/\s+/).length, 0), - fileCharCounts: processedFiles.reduce( - (acc, file) => { - acc[file.path] = file.content.length; - return acc; - }, - {} as Record<string, number>, - ), - fileTokenCounts: processedFiles.reduce( - (acc, file) => { - acc[file.path] = file.content.split(/\s+/).length; - return acc; - }, - {} as Record<string, number>, - ), + fileCharCounts: Object.fromEntries( + processedFiles.map(file => [file.path, file.content.length]) + ), + fileTokenCounts: Object.fromEntries( + processedFiles.map(file => [file.path, file.content.split(/\s+/).length]) + ), }; },src/core/security/workers/securityCheckWorker.ts (2)
31-34
: Enhance error logging for better debugging.Consider adding more context to the error log by including the error type and stack trace.
} catch (error) { - logger.error(`Error checking security on ${filePath}:`, error); + logger.error( + `Error checking security on ${filePath}:`, + error instanceof Error ? { type: error.constructor.name, message: error.message, stack: error.stack } : error + ); throw error; }
62-69
: Add configuration logging for better transparency.Consider logging the SecretLint configuration for better debugging and transparency.
export const createSecretLintConfig = (): SecretLintCoreConfig => ({ rules: [ { id: '@secretlint/secretlint-rule-preset-recommend', rule: creator, }, ], }); +logger.trace('SecretLint configuration:', createSecretLintConfig());
src/core/file/workers/fileCollectWorker.ts (3)
8-10
: Consider making MAX_FILE_SIZE configurable.While 50MB is a reasonable default, consider making this limit configurable through the application's configuration to accommodate different use cases.
-// Maximum file size to process (50MB) -// This prevents out-of-memory errors when processing very large files -export const MAX_FILE_SIZE = 50 * 1024 * 1024; +// Maximum file size to process (defaults to 50MB) +// This prevents out-of-memory errors when processing very large files +export const MAX_FILE_SIZE = Number(process.env.REPOMIX_MAX_FILE_SIZE) || 50 * 1024 * 1024;
37-42
: Enhance the large file warning message.The warning message could be more actionable by providing the exact command to add the file to .repomixignore.
logger.log(''); logger.log('⚠️ Large File Warning:'); logger.log('──────────────────────'); logger.log(`File exceeds size limit: ${sizeMB}MB > ${MAX_FILE_SIZE / 1024 / 1024}MB (${filePath})`); - logger.note('Add this file to .repomixignore if you want to exclude it permanently'); + logger.note(`Run 'echo "${filePath}" >> .repomixignore' to exclude this file permanently`); logger.log('');
60-62
: Log encoding detection fallback.Add logging when falling back to utf-8 encoding to help diagnose potential encoding issues.
- const encoding = jschardet.detect(buffer).encoding || 'utf-8'; + const detected = jschardet.detect(buffer); + const encoding = detected.encoding || 'utf-8'; + if (!detected.encoding) { + logger.debug(`No encoding detected for ${filePath}, falling back to utf-8`); + } const content = iconv.decode(buffer, encoding);src/core/security/securityCheck.ts (2)
16-28
: Consider increasing the idle timeout for large repositories.The current
idleTimeout
of 5000ms might be too short for processing large repositories, potentially leading to premature worker termination between batches. This could cause unnecessary overhead from worker pool recreation.const pool = new Piscina({ filename: new URL('./workers/securityCheckWorker.js', import.meta.url).href, minThreads, maxThreads, - idleTimeout: 5000, + idleTimeout: 30000, // 30 seconds for better handling of large repositories });
53-61
: Consider implementing batch processing for large repositories.While the current implementation efficiently processes files in parallel, for very large repositories, processing all files simultaneously might consume excessive memory. Consider implementing batch processing to better control resource usage.
- const results = await Promise.all( - tasks.map((task) => - runTask(task).then((result) => { - completedTasks++; - progressCallback(`Running security check... (${completedTasks}/${totalTasks}) ${pc.dim(task.filePath)}`); - return result; - }), - ), - ); + const batchSize = 100; // Adjust based on available system resources + const results = []; + for (let i = 0; i < tasks.length; i += batchSize) { + const batch = tasks.slice(i, i + batchSize); + const batchResults = await Promise.all( + batch.map((task) => + runTask(task).then((result) => { + completedTasks++; + progressCallback(`Running security check... (${completedTasks}/${totalTasks}) ${pc.dim(task.filePath)}`); + return result; + }), + ), + ); + results.push(...batchResults); + }tests/core/file/fileCollect.test.ts (1)
20-24
: Enhance test coverage for concurrent file collection.The current
mockInitTaskRunner
processes tasks sequentially. Consider adding tests that verify concurrent file collection behavior.const mockInitTaskRunner = () => { + const activeTasks = new Set(); return async (task: FileCollectTask) => { + activeTasks.add(task); + if (activeTasks.size > 1) { + logger.debug(`Processing ${activeTasks.size} concurrent tasks`); + } const result = await fileCollectWorker(task); + activeTasks.delete(task); return result; }; };src/core/file/fileProcess.ts (2)
41-65
: Enhance error handling with specific error types.The current error handling is too generic. Consider:
- Creating specific error types for different failure scenarios
- Adding retry logic for transient failures
- Including more context in error messages
Example implementation:
+ class FileProcessingError extends Error { + constructor(message: string, public readonly cause?: unknown) { + super(message); + this.name = 'FileProcessingError'; + } + } try { // ... existing code ... } catch (error) { - logger.error('Error during file processing:', error); - throw error; + logger.error('Error during file processing:', { + error, + taskCount: tasks.length, + completedTasks + }); + throw new FileProcessingError('File processing failed', error); }🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 64-65: src/core/file/fileProcess.ts#L64-L65
Added lines #L64 - L65 were not covered by tests
32-56
: Consider implementing batch processing for better performance.Instead of processing all files at once, consider implementing batch processing to:
- Reduce memory usage
- Provide more granular progress updates
- Allow for better error recovery
Would you like me to provide an example implementation of batch processing?
src/core/file/fileCollect.ts (1)
47-56
: Improve progress tracking with estimated time remaining.The current progress updates could be enhanced with time estimates.
Example implementation:
const startTime = process.hrtime.bigint(); let completedTasks = 0; const totalTasks = tasks.length; const updateProgress = (task: FileCollectTask) => { completedTasks++; const elapsed = Number(process.hrtime.bigint() - startTime) / 1e9; const rate = completedTasks / elapsed; const remaining = (totalTasks - completedTasks) / rate; progressCallback( `Collect file... (${completedTasks}/${totalTasks}, ~${remaining.toFixed(1)}s remaining) ${pc.dim(task.filePath)}` ); logger.trace(`Collect files... (${completedTasks}/${totalTasks}) ${task.filePath}`); }; const results = await Promise.all( tasks.map((task) => runTask(task).then((result) => { updateProgress(task); return result; }), ), );
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (30)
src/cli/actions/remoteAction.ts
(2 hunks)src/core/file/fileCollect.ts
(1 hunks)src/core/file/fileProcess.ts
(1 hunks)src/core/file/workers/fileCollectWorker.ts
(1 hunks)src/core/file/workers/fileProcessWorker.ts
(1 hunks)src/core/metrics/aggregateMetrics.ts
(0 hunks)src/core/metrics/calculateAllFileMetrics.ts
(1 hunks)src/core/metrics/calculateIndividualFileMetrics.ts
(0 hunks)src/core/metrics/calculateMetrics.ts
(2 hunks)src/core/metrics/calculateOutputMetrics.ts
(1 hunks)src/core/metrics/workers/fileMetricsWorker.ts
(1 hunks)src/core/metrics/workers/outputMetricsWorker.ts
(1 hunks)src/core/metrics/workers/types.ts
(1 hunks)src/core/packager.ts
(1 hunks)src/core/security/runSecurityCheckIfEnabled.ts
(0 hunks)src/core/security/securityCheck.ts
(1 hunks)src/core/security/validateFileSafety.ts
(1 hunks)src/core/security/workers/securityCheckWorker.ts
(1 hunks)src/shared/processConcurrency.ts
(1 hunks)tests/cli/actions/remoteAction.test.ts
(2 hunks)tests/core/file/fileCollect.test.ts
(6 hunks)tests/core/file/fileProcess.test.ts
(9 hunks)tests/core/metrics/aggregateMetrics.test.ts
(0 hunks)tests/core/metrics/calculateAllFileMetrics.test.ts
(1 hunks)tests/core/metrics/calculateIndividualFileMetrics.test.ts
(0 hunks)tests/core/metrics/calculateMetrics.test.ts
(1 hunks)tests/core/packager.test.ts
(1 hunks)tests/core/security/runSecurityCheckIfEnabled.test.ts
(0 hunks)tests/core/security/validateFileSafety.test.ts
(1 hunks)tests/integration-tests/packager.test.ts
(2 hunks)
💤 Files with no reviewable changes (6)
- tests/core/metrics/aggregateMetrics.test.ts
- src/core/metrics/aggregateMetrics.ts
- tests/core/metrics/calculateIndividualFileMetrics.test.ts
- src/core/security/runSecurityCheckIfEnabled.ts
- src/core/metrics/calculateIndividualFileMetrics.ts
- tests/core/security/runSecurityCheckIfEnabled.test.ts
🚧 Files skipped from review as they are similar to previous changes (8)
- src/core/metrics/workers/types.ts
- src/cli/actions/remoteAction.ts
- tests/cli/actions/remoteAction.test.ts
- src/core/file/workers/fileProcessWorker.ts
- src/core/security/validateFileSafety.ts
- tests/core/security/validateFileSafety.test.ts
- tests/core/packager.test.ts
- src/core/packager.ts
🧰 Additional context used
🪛 GitHub Check: codecov/patch
src/core/metrics/workers/fileMetricsWorker.ts
[warning] 25-30: src/core/metrics/workers/fileMetricsWorker.ts#L25-L30
Added lines #L25 - L30 were not covered by tests
[warning] 32-33: src/core/metrics/workers/fileMetricsWorker.ts#L32-L33
Added lines #L32 - L33 were not covered by tests
[warning] 48-52: src/core/metrics/workers/fileMetricsWorker.ts#L48-L52
Added lines #L48 - L52 were not covered by tests
src/core/metrics/calculateAllFileMetrics.ts
[warning] 12-13: src/core/metrics/calculateAllFileMetrics.ts#L12-L13
Added lines #L12 - L13 were not covered by tests
[warning] 15-20: src/core/metrics/calculateAllFileMetrics.ts#L15-L20
Added lines #L15 - L20 were not covered by tests
[warning] 22-23: src/core/metrics/calculateAllFileMetrics.ts#L22-L23
Added lines #L22 - L23 were not covered by tests
[warning] 65-67: src/core/metrics/calculateAllFileMetrics.ts#L65-L67
Added lines #L65 - L67 were not covered by tests
src/core/file/fileCollect.ts
[warning] 10-11: src/core/file/fileCollect.ts#L10-L11
Added lines #L10 - L11 were not covered by tests
[warning] 13-18: src/core/file/fileCollect.ts#L13-L18
Added lines #L13 - L18 were not covered by tests
[warning] 20-20: src/core/file/fileCollect.ts#L20
Added line #L20 was not covered by tests
[warning] 64-65: src/core/file/fileCollect.ts#L64-L65
Added lines #L64 - L65 were not covered by tests
src/core/file/fileProcess.ts
[warning] 11-12: src/core/file/fileProcess.ts#L11-L12
Added lines #L11 - L12 were not covered by tests
[warning] 14-19: src/core/file/fileProcess.ts#L14-L19
Added lines #L14 - L19 were not covered by tests
[warning] 21-22: src/core/file/fileProcess.ts#L21-L22
Added lines #L21 - L22 were not covered by tests
[warning] 64-65: src/core/file/fileProcess.ts#L64-L65
Added lines #L64 - L65 were not covered by tests
src/core/metrics/workers/outputMetricsWorker.ts
[warning] 1-3: src/core/metrics/workers/outputMetricsWorker.ts#L1-L3
Added lines #L1 - L3 were not covered by tests
[warning] 12-12: src/core/metrics/workers/outputMetricsWorker.ts#L12
Added line #L12 was not covered by tests
[warning] 14-19: src/core/metrics/workers/outputMetricsWorker.ts#L14-L19
Added lines #L14 - L19 were not covered by tests
[warning] 21-24: src/core/metrics/workers/outputMetricsWorker.ts#L21-L24
Added lines #L21 - L24 were not covered by tests
[warning] 26-31: src/core/metrics/workers/outputMetricsWorker.ts#L26-L31
Added lines #L26 - L31 were not covered by tests
[warning] 33-34: src/core/metrics/workers/outputMetricsWorker.ts#L33-L34
Added lines #L33 - L34 were not covered by tests
src/core/file/workers/fileCollectWorker.ts
[warning] 56-57: src/core/file/workers/fileCollectWorker.ts#L56-L57
Added lines #L56 - L57 were not covered by tests
src/core/metrics/calculateOutputMetrics.ts
[warning] 9-10: src/core/metrics/calculateOutputMetrics.ts#L9-L10
Added lines #L9 - L10 were not covered by tests
[warning] 12-15: src/core/metrics/calculateOutputMetrics.ts#L12-L15
Added lines #L12 - L15 were not covered by tests
[warning] 17-18: src/core/metrics/calculateOutputMetrics.ts#L17-L18
Added lines #L17 - L18 were not covered by tests
[warning] 21-28: src/core/metrics/calculateOutputMetrics.ts#L21-L28
Added lines #L21 - L28 were not covered by tests
[warning] 30-32: src/core/metrics/calculateOutputMetrics.ts#L30-L32
Added lines #L30 - L32 were not covered by tests
[warning] 34-34: src/core/metrics/calculateOutputMetrics.ts#L34
Added line #L34 was not covered by tests
[warning] 36-38: src/core/metrics/calculateOutputMetrics.ts#L36-L38
Added lines #L36 - L38 were not covered by tests
[warning] 40-45: src/core/metrics/calculateOutputMetrics.ts#L40-L45
Added lines #L40 - L45 were not covered by tests
⏰ Context from checks skipped due to timeout of 90000ms (8)
- GitHub Check: Build and run (windows-latest, 23.x)
- GitHub Check: Build and run (windows-latest, 22.x)
- GitHub Check: Build and run (windows-latest, 21.x)
- GitHub Check: Build and run (windows-latest, 20.x)
- GitHub Check: Build and run (windows-latest, 19.x)
- GitHub Check: Build and run (windows-latest, 18.x)
- GitHub Check: Build and run (windows-latest, 18.0.0)
- GitHub Check: Cloudflare Pages
🔇 Additional comments (18)
src/core/metrics/workers/fileMetricsWorker.ts (3)
1-12
: LGTM! Well-structured imports and interface definition.The imports are properly typed and the
FileMetricsTask
interface is well-defined with clear properties.
24-33
: Add test coverage for the main worker function.The static analysis indicates that this section lacks test coverage. Consider adding tests for:
- Successful metric calculation
- Performance timing measurement
- Error scenarios
Would you like me to help generate test cases for this function?
🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 25-30: src/core/metrics/workers/fileMetricsWorker.ts#L25-L30
Added lines #L25 - L30 were not covered by tests
[warning] 32-33: src/core/metrics/workers/fileMetricsWorker.ts#L32-L33
Added lines #L32 - L33 were not covered by tests
1-52
: Great implementation of the worker thread functionality! 🚀The implementation aligns perfectly with the PR's performance optimization goals, as evidenced by the impressive improvements in processing times across various repositories. The code is well-structured, properly typed, and includes essential features like resource management and performance measurements.
🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 25-30: src/core/metrics/workers/fileMetricsWorker.ts#L25-L30
Added lines #L25 - L30 were not covered by tests
[warning] 32-33: src/core/metrics/workers/fileMetricsWorker.ts#L32-L33
Added lines #L32 - L33 were not covered by tests
[warning] 48-52: src/core/metrics/workers/fileMetricsWorker.ts#L48-L52
Added lines #L48 - L52 were not covered by teststests/integration-tests/packager.test.ts (2)
8-19
: LGTM! Well-organized imports.The new imports align well with the PR's objective of implementing worker-based processing for performance optimization.
85-90
: Verify mock security validation behavior.The current mock implementation of
validateFileSafety
returns an empty array without performing any validation. While this might be intentional for testing, it's important to ensure this doesn't mask potential security issues.Consider adding test cases that verify the security validation behavior:
- Add test cases with known unsafe files
- Verify that the actual implementation properly filters these files
- Document why the mock returns an empty array in this test
✅ Verification successful
Mock implementation is appropriate, security validation is properly tested
The empty array mock in the integration test is intentional and correct. Security validation is thoroughly tested in dedicated test files:
tests/core/security/validateFileSafety.test.ts
verifies the main validation flowtests/core/security/filterOutUntrustedFiles.test.ts
covers both suspicious and clean file scenarios🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Search for actual security validation tests rg -l "validateFileSafety.*test" --type tsLength of output: 44
Script:
#!/bin/bash # Find validateFileSafety implementation and related files rg -l "validateFileSafety" --type ts # Find filterOutUntrustedFiles implementation rg -l "filterOutUntrustedFiles" --type ts # Look for security-related test files rg -l "security.*test|test.*security" --type ts # Search for actual implementation of security validation ast-grep --pattern 'function validateFileSafety($$$) { $$$ }'Length of output: 646
Script:
#!/bin/bash # Check content of security validation test files echo "=== validateFileSafety.test.ts ===" cat tests/core/security/validateFileSafety.test.ts echo -e "\n=== filterOutUntrustedFiles.test.ts ===" cat tests/core/security/filterOutUntrustedFiles.test.tsLength of output: 3377
src/shared/processConcurrency.ts (2)
3-5
: LGTM! Efficient CPU core detection.The implementation correctly uses the modern
availableParallelism()
API with a fallback tocpus().length
.
7-24
: Consider adding upper bound validation for numOfTasks.While the worker thread calculation logic is sound, extremely large
numOfTasks
values could cause integer overflow in the division operation.Add input validation:
export const getWorkerThreadCount = (numOfTasks: number): { minThreads: number; maxThreads: number } => { + if (!Number.isFinite(numOfTasks) || numOfTasks <= 0) { + throw new Error('numOfTasks must be a positive finite number'); + } const processConcurrency = getProcessConcurrency();✅ Verification successful
The suggested input validation is valuable and should be implemented
While integer overflow isn't a practical concern due to JavaScript's number handling, the validation is important because:
- It prevents negative numbers and zero which could cause worker pool initialization issues
- It handles edge cases like NaN and Infinity that could impact system stability
- It provides clear error messages for debugging
🏁 Scripts executed
The following scripts were executed for the analysis:
Error: No shell scripts found
Script:
#!/bin/bash # Search for calls to getWorkerThreadCount to understand usage patterns rg "getWorkerThreadCount\(" -A 2 # Search for tests to understand expected behavior fd "test" -e ts -e js --exec rg -l "getWorkerThreadCount"Length of output: 1174
src/core/metrics/workers/outputMetricsWorker.ts (2)
21-34
: Consider handling token counting errors.While counting tokens is usually straightforward, malformed encoding strings or unexpectedly large content might throw errors.
🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 21-24: src/core/metrics/workers/outputMetricsWorker.ts#L21-L24
Added lines #L21 - L24 were not covered by tests
[warning] 26-31: src/core/metrics/workers/outputMetricsWorker.ts#L26-L31
Added lines #L26 - L31 were not covered by tests
[warning] 33-34: src/core/metrics/workers/outputMetricsWorker.ts#L33-L34
Added lines #L33 - L34 were not covered by tests
36-42
: LGTM! Proper resource cleanup.The cleanup implementation correctly frees resources when the worker is terminated.
tests/core/metrics/calculateAllFileMetrics.test.ts (1)
14-18
: LGTM! Clean mock implementation.The mock task runner correctly simulates the worker thread behavior for testing.
src/core/metrics/calculateOutputMetrics.ts (2)
8-18
: Consider adding worker pool cleanup.The worker pool should be cleaned up to prevent resource leaks.
🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 9-10: src/core/metrics/calculateOutputMetrics.ts#L9-L10
Added lines #L9 - L10 were not covered by tests
[warning] 12-15: src/core/metrics/calculateOutputMetrics.ts#L12-L15
Added lines #L12 - L15 were not covered by tests
[warning] 17-18: src/core/metrics/calculateOutputMetrics.ts#L17-L18
Added lines #L17 - L18 were not covered by tests
20-45
: LGTM! Good error handling and performance tracking.The implementation includes:
- Proper error propagation
- Detailed logging
- Performance measurement
🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 21-28: src/core/metrics/calculateOutputMetrics.ts#L21-L28
Added lines #L21 - L28 were not covered by tests
[warning] 30-32: src/core/metrics/calculateOutputMetrics.ts#L30-L32
Added lines #L30 - L32 were not covered by tests
[warning] 34-34: src/core/metrics/calculateOutputMetrics.ts#L34
Added line #L34 was not covered by tests
[warning] 36-38: src/core/metrics/calculateOutputMetrics.ts#L36-L38
Added lines #L36 - L38 were not covered by tests
[warning] 40-45: src/core/metrics/calculateOutputMetrics.ts#L40-L45
Added lines #L40 - L45 were not covered by testssrc/core/metrics/calculateMetrics.ts (1)
27-30
: LGTM! Efficient concurrent execution.Good use of Promise.all to parallelize file metrics and output metrics calculation.
src/core/security/workers/securityCheckWorker.ts (1)
6-9
: LGTM! Well-structured interface for the security check task.The
SecurityCheckTask
interface clearly defines the required input for the worker thread.src/core/file/workers/fileCollectWorker.ts (1)
46-49
: Add test coverage for binary file detection.The binary file detection logic lacks test coverage. Add test cases for both file path and content-based binary detection.
Also applies to: 55-58
src/core/metrics/calculateAllFileMetrics.ts (1)
49-57
: Consider implementing batch processing for large repositories.Similar to the security check implementation, consider adding batch processing to better manage memory usage when calculating metrics for large repositories.
tests/core/file/fileProcess.test.ts (1)
11-15
: Enhance test coverage for concurrent file processing.The current
mockInitTaskRunner
processes tasks sequentially. Consider adding tests that verify concurrent file processing behavior.tests/core/metrics/calculateMetrics.test.ts (1)
50-53
: Add test cases for error scenarios and edge cases.While the happy path is tested, consider adding test cases for:
- Worker initialization failures
- Edge cases with empty file lists
- Error propagation from worker tasks
Would you like me to help generate additional test cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 7
🧹 Nitpick comments (12)
tests/shared/processConcurrency.test.ts (1)
20-59
: Consider adding edge cases for getWorkerThreadCount.While the current test cases cover the basic scenarios well, consider adding tests for:
- Negative task counts
- Decimal/float task counts
- Very large numbers (Number.MAX_SAFE_INTEGER)
+ it('should handle negative task counts', () => { + const { minThreads, maxThreads } = getWorkerThreadCount(-10); + expect(minThreads).toBe(1); + expect(maxThreads).toBe(1); + }); + + it('should handle decimal task counts', () => { + const { minThreads, maxThreads } = getWorkerThreadCount(150.75); + expect(minThreads).toBe(1); + expect(maxThreads).toBe(2); + }); + + it('should handle very large task counts', () => { + const { minThreads, maxThreads } = getWorkerThreadCount(Number.MAX_SAFE_INTEGER); + expect(minThreads).toBe(1); + expect(maxThreads).toBe(8); // Limited by CPU count + });tests/core/metrics/calculateOutputMetrics.test.ts (2)
9-13
: Consider testing task runner initialization failure.The mock task runner could be enhanced to test initialization failures.
+const mockFailedInitTaskRunner = () => { + throw new Error('Failed to initialize task runner'); +}; + +it('should handle task runner initialization failure', async () => { + const content = 'test content'; + const encoding = 'o200k_base'; + + await expect( + calculateOutputMetrics(content, encoding, undefined, { + initTaskRunner: mockFailedInitTaskRunner, + }), + ).rejects.toThrow('Failed to initialize task runner'); +});
70-80
: Enhance complex content test assertions.The current test only verifies that the result is a positive number. Consider adding more specific assertions.
it('should work with longer complex content', async () => { const content = 'This is a longer test content with multiple sentences. It should work correctly.'; const encoding = 'o200k_base'; const result = await calculateOutputMetrics(content, encoding, undefined, { initTaskRunner: mockInitTaskRunner, }); expect(result).toBeGreaterThan(0); expect(typeof result).toBe('number'); + // Add specific token count assertion + expect(result).toBe(16); // Verify exact token count + // Test that spaces don't affect token count + const contentWithExtraSpaces = ' ' + content + ' '; + const resultWithSpaces = await calculateOutputMetrics(contentWithExtraSpaces, encoding, undefined, { + initTaskRunner: mockInitTaskRunner, + }); + expect(resultWithSpaces).toBe(result); });src/core/metrics/calculateOutputMetrics.ts (1)
6-9
: Consider optimizing thread count and adding pool cleanup.The worker pool is initialized with a single thread which might be a bottleneck for large repositories. Additionally, there's no cleanup mechanism for the worker pool.
const initTaskRunner = () => { const pool = initPiscina(1, new URL('./workers/outputMetricsWorker.js', import.meta.url).href); - return (task: OutputMetricsTask) => pool.run(task); + return { + runTask: (task: OutputMetricsTask) => pool.run(task), + cleanup: async () => { + await pool.destroy(); + logger.trace('Output metrics worker pool destroyed'); + } + }; };src/core/security/workers/securityCheckWorker.ts (3)
6-28
: Consider adding specific error type handling.While the error handling is good, it could be improved by handling specific error types differently.
} catch (error) { - logger.error(`Error checking security on ${filePath}:`, error); - throw error; + if (error instanceof SyntaxError) { + logger.warn(`Syntax error in ${filePath}:`, error); + return null; + } else { + logger.error(`Error checking security on ${filePath}:`, error); + throw error; + } }
30-54
: Consider structured logging for better analysis.The logging implementation could be improved to facilitate log parsing and analysis.
- logger.trace(`Found ${result.messages.length} issues in ${filePath}`); - logger.trace(result.messages.map((message) => ` - ${message.message}`).join('\n')); + logger.trace({ + event: 'security_check_issues', + file: filePath, + issueCount: result.messages.length, + issues: result.messages.map((message) => message.message) + });
56-63
: Consider adding configuration customization.The SecretLint configuration is currently static. Consider allowing rule customization through configuration options.
-export const createSecretLintConfig = (): SecretLintCoreConfig => ({ +export const createSecretLintConfig = ( + options: { customRules?: Array<{ id: string; rule: any }> } = {} +): SecretLintCoreConfig => ({ rules: [ { id: '@secretlint/secretlint-rule-preset-recommend', rule: creator, - }, + }, + ...(options.customRules || []), ], });src/core/file/fileProcess.ts (1)
31-55
: Consider implementing batch processing for very large repositories.While Promise.all works well, processing all files at once might consume too much memory for very large repositories.
+const processBatch = async (tasks: FileProcessTask[], runTask: any, batchSize: number = 100) => { + const results = []; + for (let i = 0; i < tasks.length; i += batchSize) { + const batch = tasks.slice(i, i + batchSize); + const batchResults = await Promise.all(batch.map(task => runTask(task))); + results.push(...batchResults); + } + return results; +}; try { const startTime = process.hrtime.bigint(); logger.trace(`Starting file processing for ${rawFiles.length} files using worker pool`); let completedTasks = 0; const totalTasks = tasks.length; - const results = await Promise.all( - tasks.map((task) => + const results = await processBatch( + tasks, + (task: FileProcessTask) => runTask(task).then((result) => { completedTasks++; progressCallback(`Processing file... (${completedTasks}/${totalTasks}) ${pc.dim(task.rawFile.path)}`); return result; }), - ), );src/core/security/securityCheck.ts (1)
25-31
: Consider batching tasks for better performance.For large repositories, creating individual tasks for each file might lead to overhead. Consider batching files to reduce the number of worker tasks.
Example implementation:
- const tasks = rawFiles.map( - (file) => - ({ - filePath: file.path, - content: file.content, - }) satisfies SecurityCheckTask, - ); + const BATCH_SIZE = 100; + const tasks = []; + for (let i = 0; i < rawFiles.length; i += BATCH_SIZE) { + const batch = rawFiles.slice(i, i + BATCH_SIZE); + tasks.push({ + files: batch.map(file => ({ + filePath: file.path, + content: file.content, + })), + } satisfies SecurityCheckTask); + }src/core/metrics/calculateAllFileMetrics.ts (2)
24-31
: Consider adding file type validation.The task creation assumes all files can be processed for metrics. Consider validating file types to avoid unnecessary processing.
+const isProcessableFile = (file: ProcessedFile) => { + const PROCESSABLE_EXTENSIONS = ['.js', '.ts', '.jsx', '.tsx', '.py', '.java']; + return PROCESSABLE_EXTENSIONS.some(ext => file.path.endsWith(ext)); +}; const tasks = processedFiles + .filter(isProcessableFile) .map( (file, index) => ({ file, index, totalFiles: processedFiles.length, encoding: tokenCounterEncoding, }) satisfies FileMetricsTask, );
38-47
: Consider implementing task prioritization.For large repositories, prioritizing certain file types or smaller files could improve perceived performance.
+const prioritizeTasks = (tasks: FileMetricsTask[]): FileMetricsTask[] => { + return [...tasks].sort((a, b) => { + // Prioritize smaller files + return a.file.content.length - b.file.content.length; + }); +}; - const results = await Promise.all( + const results = await Promise.all( + prioritizeTasks(tasks).map((task) => - tasks.map((task) => runTask(task).then((result) => { completedTasks++; progressCallback(`Calculating metrics... (${completedTasks}/${task.totalFiles}) ${pc.dim(task.file.path)}`); return result; }), ), );tests/core/security/securityCheck.test.ts (1)
13-24
: Add more edge cases to test data.The mock files could include more diverse scenarios to ensure robust testing.
const mockFiles: RawFile[] = [ { path: 'test1.js', // secretlint-disable content: 'URL: https://user:pass@example.com', // Clear security issue // secretlint-enable }, { path: 'test2.js', content: 'console.log("Hello World");', // No secrets }, + { + path: 'test3.js', + content: '', // Empty file + }, + { + path: 'test4.js', + content: Buffer.from('binary data').toString(), // Binary-like content + }, + { + path: 'test5.js', + content: 'a'.repeat(1024 * 1024), // Large file + }, ];
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (19)
src/config/configLoad.ts
(0 hunks)src/core/file/fileCollect.ts
(1 hunks)src/core/file/fileProcess.ts
(1 hunks)src/core/metrics/calculateAllFileMetrics.ts
(1 hunks)src/core/metrics/calculateOutputMetrics.ts
(1 hunks)src/core/security/securityCheck.ts
(1 hunks)src/core/security/workers/securityCheckWorker.ts
(1 hunks)src/shared/processConcurrency.ts
(1 hunks)tests/cli/actions/remoteAction.test.ts
(2 hunks)tests/cli/cliPrint.test.ts
(1 hunks)tests/config/configSchema.test.ts
(1 hunks)tests/core/metrics/calculateAllFileMetrics.test.ts
(1 hunks)tests/core/metrics/calculateOutputMetrics.test.ts
(1 hunks)tests/core/packager/copyToClipboardIfEnabled.test.ts
(0 hunks)tests/core/security/securityCheck.test.ts
(1 hunks)tests/core/security/workers/securityCheckWorker.test.ts
(1 hunks)tests/shared/processConcurrency.test.ts
(1 hunks)website/compose.yml
(1 hunks)website/server/src/utils/cache.ts
(1 hunks)
💤 Files with no reviewable changes (2)
- src/config/configLoad.ts
- tests/core/packager/copyToClipboardIfEnabled.test.ts
✅ Files skipped from review due to trivial changes (4)
- website/compose.yml
- tests/cli/cliPrint.test.ts
- tests/config/configSchema.test.ts
- website/server/src/utils/cache.ts
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/cli/actions/remoteAction.test.ts
🧰 Additional context used
🪛 Gitleaks (8.21.2)
tests/core/security/workers/securityCheckWorker.test.ts
24-24: Identified a Slack Bot token, which may compromise bot integrations and communication channel security.
(slack-bot-token)
16-16: Uncovered a GitHub Personal Access Token, potentially leading to unauthorized repository access and sensitive content exposure.
(github-pat)
23-23: Identified a Slack Legacy Workspace token, potentially compromising access to workspace data and legacy features.
(slack-legacy-workspace-token)
29-43: Identified a Private Key, which may compromise cryptographic security and sensitive data encryption.
(private-key)
⏰ Context from checks skipped due to timeout of 90000ms (8)
- GitHub Check: Test (macos-latest, 18.x)
- GitHub Check: Test (windows-latest, 23.x)
- GitHub Check: Test (windows-latest, 22.x)
- GitHub Check: Test (windows-latest, 21.x)
- GitHub Check: Test (windows-latest, 20.x)
- GitHub Check: Test (windows-latest, 19.x)
- GitHub Check: Analyze (javascript-typescript)
- GitHub Check: Cloudflare Pages
🔇 Additional comments (10)
tests/shared/processConcurrency.test.ts (1)
9-17
: LGTM! Good test coverage for getProcessConcurrency.The test effectively verifies the behavior when os.availableParallelism is available.
tests/core/security/workers/securityCheckWorker.test.ts (1)
51-76
: LGTM! Good negative test case.The test for normal content helps ensure no false positives.
src/core/metrics/calculateOutputMetrics.ts (2)
11-18
: Well-structured implementation with good practices!The function follows good practices:
- Clean separation of concerns
- Dependency injection for better testability
- Optional parameters with sensible defaults
21-36
: Excellent error handling and performance logging!The implementation includes:
- High-resolution performance measurements
- Detailed error logging with context
- Proper error propagation
src/core/file/fileProcess.ts (2)
9-12
: Add worker pool cleanup mechanism.The worker pool should be cleaned up after task completion to prevent resource leaks.
22-28
: Excellent type-safe implementation!The task mapping implementation is clean and type-safe, using the TypeScript satisfies operator effectively.
src/core/file/fileCollect.ts (3)
8-11
: Add worker pool cleanup mechanism.The worker pool should be cleaned up after task completion to prevent resource leaks.
13-20
: Well-structured function signature with good defaults!The implementation follows best practices:
- Sensible default for progressCallback
- Clean dependency injection pattern
31-55
: Consider implementing batch processing for very large repositories.Similar to fileProcess.ts, consider implementing batch processing for better memory management with large repositories.
tests/core/metrics/calculateAllFileMetrics.test.ts (1)
12-16
: Enhance test coverage with more scenarios.The mock implementation is basic. Consider adding tests for:
- Worker initialization failures
- Task execution errors
- Edge cases (empty files, large files)
✅ Verification successful
The review comment is valid - test coverage needs improvement
The implementation includes error handling and worker pool management that should be tested. The current mock bypasses important mechanisms. Specific improvements needed:
- Test worker pool initialization failures
- Test error propagation from the worker pool
- Test progress callback accuracy
- Test cleanup of worker pool resources
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Check current test coverage rg -l "calculateAllFileMetrics.*test" tests/ # Look for existing error handling tests rg "test.*error.*calculateAllFileMetrics" tests/Length of output: 97
Script:
#!/bin/bash # Find the test file and implementation fd calculateAllFileMetrics # Check test contents rg "describe|it\(" -A 2 tests/core/metrics/calculateAllFileMetrics.test.ts # Look for worker-related error handling rg "catch|throw|error" tests/core/metrics/calculateAllFileMetrics.test.tsLength of output: 706
Script:
#!/bin/bash # Check implementation for error handling cat src/core/metrics/calculateAllFileMetrics.ts # Look for worker implementation fd fileMetricsWorkerLength of output: 2097
Performance Improvement
yamadashy/repomix
868.73 millis (usr: 1.11 secs, sys: 0.14 secs)
671.26 millis (usr: 1.42 secs, sys: 0.22 secs)
No significant change
facebook/react
123.31 secs (usr: 118.64s, sys: 1.60s)
7.10 secs (usr: 20.01s, sys: 2.39s)
17x faster
vercel/next.js
17.85 mins (usr: 16.66 mins, sys: 0.18 mins)
17.27 secs (usr: 52.93 secs, sys: 7.11 secs)
58x faster
Changes
p-map
with Piscina worker threads for parallel processingChecklist
npm run test
npm run lint