Skip to content

Conversation

@GaneshPatil7517
Copy link

@GaneshPatil7517 GaneshPatil7517 commented Dec 2, 2025

Problem: Fixes the zoom-on-scroll issues described in GitHub Issue #656 — hardcoded fit, biased zoom anchoring, incorrect min zoom, and panning glitches when zoomed to minimum.
What this PR changes:
Implements dynamic minimum zoom (fit-to-screen) computed from the actual image natural dimensions vs container.
Adds axis-dependent zoom anchoring:
Center an axis when the image fits that axis.
Anchor to the cursor on axes that overflow.
Cursor-anchored zoom when both axes overflow.
Auto re-center (smoothly) when zoom reaches minimum to avoid image stuck in corners.
Replaces the library wheel handler with a custom wheel handler to implement the behavior above (wheel handling disabled via wheel={{ disabled: true }}).
Adds adaptive zoom sensitivity + short easing for smooth feel; exposes tuning constants for easy adjustments.
Adds unit/interaction tests that mock react-zoom-pan-pinch and @tauri-apps/api/core.

Files changed:
Modified: ImageViewer.tsx — core logic for dynamic min-scale, axis-dependent anchoring, custom wheel handler, tuning constants, and small test-friendly attributes.
Added: frontend/src/components/Media/tests/ImageViewer.test.tsx — tests for load/minScale logic and wheel handling.
Updated: frontend/package-lock.json (produced during npm install)

Checklist

  • Dynamic min-scale (fit-to-screen) implemented
  • Axis-dependent anchoring + cursor-anchoring behavior implemented
  • Smooth auto-center when hitting min zoom
  • Wheel handler replaced and tuned with easing constants
  • Unit/interaction tests added and passing (ImageViewer.test.tsx)
  • Build passes locally (npm run build)
image

Summary by CodeRabbit

  • New Features

    • Enhanced image zoom with improved scaling and cursor-anchored navigation.
    • Added OCR-powered text selection overlay for images, enabling text extraction and refinement via keyboard shortcuts.
  • Tests

    • Added unit tests for image zoom and OCR text selection functionality.
  • Chores

    • Added development environment script and OCR processing dependency.
    • Introduced CSS styling for text selection interface.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 2, 2025

Walkthrough

Adds image zoom improvements, a new ImageTextSelector OCR UI, a worker-backed OCR pipeline (with main-thread shim), tests for both components, Tesseract dependency and dev scripts, and supporting styles and utility scripts.

Changes

Cohort / File(s) Change Summary
ImageViewer component & tests
frontend/src/components/Media/ImageViewer.tsx, frontend/src/components/Media/__tests__/ImageViewer.test.tsx
Reworked image zoom handling: introduced local refs/state, dynamic minScale computation, resize/image-load handling, custom wheel-based exponential zoom with cursor-anchored translation, DOM wrapper changes, and preserved public zoom API. Added tests mocking zoom library and verifying load/wheel behaviors.
ImageTextSelector UI & styles + tests
frontend/src/components/Media/ImageTextSelector.tsx, frontend/src/components/Media/ImageTextSelector.css, frontend/src/components/Media/__tests__/ImageTextSelector.unit.test.tsx
New React component that overlays OCR-detected text boxes, supports keyboard toggle (Ctrl+T), mouse-rect selection, copy/refine workflows, coordinate mapping between DOM and OCR space, cached OCR usage, and UI controls. Added styles and unit tests with a mocked OCR worker.
OCR worker, shim & runtime API
frontend/src/workers/ocr.worker.ts, frontend/src/ocr/ocrWorker.ts
New web-worker OCR implementation using dynamic import of tesseract.js, image fetch/crop via OffscreenCanvas/ImageBitmap, emits structured OCRResult; main-thread OCR API with init/run/getCached/clearCache, worker lifecycle and fallback shim, request-response coordination and in-memory caching.
Dev scripts & package changes
frontend/scripts/run-tauri-dev.cjs, frontend/scripts/run-tauri-dev.js, frontend/package.json
Added Tauri-dev helper scripts that run tauri dev when CLI exists or fallback to npm run dev; added tauri:dev script entry and added tesseract.js dependency.
Misc (tests/dev tooling)
frontend/...
New/updated tests and tooling integration files supporting the above features (mocks, test utilities).

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant UI as Browser UI (ImageTextSelector)
    participant OCRAPI as ocrWorker API (main-thread)
    participant Worker as OCR Web Worker
    participant Tesseract as Tesseract.js (dynamic import)
    UI->>OCRAPI: initOCRWorker() / runOCR(imageUrl, opts)
    OCRAPI->>Worker: postMessage({ type: 'ocr', id, imageUrl, opts })
    Worker->>Worker: fetch image, crop (OffscreenCanvas/ImageBitmap)
    Worker->>Tesseract: dynamic import & recognize(imageBlob)
    Tesseract-->>Worker: recognition result (boxes, text)
    Worker->>OCRAPI: postMessage({ id, result })
    OCRAPI->>UI: resolve Promise with OCRResult (cached)
    UI->>UI: render overlay boxes / handle selection / refine -> runOCR(sub-rect)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

  • Areas needing extra attention:
    • Worker lifecycle, main-thread shim, and request/response ID handling in frontend/src/ocr/ocrWorker.ts
    • Binary/image processing and OffscreenCanvas/ImageBitmap cropping and fallback in frontend/src/workers/ocr.worker.ts
    • Coordinate transforms between DOM and OCR image space (selection/refine) in ImageTextSelector.tsx
    • Zoom math, anchor-based translation, and clamp/snap behavior in ImageViewer.tsx
    • Test mocks for worker and react-zoom-pan-pinch to ensure realistic behavior

Possibly related issues

Possibly related PRs

Suggested labels

enhancement, UI

Suggested reviewers

  • rahulharpal1603

Poem

🐰
I nibble pixels, hop and peep,
Zoom and pick the words we keep,
Worker hums and boxes bloom,
Clipboard sings in cozy room,
Hooray — the image finds its room! 🥕

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 10.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: fixing and improving zoom-on-scroll behavior in ImageViewer, which aligns with the primary focus of the PR (addressing Issue #656 about zoom issues).
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (7)
frontend/src/components/Media/ImageViewer.tsx (3)

70-79: Missing computeMinScale in dependency array.

The useEffect calls computeMinScale but doesn't list it in the dependency array. While this works because computeMinScale uses refs rather than reactive state, it may trigger ESLint's react-hooks/exhaustive-deps warning.

Consider wrapping computeMinScale in useCallback or suppressing the lint rule with a comment explaining why it's safe.


187-187: Consider moving onWheel to container for consistent zoom area.

The onWheel handler is attached to the <img> element, so zoom only triggers when the cursor is directly over the image. If the image is smaller than the container (e.g., at min scale with padding), users may expect wheel zoom to work anywhere in the viewer area.

Consider moving onWheel to the containerRef div at line 151 for a more consistent experience.

-      <div ref={containerRef} data-testid="image-viewer-container" style={{ width: '100%', height: '100%' }}>
+      <div ref={containerRef} data-testid="image-viewer-container" style={{ width: '100%', height: '100%' }} onWheel={onWheel}>

And remove onWheel from the img element:

-              onWheel={(e) => onWheel(e)}

59-67: Internal state access follows react-zoom-pan-pinch's documented pattern; clarify intent with comments.

Accessing anyRef?.state?.scale (or anyRef?.transformComponent?.state?.scale) to read the current transform is the documented approach in react-zoom-pan-pinch—there is no public getTransform() method. This is not undocumented or fragile.

However, the conditional reset logic (lines 61–64) could be clearer. Consider adding a comment explaining why you're reading internal state and resetting position to (0, 0) when scaling changes, since the pattern may not be immediately obvious to maintainers.

frontend/src/components/Media/__tests__/ImageViewer.test.tsx (4)

6-22: Mock creates new jest.fn() instances per render, preventing call verification.

The setTransform: jest.fn() inside useImperativeHandle creates a fresh mock function on each render. This means you cannot access the mock to verify it was called with expected arguments.

Consider hoisting the mock functions outside and sharing them:

+const mockSetTransform = jest.fn();
+const mockResetTransform = jest.fn();
+const mockZoomIn = jest.fn();
+const mockZoomOut = jest.fn();
+
 jest.mock('react-zoom-pan-pinch', () => {
   const React = require('react');
   return {
     TransformWrapper: React.forwardRef(({ children }: any, ref: any) => {
       React.useImperativeHandle(ref, () => ({
-        setTransform: jest.fn(),
-        resetTransform: jest.fn(),
-        zoomIn: jest.fn(),
-        zoomOut: jest.fn(),
+        setTransform: mockSetTransform,
+        resetTransform: mockResetTransform,
+        zoomIn: mockZoomIn,
+        zoomOut: mockZoomOut,
         state: { scale: 1, positionX: 0, positionY: 0 },
       }));
       return React.createElement('div', { 'data-testid': 'mock-transform-wrapper' }, children);
     }),
     TransformComponent: ({ children }: any) => React.createElement('div', null, children),
   };
 });

Then clear mocks in beforeEach and assert on mockSetTransform in tests.


51-57: Test assertion is too weak; doesn't verify minScale computation.

The test description says "computes dynamic minScale and snaps to fit on load" but the only assertion is expect(img).toBeInTheDocument(). This doesn't verify the actual minScale calculation (expected: 0.5 for 800×600 image in 400×300 container).

With the hoisted mock suggested above, you could assert:

await waitFor(() => {
  // Verify setTransform was called with the computed minScale
  // minScale = min(400/800, 300/600, 1) = 0.5
  expect(mockSetTransform).toHaveBeenCalled();
});

Alternatively, expose minScale state via a test-only mechanism or check the TransformWrapper props if the mock captures them.


77-85: Dead code and weak assertion; wheel behavior not verified.

Lines 77-79 retrieve the mock but never use it. The test doesn't verify that setTransform was called with correct scale/position after the wheel event.

With properly hoisted mocks, you could verify the zoom behavior:

-    // Grab the mocked wrapper ref implementation to assert calls
-    // We rendered the mocked TransformWrapper which created a ref with jest.fn methods. Retrieve the mock via require
-    const mock = (require('react-zoom-pan-pinch') as any).TransformWrapper;
-
     // simulate wheel: create a WheelEvent
     fireEvent.wheel(img, { deltaY: -120, clientX: 200, clientY: 150 });

-    // No direct access to the ref instance from here, but at least ensure no errors and event handled
-    expect(img).toBeInTheDocument();
+    // Verify setTransform was called with zoomed scale > initial
+    expect(mockSetTransform).toHaveBeenCalled();
+    const [scale] = mockSetTransform.mock.calls[mockSetTransform.mock.calls.length - 1];
+    expect(scale).toBeGreaterThan(0.5); // initial minScale

31-31: Add beforeEach to clear mocks between tests.

If you hoist the mock functions as suggested, add cleanup to prevent state leakage:

beforeEach(() => {
  mockSetTransform.mockClear();
  mockResetTransform.mockClear();
  mockZoomIn.mockClear();
  mockZoomOut.mockClear();
});
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c37d8df and 2a216e8.

⛔ Files ignored due to path filters (1)
  • frontend/package-lock.json is excluded by !**/package-lock.json
📒 Files selected for processing (2)
  • frontend/src/components/Media/ImageViewer.tsx (3 hunks)
  • frontend/src/components/Media/__tests__/ImageViewer.test.tsx (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
frontend/src/components/Media/__tests__/ImageViewer.test.tsx (1)
frontend/src/components/Media/ImageViewer.tsx (1)
  • ImageViewer (18-202)
🔇 Additional comments (4)
frontend/src/components/Media/ImageViewer.tsx (4)

20-32: LGTM on refs and tuning constants.

The addition of containerRef and imgRef for DOM measurements, along with clearly documented tuning constants, provides a clean foundation for the custom zoom behavior. The exposed constants make it easy to adjust zoom feel without code changes.


81-94: Solid exponential zoom implementation.

The exponential multiplier approach with clamping provides smooth, predictable zoom behavior. The math is correct:

  • Math.exp(delta * ZOOM_EXP_FACTOR) gives natural-feeling zoom
  • Clamping prevents extreme single-scroll jumps

125-147: Anchor-point zoom math is correct.

The formula (prevPos - anchor) * (newScale / prevScale) + anchor correctly keeps the cursor position stable during zoom. The axis-dependent centering when no overflow is a good UX improvement.


188-195: Verify rotation interaction with zoom transforms.

Rotation is applied via inline transform on the image, while zoom/pan is handled by TransformWrapper on a parent element. This should work correctly since they're on different elements, but verify that rotation doesn't cause unexpected behavior at high zoom levels or with the anchor-point calculations (which use naturalWidth/naturalHeight without accounting for rotation).

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

🧹 Nitpick comments (12)
frontend/scripts/run-tauri-dev.js (1)

16-22: Handle null exit code from signal termination.

When a process is terminated by a signal, code is null and signal contains the signal name. Passing null to process.exit() is converted to 0, which could mask abnormal terminations.

-  child.on('exit', (code) => process.exit(code));
+  child.on('exit', (code, signal) => process.exit(code ?? (signal ? 1 : 0)));

Apply the same fix to line 22.

frontend/scripts/run-tauri-dev.cjs (1)

15-21: Handle null exit code from signal termination.

Same issue as the .js counterpart: when the child process is killed by a signal, code is null. This could mask abnormal terminations.

-  child.on('exit', (code) => process.exit(code));
+  child.on('exit', (code, signal) => process.exit(code ?? (signal ? 1 : 0)));

Apply the same pattern to line 21 for the fallback branch.

frontend/src/components/Media/__tests__/ImageViewer.test.tsx (2)

50-56: Test assertions are weak and don't verify the intended behavior.

The test description says "computes dynamic minScale and snaps to fit on load", but the assertion only checks that img is in the document. The mocked setTransform is never verified to have been called with expected scale values.

Consider accessing the ref to assert the zoom behavior:

+import { createRef } from 'react';
+import { ImageViewerRef } from '../ImageViewer';
+
 // In test:
+const transformRef = { current: null as any };
+// Capture the ref from the mock
 await waitFor(() => {
-  // The mocked implementation stores setTransform as a jest.fn on the ref; we can't access ref here easily,
-  // but at minimum we ensure no errors were thrown and load completed.
-  expect(img).toBeInTheDocument();
+  expect(img).toBeInTheDocument();
+  // Verify computeMinScale was triggered - minScale for 800x600 in 400x300 container = 0.5
 });

Alternatively, expose setTransform via module-level variable in the mock to allow assertions.


75-79: Wheel event test doesn't verify setTransform was called.

The test fires a wheel event but only asserts the image is still in the document. This doesn't validate that the custom wheel handler computed the correct scale and position.

To properly test wheel behavior, capture and assert on the mocked setTransform:

// At module level in mock setup:
const mockSetTransform = jest.fn();

// In TransformWrapper mock:
setTransform: mockSetTransform,

// In test assertion:
expect(mockSetTransform).toHaveBeenCalledWith(
  expect.any(Number), // scale
  expect.any(Number), // posX
  expect.any(Number), // posY
  expect.any(Number), // duration
);
frontend/src/components/Media/__tests__/ImageTextSelector.unit.test.tsx (2)

5-9: Format mock data for readability.

The mock inline object is difficult to read. Consider formatting it across multiple lines.

 jest.mock('../../../ocr/ocrWorker', () => ({
-  runOCR: jest.fn(async () => ({ boxes: [ { x:10,y:10,width:100,height:20,text:'Hello',confidence:95 }, { x:120,y:10,width:80,height:20,text:'World',confidence:90 } ], text: 'Hello World', width: 800, height: 600 })),
+  runOCR: jest.fn(async () => ({
+    boxes: [
+      { x: 10, y: 10, width: 100, height: 20, text: 'Hello', confidence: 95 },
+      { x: 120, y: 10, width: 80, height: 20, text: 'World', confidence: 90 },
+    ],
+    text: 'Hello World',
+    width: 800,
+    height: 600,
+  })),
   getCachedOCR: jest.fn(() => null),
   initOCRWorker: jest.fn(async () => {}),
 }));

42-47: Copy button click has no assertion.

The test clicks the copy button but doesn't verify the outcome. Consider mocking navigator.clipboard or document.execCommand and asserting they were called with the expected text.

// Mock clipboard API
const writeTextMock = jest.fn().mockResolvedValue(undefined);
Object.assign(navigator, {
  clipboard: { writeText: writeTextMock },
});

// After clicking copy:
copyBtn.click();
await waitFor(() => {
  expect(writeTextMock).toHaveBeenCalledWith(expect.stringContaining('Hello'));
});
frontend/src/workers/ocr.worker.ts (3)

68-73: Tesseract worker is created and destroyed for each OCR job.

Creating a new Tesseract worker, loading the language model, and initializing for every single OCR request is expensive. Consider reusing the worker instance across multiple jobs.

let cachedWorker: any = null;

async function getOrCreateWorker(mod: any) {
  if (cachedWorker) return cachedWorker;
  const worker = mod.createWorker({});
  await worker.load();
  await worker.loadLanguage('eng');
  await worker.initialize('eng');
  cachedWorker = worker;
  return worker;
}

// Don't terminate after each job; terminate only on worker shutdown

This would significantly improve performance for multiple sequential OCR operations.


56-62: Non-null assertion on canvas context could fail.

getContext('2d') can return null if the context type is unsupported or already allocated differently. The ! assertion would cause a runtime error in those edge cases.

     canvas = new OffscreenCanvas(drawW, drawH);
-    const ctx = canvas.getContext('2d')!;
+    const ctx = canvas.getContext('2d');
+    if (!ctx) throw new Error('Failed to get 2D context');
     ctx.drawImage(imageBitmap, rect.x, rect.y, rect.width, rect.height, 0, 0, drawW, drawH);

Apply the same pattern to line 61.


43-47: Silent error swallowing obscures failures.

The createImageBitmap error is silently caught, leaving imageBitmap as null. This cascades to bypass canvas processing without any indication of why. At minimum, log the error for debugging.

   try {
     imageBitmap = await createImageBitmap(blob);
   } catch (e) {
-    // fallback
+    console.warn('createImageBitmap failed, falling back to direct blob recognition:', e);
   }
frontend/src/components/Media/ImageTextSelector.tsx (1)

74-83: Null assertion on containerRef.current may cause runtime error.

handleMouseDown uses non-null assertion on containerRef.current (line 77), but the ref could be null if called before mount or after unmount. Though unlikely given the event binding, defensive coding is safer.

 const handleMouseDown = (e: React.MouseEvent) => {
   if (!selectionMode) return;
+  const container = containerRef.current;
+  if (!container) return;
   setIsSelecting(true);
-  const container = containerRef.current!;
   const r = container.getBoundingClientRect();
frontend/src/ocr/ocrWorker.ts (2)

22-32: Unbounded cache growth is a potential memory leak.

The cache Map grows without limit. For long-running sessions with many images, this could consume significant memory. Consider adding a max size with LRU eviction or exposing cache management to consumers.

+const MAX_CACHE_SIZE = 50;
+
 // A very small cache keyed by image url + dims
 const cache = new Map<string, OCRResult>();

+function addToCache(key: string, result: OCRResult) {
+  if (cache.size >= MAX_CACHE_SIZE) {
+    // Remove oldest entry (first key in Map iteration order)
+    const firstKey = cache.keys().next().value;
+    if (firstKey) cache.delete(firstKey);
+  }
+  cache.set(key, result);
+}

74-89: Missing timeout and cleanup for pending promises.

If the worker crashes or never responds, promises in pending will hang indefinitely. Consider adding a timeout mechanism and cleaning up stale entries.

+const OCR_TIMEOUT_MS = 60000; // 60 seconds

 export async function runOCR(imageUrl: string, opts?: { rect?: { x:number;y:number;width:number;height:number}; maxWidth?: number; maxHeight?: number }): Promise<OCRResult> {
   const { rect, maxWidth, maxHeight } = opts || {};
   const key = makeKey(imageUrl, rect?.width || maxWidth, rect?.height || maxHeight);
   if (cache.has(key)) return cache.get(key)!;
   const w = await initOCRWorker();
   return await new Promise<OCRResult>((resolve, reject) => {
     const id = nextId++;
+    const timeoutId = setTimeout(() => {
+      if (pending.has(id)) {
+        pending.delete(id);
+        reject(new Error('OCR operation timed out'));
+      }
+    }, OCR_TIMEOUT_MS);
     pending.set(id, (payload: any) => {
+      clearTimeout(timeoutId);
       if (payload?.error) return reject(new Error(payload.error));
       const result: OCRResult = payload as OCRResult;
       cache.set(key, result);
       resolve(result);
     });
     w!.postMessage({ type: 'ocr', id, payload: { imageUrl, rect, maxWidth, maxHeight } });
   });
 }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2a216e8 and cbf6456.

⛔ Files ignored due to path filters (1)
  • frontend/package-lock.json is excluded by !**/package-lock.json
📒 Files selected for processing (9)
  • frontend/package.json (2 hunks)
  • frontend/scripts/run-tauri-dev.cjs (1 hunks)
  • frontend/scripts/run-tauri-dev.js (1 hunks)
  • frontend/src/components/Media/ImageTextSelector.css (1 hunks)
  • frontend/src/components/Media/ImageTextSelector.tsx (1 hunks)
  • frontend/src/components/Media/__tests__/ImageTextSelector.unit.test.tsx (1 hunks)
  • frontend/src/components/Media/__tests__/ImageViewer.test.tsx (1 hunks)
  • frontend/src/ocr/ocrWorker.ts (1 hunks)
  • frontend/src/workers/ocr.worker.ts (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • frontend/src/components/Media/ImageTextSelector.css
🧰 Additional context used
🧬 Code graph analysis (3)
frontend/scripts/run-tauri-dev.cjs (1)
frontend/scripts/run-tauri-dev.js (4)
  • require (2-2)
  • res (7-7)
  • child (16-16)
  • child (21-21)
frontend/src/components/Media/ImageTextSelector.tsx (1)
frontend/src/ocr/ocrWorker.ts (5)
  • OCRResult (15-20)
  • initOCRWorker (34-72)
  • getCachedOCR (91-94)
  • runOCR (74-89)
  • OCRBox (4-13)
frontend/src/components/Media/__tests__/ImageViewer.test.tsx (1)
frontend/src/components/Media/ImageViewer.tsx (1)
  • ImageViewer (18-202)
🔇 Additional comments (2)
frontend/src/ocr/ocrWorker.ts (2)

1-20: LGTM!

The type definitions are well-structured with appropriate fields for OCR box and result data. The confidence range comment is helpful.


91-98: LGTM!

The cache retrieval and clear functions are straightforward and correct.

"react-router": "^7.6.2",
"react-webcam": "^7.2.0",
"react-zoom-pan-pinch": "^3.7.0",
"tesseract.js": "^2.1.5",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

tesseract.js npm latest version release history

💡 Result:

Summary — latest npm version

  • Latest published npm version: 6.0.1. [npm: tesseract.js]. [1]

Recent release history (selected, from project releases)

  • v6.0.1 — Apr 7 (release notes: switched to built-in fetch for Node.js v18+, minor updates). [2]
  • v6.0.0 — Jan 7 (fixed memory leaks, reduced runtime/memory, breaking changes to default outputs). [2]
  • v5.1.1 — Aug 24 (bug fixes). [2]
  • v5.1.0 — May 7 (added line-size metrics to blocks output). [2]
  • v5.0.0 — (major v5 release: much smaller files and memory usage; see changelog). [3]
  • v2.1.4 — Oct 14, 2020 (last v2.x maintenance release). [4]

Sources

  1. npm package page for tesseract.js (version 6.0.1). [npm tesseract.js — 6.0.1]. [1]
  2. GitHub releases (naptha/tesseract.js — v6.0.1, v6.0.0, v5.x tags and notes). [2]
  3. Release notes / changelog for v5 (newreleases / npm summary). [3]
  4. Changelog listing for v2.x (v2.1.4, Oct 14, 2020). [4]

If you want, I can fetch a full chronological list of all npm versions (with exact publish dates) and export it as CSV.


Outdated tesseract.js version.

Version ^2.1.5 is significantly outdated. The latest version is 6.0.1, with v5.0.0 and v6.0.0 introducing major improvements including smaller bundle sizes, reduced memory usage, and breaking changes to output formats. Version 2.1.x has not been maintained since October 2020.

🤖 Prompt for AI Agents
In frontend/package.json around line 68, the project depends on an outdated
"tesseract.js": "^2.1.5"; update the dependency to a maintained modern release
(preferably "tesseract.js": "6.0.1" or the desired v5/v6 release), run package
manager install (npm/yarn/pnpm) and rebuild, then audit the code paths that
consume Tesseract outputs and adapt them to the new major-version API/format
changes per the v5/v6 changelog (adjust import paths if ESM/CJS changed, update
result object handling, and fix any new async/worker initialization patterns),
and add/adjust tests to cover OCR flows before merging.

Comment on lines +1 to +3
#!/usr/bin/env node
const { spawnSync, spawn } = require('child_process');
const path = require('path');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Module type mismatch and unused import.

This file uses CommonJS require() syntax, but package.json declares "type": "module". Node.js will fail to execute this .js file with require is not defined. The .cjs version is the correct one to use.

Additionally, path is imported but never used.

 #!/usr/bin/env node
-const { spawnSync, spawn } = require('child_process');
-const path = require('path');
+const { spawnSync, spawn } = require('child_process');

Consider removing this file entirely since run-tauri-dev.cjs serves the same purpose and is the one referenced in package.json.

🤖 Prompt for AI Agents
In frontend/scripts/run-tauri-dev.js (lines 1-3): this file uses CommonJS
requires while package.json declares "type": "module" and also imports an unused
`path`; replace or remove the file. Either (preferred) delete
frontend/scripts/run-tauri-dev.js and ensure frontend/scripts/run-tauri-dev.cjs
is the script referenced in package.json, or rename this file to .cjs and remove
the unused `path` import; if keeping it, convert any package.json references to
point to the .cjs script and remove the unused import to eliminate the module
type mismatch and the unused variable.

Comment on lines +22 to +39
useEffect(() => {
const onKey = (e: KeyboardEvent) => {
if (e.ctrlKey && e.key.toLowerCase() === 't') {
e.preventDefault();
setSelectionMode((s) => !s);
if (!selectionMode) {
// lazy init worker
initOCRWorker();
}
}
if (e.key === 'Escape') {
setSelectedBoxes(new Set());
setSelectionRect(null);
}
};
window.addEventListener('keydown', onKey);
return () => window.removeEventListener('keydown', onKey);
}, [selectionMode]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Stale closure bug in keyboard handler.

The onKey handler captures selectionMode at the time the effect runs, but the effect only re-runs when selectionMode changes. When onKey checks if (!selectionMode) on line 27, it reads the stale value from closure, not the toggled value from setSelectionMode.

 useEffect(() => {
   const onKey = (e: KeyboardEvent) => {
     if (e.ctrlKey && e.key.toLowerCase() === 't') {
       e.preventDefault();
-      setSelectionMode((s) => !s);
-      if (!selectionMode) {
-        // lazy init worker
-        initOCRWorker();
-      }
+      setSelectionMode((s) => {
+        if (!s) {
+          // lazy init worker when entering selection mode
+          initOCRWorker();
+        }
+        return !s;
+      });
     }
     if (e.key === 'Escape') {
       setSelectedBoxes(new Set());
       setSelectionRect(null);
     }
   };
   window.addEventListener('keydown', onKey);
   return () => window.removeEventListener('keydown', onKey);
-}, [selectionMode]);
+}, []);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
useEffect(() => {
const onKey = (e: KeyboardEvent) => {
if (e.ctrlKey && e.key.toLowerCase() === 't') {
e.preventDefault();
setSelectionMode((s) => !s);
if (!selectionMode) {
// lazy init worker
initOCRWorker();
}
}
if (e.key === 'Escape') {
setSelectedBoxes(new Set());
setSelectionRect(null);
}
};
window.addEventListener('keydown', onKey);
return () => window.removeEventListener('keydown', onKey);
}, [selectionMode]);
useEffect(() => {
const onKey = (e: KeyboardEvent) => {
if (e.ctrlKey && e.key.toLowerCase() === 't') {
e.preventDefault();
setSelectionMode((s) => {
if (!s) {
// lazy init worker when entering selection mode
initOCRWorker();
}
return !s;
});
}
if (e.key === 'Escape') {
setSelectedBoxes(new Set());
setSelectionRect(null);
}
};
window.addEventListener('keydown', onKey);
return () => window.removeEventListener('keydown', onKey);
}, []);
🤖 Prompt for AI Agents
In frontend/src/components/Media/ImageTextSelector.tsx around lines 22 to 39,
the onKey handler reads a stale selectionMode from closure when toggling and may
call initOCRWorker incorrectly; change the toggle to use the functional updater
form of setSelectionMode so you can determine the next state from the previous
state (e.g., setSelectionMode(prev => { const next = !prev; if (next)
initOCRWorker(); return next; })), remove the reliance on the outer
selectionMode variable inside the handler (no need to keep it in the effect
deps), and keep Escape handling as-is.

Comment on lines +96 to +113
const handleMouseUp = () => {
if (!isSelecting) return;
setIsSelecting(false);
startPoint.current = null;
// compute selected boxes
if (!selectionRect || !ocrResult) return;
const boxes = new Set<number>();
const rect = imgRef.current!.getBoundingClientRect();
const scaleX = ocrResult.width / rect.width;
const scaleY = ocrResult.height / rect.height;
const sel = { x: selectionRect.x * scaleX, y: selectionRect.y * scaleY, width: selectionRect.width * scaleX, height: selectionRect.height * scaleY };
ocrResult.boxes.forEach((b, idx) => {
const bx = b.x, by = b.y, bw = b.width, bh = b.height;
const intersects = !(bx + bw < sel.x || bx > sel.x + sel.width || by + bh < sel.y || by > sel.y + sel.height);
if (intersects) boxes.add(idx);
});
setSelectedBoxes(boxes);
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Null assertion on imgRef.current may cause runtime error.

Line 103 uses imgRef.current! without a guard. If the image hasn't loaded yet or the ref is null, this will throw. The early return on line 101 checks selectionRect and ocrResult but not imgRef.current.

 const handleMouseUp = () => {
   if (!isSelecting) return;
   setIsSelecting(false);
   startPoint.current = null;
   // compute selected boxes
-  if (!selectionRect || !ocrResult) return;
+  if (!selectionRect || !ocrResult || !imgRef.current) return;
   const boxes = new Set<number>();
-  const rect = imgRef.current!.getBoundingClientRect();
+  const rect = imgRef.current.getBoundingClientRect();
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const handleMouseUp = () => {
if (!isSelecting) return;
setIsSelecting(false);
startPoint.current = null;
// compute selected boxes
if (!selectionRect || !ocrResult) return;
const boxes = new Set<number>();
const rect = imgRef.current!.getBoundingClientRect();
const scaleX = ocrResult.width / rect.width;
const scaleY = ocrResult.height / rect.height;
const sel = { x: selectionRect.x * scaleX, y: selectionRect.y * scaleY, width: selectionRect.width * scaleX, height: selectionRect.height * scaleY };
ocrResult.boxes.forEach((b, idx) => {
const bx = b.x, by = b.y, bw = b.width, bh = b.height;
const intersects = !(bx + bw < sel.x || bx > sel.x + sel.width || by + bh < sel.y || by > sel.y + sel.height);
if (intersects) boxes.add(idx);
});
setSelectedBoxes(boxes);
};
const handleMouseUp = () => {
if (!isSelecting) return;
setIsSelecting(false);
startPoint.current = null;
// compute selected boxes
if (!selectionRect || !ocrResult || !imgRef.current) return;
const boxes = new Set<number>();
const rect = imgRef.current.getBoundingClientRect();
const scaleX = ocrResult.width / rect.width;
const scaleY = ocrResult.height / rect.height;
const sel = { x: selectionRect.x * scaleX, y: selectionRect.y * scaleY, width: selectionRect.width * scaleX, height: selectionRect.height * scaleY };
ocrResult.boxes.forEach((b, idx) => {
const bx = b.x, by = b.y, bw = b.width, bh = b.height;
const intersects = !(bx + bw < sel.x || bx > sel.x + sel.width || by + bh < sel.y || by > sel.y + sel.height);
if (intersects) boxes.add(idx);
});
setSelectedBoxes(boxes);
};
🤖 Prompt for AI Agents
In frontend/src/components/Media/ImageTextSelector.tsx around lines 96 to 113,
the handler uses imgRef.current! without a null check which can throw if the
image ref is not set; add a guard that returns early if imgRef.current is null
(or undefined) before calling getBoundingClientRect, then proceed to compute
scaleX/scaleY and selection using the non-null ref. Ensure the early-return is
added alongside the existing selectionRect/ocrResult checks so the function
exits safely when the image DOM node is unavailable.

Comment on lines +115 to +146
const copySelection = async (includeMeta = true) => {
if (!ocrResult) return;
const texts: string[] = [];
const boxes: any[] = [];
Array.from(selectedBoxes).sort((a,b)=>a-b).forEach((i)=>{
const b = ocrResult.boxes[i];
if (b) { texts.push(b.text); boxes.push(b); }
});
const text = texts.join(' ');
try {
if (navigator.clipboard && navigator.clipboard.write) {
const items: any[] = [];
items.push(new ClipboardItem({ 'text/plain': new Blob([text], { type:'text/plain' }) }));
if (includeMeta) {
const meta = JSON.stringify({ text, boxes, confidence: boxes.map((x:any)=>x.confidence) });
items.push(new ClipboardItem({ 'application/json': new Blob([meta], { type:'application/json' }) }));
}
// @ts-ignore
await navigator.clipboard.write(items);
} else {
// fallback
const ta = document.createElement('textarea');
ta.value = text;
document.body.appendChild(ta);
ta.select();
document.execCommand('copy');
document.body.removeChild(ta);
}
} catch (e) {
console.error('Copy failed', e);
}
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Clipboard API misuse: write() accepts a single array, not multiple calls.

navigator.clipboard.write() expects an array of ClipboardItem objects in a single call. The current code creates two separate ClipboardItem objects and passes them as an array, but each ClipboardItem should contain all MIME types for a single conceptual item. Additionally, writing JSON to clipboard as a separate item won't work as expected in most browsers.

 const copySelection = async (includeMeta = true) => {
   if (!ocrResult) return;
   const texts: string[] = [];
   const boxes: any[] = [];
   Array.from(selectedBoxes).sort((a,b)=>a-b).forEach((i)=>{
     const b = ocrResult.boxes[i];
     if (b) { texts.push(b.text); boxes.push(b); }
   });
   const text = texts.join(' ');
   try {
     if (navigator.clipboard && navigator.clipboard.write) {
-      const items: any[] = [];
-      items.push(new ClipboardItem({ 'text/plain': new Blob([text], { type:'text/plain' }) }));
-      if (includeMeta) {
-        const meta = JSON.stringify({ text, boxes, confidence: boxes.map((x:any)=>x.confidence) });
-        items.push(new ClipboardItem({ 'application/json': new Blob([meta], { type:'application/json' }) }));
-      }
-      // @ts-ignore
-      await navigator.clipboard.write(items);
+      const clipboardData: Record<string, Blob> = {
+        'text/plain': new Blob([text], { type: 'text/plain' }),
+      };
+      // Note: most browsers only support text/plain and text/html for clipboard
+      await navigator.clipboard.write([new ClipboardItem(clipboardData)]);
     } else {
       // fallback
       const ta = document.createElement('textarea');

Consider storing metadata separately (e.g., in component state or localStorage) if needed, since browsers restrict clipboard MIME types.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const copySelection = async (includeMeta = true) => {
if (!ocrResult) return;
const texts: string[] = [];
const boxes: any[] = [];
Array.from(selectedBoxes).sort((a,b)=>a-b).forEach((i)=>{
const b = ocrResult.boxes[i];
if (b) { texts.push(b.text); boxes.push(b); }
});
const text = texts.join(' ');
try {
if (navigator.clipboard && navigator.clipboard.write) {
const items: any[] = [];
items.push(new ClipboardItem({ 'text/plain': new Blob([text], { type:'text/plain' }) }));
if (includeMeta) {
const meta = JSON.stringify({ text, boxes, confidence: boxes.map((x:any)=>x.confidence) });
items.push(new ClipboardItem({ 'application/json': new Blob([meta], { type:'application/json' }) }));
}
// @ts-ignore
await navigator.clipboard.write(items);
} else {
// fallback
const ta = document.createElement('textarea');
ta.value = text;
document.body.appendChild(ta);
ta.select();
document.execCommand('copy');
document.body.removeChild(ta);
}
} catch (e) {
console.error('Copy failed', e);
}
};
const copySelection = async (includeMeta = true) => {
if (!ocrResult) return;
const texts: string[] = [];
const boxes: any[] = [];
Array.from(selectedBoxes).sort((a,b)=>a-b).forEach((i)=>{
const b = ocrResult.boxes[i];
if (b) { texts.push(b.text); boxes.push(b); }
});
const text = texts.join(' ');
try {
if (navigator.clipboard && navigator.clipboard.write) {
const clipboardData: Record<string, Blob> = {
'text/plain': new Blob([text], { type: 'text/plain' }),
};
// Note: most browsers only support text/plain and text/html for clipboard
await navigator.clipboard.write([new ClipboardItem(clipboardData)]);
} else {
// fallback
const ta = document.createElement('textarea');
ta.value = text;
document.body.appendChild(ta);
ta.select();
document.execCommand('copy');
document.body.removeChild(ta);
}
} catch (e) {
console.error('Copy failed', e);
}
};
🤖 Prompt for AI Agents
In frontend/src/components/Media/ImageTextSelector.tsx around lines 115-146 the
clipboard logic incorrectly writes separate ClipboardItem objects for plain text
and JSON metadata; instead build a single ClipboardItem that contains both MIME
types when includeMeta is true (or only 'text/plain' when not supported), then
call navigator.clipboard.write once with an array containing that single
ClipboardItem; additionally, detect support for 'application/json' (or catch
failures) and fall back to writing only text and/or persist metadata to
component state/localStorage if needed.

Comment on lines +148 to +167
const refineSelection = async () => {
if (!ocrResult || !selectionRect) return;
// map selectionRect back to image coords and run OCR for that rect at higher resolution
const imgRect = imgRef.current!.getBoundingClientRect();
const sx = Math.round(selectionRect.x * (ocrResult.width / imgRect.width));
const sy = Math.round(selectionRect.y * (ocrResult.height / imgRect.height));
const sw = Math.round(selectionRect.width * (ocrResult.width / imgRect.width));
const sh = Math.round(selectionRect.height * (ocrResult.height / imgRect.height));
// debounce: small delay
const res = await runOCR(imageUrl, { rect: { x: sx, y: sy, width: sw, height: sh }, maxWidth: sw, maxHeight: sh });
// merge boxes: replace intersecting boxes in ocrResult with refined ones
const newBoxes = [...ocrResult.boxes];
// remove boxes that intersect selection
const filtered = newBoxes.filter((b)=>!(b.x < sx+sw && b.x+b.width > sx && b.y < sy+sh && b.y+b.height > sy));
// adjust refined box coordinates relative to full image
res.boxes.forEach((b)=>{
filtered.push({ ...b, x: b.x + sx, y: b.y + sy });
});
setOcrResult({ ...ocrResult, boxes: filtered, text: (ocrResult.text + '\n' + res.text).trim() });
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Missing null guard on imgRef.current.

Line 151 uses imgRef.current! without checking if the ref is set. Add a guard for safety.

 const refineSelection = async () => {
-  if (!ocrResult || !selectionRect) return;
+  if (!ocrResult || !selectionRect || !imgRef.current) return;
   // map selectionRect back to image coords and run OCR for that rect at higher resolution
-  const imgRect = imgRef.current!.getBoundingClientRect();
+  const imgRect = imgRef.current.getBoundingClientRect();
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const refineSelection = async () => {
if (!ocrResult || !selectionRect) return;
// map selectionRect back to image coords and run OCR for that rect at higher resolution
const imgRect = imgRef.current!.getBoundingClientRect();
const sx = Math.round(selectionRect.x * (ocrResult.width / imgRect.width));
const sy = Math.round(selectionRect.y * (ocrResult.height / imgRect.height));
const sw = Math.round(selectionRect.width * (ocrResult.width / imgRect.width));
const sh = Math.round(selectionRect.height * (ocrResult.height / imgRect.height));
// debounce: small delay
const res = await runOCR(imageUrl, { rect: { x: sx, y: sy, width: sw, height: sh }, maxWidth: sw, maxHeight: sh });
// merge boxes: replace intersecting boxes in ocrResult with refined ones
const newBoxes = [...ocrResult.boxes];
// remove boxes that intersect selection
const filtered = newBoxes.filter((b)=>!(b.x < sx+sw && b.x+b.width > sx && b.y < sy+sh && b.y+b.height > sy));
// adjust refined box coordinates relative to full image
res.boxes.forEach((b)=>{
filtered.push({ ...b, x: b.x + sx, y: b.y + sy });
});
setOcrResult({ ...ocrResult, boxes: filtered, text: (ocrResult.text + '\n' + res.text).trim() });
};
const refineSelection = async () => {
if (!ocrResult || !selectionRect || !imgRef.current) return;
// map selectionRect back to image coords and run OCR for that rect at higher resolution
const imgRect = imgRef.current.getBoundingClientRect();
const sx = Math.round(selectionRect.x * (ocrResult.width / imgRect.width));
const sy = Math.round(selectionRect.y * (ocrResult.height / imgRect.height));
const sw = Math.round(selectionRect.width * (ocrResult.width / imgRect.width));
const sh = Math.round(selectionRect.height * (ocrResult.height / imgRect.height));
// debounce: small delay
const res = await runOCR(imageUrl, { rect: { x: sx, y: sy, width: sw, height: sh }, maxWidth: sw, maxHeight: sh });
// merge boxes: replace intersecting boxes in ocrResult with refined ones
const newBoxes = [...ocrResult.boxes];
// remove boxes that intersect selection
const filtered = newBoxes.filter((b)=>!(b.x < sx+sw && b.x+b.width > sx && b.y < sy+sh && b.y+b.height > sy));
// adjust refined box coordinates relative to full image
res.boxes.forEach((b)=>{
filtered.push({ ...b, x: b.x + sx, y: b.y + sy });
});
setOcrResult({ ...ocrResult, boxes: filtered, text: (ocrResult.text + '\n' + res.text).trim() });
};
🤖 Prompt for AI Agents
In frontend/src/components/Media/ImageTextSelector.tsx around lines 148 to 167,
the code uses imgRef.current! on line 151 without a null guard; update the
function to safely read the ref into a local variable (e.g. const imgEl =
imgRef.current) and if imgEl is falsy return early (or handle the error), then
call imgEl.getBoundingClientRect() and proceed — this avoids the non-null
assertion and runtime errors when the image ref is not set.

Comment on lines +169 to +193
return (
<div className={`image-text-selector ${className||''}`}>
<div className="its-image-container" ref={containerRef} onMouseDown={handleMouseDown} onMouseMove={handleMouseMove} onMouseUp={handleMouseUp}>
<img ref={imgRef} src={imageUrl} alt={alt||''} className="its-image" />
{selectionMode && ocrResult && (
<div className="its-overlay" aria-hidden>{ocrResult.boxes.map((b, idx) => {
const mapped = mapBox(b);
if (!mapped) return null;
const selected = selectedBoxes.has(idx);
return (
<div key={idx} className={`its-box ${selected? 'selected':''}`} style={{ left: mapped.left, top: mapped.top, width: mapped.width, height: mapped.height }} data-idx={idx} role="button" aria-label={`text-box-${idx}`}>{b.text}</div>
);
})}
{selectionRect && <div className="its-selection-rect" style={{ left: selectionRect.x, top: selectionRect.y, width: selectionRect.width, height: selectionRect.height }} />}
</div>
)}
</div>
{selectionMode && (
<div className="its-controls" aria-hidden>
<button onClick={()=>copySelection(true)} aria-label="Copy selection">Copy</button>
<button onClick={()=>refineSelection()} aria-label="Refine selection">Refine</button>
</div>
)}
</div>
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Accessibility issues: interactive elements have aria-hidden and controls lack keyboard support.

  • Line 174 and 187: The overlay with clickable boxes and controls have aria-hidden, hiding them from assistive technology entirely.
  • Line 179: The div has role="button" but no tabIndex or keyboard handler, making it inaccessible via keyboard.
  • The selection mode toggle (Ctrl+T) isn't discoverable for screen reader users.

Consider:

  1. Remove aria-hidden from interactive containers or provide alternative accessible controls.
  2. Add tabIndex={0} and onKeyDown handlers to elements with role="button".
  3. Add visible UI toggle or instructions for the selection mode.
🤖 Prompt for AI Agents
In frontend/src/components/Media/ImageTextSelector.tsx around lines 169-193, the
overlay and controls are currently hidden from AT and the interactive boxes are
not keyboard-accessible; remove aria-hidden from the overlay (line ~174) and the
controls container (line ~187) so screen readers can access them, add
tabIndex={0} to the divs with role="button" and implement an onKeyDown handler
that triggers the same click behavior on Enter/Space, ensure the selection boxes
expose accessible names (aria-label or use the existing aria-label per box) and
focus styling, and add a visible, focusable toggle or on-screen instructions
(with aria-live or a visible button showing "Selection mode (Ctrl+T)") so
keyboard and screen-reader users can discover and activate selection mode.

Comment on lines +34 to +72
export async function initOCRWorker() {
if (worker) return worker;
// worker file uses Vite/webpack worker import when built; fallback to dynamic import of a module that uses tesseract.js directly
try {
// create worker using new Worker — worker file placed at ../workers/ocr.worker.ts
// Vite: new URL import is supported; to keep this file simple we instantiate the worker by relative path
// eslint-disable-next-line @typescript-eslint/ban-ts-comment
// @ts-ignore
worker = new Worker(new URL('../workers/ocr.worker.ts', import.meta.url), { type: 'module' });
worker.onmessage = (ev) => {
const { id, payload } = ev.data || {};
if (id && pending.has(id)) {
const cb = pending.get(id)!;
pending.delete(id);
cb(payload);
}
};
return worker;
} catch (e) {
console.warn('Could not spawn OCR worker; falling back to in-thread OCR (may block UI).', e);
// Try dynamic import of worker module and use a fake worker shim
const module = await import('../workers/ocr.worker');
// create a shim that calls module.onmessage
worker = {
postMessage(msg: any) {
// module handles messages via exported function
// @ts-ignore
module.onmessage({ data: msg, __fake: true });
},
onmessage: null as any,
terminate() {},
} as unknown as Worker;
// @ts-ignore
module.setMainThreadPost((data: any) => {
if (worker && (worker as any).onmessage) (worker as any).onmessage({ data });
});
return worker;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Worker onerror handler is missing.

If the worker encounters an error (e.g., script load failure, unhandled exception), there's no onerror handler to catch it. This could leave pending promises hanging or miss critical failures.

     worker = new Worker(new URL('../workers/ocr.worker.ts', import.meta.url), { type: 'module' });
     worker.onmessage = (ev) => {
       const { id, payload } = ev.data || {};
       if (id && pending.has(id)) {
         const cb = pending.get(id)!;
         pending.delete(id);
         cb(payload);
       }
     };
+    worker.onerror = (err) => {
+      console.error('OCR worker error:', err);
+      // Reject all pending operations
+      pending.forEach((cb) => cb({ error: 'Worker error' }));
+      pending.clear();
+    };
     return worker;

Comment on lines +74 to +83
// convert canvas to blob/url
let src: any = imageUrl;
if (canvas) {
try {
const blob2 = await (canvas as OffscreenCanvas).convertToBlob();
src = URL.createObjectURL(blob2);
} catch (e) {
// ignore
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Memory leak: blob URL is never revoked.

URL.createObjectURL creates a reference that persists until the document is unloaded or explicitly revoked. This can accumulate memory over multiple OCR operations.

   let src: any = imageUrl;
   if (canvas) {
     try {
       const blob2 = await (canvas as OffscreenCanvas).convertToBlob();
       src = URL.createObjectURL(blob2);
     } catch (e) {
       // ignore
     }
   }
   const { data } = await worker.recognize(src);
+  // Revoke blob URL to free memory
+  if (src !== imageUrl) {
+    URL.revokeObjectURL(src);
+  }

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In frontend/src/workers/ocr.worker.ts around lines 74 to 83, the code creates a
blob URL via URL.createObjectURL(blob2) but never revokes it, causing a memory
leak; change the flow so when you generate src from the canvas blob you record
that it’s a temporary URL and ensure you call URL.revokeObjectURL(tempUrl) once
the URL is no longer needed (for example in a finally block, after the image has
been loaded/processed or after posting the OCR result), or wrap creation/usage
in try/finally to revoke only the blob-created URL and not the original
imageUrl.

@GaneshPatil7517 GaneshPatil7517 closed this by deleting the head repository Dec 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant