Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
252 changes: 217 additions & 35 deletions frontend/package-lock.json

Large diffs are not rendered by default.

2 changes: 2 additions & 0 deletions frontend/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
"test:watch": "jest --watch",
"preview": "vite preview",
"tauri": "tauri",
"tauri:dev": "node ./scripts/run-tauri-dev.cjs",
"setup:linux": "cd scripts && ./setup_env.sh",
"lint:check": "eslint --max-warnings 0 --config .eslintrc.json .",
"lint:fix": "eslint --max-warnings 0 --config .eslintrc.json . --fix",
Expand Down Expand Up @@ -64,6 +65,7 @@
"react-router": "^7.6.2",
"react-webcam": "^7.2.0",
"react-zoom-pan-pinch": "^3.7.0",
"tesseract.js": "^2.1.5",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

tesseract.js npm latest version release history

💡 Result:

Summary — latest npm version

  • Latest published npm version: 6.0.1. [npm: tesseract.js]. [1]

Recent release history (selected, from project releases)

  • v6.0.1 — Apr 7 (release notes: switched to built-in fetch for Node.js v18+, minor updates). [2]
  • v6.0.0 — Jan 7 (fixed memory leaks, reduced runtime/memory, breaking changes to default outputs). [2]
  • v5.1.1 — Aug 24 (bug fixes). [2]
  • v5.1.0 — May 7 (added line-size metrics to blocks output). [2]
  • v5.0.0 — (major v5 release: much smaller files and memory usage; see changelog). [3]
  • v2.1.4 — Oct 14, 2020 (last v2.x maintenance release). [4]

Sources

  1. npm package page for tesseract.js (version 6.0.1). [npm tesseract.js — 6.0.1]. [1]
  2. GitHub releases (naptha/tesseract.js — v6.0.1, v6.0.0, v5.x tags and notes). [2]
  3. Release notes / changelog for v5 (newreleases / npm summary). [3]
  4. Changelog listing for v2.x (v2.1.4, Oct 14, 2020). [4]

If you want, I can fetch a full chronological list of all npm versions (with exact publish dates) and export it as CSV.


Outdated tesseract.js version.

Version ^2.1.5 is significantly outdated. The latest version is 6.0.1, with v5.0.0 and v6.0.0 introducing major improvements including smaller bundle sizes, reduced memory usage, and breaking changes to output formats. Version 2.1.x has not been maintained since October 2020.

🤖 Prompt for AI Agents
In frontend/package.json around line 68, the project depends on an outdated
"tesseract.js": "^2.1.5"; update the dependency to a maintained modern release
(preferably "tesseract.js": "6.0.1" or the desired v5/v6 release), run package
manager install (npm/yarn/pnpm) and rebuild, then audit the code paths that
consume Tesseract outputs and adapt them to the new major-version API/format
changes per the v5/v6 changelog (adjust import paths if ESM/CJS changed, update
result object handling, and fix any new async/worker initialization patterns),
and add/adjust tests to cover OCR flows before merging.

"tailwind-merge": "^3.3.0",
"tailwindcss": "^4.1.8",
"ts-node": "^10.9.2",
Expand Down
22 changes: 22 additions & 0 deletions frontend/scripts/run-tauri-dev.cjs
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
#!/usr/bin/env node
const { spawnSync, spawn } = require('child_process');

function hasTauriCli() {
try {
const res = spawnSync('tauri', ['--version'], { stdio: 'ignore' });
return res.status === 0;
} catch (e) {
return false;
}
}

if (hasTauriCli()) {
console.log('Tauri CLI detected. Running `tauri dev`...');
const child = spawn('tauri', ['dev'], { stdio: 'inherit', shell: true });
child.on('exit', (code) => process.exit(code));
} else {
console.warn('Tauri CLI not found locally. Falling back to starting the web dev server (vite).');
console.warn('If you want full Tauri dev experience, install Tauri toolchain (Rust + @tauri-apps/cli).');
const child = spawn('npm', ['run', 'dev'], { stdio: 'inherit', shell: true });
child.on('exit', (code) => process.exit(code));
}
23 changes: 23 additions & 0 deletions frontend/scripts/run-tauri-dev.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
#!/usr/bin/env node
const { spawnSync, spawn } = require('child_process');
const path = require('path');
Comment on lines +1 to +3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Module type mismatch and unused import.

This file uses CommonJS require() syntax, but package.json declares "type": "module". Node.js will fail to execute this .js file with require is not defined. The .cjs version is the correct one to use.

Additionally, path is imported but never used.

 #!/usr/bin/env node
-const { spawnSync, spawn } = require('child_process');
-const path = require('path');
+const { spawnSync, spawn } = require('child_process');

Consider removing this file entirely since run-tauri-dev.cjs serves the same purpose and is the one referenced in package.json.

🤖 Prompt for AI Agents
In frontend/scripts/run-tauri-dev.js (lines 1-3): this file uses CommonJS
requires while package.json declares "type": "module" and also imports an unused
`path`; replace or remove the file. Either (preferred) delete
frontend/scripts/run-tauri-dev.js and ensure frontend/scripts/run-tauri-dev.cjs
is the script referenced in package.json, or rename this file to .cjs and remove
the unused `path` import; if keeping it, convert any package.json references to
point to the .cjs script and remove the unused import to eliminate the module
type mismatch and the unused variable.


function hasTauriCli() {
try {
const res = spawnSync('tauri', ['--version'], { stdio: 'ignore' });
return res.status === 0;
} catch (e) {
return false;
}
}

if (hasTauriCli()) {
console.log('Tauri CLI detected. Running `tauri dev`...');
const child = spawn('tauri', ['dev'], { stdio: 'inherit', shell: true });
child.on('exit', (code) => process.exit(code));
} else {
console.warn('Tauri CLI not found locally. Falling back to starting the web dev server (vite).');
console.warn('If you want full Tauri dev experience, install Tauri toolchain (Rust + @tauri-apps/cli).');
const child = spawn('npm', ['run', 'dev'], { stdio: 'inherit', shell: true });
child.on('exit', (code) => process.exit(code));
}
9 changes: 9 additions & 0 deletions frontend/src/components/Media/ImageTextSelector.css
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
.image-text-selector{position:relative}
.its-image-container{position:relative; width:100%; height:100%; overflow:hidden}
.its-image{display:block; max-width:100%; max-height:100%}
.its-overlay{position:absolute; left:0; top:0; right:0; bottom:0; pointer-events:none}
.its-box{position:absolute; border:1px dashed rgba(0,0,0,0.35); background:rgba(255,255,0,0.08); padding:2px; font-size:12px; pointer-events:auto}
.its-box.selected{outline:2px solid #0078D4; background:rgba(0,120,212,0.12)}
.its-selection-rect{position:absolute; border:2px solid #0b84ff44; background:rgba(11,132,255,0.08)}
.its-controls{position:fixed; right:20px; bottom:20px; display:flex; gap:8px}
.its-controls button{padding:6px 10px}
196 changes: 196 additions & 0 deletions frontend/src/components/Media/ImageTextSelector.tsx
Original file line number Diff line number Diff line change
@@ -0,0 +1,196 @@
import React, { useEffect, useRef, useState } from 'react';
import type { OCRResult, OCRBox } from '../../ocr/ocrWorker';
import { runOCR, getCachedOCR, initOCRWorker } from '../../ocr/ocrWorker';
import './ImageTextSelector.css';

type Props = {
imageUrl: string;
alt?: string;
className?: string;
};

export const ImageTextSelector: React.FC<Props> = ({ imageUrl, alt, className }) => {
const imgRef = useRef<HTMLImageElement | null>(null);
const containerRef = useRef<HTMLDivElement | null>(null);
const [ocrResult, setOcrResult] = useState<OCRResult | null>(null);
const [selectionMode, setSelectionMode] = useState(false);
const [selectedBoxes, setSelectedBoxes] = useState<Set<number>>(new Set());
const [selectionRect, setSelectionRect] = useState<{ x:number;y:number;width:number;height:number } | null>(null);
const [isSelecting, setIsSelecting] = useState(false);
const startPoint = useRef<{x:number;y:number}|null>(null);

useEffect(() => {
const onKey = (e: KeyboardEvent) => {
if (e.ctrlKey && e.key.toLowerCase() === 't') {
e.preventDefault();
setSelectionMode((s) => !s);
if (!selectionMode) {
// lazy init worker
initOCRWorker();
}
}
if (e.key === 'Escape') {
setSelectedBoxes(new Set());
setSelectionRect(null);
}
};
window.addEventListener('keydown', onKey);
return () => window.removeEventListener('keydown', onKey);
}, [selectionMode]);
Comment on lines +22 to +39
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Stale closure bug in keyboard handler.

The onKey handler captures selectionMode at the time the effect runs, but the effect only re-runs when selectionMode changes. When onKey checks if (!selectionMode) on line 27, it reads the stale value from closure, not the toggled value from setSelectionMode.

 useEffect(() => {
   const onKey = (e: KeyboardEvent) => {
     if (e.ctrlKey && e.key.toLowerCase() === 't') {
       e.preventDefault();
-      setSelectionMode((s) => !s);
-      if (!selectionMode) {
-        // lazy init worker
-        initOCRWorker();
-      }
+      setSelectionMode((s) => {
+        if (!s) {
+          // lazy init worker when entering selection mode
+          initOCRWorker();
+        }
+        return !s;
+      });
     }
     if (e.key === 'Escape') {
       setSelectedBoxes(new Set());
       setSelectionRect(null);
     }
   };
   window.addEventListener('keydown', onKey);
   return () => window.removeEventListener('keydown', onKey);
-}, [selectionMode]);
+}, []);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
useEffect(() => {
const onKey = (e: KeyboardEvent) => {
if (e.ctrlKey && e.key.toLowerCase() === 't') {
e.preventDefault();
setSelectionMode((s) => !s);
if (!selectionMode) {
// lazy init worker
initOCRWorker();
}
}
if (e.key === 'Escape') {
setSelectedBoxes(new Set());
setSelectionRect(null);
}
};
window.addEventListener('keydown', onKey);
return () => window.removeEventListener('keydown', onKey);
}, [selectionMode]);
useEffect(() => {
const onKey = (e: KeyboardEvent) => {
if (e.ctrlKey && e.key.toLowerCase() === 't') {
e.preventDefault();
setSelectionMode((s) => {
if (!s) {
// lazy init worker when entering selection mode
initOCRWorker();
}
return !s;
});
}
if (e.key === 'Escape') {
setSelectedBoxes(new Set());
setSelectionRect(null);
}
};
window.addEventListener('keydown', onKey);
return () => window.removeEventListener('keydown', onKey);
}, []);
🤖 Prompt for AI Agents
In frontend/src/components/Media/ImageTextSelector.tsx around lines 22 to 39,
the onKey handler reads a stale selectionMode from closure when toggling and may
call initOCRWorker incorrectly; change the toggle to use the functional updater
form of setSelectionMode so you can determine the next state from the previous
state (e.g., setSelectionMode(prev => { const next = !prev; if (next)
initOCRWorker(); return next; })), remove the reliance on the outer
selectionMode variable inside the handler (no need to keep it in the effect
deps), and keep Escape handling as-is.


useEffect(() => {
if (!selectionMode) return;
// run OCR lazily
let mounted = true;
(async () => {
try {
const cached = getCachedOCR(imageUrl);
if (cached) {
if (!mounted) return;
setOcrResult(cached);
return;
}
const res = await runOCR(imageUrl);
if (!mounted) return;
setOcrResult(res);
} catch (e) {
console.error('OCR error', e);
}
})();
return () => { mounted = false; };
}, [selectionMode, imageUrl]);

// map OCR coords to DOM coords using image natural dims + getBoundingClientRect
const mapBox = (box: OCRBox) => {
const img = imgRef.current;
const container = containerRef.current;
if (!img || !container || !ocrResult) return null;
const rect = img.getBoundingClientRect();
const scaleX = rect.width / ocrResult.width;
const scaleY = rect.height / ocrResult.height;
return { left: box.x * scaleX, top: box.y * scaleY, width: box.width * scaleX, height: box.height * scaleY };
};

const handleMouseDown = (e: React.MouseEvent) => {
if (!selectionMode) return;
setIsSelecting(true);
const container = containerRef.current!;
const r = container.getBoundingClientRect();
const x = e.clientX - r.left;
const y = e.clientY - r.top;
startPoint.current = { x, y };
setSelectionRect({ x, y, width: 0, height: 0 });
};
const handleMouseMove = (e: React.MouseEvent) => {
if (!isSelecting || !startPoint.current) return;
const container = containerRef.current!;
const r = container.getBoundingClientRect();
const x = e.clientX - r.left;
const y = e.clientY - r.top;
const sx = Math.min(startPoint.current.x, x);
const sy = Math.min(startPoint.current.y, y);
const w = Math.abs(x - startPoint.current.x);
const h = Math.abs(y - startPoint.current.y);
setSelectionRect({ x: sx, y: sy, width: w, height: h });
};
const handleMouseUp = () => {
if (!isSelecting) return;
setIsSelecting(false);
startPoint.current = null;
// compute selected boxes
if (!selectionRect || !ocrResult) return;
const boxes = new Set<number>();
const rect = imgRef.current!.getBoundingClientRect();
const scaleX = ocrResult.width / rect.width;
const scaleY = ocrResult.height / rect.height;
const sel = { x: selectionRect.x * scaleX, y: selectionRect.y * scaleY, width: selectionRect.width * scaleX, height: selectionRect.height * scaleY };
ocrResult.boxes.forEach((b, idx) => {
const bx = b.x, by = b.y, bw = b.width, bh = b.height;
const intersects = !(bx + bw < sel.x || bx > sel.x + sel.width || by + bh < sel.y || by > sel.y + sel.height);
if (intersects) boxes.add(idx);
});
setSelectedBoxes(boxes);
};
Comment on lines +96 to +113
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Null assertion on imgRef.current may cause runtime error.

Line 103 uses imgRef.current! without a guard. If the image hasn't loaded yet or the ref is null, this will throw. The early return on line 101 checks selectionRect and ocrResult but not imgRef.current.

 const handleMouseUp = () => {
   if (!isSelecting) return;
   setIsSelecting(false);
   startPoint.current = null;
   // compute selected boxes
-  if (!selectionRect || !ocrResult) return;
+  if (!selectionRect || !ocrResult || !imgRef.current) return;
   const boxes = new Set<number>();
-  const rect = imgRef.current!.getBoundingClientRect();
+  const rect = imgRef.current.getBoundingClientRect();
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const handleMouseUp = () => {
if (!isSelecting) return;
setIsSelecting(false);
startPoint.current = null;
// compute selected boxes
if (!selectionRect || !ocrResult) return;
const boxes = new Set<number>();
const rect = imgRef.current!.getBoundingClientRect();
const scaleX = ocrResult.width / rect.width;
const scaleY = ocrResult.height / rect.height;
const sel = { x: selectionRect.x * scaleX, y: selectionRect.y * scaleY, width: selectionRect.width * scaleX, height: selectionRect.height * scaleY };
ocrResult.boxes.forEach((b, idx) => {
const bx = b.x, by = b.y, bw = b.width, bh = b.height;
const intersects = !(bx + bw < sel.x || bx > sel.x + sel.width || by + bh < sel.y || by > sel.y + sel.height);
if (intersects) boxes.add(idx);
});
setSelectedBoxes(boxes);
};
const handleMouseUp = () => {
if (!isSelecting) return;
setIsSelecting(false);
startPoint.current = null;
// compute selected boxes
if (!selectionRect || !ocrResult || !imgRef.current) return;
const boxes = new Set<number>();
const rect = imgRef.current.getBoundingClientRect();
const scaleX = ocrResult.width / rect.width;
const scaleY = ocrResult.height / rect.height;
const sel = { x: selectionRect.x * scaleX, y: selectionRect.y * scaleY, width: selectionRect.width * scaleX, height: selectionRect.height * scaleY };
ocrResult.boxes.forEach((b, idx) => {
const bx = b.x, by = b.y, bw = b.width, bh = b.height;
const intersects = !(bx + bw < sel.x || bx > sel.x + sel.width || by + bh < sel.y || by > sel.y + sel.height);
if (intersects) boxes.add(idx);
});
setSelectedBoxes(boxes);
};
🤖 Prompt for AI Agents
In frontend/src/components/Media/ImageTextSelector.tsx around lines 96 to 113,
the handler uses imgRef.current! without a null check which can throw if the
image ref is not set; add a guard that returns early if imgRef.current is null
(or undefined) before calling getBoundingClientRect, then proceed to compute
scaleX/scaleY and selection using the non-null ref. Ensure the early-return is
added alongside the existing selectionRect/ocrResult checks so the function
exits safely when the image DOM node is unavailable.


const copySelection = async (includeMeta = true) => {
if (!ocrResult) return;
const texts: string[] = [];
const boxes: any[] = [];
Array.from(selectedBoxes).sort((a,b)=>a-b).forEach((i)=>{
const b = ocrResult.boxes[i];
if (b) { texts.push(b.text); boxes.push(b); }
});
const text = texts.join(' ');
try {
if (navigator.clipboard && navigator.clipboard.write) {
const items: any[] = [];
items.push(new ClipboardItem({ 'text/plain': new Blob([text], { type:'text/plain' }) }));
if (includeMeta) {
const meta = JSON.stringify({ text, boxes, confidence: boxes.map((x:any)=>x.confidence) });
items.push(new ClipboardItem({ 'application/json': new Blob([meta], { type:'application/json' }) }));
}
// @ts-ignore
await navigator.clipboard.write(items);
} else {
// fallback
const ta = document.createElement('textarea');
ta.value = text;
document.body.appendChild(ta);
ta.select();
document.execCommand('copy');
document.body.removeChild(ta);
}
} catch (e) {
console.error('Copy failed', e);
}
};
Comment on lines +115 to +146
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Clipboard API misuse: write() accepts a single array, not multiple calls.

navigator.clipboard.write() expects an array of ClipboardItem objects in a single call. The current code creates two separate ClipboardItem objects and passes them as an array, but each ClipboardItem should contain all MIME types for a single conceptual item. Additionally, writing JSON to clipboard as a separate item won't work as expected in most browsers.

 const copySelection = async (includeMeta = true) => {
   if (!ocrResult) return;
   const texts: string[] = [];
   const boxes: any[] = [];
   Array.from(selectedBoxes).sort((a,b)=>a-b).forEach((i)=>{
     const b = ocrResult.boxes[i];
     if (b) { texts.push(b.text); boxes.push(b); }
   });
   const text = texts.join(' ');
   try {
     if (navigator.clipboard && navigator.clipboard.write) {
-      const items: any[] = [];
-      items.push(new ClipboardItem({ 'text/plain': new Blob([text], { type:'text/plain' }) }));
-      if (includeMeta) {
-        const meta = JSON.stringify({ text, boxes, confidence: boxes.map((x:any)=>x.confidence) });
-        items.push(new ClipboardItem({ 'application/json': new Blob([meta], { type:'application/json' }) }));
-      }
-      // @ts-ignore
-      await navigator.clipboard.write(items);
+      const clipboardData: Record<string, Blob> = {
+        'text/plain': new Blob([text], { type: 'text/plain' }),
+      };
+      // Note: most browsers only support text/plain and text/html for clipboard
+      await navigator.clipboard.write([new ClipboardItem(clipboardData)]);
     } else {
       // fallback
       const ta = document.createElement('textarea');

Consider storing metadata separately (e.g., in component state or localStorage) if needed, since browsers restrict clipboard MIME types.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const copySelection = async (includeMeta = true) => {
if (!ocrResult) return;
const texts: string[] = [];
const boxes: any[] = [];
Array.from(selectedBoxes).sort((a,b)=>a-b).forEach((i)=>{
const b = ocrResult.boxes[i];
if (b) { texts.push(b.text); boxes.push(b); }
});
const text = texts.join(' ');
try {
if (navigator.clipboard && navigator.clipboard.write) {
const items: any[] = [];
items.push(new ClipboardItem({ 'text/plain': new Blob([text], { type:'text/plain' }) }));
if (includeMeta) {
const meta = JSON.stringify({ text, boxes, confidence: boxes.map((x:any)=>x.confidence) });
items.push(new ClipboardItem({ 'application/json': new Blob([meta], { type:'application/json' }) }));
}
// @ts-ignore
await navigator.clipboard.write(items);
} else {
// fallback
const ta = document.createElement('textarea');
ta.value = text;
document.body.appendChild(ta);
ta.select();
document.execCommand('copy');
document.body.removeChild(ta);
}
} catch (e) {
console.error('Copy failed', e);
}
};
const copySelection = async (includeMeta = true) => {
if (!ocrResult) return;
const texts: string[] = [];
const boxes: any[] = [];
Array.from(selectedBoxes).sort((a,b)=>a-b).forEach((i)=>{
const b = ocrResult.boxes[i];
if (b) { texts.push(b.text); boxes.push(b); }
});
const text = texts.join(' ');
try {
if (navigator.clipboard && navigator.clipboard.write) {
const clipboardData: Record<string, Blob> = {
'text/plain': new Blob([text], { type: 'text/plain' }),
};
// Note: most browsers only support text/plain and text/html for clipboard
await navigator.clipboard.write([new ClipboardItem(clipboardData)]);
} else {
// fallback
const ta = document.createElement('textarea');
ta.value = text;
document.body.appendChild(ta);
ta.select();
document.execCommand('copy');
document.body.removeChild(ta);
}
} catch (e) {
console.error('Copy failed', e);
}
};
🤖 Prompt for AI Agents
In frontend/src/components/Media/ImageTextSelector.tsx around lines 115-146 the
clipboard logic incorrectly writes separate ClipboardItem objects for plain text
and JSON metadata; instead build a single ClipboardItem that contains both MIME
types when includeMeta is true (or only 'text/plain' when not supported), then
call navigator.clipboard.write once with an array containing that single
ClipboardItem; additionally, detect support for 'application/json' (or catch
failures) and fall back to writing only text and/or persist metadata to
component state/localStorage if needed.


const refineSelection = async () => {
if (!ocrResult || !selectionRect) return;
// map selectionRect back to image coords and run OCR for that rect at higher resolution
const imgRect = imgRef.current!.getBoundingClientRect();
const sx = Math.round(selectionRect.x * (ocrResult.width / imgRect.width));
const sy = Math.round(selectionRect.y * (ocrResult.height / imgRect.height));
const sw = Math.round(selectionRect.width * (ocrResult.width / imgRect.width));
const sh = Math.round(selectionRect.height * (ocrResult.height / imgRect.height));
// debounce: small delay
const res = await runOCR(imageUrl, { rect: { x: sx, y: sy, width: sw, height: sh }, maxWidth: sw, maxHeight: sh });
// merge boxes: replace intersecting boxes in ocrResult with refined ones
const newBoxes = [...ocrResult.boxes];
// remove boxes that intersect selection
const filtered = newBoxes.filter((b)=>!(b.x < sx+sw && b.x+b.width > sx && b.y < sy+sh && b.y+b.height > sy));
// adjust refined box coordinates relative to full image
res.boxes.forEach((b)=>{
filtered.push({ ...b, x: b.x + sx, y: b.y + sy });
});
setOcrResult({ ...ocrResult, boxes: filtered, text: (ocrResult.text + '\n' + res.text).trim() });
};
Comment on lines +148 to +167
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Missing null guard on imgRef.current.

Line 151 uses imgRef.current! without checking if the ref is set. Add a guard for safety.

 const refineSelection = async () => {
-  if (!ocrResult || !selectionRect) return;
+  if (!ocrResult || !selectionRect || !imgRef.current) return;
   // map selectionRect back to image coords and run OCR for that rect at higher resolution
-  const imgRect = imgRef.current!.getBoundingClientRect();
+  const imgRect = imgRef.current.getBoundingClientRect();
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const refineSelection = async () => {
if (!ocrResult || !selectionRect) return;
// map selectionRect back to image coords and run OCR for that rect at higher resolution
const imgRect = imgRef.current!.getBoundingClientRect();
const sx = Math.round(selectionRect.x * (ocrResult.width / imgRect.width));
const sy = Math.round(selectionRect.y * (ocrResult.height / imgRect.height));
const sw = Math.round(selectionRect.width * (ocrResult.width / imgRect.width));
const sh = Math.round(selectionRect.height * (ocrResult.height / imgRect.height));
// debounce: small delay
const res = await runOCR(imageUrl, { rect: { x: sx, y: sy, width: sw, height: sh }, maxWidth: sw, maxHeight: sh });
// merge boxes: replace intersecting boxes in ocrResult with refined ones
const newBoxes = [...ocrResult.boxes];
// remove boxes that intersect selection
const filtered = newBoxes.filter((b)=>!(b.x < sx+sw && b.x+b.width > sx && b.y < sy+sh && b.y+b.height > sy));
// adjust refined box coordinates relative to full image
res.boxes.forEach((b)=>{
filtered.push({ ...b, x: b.x + sx, y: b.y + sy });
});
setOcrResult({ ...ocrResult, boxes: filtered, text: (ocrResult.text + '\n' + res.text).trim() });
};
const refineSelection = async () => {
if (!ocrResult || !selectionRect || !imgRef.current) return;
// map selectionRect back to image coords and run OCR for that rect at higher resolution
const imgRect = imgRef.current.getBoundingClientRect();
const sx = Math.round(selectionRect.x * (ocrResult.width / imgRect.width));
const sy = Math.round(selectionRect.y * (ocrResult.height / imgRect.height));
const sw = Math.round(selectionRect.width * (ocrResult.width / imgRect.width));
const sh = Math.round(selectionRect.height * (ocrResult.height / imgRect.height));
// debounce: small delay
const res = await runOCR(imageUrl, { rect: { x: sx, y: sy, width: sw, height: sh }, maxWidth: sw, maxHeight: sh });
// merge boxes: replace intersecting boxes in ocrResult with refined ones
const newBoxes = [...ocrResult.boxes];
// remove boxes that intersect selection
const filtered = newBoxes.filter((b)=>!(b.x < sx+sw && b.x+b.width > sx && b.y < sy+sh && b.y+b.height > sy));
// adjust refined box coordinates relative to full image
res.boxes.forEach((b)=>{
filtered.push({ ...b, x: b.x + sx, y: b.y + sy });
});
setOcrResult({ ...ocrResult, boxes: filtered, text: (ocrResult.text + '\n' + res.text).trim() });
};
🤖 Prompt for AI Agents
In frontend/src/components/Media/ImageTextSelector.tsx around lines 148 to 167,
the code uses imgRef.current! on line 151 without a null guard; update the
function to safely read the ref into a local variable (e.g. const imgEl =
imgRef.current) and if imgEl is falsy return early (or handle the error), then
call imgEl.getBoundingClientRect() and proceed — this avoids the non-null
assertion and runtime errors when the image ref is not set.


return (
<div className={`image-text-selector ${className||''}`}>
<div className="its-image-container" ref={containerRef} onMouseDown={handleMouseDown} onMouseMove={handleMouseMove} onMouseUp={handleMouseUp}>
<img ref={imgRef} src={imageUrl} alt={alt||''} className="its-image" />
{selectionMode && ocrResult && (
<div className="its-overlay" aria-hidden>{ocrResult.boxes.map((b, idx) => {
const mapped = mapBox(b);
if (!mapped) return null;
const selected = selectedBoxes.has(idx);
return (
<div key={idx} className={`its-box ${selected? 'selected':''}`} style={{ left: mapped.left, top: mapped.top, width: mapped.width, height: mapped.height }} data-idx={idx} role="button" aria-label={`text-box-${idx}`}>{b.text}</div>
);
})}
{selectionRect && <div className="its-selection-rect" style={{ left: selectionRect.x, top: selectionRect.y, width: selectionRect.width, height: selectionRect.height }} />}
</div>
)}
</div>
{selectionMode && (
<div className="its-controls" aria-hidden>
<button onClick={()=>copySelection(true)} aria-label="Copy selection">Copy</button>
<button onClick={()=>refineSelection()} aria-label="Refine selection">Refine</button>
</div>
)}
</div>
);
Comment on lines +169 to +193
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Accessibility issues: interactive elements have aria-hidden and controls lack keyboard support.

  • Line 174 and 187: The overlay with clickable boxes and controls have aria-hidden, hiding them from assistive technology entirely.
  • Line 179: The div has role="button" but no tabIndex or keyboard handler, making it inaccessible via keyboard.
  • The selection mode toggle (Ctrl+T) isn't discoverable for screen reader users.

Consider:

  1. Remove aria-hidden from interactive containers or provide alternative accessible controls.
  2. Add tabIndex={0} and onKeyDown handlers to elements with role="button".
  3. Add visible UI toggle or instructions for the selection mode.
🤖 Prompt for AI Agents
In frontend/src/components/Media/ImageTextSelector.tsx around lines 169-193, the
overlay and controls are currently hidden from AT and the interactive boxes are
not keyboard-accessible; remove aria-hidden from the overlay (line ~174) and the
controls container (line ~187) so screen readers can access them, add
tabIndex={0} to the divs with role="button" and implement an onKeyDown handler
that triggers the same click behavior on Enter/Space, ensure the selection boxes
expose accessible names (aria-label or use the existing aria-label per box) and
focus styling, and add a visible, focusable toggle or on-screen instructions
(with aria-live or a visible button showing "Selection mode (Ctrl+T)") so
keyboard and screen-reader users can discover and activate selection mode.

};

export default ImageTextSelector;
Loading