Skip to content

Releases: naptha/tesseract.js


28 Aug 04:36
Choose a tag to compare

What's Changed

  • Fixed bug causing excessive memory use when using FS + writeFile function (#812)
  • Fixed bug where setting output option debug: true was forcing recognition to be run (#788)
  • Added warning message when setParameters is used to set options that can only be set during initialize (#816)
  • Minor edits to reduce memory use (#815)
  • Minor changes to documentation, types, and example code (#575, #791, #803, #805, #810, #817)

New Contributors

Full Changelog: v4.1.1...v4.1.2


21 Jun 05:14
Choose a tag to compare

What's Changed

  • Fixed detection of image orientation metadata (#783)
    • Allows Tesseract.js to work with images taken on iOS devices
  • Minor changes to documentation and types (#781, #782, #778)

New Contributors

Full Changelog: v4.1.0...v4.1.1


03 Jun 00:57
Choose a tag to compare

What's Changed

  • Added ability to run layout analysis without recognition (#656)
  • Added support for OffscreenCanvas in browser version by @nathanbabcock (#766)
  • Fixed bug where recognize was running OCR even when not necessary (#769)
  • Fixed bug where certain valid langPath URLs caused errors in browser version (#558)
  • Removed problematic file-type and resolve-url dependencies (#773, #711)

Full Changelog: v4.0.6...v4.1.0


16 May 03:23
Choose a tag to compare

What's Changed

  • Invalid langData (.traineddata files) are now cleared from cache (#753)
    • Note: setting cacheMethod: 'none' or cacheMethod: 'refresh' to prevent invalid files from being cached should no longer be necessary
  • Added source maps to esm build (#761)
  • Various updates to documentation

Full Changelog: v4.0.5...v4.0.6


03 May 01:52
Choose a tag to compare

What's Changed

  • No changes to code
    • Removed unnecessary files to reduce the size of the npm package

Full Changelog: v4.0.4...v4.0.5


01 May 00:57
Choose a tag to compare

What's Changed

  • Added SIMD-detection when corePath is manually specified (#735)
    • Important note for users who set corePath: for significantly faster performance, set corePath to a directory that includes both tesseract-core.wasm.js and tesseract-core-simd.wasm.js
    • See this comment for explanation
  • Improved auto-rotate feature (rotateAuto: true) (#747)
  • Switched default CDN from unpkg to jsdelivr (#743)
  • Updated various dependencies (#729, #736, #737, #739, #741)
  • Reduced size of npm package (#731, #734, #740)

New Contributors

Full Changelog: v4.0.3...v4.0.4


30 Mar 02:41
Choose a tag to compare

What's Changed

  • Updated Tesseract to v5.3.0
    • This resolves bug with inverted (white on black) text recognition (#717)
  • Minor documentation fixes (#612, #614, #682, #673)
  • Better types for addJob by @nathanbabcock in #719

New Contributors

Full Changelog: v4.0.2...v4.0.3


18 Dec 06:16
Choose a tag to compare

What's Changed

  • Fixed bug breaking compatibility with certain devices (#701)

Full Changelog: v4.0.1...v4.0.2


10 Dec 05:17
Choose a tag to compare

What's Changed

  • Running recognize or detect with invalid image argument now throws error message (#699)
  • Fixed bug with custom langdata paths (#697)

New Contributors

Full Changelog: v4.0.0...v4.0.1


25 Nov 20:24
Choose a tag to compare

Breaking Changes

  1. createWorker is now async
    1. In most code this means worker = Tesseract.createWorker() should be replaced with worker = await Tesseract.createWorker()
    2. Calling with invalid workerPath or corePath now produces error/rejected promise (#654)
  2. worker.load is no longer needed (createWorker now returns worker pre-loaded)
  3. getPDF function replaced by pdf recognize option (#488)
    1. This allows PDFs to be created when using a scheduler
    2. See browser and node examples for usage

Major New Features

  1. Processed images created by Tesseract can be retrieved using imageColor, imageGrey, and imageBinary options (#588)
    1. See image-processing.html example for usage
  2. Image rotation options rotateAuto and rotateRadians have been added, which significantly improve accuracy on certain documents
    1. See Issue #648 example of how auto-rotation improves accuracy
    2. See image-processing.html example for usage of rotateAuto option
  3. Tesseract parameters (usually set using worker.setParameters) can now be set for single jobs using worker.recognize options (#665)
    1. For example, a single job can be set to recognize only numbers using worker.recognize(image, {tessedit_char_whitelist: "0123456789"})
    2. As these settings are reverted after the job, this allows for using different parameters for specific jobs when working with schedulers
  4. Initialization parameters (e.g. load_system_dawg, load_number_dawg, and load_punc_dawg) can now be set (#613)
    1. The third argument to worker.initialize now accepts either (1) an object with key/value pairs or (2) a string containing contents to write to a config file
    2. For example, both of these lines set load_number_dawg to 0:
      1. worker.initialize('eng', "0", {load_number_dawg: "0"});
      2. worker.initialize('eng', "0", "load_number_dawg 0");

Other Changes

  1. loadLanguage now resolves without error when language is loaded but writing to cache fails
    1. This allows for running in Firefox incognito mode using default settings (#609)
  2. detect returns null values when OS detection fails rather than throwing error (#526)
  3. Memory leak causing crashes fixed (#678)
  4. Cache corruption should now be much less common (#666)

New Contributors

Full Changelog: v3.0.3...v4.0.0