forked from tesseract-ocr/tesseract
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Introducing new APIs to assist with detecting and reporting overlarge…
… input images. Available to both userland and tesseract internal code, these can be used to report & early fail images which are too large to fit in memory. Some very lenient defaults are used for the memory pressure allowance (1.5 GByte for 32bit builds, 64GByte for 64bit builds) but this can be tweaked to your liking and local machine shop via Tesseract Global Variable `allowed_image_memory_capacity` (DOUBLE type). NOTE: the allowance limit can be effectively removed by setting this variable to an 'insane' value, e.g. `1.0e30`. HOWEVER, the CheckAndReportIfImageTooLarge() API will still fire for images with either width or high dimension >= TDIMENSION_MAX, which in the default built is the classic INT16_MAX (32767px); when compiled with defined(LARGE_IMAGES), then the width/height limit is raised to 24bit i.e. ~ 16.7 Mpx, which would then tolerate images smaller than 16777216 x 16777216px. (This latter part is a work-in-progress.) Related: - tesseract-ocr#3184 - tesseract-ocr#3885 - tesseract-ocr#3435 (pullreq by @stweil -- WIP) # Conflicts: # src/api/baseapi.cpp # src/ccmain/tesseractclass.h # src/ccmain/thresholder.cpp # src/ccutil/params.h # src/textord/tordmain.cpp
- Loading branch information
1 parent
9d71da7
commit 7d2e851
Showing
10 changed files
with
640 additions
and
8 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,62 @@ | ||
/********************************************************************** | ||
* File: memcost_estimate.h | ||
* Description: Inline routines and macros for serialisation functions | ||
* Author: Ger Hobbelt | ||
* | ||
* (C) Copyright 1990, Hewlett-Packard Ltd. | ||
** Licensed under the Apache License, Version 2.0 (the "License"); | ||
** you may not use this file except in compliance with the License. | ||
** You may obtain a copy of the License at | ||
** http://www.apache.org/licenses/LICENSE-2.0 | ||
** Unless required by applicable law or agreed to in writing, software | ||
** distributed under the License is distributed on an "AS IS" BASIS, | ||
** WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
** See the License for the specific language governing permissions and | ||
** limitations under the License. | ||
* | ||
**********************************************************************/ | ||
|
||
#ifndef T_MEMCOST_ESTIMATE_H | ||
#define T_MEMCOST_ESTIMATE_H | ||
|
||
#include <string> | ||
|
||
namespace tesseract { | ||
|
||
// Image memory capacity cost estimate report. Cost is measured in BYTES. Cost is reported | ||
// (`to_string()`) in GBYTES. | ||
// | ||
// Uses `allowed_image_memory_capacity` plus some compile-time heuristics to indicate | ||
// whether the estimated cost is oversized --> `cost.is_too_large()` | ||
struct ImageCostEstimate { | ||
float cost; | ||
|
||
protected: | ||
float allowed_image_memory_capacity; | ||
|
||
public: | ||
ImageCostEstimate() | ||
: ImageCostEstimate(0.0f, 1e30f) { | ||
} | ||
|
||
ImageCostEstimate(float c, float allowance = 1e30f); | ||
|
||
static float get_max_system_allowance(); | ||
|
||
float get_max_allowance() const; | ||
|
||
void set_max_allowance(float allowance); | ||
|
||
bool is_too_large() const; | ||
|
||
std::string to_string() const; | ||
|
||
// implicit conversion | ||
operator std::string() const; | ||
|
||
static std::string capacity_to_string(float cost); | ||
}; | ||
|
||
} // namespace tesseract. | ||
|
||
#endif |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.