-
Notifications
You must be signed in to change notification settings - Fork 1
Command line tool that extracts into individual png files an image of each textline as per the contours defined in a Page XML file
License
PRHLT/pageLineExtractor-deprecated
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
<<This tool is now deprecated and will not be maintained any longer. >> README ------------------------------------------------- 1. INSTALLATION In order to install the software just execute the command: make install This will leave the following command line tool: page_format_tool 2. USAGE The command line tool has 4 modes of usage: 2.1 Help - lists the command line options ./page_format_tool --help Allowed options: -h [ --help ] Generates this help message -i [ --input_image ] arg (=image.jpg) Input image from which to extract the line images (by default ./image.jpg) -l [ --page_file ] arg (=page.xml) Page file path (by default ./page.xml) -m [ --operation_mode ] arg (=DISPLAY) Operation mode of the command line tool, list printing out the list of line regions (LIST) or save the line images to (FILE) (default value is LIST) -v [ --verbosity ] arg (=0) % Verbosity os messages during execution [0-2] 2.2 List - lists the line regions in reading order and indicates their height and width ./page_format_tool -i 072_047_003.jpg -l 072_047_003.xml -m LIST -v 0 #Name: 072_047_003.tif #Width: 2731 #Height: 4096 65 55 r1 r100 51 55 r2 r101 1638 140 r4 r102 1699 112 r5 r103 1687 96 r5 r104 796 78 r5 r105 1368 127 r5 r106 1697 80 r5 r107 1719 92 r5 r108 1703 107 r5 r109 1683 100 r5 r110 805 86 r5 r111 1336 130 r5 r112 1695 103 r5 r113 1718 111 r5 r114 885 94 r5 r30 220 61 r5 r31 1678 115 r5 r115 128 50 r5 r32 1688 134 r5 r116 1734 137 r5 r117 1682 120 r5 r118 1568 127 r5 r119 1704 114 r5 r120 1693 134 r5 r121 1718 125 r5 r122 1711 119 r5 r123 1689 112 r5 r33 45 40 r5 r34 2.3 File - saves to an individual png file each of the lines present in the PAGE xml in reading order: ./page_format_tool -i 072_047_003.jpg -l 072_047_003.xml -m FILE -v 0 Additionally the file mode lists the lines that are present in the PAGE file but are not stored as per the reading order: 3383 [0x7f6fc27739c0] ERROR PRHLT.Page_File null - Out of place line >> 12 for 21 3396 [0x7f6fc27739c0] ERROR PRHLT.Page_File null - Out of place line >> 13 for 22 3398 [0x7f6fc27739c0] ERROR PRHLT.Page_File null - Out of place line >> 14 for 12 3429 [0x7f6fc27739c0] ERROR PRHLT.Page_File null - Out of place line >> 15 for 23 3430 [0x7f6fc27739c0] ERROR PRHLT.Page_File null - Out of place line >> 16 for 13 3466 [0x7f6fc27739c0] ERROR PRHLT.Page_File null - Out of place line >> 17 for 14 3503 [0x7f6fc27739c0] ERROR PRHLT.Page_File null - Out of place line >> 18 for 15 3535 [0x7f6fc27739c0] ERROR PRHLT.Page_File null - Out of place line >> 19 for 16 3567 [0x7f6fc27739c0] ERROR PRHLT.Page_File null - Out of place line >> 20 for 17 3598 [0x7f6fc27739c0] ERROR PRHLT.Page_File null - Out of place line >> 21 for 18 3635 [0x7f6fc27739c0] ERROR PRHLT.Page_File null - Out of place line >> 22 for 19 3669 [0x7f6fc27739c0] ERROR PRHLT.Page_File null - Out of place line >> 23 for 20 2.4 Default - if no parameters are given the command line tool executes the LIST option and expects the existance of files with specific names: ./page_format_tool = ./page_format_tool -i image.jpg -l page.xml -m LIST -v 0
About
Command line tool that extracts into individual png files an image of each textline as per the contours defined in a Page XML file
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published