OCR4Linux is a versatile text extraction tool that allows you to take a screenshot of a selected area, extract text using OCR, and copy it to the clipboard. It supports both Wayland and X11 sessions and offers multiple language support.
Note: This script is currently only made for Arch Linux. It may work on other arch-based distributions, but it has not been tested yet.
I didn't find any easy tool in Linux that does the same thing as the PowerTool app in Windows. This motivated me to create OCR4Linux, a simple and efficient tool to capture screenshots, extract text, and copy it to the clipboard, all in one seamless process.
-
Screenshot Capture
- Wayland support via
grimblast
- X11 support via
scrot
- Configurable screenshot directory
- Wayland support via
-
Text Extraction
- Automatic language detection
- Multi-language OCR support
- Image preprocessing for better accuracy
- UTF-8 text output
-
Clipboard Integration
- Wayland:
wl-copy
andcliphist
- X11:
xclip
- Wayland:
-
Additional Features
- Optional screenshot retention
- Comprehensive logging system
- Command-line interface
-
Arch Linux or arch-based distribution
-
Python 3.x
-
yay
package manager (will be installed if needed) -
tesseract
OCR engine -
tesseract-data-eng
English language pack -
tesseract-data-ara
Arabic language pack -
If you need any other language other than the above two, search for it using the command:
sudo pacman -Ss tesseract-data-{lang}
python-pillow
python-pytesseract
- Wayland:
grimblast-git
wl-clipboard
cliphist
rofi-wayland
- X11:
scrot
xclip
rofi
-
Clone the repository:
git clone https://github.com/moheladwy/OCR4Linux.git cd OCR4Linux
-
Run the setup script to install the required packages and copy the necessary files to the configuration directory:
chmod +x setup.sh ./setup.sh
-
Run the main script to take a screenshot, extract text, and copy it to the clipboard:
chmod +x OCR4Linux.sh ./OCR4Linux.sh
-
The script will take a screenshot of the selected area, extract the text from the image, and copy it to the clipboard.
Option | Description | Default |
---|---|---|
-r |
Remove screenshot after processing | false |
-d DIR |
Set screenshot directory | $HOME/Pictures/screenshots |
-l |
Keep logs | false |
-h |
Show help message | - |
Option | Description | Required |
---|---|---|
image_path |
Path to input image | Yes |
output_path |
Path to save extracted text | Yes |
-l, --list-langs |
List available OCR languages | No |
-h, --help |
Show help message | No |
# Basic usage
./OCR4Linux.sh
# Save logs and remove screenshot after processing
./OCR4Linux.sh -l -r
# Custom screenshot directory with logging
./OCR4Linux.sh -d ~/Documents/screenshots -l
# Show help
./OCR4Linux.sh -h
# Basic usage
python OCR4Linux.py input.png output.txt
# List available languages
python OCR4Linux.py --list-langs
# Show help
python OCR4Linux.py --help
- You can create a keyboard shortcut to run the script for easy access.
- put the following lines in your
hyprland.conf
file:$OCR4Linux = ~/.config/OCR4Linux/OCR4Linux.sh bind = $mainMod SHIFT, E, exec, $OCR4Linux # OCR4Linux script
- put the following lines in your
config.h
file:static const char *ocr4linux[] = { "sh", "-c", "~/.config/OCR4Linux/OCR4Linux.sh", NULL }; { MODKEY | ShiftMask, XK_e, spawn, {.v = ocr4linux } }, // OCR4Linux script
- put the following lines in your
- OCR4Linux.py: Python script to preprocess the image and extract text using
tesseract
. - OCR4Linux.sh: Shell script to take a screenshot, pass it to the python script, get the extracted text from the python script, and copy it to the clipboard.
- setup.sh: Shell script to install the required packages and copy the necessary files to the configuration directory (run this script the first time you clone the repository only).
We welcome contributions from the community to help improve OCR4Linux and make it available for all Linux users and distributions. Whether it's reporting bugs, suggesting new features, or submitting patches, your help is greatly appreciated. Please check out our contributing guidelines to get started.
This project is licensed under the MIT License. See the LICENSE file for more details.