Skip to content

Latest commit

 

History

History

entrusted_client

entrusted/app/entrusted_client

What is this?

This module provides graphical and command-line interfaces for the document sanization solution.

What is required?

What does it look like?

Graphical desktop client

The user interface is built with the FLTK toolkit because it’s lightweight and relatively easy to use.

./images/screenshot-gui-settings.png

./images/screenshot-gui-convert.png

Command-line client

The CLI can be convenient for “terminal warriors” or to supplement shell scripts.

./images/screenshot-cli.png

How to build this?

Please checkout the build instructions on the wiki for more information (software dependencies).

On Linux, You’ll need to have xorg and libxcb dev libraries (-dev or -devel packages accordingly to your Linux distribution).

At the root of this project, open a command prompt and type cargo build --features=gui,fltk/fltk-bundled,fltk/use-wayland.

How to run this?

Desktop usage

./target/debug/entrusted-gui

Command line usage

In the example below, a suspicious PDF file is converted to a searchable PDF (OCR), instead of just PDF images (ocr-lang parameter).

  • OCR is a time consuming task in comparison to just generating PDF images for pages of the original input.
  • You only want OCR if you need to be able to select or search text in the resulting PDF
    • It increases significantly processing time
    • It leverages the tesseract OCR engine behind the scenes

Basic usage

 cp ../test_data/gnus-logo.pdf suspicious_file.pdf
./target/debug/entrusted-cli --input-filename suspicious_file.pdf --ocr-lang eng

Batch conversion

There’s no explicit command-line support for batch conversion, because in a UNIX/Linux environment shell scripting is much more flexible.

In the example below, all the PDF files in the Downloads folder of the me user will be converted (recursive folder traversal).

find /home/me/Downloads \
     -name "*.pdf" \
     -exec entrusted-cli --input-filename {} \;

Is there a configuration file?

Yes, the configuration file (config.toml) is optional and its location is operating system dependent.

Configuration file location

Operating SystemConfiguration File Location
Linux & Others$XDG_CONFIG_HOME/com.rimerosolutions.entrusted.entrusted_client/config.toml
Mac OS$HOME/Library/Application Support/com.rimerosolutions.entrusted.entrusted_client/config.toml
Windows%APPDATA%\com.rimerosolutions.entrusted.entrusted_client\config.toml

Configuration format

The configuration format is TOML, it’s a bit similar to INI files syntax.

Example

# This must be a valid tesseract lang code
# See https://tesseract-ocr.github.io/tessdoc/Data-Files-in-different-versions.html
ocr-lang = "eng"

# The converted name will be named as follow original-name-sanitized.pdf
file-suffix = "sanitized"

# This is meant mostly for advanced usage (self-hosting, development, etc.)
# container-image-name= docker.io/MY_USERNAME_HERE/entrusted_container:1.2.3

# The requested visual quality of the PDF result influences processing time and result size
# This is one of 'low', 'medium' or 'high' with a default of 'medium'  
visual-quality = "medium"

Overview

ParameterDescription
ocr-langThe tesseract OCR langcode if OCR is desired (slower conversions)
file-suffixCustom file suffix for converted files (defaults to entrusted)
container-image-nameA custom container image for conversions (advanced option)
visual-qualityThe result visual quality (file size, processing time, visuals)