-
Notifications
You must be signed in to change notification settings - Fork 32
Advanced usage
DocBleach is split into multiple modules: an API, a CLI and different bleaches.
The Java API is available in api/src/main/java/xyz/docbleach/api. Maven packages are published on OSS Sonatype.
This API allows you to define your own Bleaches, by implementing xyz.docbleach.api.bleach.Bleach
.
As of today (2017-04-20), DocBleach has not reached a stable state, so ... feel free to play around, but please don't depend on it.
This same API allows you to build apps that use DocBleach, that's what the CLI does.
There are 3 main axes for the API:
- BleachSession. Defines a "session", the state of the sanitation process. Stores the different threats, the actions taken.
-
Bleach. Defines a sanitiser, a class that accepts an
InputStream
, aBleachSession
and writes a sanitised content in anOutputStream
. - Threat. Along with ThreatSeverity and ThreatType, defines a threat: a bad content in the file.
As an app developer depending on DocBleach, the minimal code required is:
// Define your inputStream and outputStream here
BleachSession session = new BleachSession();
new DefaultBleach().sanitize(inputStream, outputStream, session);
DefaultBleach
is a magical Bleach. It discovers the available bleaches, thanks to the ServiceLoader.
For instance, the OLE2 bleach is defined in module/module-office/src/main/resources/META-INF/services/xyz.docbleach.api.bleach.Bleach.
The easiest way to use DocBleach is thru the java app:
$ java -jar docbleach.jar -in ./original.pdf -out ./sane.pdf
WARN Sanitized file has been saved, 2 potential threat(s) removed.
It is possible to get a more verbose output by adding -v
or -vv
.
$ java -jar docbleach.jar -in ./original.pdf -out ./sane.pdf -v
[main] DEBUG xyz.docbleach.cli.Main - Log Level: DEBUG
[main] DEBUG xyz.docbleach.cli.Main - Checking input name : ./original.pdf
[main] DEBUG xyz.docbleach.cli.Main - Checking output name : ./sane.pdf
[main] DEBUG xyz.docbleach.modules.pdf.PdfBleach - Password was guessed: 'null'
[main] DEBUG xyz.docbleach.modules.pdf.PdfBleach - No AcroForms found
[main] DEBUG xyz.docbleach.modules.pdf.PdfBleach - Found and removed Additionnal Actions
[main] DEBUG xyz.docbleach.modules.pdf.PdfBleach - Found and removed Additionnal Actions
[main] WARN xyz.docbleach.cli.Main - Sanitized file has been saved, 2 potential threat(s) removed.
The first column ([main]
) displays the thread name, second one the log level, third one the Java Class that generated this log line.
If you pass a dash (-
) as argument to -in
, the STDIN will be taken as input.
$ java -jar docbleach.jar -in - -out ./sane.pdf < ./original.pdf
...
If you pass a dash (-
) as argument to -out
, the sanitised file will be output in STDOUT.
$ java -jar docbleach.jar -in original.pdf -out - > ./sane.pdf
...
It is possible to combine these two tweaks, giving ugly command lines. 👎
$ java -jar docbleach.jar -in - -out - < original.pdf > ./sane.pdf
...
This also allows you to curl
documents directly into DocBleach:
$ curl https://----/document.pdf | java -jar docbleach.jar -in - -out - > ./sane.pdf
...
The default format is meant for humans.
DocBleach is able to output a JSON object containing all the useful informations of the bleach process, using the -json
toggle.
$ java -jar docbleach.jar -in ./original.pdf -out ./sane.pdf -json
{"threats":[{"type":"ACTIVE_CONTENT","severity":"HIGH","location":"?","details":"Additional Actions","action":"REMOVE"},{"type":"ACTIVE_CONTENT","severity":"HIGH","location":"?","details":"Additional Actions","action":"REMOVE"}]}
This json output is sent to STDERR. You may redirect it to STDIN, and pass it to other commands (like, jq
):
$ java -jar docbleach.jar -in ./original.pdf -out ./sane.pdf -json 2>&1
Warning! When using -out -
and -json
, place the 2>&1
redirection before the > ./sane.pdf
.
If you don't, both the file and the json output will be sent in sane.pdf
.
✅ Good:
$ java -jar docbleach.jar -in ./original.pdf -out - -json 2>&1 > ./sane.pdf
{.....}
❌ Bad:
$ java -jar docbleach.jar -in ./original.pdf -out - -json > ./sane.pdf 2>&1
(no output)
Enjoy! 😄