Tools to extract/transform data from PDF
inspired by project: pdf-to-markdown
npm install @bsorrentino/pdf-tools -g
- NodeJs >= 16
- Since pdf-tools use
canvas
that is aCairo
-backed Canvas implementation for Node.js take a look to its reqirements
common options
-o, --outdir [folder] output folder (default: "out")
extract images (as png) from pdf and save it to the given folder
Usage:
pdftools pdfximages|pxi [options] <pdf>
create an image (as png) for each pdf page
Usage:
pdftools pdf2images|p2i <pdf>
convert pdf to markdown format.
Usage:
pdftools pdf2md|p2md [options] <pdf>
Options:
-ps, --pageseparator [separator] add page separator (default: "---")
--imageurl [url prefix] imgage url prefix
--stats print stats information
--debug print debug information
- Detect headers
- Detect and extract images
- Extract plain text
- Extract fonts and allow custom mapping through a generated file
<document name>.font.json
Supported fonts bold, italic,
monospace
, bold+italic - Detect code block ( i.e.
```
) - Detect external link
- Detect TOC