Skip to content

Latest commit

 

History

History
35 lines (24 loc) · 2 KB

README.md

File metadata and controls

35 lines (24 loc) · 2 KB

@krakz999/tabula-node

A package for extracting tables from PDF files. The package is a wrapper completely written in Typescript, built on top of the popular tabula-java library.

Installation

To use this package, install it via your favourite package manager:

pnpm i @krakz999/tabula-node

Usage

import { extractTables } from '@krakz999/tabula-node';

const results = await extractTables("./test.pdf", {
    pages: "all",
    guess: true
});

console.log(results);

Options

The extractTables function accepts an options object with the following properties:

  • area: Portion of the page to analyze. Example: "269.875,12.75,790.5,561". Accepts top,left,bottom,right i.e. y1,x1,y2,x2 where all values are in points relative to the top left corner. If all values are between 0-100 (inclusive) and preceded by '%', input will be taken as % of actual height or width of the page. Example: "%0,0,100,50". To specify multiple areas, pass an array. Default is entire page.
  • columns: X coordinates of column boundaries. Example "10.1,20.2,30.3". If all values are between 0-100 (inclusive) and preceded by '%', input will be taken as % of actual width of the page. Example: "%25,50,80.6"
  • format: Output format: (CSV,TSV,JSON). Default: CSV
  • guess: Guess the portion of the page to analyze per page.
  • lattice: Force PDF to be extracted using lattice-mode extraction (if there are ruling lines separating each cell, as in a PDF of an Excel spreadsheet)
  • noSpreadsheet: [Deprecated in favor of -t/--stream] Force PDF not to be extracted using spreadsheet-style extraction (if there are no ruling lines separating each cell)
  • pages: Comma separated list of ranges, or all. Examples: "1-3,5-7", --pages 3 or "all". Default is "1"
  • password: Password to decrypt document. Default is empty
  • stream: Force PDF to be extracted using stream-mode extraction (if there are no ruling lines separating each cell)
  • useLineReturns: Use embedded line returns in cells. (Only in spreadsheet mode.)