Skip to content

Commit

Permalink
Merge pull request #456 from tabulapdf/new-detection-algorithm
Browse files Browse the repository at this point in the history
Implement new table detector
  • Loading branch information
jazzido committed Feb 17, 2016
2 parents 9fcb8fd + 7de1120 commit 6c1856b
Show file tree
Hide file tree
Showing 3 changed files with 3 additions and 4 deletions.
Binary file not shown.
5 changes: 2 additions & 3 deletions lib/tabula_job_executor/jobs/detect_tables.rb
Original file line number Diff line number Diff line change
Expand Up @@ -14,15 +14,14 @@ def perform
begin
extractor = Tabula::Extraction::ObjectExtractor.new(filepath, :all)
page_count = extractor.page_count
sea = Java::TechnologyTabulaExtractors::SpreadsheetExtractionAlgorithm.new
nda = Java::TechnologyTabulaDetectors::NurminenDetectionAlgorithm.new
extractor.extract.each do |page|
page_index = page.getPageNumber

at( (page_count + page_index) / 2, page_count, "auto-detecting tables...") #starting at 50%...
changed

cells = Java::TechnologyTabulaExtractors::SpreadsheetExtractionAlgorithm.findCells(page.getHorizontalRulings, page.getVerticalRulings)
areas = sea.findSpreadsheetsFromCells(cells)
areas = nda.detect(page)
page_areas_by_page << areas.map { |rect|
[ rect.getLeft,
rect.getTop,
Expand Down
2 changes: 1 addition & 1 deletion webapp/tabula_web.rb
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
require 'fileutils'
require 'securerandom'

require_relative '../lib/jars/tabula-extractor-0.7.4-SNAPSHOT-jar-with-dependencies.jar'
require_relative '../lib/jars/tabula-0.8.0-jar-with-dependencies.jar'

require_relative '../lib/tabula_java_wrapper.rb'
java_import 'java.io.ByteArrayOutputStream'
Expand Down

0 comments on commit 6c1856b

Please sign in to comment.