Survey scanner for vektorprogrammet

Dependencies

pip install all the needed packages

Usage

Running the program

The scanner takes a single pdf file and a number of expected boxes (k) as inputs. Note: k has no real effect.

./main.py inputs.pdf k

Outputs: stdout, stderr

The outputs are written to stdout but beware, the program logs its progress to stderr. To suppress the log outputs or write them to a specific file, you can do

./main.py inputs.pdf k 2>/path/to/log  # Write to log file
./main.py inputs.pdf k 2>/dev/null     # Suppress log outputs

Output data structure

The data that comes out of the program is json formatted. It is a single json object, containing three lists, "boxes", "image_folder" and "pages". The image folder is the folder where images are placed. It outputs a list of checkboxes it has found, with ids and coordinates. e.g.

Box coordinates ("boxes")

{
  "0": {
    "x": 1214,
    "y": 908
  },
  "1": {
    "x": 1418,
    "y": 1151
  },
  "2": {
    "x": 845,
    "y": 1373
  },
}

In addition, a list of all the pages, with corresponding lists of their checked boxes are outputted. e.g.

Pages ("pages")

[
  {
    "boxes": {
      "0": true,
      "3": true,
      "5": true,
      "12": true,
      "14": true,
      "17": true,
      "20": true,
      "22": true,
      "27": true,
      "28": true
    },
    "page": 1
  },
]

Some implementation caveats

This implementation works best if the input pdf is of some length. When the boxes are detected, they are approximated by a 4 edged polygon which has to meet some criteria to be regarde as a box (i.e. they must be mostly square). Big check marks that go outside the box boundaries often trip this detection up, meaning that for every page, a few boxes go undetected. To mitigate this, the program takes into account all the pages when determining where the boxes are, and hence the pdf must be of some size before we can ensure that all boxes are detected.

Results

The system correctly identifies checked boxes.

HoughLines vs contours

We first attempted to use HoughLines to detect the boxes, but contour approximation turned out to be much more affective, as can be seen below.

Attempt 1, HoughLines:

Attempt 2, using contour detection:

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
README.md		README.md
boxcluster.ggb		boxcluster.ggb
boxcluster.txt		boxcluster.txt
contourexample.py		contourexample.py
contours.jpg		contours.jpg
convert.py		convert.py
detections.png		detections.png
draw.py		draw.py
kmeans.py		kmeans.py
kmeans_test.py		kmeans_test.py
lines.jpg		lines.jpg
main.py		main.py
print.py		print.py
progress.png		progress.png
scan.py		scan.py
vikhammer.pdf		vikhammer.pdf
whitebox.jpg		whitebox.jpg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Survey scanner for vektorprogrammet

Dependencies

Usage

Running the program

Outputs: stdout, stderr

Output data structure

Box coordinates ("boxes")

Pages ("pages")

Some implementation caveats

Results

HoughLines vs contours

About

Releases

Packages

Languages

vektorprogrammet/survey-scanner

Folders and files

Latest commit

History

Repository files navigation

Survey scanner for vektorprogrammet

Dependencies

Usage

Running the program

Outputs: stdout, stderr

Output data structure

Box coordinates ("boxes")

Pages ("pages")

Some implementation caveats

Results

HoughLines vs contours

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages