You must be signed in to change notification settings - Fork 20
Detect text contours
Text contours are detected when image.py
's WarpedImage
initialisation method
calls self.contour_list = self.contour_info(text=True)
def contour_info(self, text=True):
c_type = "text" if text else "line"
mask = Mask(self.stem, self.small, self.pagemask, c_type)
return mask.contours()
The contour_info
method is just a gate to detect either text contours
(i.e. to detect lines of text) or line contours (table borders etc.).
It makes a Mask
and then calls the contours
method on that Mask
- The
class comes from themask.py
module, and uses the name (stem
), shrunk image (small
), rectangular black and white maskpagemask
and contour type (here"text"
The component steps are:
- The shrunk image is converted to a grayscale copy,
, by mixing the RGB channels into a single channel (reducing the final dimension of the image from 3 to 1) - The grayscale image is binarised from 8-bit (the output is also 8-bit but only contains
the values 0 or 255)
- The threshold type is either binary (black stays black, white stays white) or inverse binary (white becomes black and vice versa, so black text pixels with low grayscale value near 0 become high values when binarised, near 255). We use the inverse binary so text becomes high valued and logically means "True" or "on" (as masks are used for logic operations).
sgray = cvtColor(self.small, COLOR_RGB2GRAY)
mask = adaptiveThreshold(
C=25 if self.text else 7,
(These steps are applied in reverse order for the table borders)
mask = dilate(mask, box(9, 1)) if self.text else erode(mask, box(3, 1), iterations=3)
mask = erode(mask, box(1, 3)) if self.text else dilate(mask, box(8, 2))
The pagemask is then 'applied' to the dilated/eroded mask by choosing the minimum, i.e. all negative/'off' pixels in the mask will be the minimum even if a text contour was detected there, so will be 'switched off' or ignored in the mask.
"which are too tall (compared to their width) or too thick to be text"
Before the connected component analysis happens, the filtering step happens
Back in image.py
, the WarpedImage
class immediately calls the contours
method of the Mask
which wraps a call to get_contours
from contours.py
def get_contours(name, small, mask):
contours, _ = findContours(mask, RETR_EXTERNAL, CHAIN_APPROX_NONE)
contours_out = []
for contour in contours:
rect = boundingRect(contour)
xmin, ymin, width, height = rect
if (
width < cfg.contour_opts.TEXT_MIN_WIDTH
or height < cfg.contour_opts.TEXT_MIN_HEIGHT
or width < cfg.contour_opts.TEXT_MIN_ASPECT * height
tight_mask = make_tight_mask(contour, xmin, ymin, width, height)
if tight_mask.sum(axis=0).max() > cfg.contour_opts.TEXT_MAX_THICKNESS:
contours_out.append(ContourInfo(contour, rect, tight_mask))
if cfg.debug_lvl_opt.DEBUG_LEVEL >= 2:
visualize_contours(name, small, contours_out)
return contours_out
This procedure checks if any of the following conditions are met:
- the width of the bounding box of each [text] contour (i.e. the outline of some text) is
below the
(default: 15px) - ...its height is below
(default: 2px) - ...its aspect ratio is below
(default: 1.5 i.e. width:height 3:2), i.e. it should be significantly wider than it is tall
def make_tight_mask(contour, xmin, ymin, width, height):
It then runs the make_tight_mask
function (whose signature is given above) and
checks if the maximum of the column-wise (axis=0
) totals is below the pre-set
(default: 10px) before accepting the contour
- In other words, if any column in a detected piece of text has more than 10 pixels,
the entire block will be discarded as "too thick"
- You might imagine something like a shaded rectangle or ellipse in a diagram matching these criteria. Note that there are no other checks in place to prevent overly large objects being detected as 'text', so the 'thickness' check is a way of preventing large and 'blocky' or 'chunky' marks from being registered as text. It probably wouldn't permit text drawn with a thick marker pen for example.
tight_mask = np.zeros((height, width), dtype=np.uint8)
tight_contour = contour - np.array((xmin, ymin)).reshape((-1, 1, 2))
drawContours(tight_mask, [tight_contour], contourIdx=0, color=1, thickness=-1)
return tight_mask
- First the mask is initialised with all zeroes, with the same width and height as the text region described by the contour (note: not simply the shape of the contour array)
- The
is formed by subtracting the contour's bottom left coordinate, "image"-wide (i.e. reshaped to match the dimension of the image: shape1,1,2
to the image's{number_of_contour_points},1,2
)- I would describe this as having an effect of making the coordinates of the contour relative to its bottom-left corner
- The contour is drawn by connecting the points on the mask (similar to the
earlier), withcv2.drawContours
(but passing a list of a single contour at a time)- Here the fill colour is 1 (so that the column total is a count of filled pixels)
- Again, the thickness of
means "filled" rather than outline - The
argument "indicates a contour to draw": so the 0 indicates the first item in the singleton list (the only item)
...and that's the end of the sequence of events that happened when Mask.contours()
was called
within the contour_info
method during initialisation of the WarpedImage
class, to populate its
self.contour_list = self.contour_info(text=True)
- Recall that this call began in
, the mask was made inmask.py
using the contour function fromcontours.py
. Now step back toimage.py
to proceed. - As mentioned above, this gets re-run with
to do table borders but we'll omit that as it's very similar to this part.
Next in the WarpedImage
initialisation comes iteratively_assemble_spans
, whose docstring says:
First try to assemble spans from contours, if too few spans then make spans by line detection (borders of a table box) rather than text detection.
This is referred to as "connected component analysis" (i.e. going from pixels to symbols, by grouping or 'labeling' them according to some connectivity requirement, either 4- or 8-connected).
Here, we go from the pixel lines (contours) to symbols called 'spans'. The default variables in the
config for this section are SPAN_MIN_WIDTH
of 30px and SPAN_PX_PER_STEP
of 20px ("reduced
spacing for sampling along spans").
Again we step into a function: assemble_spans
, from spans.py
spans = assemble_spans(self.stem, self.small, self.pagemask, self.contour_list)
def assemble_spans(name, small, pagemask, cinfo_list):
cinfo_list = sorted(cinfo_list, key=lambda cinfo: cinfo.rect[1])
candidate_edges = []
for i, cinfo_i in enumerate(cinfo_list):
for j in range(i):
# note e is of the form (score, left_cinfo, right_cinfo)
edge = generate_candidate_edge(cinfo_i, cinfo_list[j])
if edge is not None:
- First the contours are sorted by the 2nd element of the
value), so contours are ordered from bottom-most to upper-most last- Note that they're not sorted by x value, just y value
- Recall: the
attribute was theboundingRect
of the contour, whose elements arex,y,w,h
- The y-sorted contour list is iterated through (i.e. iterating "upwards") and
is called on all possible pairs of that contour and every previous one in the list (i.e. every one with a bounding rectangle base below the current contour's bounding rectangle base)
Before we look at the rest of the assemble_spans
function, let's look at what
does (it's a little complicated, pay close attention).
It comes from the same module, spans.py
def generate_candidate_edge(cinfo_a, cinfo_b):
We want a left of b (so a's successor will be b and b's
predecessor will be a). Make sure right endpoint of b is to the
right of left endpoint of a (swap them if not the case).
if cinfo_a.point0[0] > cinfo_b.point1[0]:
tmp = cinfo_a
cinfo_a = cinfo_b
cinfo_b = tmp
x_overlap_a = cinfo_a.local_overlap(cinfo_b)
x_overlap_b = cinfo_b.local_overlap(cinfo_a)
overall_tangent = cinfo_b.center - cinfo_a.center
overall_angle = np.arctan2(overall_tangent[1], overall_tangent[0])
delta_angle = np.divide(
angle_dist(cinfo_a.angle, overall_angle),
angle_dist(cinfo_b.angle, overall_angle),
* 180,
# we want the largest overlap in x to be small
x_overlap = max(x_overlap_a, x_overlap_b)
dist = np.linalg.norm(cinfo_b.point0 - cinfo_a.point1)
if not (
dist > cfg.edge_opts.EDGE_MAX_LENGTH
or x_overlap > cfg.edge_opts.EDGE_MAX_OVERLAP
or delta_angle > cfg.edge_opts.EDGE_MAX_ANGLE
score = dist + delta_angle * cfg.edge_opts.EDGE_ANGLE_COST
return (score, cinfo_a, cinfo_b)
# else return None
The attributes it's using (point0
, point1
[the leftmost and rightmost point in the contour],
, and angle
) were set in the initialisation of the ContourInfo
class in contours.py
def __init__(self, contour, rect, mask):
self.contour = contour
self.rect = rect
self.mask = mask
self.center, self.tangent = blob_mean_and_tangent(contour)
self.angle = np.arctan2(self.tangent[1], self.tangent[0])
clx = [self.proj_x(point) for point in contour]
lxmin, lxmax = min(clx), max(clx)
self.local_xrng = (lxmin, lxmax)
self.point0 = self.center + self.tangent * lxmin
self.point1 = self.center + self.tangent * lxmax
self.pred = None
self.succ = None
where the center
and tangent
attributes were set by this function:
def blob_mean_and_tangent(contour):
moments = cv2_moments(contour)
area = moments["m00"]
mean_x = moments["m10"] / area
mean_y = moments["m01"] / area
moments_matrix = np.divide(
[[moments["mu20"], moments["mu11"]], [moments["mu11"], moments["mu02"]]], area
_, svd_u, _ = SVDecomp(moments_matrix)
center = np.array([mean_x, mean_y])
tangent = svd_u[:, 0].flatten().copy()
return center, tangent
- The "moments" here are the image moments. I couldn't find a clearly written exposition of image moments so I wrote one: see Background on image moments
The local_overlap
method being used to calculate x axis overlap was also defined on
the ContourInfo
def local_overlap(self, other):
xmin = self.proj_x(other.point0)
xmax = self.proj_x(other.point1)
return interval_measure_overlap(self.local_xrng, (xmin, xmax))
where the local_xrng
attribute is set in the ContourInfo
initialisation as:
clx = [self.proj_x(point) for point in contour]
lxmin, lxmax = min(clx), max(clx)
self.local_xrng = (lxmin, lxmax)
...using proj_x
which takes the dot product np.dot(self.tangent, point.flatten() - self.center)
The function it wraps is simply returning:
min(int_a[1], int_b[1]) - max(int_a[0], int_b[0])
candidate_edges.sort() # sort candidate edges by score (lower is better)
for _, cinfo_a, cinfo_b in candidate_edges: # for each candidate edge
# if left and right are unassigned, join them
if cinfo_a.succ is None and cinfo_b.pred is None:
cinfo_a.succ = cinfo_b
cinfo_b.pred = cinfo_a
spans = []
while cinfo_list:
cinfo = cinfo_list[0] # get the first on the list
# keep following predecessors until none exists
while cinfo.pred:
cinfo = cinfo.pred
cur_span = [] # start a new span
width = 0.0
while cinfo: # follow successors til end of span
# remove from list (sadly making this loop *also* O(n^2)
cur_span.append(cinfo) # add to span
width += cinfo.local_xrng[1] - cinfo.local_xrng[0]
cinfo = cinfo.succ # set successor
if width > cfg.span_opts.SPAN_MIN_WIDTH:
spans.append(cur_span) # add if long enough
if cfg.debug_lvl_opt.DEBUG_LEVEL >= 2:
visualize_spans(name, small, pagemask, spans)
return spans