-
Notifications
You must be signed in to change notification settings - Fork 18
Detect text contours
Text contours are detected when image.py
's WarpedImage
initialisation method
calls self.contour_list = self.contour_info(text=True)
.
def contour_info(self, text=True):
c_type = "text" if text else "line"
mask = Mask(self.stem, self.small, self.pagemask, c_type)
return mask.contours()
The contour_info
method is just a gate to detect either text contours
(i.e. to detect lines of text) or line contours (table borders etc.).
It makes a Mask
and then calls the contours
method on that Mask
.
- The
Mask
class comes from themask.py
module, and uses the name (stem
), shrunk image (small
), rectangular black and white maskpagemask
and contour type (here"text"
).
The component steps are:
- The shrunk image is converted to a grayscale copy,
sgray
, by mixing the RGB channels into a single channel (reducing the final dimension of the image from 3 to 1) - The grayscale image is binarised from 8-bit (the output is also 8-bit but only contains
the values 0 or 255)
- The threshold type is either binary (black stays black, white stays white) or inverse binary (white becomes black and vice versa, so black text pixels with low grayscale value near 0 become high values when binarised, near 255). We use the inverse binary so text becomes high valued and logically means "True" or "on" (as masks are used for logic operations).
sgray = cvtColor(self.small, COLOR_RGB2GRAY)
mask = adaptiveThreshold(
src=sgray,
maxValue=255,
adaptiveMethod=ADAPTIVE_THRESH_MEAN_C,
thresholdType=THRESH_BINARY_INV,
blockSize=cfg.mask_opts.ADAPTIVE_WINSZ,
C=25 if self.text else 7,
)
(These steps are applied in reverse order for the table borders)
mask = dilate(mask, box(9, 1)) if self.text else erode(mask, box(3, 1), iterations=3)
mask = erode(mask, box(1, 3)) if self.text else dilate(mask, box(8, 2))
The pagemask is then 'applied' to the dilated/eroded mask by choosing the minimum, i.e. all negative/'off' pixels in the mask will be the minimum even if a text contour was detected there, so will be 'switched off' or ignored in the mask.
"which are too tall (compared to their width) or too thick to be text"
Before the connected component analysis happens, the filtering step happens
Back in image.py
, the WarpedImage
class immediately calls the contours
method of the Mask
,
which wraps a call to get_contours
from contours.py
.
def get_contours(name, small, mask):
contours, _ = findContours(mask, RETR_EXTERNAL, CHAIN_APPROX_NONE)
contours_out = []
for contour in contours:
rect = boundingRect(contour)
xmin, ymin, width, height = rect
if (
width < cfg.contour_opts.TEXT_MIN_WIDTH
or height < cfg.contour_opts.TEXT_MIN_HEIGHT
or width < cfg.contour_opts.TEXT_MIN_ASPECT * height
):
continue
tight_mask = make_tight_mask(contour, xmin, ymin, width, height)
if tight_mask.sum(axis=0).max() > cfg.contour_opts.TEXT_MAX_THICKNESS:
continue
contours_out.append(ContourInfo(contour, rect, tight_mask))
if cfg.debug_lvl_opt.DEBUG_LEVEL >= 2:
visualize_contours(name, small, contours_out)
return contours_out
This procedure checks if any of the following conditions are met:
- the width of the bounding box of each [text] contour (i.e. the outline of some text) is
below the
TEXT_MIN_WIDTH
(default: 15px) - ...its height is below
TEXT_MIN_HEIGHT
(default: 2px) - ...its aspect ratio is below
TEXT_MIN_ASPECT
(default: 1.5 i.e. width:height 3:2), i.e. it should be significantly wider than it is tall
def make_tight_mask(contour, xmin, ymin, width, height):
It then runs the make_tight_mask
function (whose signature is given above) and
checks if the maximum of the column-wise (axis=0
) totals is below the pre-set
TEXT_MAX_THICKNESS
(default: 10px) before accepting the contour
- In other words, if any column in a detected piece of text has more than 10 pixels,
the entire block will be discarded as "too thick"
- You might imagine something like a shaded rectangle or ellipse in a diagram matching these criteria. Note that there are no other checks in place to prevent overly large objects being detected as 'text', so the 'thickness' check is a way of preventing large and 'blocky' or 'chunky' marks from being registered as text. It probably wouldn't permit text drawn with a thick marker pen for example.
tight_mask = np.zeros((height, width), dtype=np.uint8)
tight_contour = contour - np.array((xmin, ymin)).reshape((-1, 1, 2))
drawContours(tight_mask, [tight_contour], contourIdx=0, color=1, thickness=-1)
return tight_mask
- First the mask is initialised with all zeroes, with the same width and height as the text region described by the contour (note: not simply the shape of the contour array)
- The
tight_contour
is formed by subtracting the contour's bottom left coordinate, "image"-wide (i.e. reshaped to match the dimension of the image: shape1,1,2
to the image's{number_of_contour_points},1,2
)- I would describe this as having an effect of making the coordinates of the contour relative to its bottom-left corner
- The contour is drawn by connecting the points on the mask (similar to the
cv2.rectangle
earlier), withcv2.drawContours
(but passing a list of a single contour at a time)- Here the fill colour is 1 (so that the column total is a count of filled pixels)
- Again, the thickness of
-1
means "filled" rather than outline - The
contourIdx
argument "indicates a contour to draw": so the 0 indicates the first item in the singleton list (the only item)
...and that's the end of the sequence of events that happened when Mask.contours()
was called
within the contour_info
method during initialisation of the WarpedImage
class, to populate its
contour_list
attribute:
self.contour_list = self.contour_info(text=True)
- Recall that this call began in
image.py
, the mask was made inmask.py
using the contour function fromcontours.py
. Now step back toimage.py
to proceed. - As mentioned above, this gets re-run with
text=False
to do table borders but we'll omit that as it's very similar to this part.
Next in the WarpedImage
initialisation comes iteratively_assemble_spans
, whose docstring says:
First try to assemble spans from contours, if too few spans then make spans by line detection (borders of a table box) rather than text detection.
This is referred to as "connected component analysis" (i.e. going from pixels to symbols, by grouping or 'labeling' them according to some connectivity requirement, either 4- or 8-connected).
Here, we go from the pixel lines (contours) to symbols called 'spans'. The default variables in the
config for this section are SPAN_MIN_WIDTH
of 30px and SPAN_PX_PER_STEP
of 20px ("reduced
spacing for sampling along spans").
Again we step into a function: assemble_spans
, from spans.py
spans = assemble_spans(self.stem, self.small, self.pagemask, self.contour_list)
⇣
def assemble_spans(name, small, pagemask, cinfo_list):
cinfo_list = sorted(cinfo_list, key=lambda cinfo: cinfo.rect[1])
candidate_edges = []
for i, cinfo_i in enumerate(cinfo_list):
for j in range(i):
# note e is of the form (score, left_cinfo, right_cinfo)
edge = generate_candidate_edge(cinfo_i, cinfo_list[j])
if edge is not None:
candidate_edges.append(edge)
- First the contours are sorted by the 2nd element of the
rect
(itsy
value), so contours are ordered from bottom-most to upper-most last- Note that they're not sorted by x value, just y value
- Recall: the
rect
attribute was theboundingRect
of the contour, whose elements arex,y,w,h
- The y-sorted contour list is iterated through (i.e. iterating "upwards") and
generate_candidate_edge
is called on all possible pairs of that contour and every previous one in the list (i.e. every one with a bounding rectangle base below the current contour's bounding rectangle base)
Before we look at the rest of the assemble_spans
function, let's look at what
generate_candidate_edge
does (it's a little complicated, pay close attention).
It comes from the same module, spans.py
def generate_candidate_edge(cinfo_a, cinfo_b):
"""
We want a left of b (so a's successor will be b and b's
predecessor will be a). Make sure right endpoint of b is to the
right of left endpoint of a (swap them if not the case).
"""
if cinfo_a.point0[0] > cinfo_b.point1[0]:
tmp = cinfo_a
cinfo_a = cinfo_b
cinfo_b = tmp
x_overlap_a = cinfo_a.local_overlap(cinfo_b)
x_overlap_b = cinfo_b.local_overlap(cinfo_a)
overall_tangent = cinfo_b.center - cinfo_a.center
overall_angle = np.arctan2(overall_tangent[1], overall_tangent[0])
delta_angle = np.divide(
max(
angle_dist(cinfo_a.angle, overall_angle),
angle_dist(cinfo_b.angle, overall_angle),
)
* 180,
np.pi,
)
# we want the largest overlap in x to be small
x_overlap = max(x_overlap_a, x_overlap_b)
dist = np.linalg.norm(cinfo_b.point0 - cinfo_a.point1)
if not (
dist > cfg.edge_opts.EDGE_MAX_LENGTH
or x_overlap > cfg.edge_opts.EDGE_MAX_OVERLAP
or delta_angle > cfg.edge_opts.EDGE_MAX_ANGLE
):
score = dist + delta_angle * cfg.edge_opts.EDGE_ANGLE_COST
return (score, cinfo_a, cinfo_b)
# else return None
- The process of generating candidate edges is covered in more detail in the next section in the context of span assembly from the candidates
The attributes it's using (point0
, point1
[the leftmost and rightmost point in the contour],
center
, and angle
) were set in the initialisation of the ContourInfo
class in contours.py
:
def __init__(self, contour, rect, mask):
self.contour = contour
self.rect = rect
self.mask = mask
self.center, self.tangent = blob_mean_and_tangent(contour)
self.angle = np.arctan2(self.tangent[1], self.tangent[0])
clx = [self.proj_x(point) for point in contour]
lxmin, lxmax = min(clx), max(clx)
self.local_xrng = (lxmin, lxmax)
self.point0 = self.center + self.tangent * lxmin
self.point1 = self.center + self.tangent * lxmax
self.pred = None
self.succ = None
where the center
and tangent
attributes were set by this function:
def blob_mean_and_tangent(contour):
"""
Construct blob image's covariance matrix from second order central moments
(i.e. dividing them by the 0-order 'area moment' to make them translationally
invariant), from the eigenvectors of which the blob orientation can be
extracted (they are its principle components).
"""
moments = cv2_moments(contour)
area = moments["m00"]
mean_x = moments["m10"] / area
mean_y = moments["m01"] / area
covariance_matrix = np.divide(
[[moments["mu20"], moments["mu11"]], [moments["mu11"], moments["mu02"]]], area
)
_, svd_u, _ = SVDecomp(covariance_matrix)
center = np.array([mean_x, mean_y])
tangent = svd_u[:, 0].flatten().copy()
return center, tangent
- The "moments" here are image moments. I couldn't find a clearly written exposition of image moments so I wrote one: see Background on image moments
- Computing SVD of the covariance matrix (which you should note is a 2x2 matrix) gives
the 2 eigenvalues: the principal components which give the orientation, the first of
which is the major axis (
svd_u[:, 0]
)
The local_overlap
method being used to calculate x axis overlap was also defined on
the ContourInfo
class:
def local_overlap(self, other):
xmin = self.proj_x(other.point0)
xmax = self.proj_x(other.point1)
return interval_measure_overlap(self.local_xrng, (xmin, xmax))
where the local_xrng
attribute is set in the ContourInfo
initialisation as:
clx = [self.proj_x(point) for point in contour]
lxmin, lxmax = min(clx), max(clx)
self.local_xrng = (lxmin, lxmax)
...using proj_x
which takes the dot product np.dot(self.tangent, point.flatten() - self.center)
(i.e. between the contour direction, tangent
, and the relative position vector of the point
w.r.t. the blob centre).
The title of this function indicates the assumption that the text we've contoured is running from left to right: the tangent of the blob is in the x direction, and so the values of the leftmost and rightmost will have the most negative and most positive values.
- The left- and right-most points on the contour will be the most on the tangent, and thus most in the range or column space of the tangent vector, whereas the intermediate points such as those above and below the centre will be more orthogonal to the tangent, and thus their projected value (dot product with the tangent vector) will fall nearer to zero.
- Long story short, the
local_xrng
indicates the min and max projections, from which the corresponding points are recreated by reprojecting the tangent along these values from the centre to regainself.point0
andself.point1
(leftmost and rightmost points on the contour)
The interval_measure_overlap
function which local_overlap
wraps is simply returning:
min(int_a[1], int_b[1]) - max(int_a[0], int_b[0])
i.e. it's using its own projection of the other blob's leftmost and rightmost points
This is just reuse of the aforementioned SVD PCA tangent-relative leftmost and
rightmost points, joined by a line in the visualize_contours
function (with
a circle at the midpoint, ContourInfo.center
)
for j, cinfo in enumerate(cinfo_list):
color = cCOLOURS[j % len(cCOLOURS)]
color = tuple(c // 4 for c in color)
circle(display, fltp(cinfo.center), 3, (255, 255, 255), 1, LINE_AA)
line(
display,
fltp(cinfo.point0),
fltp(cinfo.point1),
(255, 255, 255),
1,
LINE_AA,
)
(This actually comes at the end of the span assembly, which is the next step: see the next part of this series)