Skip to content

Detect text contours

Louis Maddox edited this page Apr 27, 2021 · 5 revisions

Text contours are detected when image.py's WarpedImage initialisation method calls self.contour_list = self.contour_info(text=True).

def contour_info(self, text=True): 
    c_type = "text" if text else "line" 
    mask = Mask(self.stem, self.small, self.pagemask, c_type) 
    return mask.contours() 

The contour_info method is just a gate to detect either text contours (i.e. to detect lines of text) or line contours (table borders etc.).

It makes a Mask and then calls the contours method on that Mask.

  • The Mask class comes from the mask.py module, and uses the name (stem), shrunk image (small), rectangular black and white mask pagemask and contour type (here "text").

The component steps are:

a) Adaptive threshold

  • The shrunk image is converted to a grayscale copy, sgray, by mixing the RGB channels into a single channel (reducing the final dimension of the image from 3 to 1)
  • The grayscale image is binarised from 8-bit (the output is also 8-bit but only contains the values 0 or 255)
    • The threshold type is either binary (black stays black, white stays white) or inverse binary (white becomes black and vice versa, so black text pixels with low grayscale value near 0 become high values when binarised, near 255). We use the inverse binary so text becomes high valued and logically means "True" or "on" (as masks are used for logic operations).
sgray = cvtColor(self.small, COLOR_RGB2GRAY)
mask = adaptiveThreshold(
    src=sgray,
    maxValue=255,
    adaptiveMethod=ADAPTIVE_THRESH_MEAN_C,
    thresholdType=THRESH_BINARY_INV,
    blockSize=cfg.mask_opts.ADAPTIVE_WINSZ,
    C=25 if self.text else 7,
)

b) Dilation and erosion

(These steps are applied in reverse order for the table borders)

mask = dilate(mask, box(9, 1)) if self.text else erode(mask, box(3, 1), iterations=3)
mask = erode(mask, box(1, 3)) if self.text else dilate(mask, box(8, 2))

The pagemask is then 'applied' to the dilated/eroded mask by choosing the minimum, i.e. all negative/'off' pixels in the mask will be the minimum even if a text contour was detected there, so will be 'switched off' or ignored in the mask.

pre-c) filtering step to eliminate blobs

"which are too tall (compared to their width) or too thick to be text"

Before the connected component analysis happens, the filtering step happens

Back in image.py, the WarpedImage class immediately calls the contours method of the Mask, which wraps a call to get_contours from contours.py.

def get_contours(name, small, mask):
    contours, _ = findContours(mask, RETR_EXTERNAL, CHAIN_APPROX_NONE)
    contours_out = []
    for contour in contours:
        rect = boundingRect(contour)
        xmin, ymin, width, height = rect
        if (
            width < cfg.contour_opts.TEXT_MIN_WIDTH
            or height < cfg.contour_opts.TEXT_MIN_HEIGHT
            or width < cfg.contour_opts.TEXT_MIN_ASPECT * height
        ):
            continue
        tight_mask = make_tight_mask(contour, xmin, ymin, width, height)
        if tight_mask.sum(axis=0).max() > cfg.contour_opts.TEXT_MAX_THICKNESS:
            continue
        contours_out.append(ContourInfo(contour, rect, tight_mask))
    if cfg.debug_lvl_opt.DEBUG_LEVEL >= 2:
        visualize_contours(name, small, contours_out)
    return contours_out

This procedure checks if any of the following conditions are met:

  • the width of the bounding box of each [text] contour (i.e. the outline of some text) is below the TEXT_MIN_WIDTH (default: 15px)
  • ...its height is below TEXT_MIN_HEIGHT (default: 2px)
  • ...its aspect ratio is below TEXT_MIN_ASPECT (default: 1.5 i.e. width:height 3:2), i.e. it should be significantly wider than it is tall
def make_tight_mask(contour, xmin, ymin, width, height):

It then runs the make_tight_mask function (whose signature is given above) and checks if the maximum of the column-wise (axis=0) totals is below the pre-set TEXT_MAX_THICKNESS (default: 10px) before accepting the contour

  • In other words, if any column in a detected piece of text has more than 10 pixels, the entire block will be discarded as "too thick"
    • You might imagine something like a shaded rectangle or ellipse in a diagram matching these criteria. Note that there are no other checks in place to prevent overly large objects being detected as 'text', so the 'thickness' check is a way of preventing large and 'blocky' or 'chunky' marks from being registered as text. It probably wouldn't permit text drawn with a thick marker pen for example.
tight_mask = np.zeros((height, width), dtype=np.uint8)
tight_contour = contour - np.array((xmin, ymin)).reshape((-1, 1, 2))
drawContours(tight_mask, [tight_contour], contourIdx=0, color=1, thickness=-1)
return tight_mask
  • First the mask is initialised with all zeroes, with the same width and height as the text region described by the contour (note: not simply the shape of the contour array)
  • The tight_contour is formed by subtracting the contour's bottom left coordinate, "image"-wide (i.e. reshaped to match the dimension of the image: shape 1,1,2 to the image's {number_of_contour_points},1,2)
    • I would describe this as having an effect of making the coordinates of the contour relative to its bottom-left corner
  • The contour is drawn by connecting the points on the mask (similar to the cv2.rectangle earlier), with cv2.drawContours (but passing a list of a single contour at a time)
    • Here the fill colour is 1 (so that the column total is a count of filled pixels)
    • Again, the thickness of -1 means "filled" rather than outline
    • The contourIdx argument "indicates a contour to draw": so the 0 indicates the first item in the singleton list (the only item)

...and that's the end of the sequence of events that happened when Mask.contours() was called within the contour_info method during initialisation of the WarpedImage class, to populate its contour_list attribute:

self.contour_list = self.contour_info(text=True)
  • Recall that this call began in image.py, the mask was made in mask.py using the contour function from contours.py. Now step back to image.py to proceed.
  • As mentioned above, this gets re-run with text=False to do table borders but we'll omit that as it's very similar to this part.

c) Connected component analysis

Next in the WarpedImage initialisation comes iteratively_assemble_spans, whose docstring says:

First try to assemble spans from contours, if too few spans then make spans by line detection (borders of a table box) rather than text detection.

This is referred to as "connected component analysis" (i.e. going from pixels to symbols, by grouping or 'labeling' them according to some connectivity requirement, either 4- or 8-connected).

Here, we go from the pixel lines (contours) to symbols called 'spans'. The default variables in the config for this section are SPAN_MIN_WIDTH of 30px and SPAN_PX_PER_STEP of 20px ("reduced spacing for sampling along spans").

Again we step into a function: assemble_spans, from spans.py

spans = assemble_spans(self.stem, self.small, self.pagemask, self.contour_list)

def assemble_spans(name, small, pagemask, cinfo_list):
    cinfo_list = sorted(cinfo_list, key=lambda cinfo: cinfo.rect[1])
    candidate_edges = []
    for i, cinfo_i in enumerate(cinfo_list):
        for j in range(i):
            # note e is of the form (score, left_cinfo, right_cinfo)
            edge = generate_candidate_edge(cinfo_i, cinfo_list[j])
            if edge is not None:
                candidate_edges.append(edge)
  • First the contours are sorted by the 2nd element of the rect (its y value), so contours are ordered from bottom-most to upper-most last
    • Note that they're not sorted by x value, just y value
    • Recall: the rect attribute was the boundingRect of the contour, whose elements are x,y,w,h
  • The y-sorted contour list is iterated through (i.e. iterating "upwards") and generate_candidate_edge is called on all possible pairs of that contour and every previous one in the list (i.e. every one with a bounding rectangle base below the current contour's bounding rectangle base)

Before we look at the rest of the assemble_spans function, let's look at what generate_candidate_edge does (it's a little complicated, pay close attention). It comes from the same module, spans.py

def generate_candidate_edge(cinfo_a, cinfo_b):
    """
    We want a left of b (so a's successor will be b and b's
    predecessor will be a). Make sure right endpoint of b is to the
    right of left endpoint of a (swap them if not the case).
    """
    if cinfo_a.point0[0] > cinfo_b.point1[0]:
        tmp = cinfo_a
        cinfo_a = cinfo_b
        cinfo_b = tmp
    x_overlap_a = cinfo_a.local_overlap(cinfo_b)
    x_overlap_b = cinfo_b.local_overlap(cinfo_a)
    overall_tangent = cinfo_b.center - cinfo_a.center
    overall_angle = np.arctan2(overall_tangent[1], overall_tangent[0])
    delta_angle = np.divide(
        max(
            angle_dist(cinfo_a.angle, overall_angle),
            angle_dist(cinfo_b.angle, overall_angle),
        )
        * 180,
        np.pi,
    )
    # we want the largest overlap in x to be small
    x_overlap = max(x_overlap_a, x_overlap_b)
    dist = np.linalg.norm(cinfo_b.point0 - cinfo_a.point1)
    if not (
        dist > cfg.edge_opts.EDGE_MAX_LENGTH
        or x_overlap > cfg.edge_opts.EDGE_MAX_OVERLAP
        or delta_angle > cfg.edge_opts.EDGE_MAX_ANGLE
    ):
        score = dist + delta_angle * cfg.edge_opts.EDGE_ANGLE_COST
        return (score, cinfo_a, cinfo_b)
    # else return None

The attributes it's using (point0, point1 [the leftmost and rightmost point in the contour], center, and angle) were set in the initialisation of the ContourInfo class in contours.py:

def __init__(self, contour, rect, mask):
    self.contour = contour
    self.rect = rect
    self.mask = mask
    self.center, self.tangent = blob_mean_and_tangent(contour)
    self.angle = np.arctan2(self.tangent[1], self.tangent[0])
    clx = [self.proj_x(point) for point in contour]
    lxmin, lxmax = min(clx), max(clx)
    self.local_xrng = (lxmin, lxmax)
    self.point0 = self.center + self.tangent * lxmin
    self.point1 = self.center + self.tangent * lxmax
    self.pred = None
    self.succ = None

where the center and tangent attributes were set by this function:

def blob_mean_and_tangent(contour):
    moments = cv2_moments(contour)
    area = moments["m00"]
    mean_x = moments["m10"] / area
    mean_y = moments["m01"] / area
    moments_matrix = np.divide(
        [[moments["mu20"], moments["mu11"]], [moments["mu11"], moments["mu02"]]], area
    )
    _, svd_u, _ = SVDecomp(moments_matrix)
    center = np.array([mean_x, mean_y])
    tangent = svd_u[:, 0].flatten().copy()
    return center, tangent
  • The "moments" here are the image moments. I couldn't find a clearly written exposition of image moments so I wrote one: see Background on image moments

The local_overlap method being used to calculate x axis overlap was also defined on the ContourInfo class:

def local_overlap(self, other):
    xmin = self.proj_x(other.point0)
    xmax = self.proj_x(other.point1)
    return interval_measure_overlap(self.local_xrng, (xmin, xmax))

where the local_xrng attribute is set in the ContourInfo initialisation as:

clx = [self.proj_x(point) for point in contour]
lxmin, lxmax = min(clx), max(clx)
self.local_xrng = (lxmin, lxmax)

...using proj_x which takes the dot product np.dot(self.tangent, point.flatten() - self.center)

The function it wraps is simply returning:

min(int_a[1], int_b[1]) - max(int_a[0], int_b[0])
    candidate_edges.sort()  # sort candidate edges by score (lower is better)
    for _, cinfo_a, cinfo_b in candidate_edges:  # for each candidate edge
        # if left and right are unassigned, join them
        if cinfo_a.succ is None and cinfo_b.pred is None:
            cinfo_a.succ = cinfo_b
            cinfo_b.pred = cinfo_a
    spans = []
    while cinfo_list:
        cinfo = cinfo_list[0]  # get the first on the list
        # keep following predecessors until none exists
        while cinfo.pred:
            cinfo = cinfo.pred
        cur_span = []  # start a new span
        width = 0.0
        while cinfo:  # follow successors til end of span
            # remove from list (sadly making this loop *also* O(n^2)
            cinfo_list.remove(cinfo)
            cur_span.append(cinfo)  # add to span
            width += cinfo.local_xrng[1] - cinfo.local_xrng[0]
            cinfo = cinfo.succ  # set successor
        if width > cfg.span_opts.SPAN_MIN_WIDTH:
            spans.append(cur_span)  # add if long enough
    if cfg.debug_lvl_opt.DEBUG_LEVEL >= 2:
        visualize_spans(name, small, pagemask, spans)
    return spans

d) Text contours are approximated by their best fitting line segment using PCA

...?

Clone this wiki locally