Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Obtain Character-Level Bounding Boxes of Generated Text #3921

Closed
enchainingrealm opened this issue Jun 27, 2019 · 9 comments
Closed

Obtain Character-Level Bounding Boxes of Generated Text #3921

enchainingrealm opened this issue Jun 27, 2019 · 9 comments
Labels

Comments

@enchainingrealm
Copy link

enchainingrealm commented Jun 27, 2019

I'm placing some text on an image as follows, using Python 3.7.3 and PIL 5.4.1:

from PIL import Image, ImageDraw, ImageFont

image = ...
font_filepath = ...
font_size = ...

draw = ImageDraw.Draw(image)
font = ImageFont.truetype(font_filepath, font_size)

xy = ...   # generate random (x, y) coordinates to place the text at
text = ...

draw.text(xy, text, font=font)

I want to get the character-level bounding boxes around the text placed on the image. For example, if text = "hello", I want a list of five rectangles, each of which bounds a corresponding letter in "hello". For example, the bounding box for "l" should be thinner and taller than the bounding box for "o", and the bounding boxes for the two "l"s should have different x-positions and same y-positions.

I have investigated using:

size = font.getsize(text)
mask = font.getmask(text)

However, I don't know how to interpret mask, because:

  • mask is an ImagingCore object instead of an Image object
  • len(mask) does not even equal the area calculated by size[0] * size[1]

What is the easiest way to obtain character-level bounding boxes for text placed on an image by PIL?

@radarhere
Copy link
Member

from PIL import Image, ImageDraw, ImageFont

image = Image.new("RGB", (200, 100))
font_filepath = "/Library/Fonts/Arial.ttf"
font_size = 50

draw = ImageDraw.Draw(image)
font = ImageFont.truetype(font_filepath, font_size)

xy = (50, 20)
text = "hello"

draw.text(xy, text, font=font)

for char in text:
	print(font.getmask(char).size)

@radarhere
Copy link
Member

If it helps, here is where 'size' is defined for ImagingCore -

{ "size", (getter) _getattr_size },

Let us know if this doesn't answer your question, or if you have any further questions.

@enchainingrealm
Copy link
Author

This gives me the size of each character's bounding box, but how do I retrieve the location of each character's bounding box?

@radarhere
Copy link
Member

from PIL import Image, ImageDraw, ImageFont

image = Image.new("RGB", (200, 100))
font_filepath = "/Library/Fonts/Arial.ttf"
font_size = 50

draw = ImageDraw.Draw(image)
font = ImageFont.truetype(font_filepath, font_size)

xy = (40, 20)
text = "hello"

draw.text(xy, text, font=font)

for i, char in enumerate(text):
	right, bottom = font.getsize(text[:i+1])
	width, height = font.getmask(char).size
	right += xy[0]
	bottom += xy[1]
	top = bottom - height
	left = right - width
	
	draw.rectangle((left, top, right, bottom), None, "#f00")

image.save("out.png")

out

@enchainingrealm
Copy link
Author

That makes sense, seems like you can just assume the boxes for each character are adjacent to each other and aligned along the bottom.

That answers my question. I'm closing this issue now.

@lamhoangtung
Copy link

lamhoangtung commented Sep 19, 2019

Since @radarhere solution is incorrect with multiple words like this:

out

I slightly modified it so that it can works on even more cases:

from PIL import Image, ImageDraw, ImageFont

image = Image.new("RGB", (500, 100))
font_filepath = "./template/arial.ttf"
font_size = 50

draw = ImageDraw.Draw(image)
font = ImageFont.truetype(font_filepath, font_size)

xy = (40, 20)
text = "Hoàng Tùng Lâm"

draw.text(xy, text, font=font)

for i, char in enumerate(text):
    bottom_1 = font.getsize(text[i])[1]
    right, bottom_2 = font.getsize(text[:i+1])
    bottom = bottom_1 if bottom_1 < bottom_2 else bottom_2
    width, height = font.getmask(char).size
    right += xy[0]
    bottom += xy[1]
    top = bottom - height
    left = right - width

    draw.rectangle((left, top, right, bottom), None, "#f00")

    draw.rectangle((left, top, right, bottom), None, "#f00")

image.save("out.png")

out

Hope this could help someone :P

@indigoviolet
Copy link

The above is close, but not always correct for slanted fonts: see #4789 (comment)

@jinyu121
Copy link

@lamhoangtung 's solution is great, but when processing Chinese punctuations, this code can not always get "tight" bounding box. For example:

image

The font is msyhbd.ttf, and the text is text = "Hoàng Tùng Lâm 测试,测试。测试!测试?Test. Test?"

Is this caused by the font file itself?

@lamhoangtung
Copy link

@jinyu121 It's likely due to the font itself

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants