Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new Augmentation - Text Strikethrough #63

Closed
shaheryar1 opened this issue Aug 26, 2021 · 20 comments
Closed

Add new Augmentation - Text Strikethrough #63

shaheryar1 opened this issue Aug 26, 2021 · 20 comments
Assignees

Comments

@shaheryar1
Copy link
Contributor

shaheryar1 commented Aug 26, 2021

Untitled

Just noticed this in some Report oriented documents, right now we cannot achieve this with current pipeline. Maybe we can add this kind of augmentation in augraphy to support hand cut lines feature. What you guys think about it ?

@proofconstruction
Copy link
Contributor

I think this is a pretty interesting idea, and probably has a similar approach to how we'd solve the redaction effect mentioned here in Issue #43.

The hardest part would be identifying regions of text to add the strikethrough effect to, but we can use Sobel edge detection etc., randomly choose a y-value in the image, then check for edges at that height (so, varying the x-value and keeping that y-value constant). If we find an edge at that height, we can keep searching for more edges at that height, and if we find a second one, we have a word there. We can then generate a line connecting the two edge-points we found, and overlay that on the ink layer.

For maximum realism, when we find two such edge points in the image, we can check y-values above and below to find the top and bottom of the characters, then average these values to find the middle of the character. We could draw the line connecting the midpoints, like someone would in person.

There are some straightforward algorithms for generating the drawn line (e.g. advance five pixels in the x-direction and randomly +1 or -1 in the y-direction), and we could continue it past the endpoints to be even more realistic.

@proofconstruction proofconstruction changed the title Add new Augmentation - Hand cut text lines Add new Augmentation - Text Strikethrough Aug 26, 2021
@jboarman
Copy link
Member

jboarman commented Aug 26, 2021

This technique to find contours may help identify where the text is located. From my experience with this, it can be hit or miss. But, it should work decently well for Augraphy's purpose.

Stack Overflow: Finding contours with lines of text in OpenCV
https://stackoverflow.com/a/50777937/764307

@shaheryar1
Copy link
Contributor Author

This technique to find contours may help identify where the text is located. From my experience with this, it can be hit or miss. But, it should work decently well for Augraphy's purpose.

Yes, was thinking same and I also have experience with this kind of work, So won't take long to add this.

@shaheryar1
Copy link
Contributor Author

I have started doing some experiments on this, Its not fully optimized yet.

There are some straightforward algorithms for generating the drawn line (e.g. advance five pixels in the x-direction and randomly +1 or -1 in the y-direction), and we could continue it past the endpoints to be even more realistic.

Will try to implement this one too.
https://colab.research.google.com/drive/11-_ne7ZJuqH8mkcmtM6WrwWGgD0v3VEt?usp=sharing

@jboarman
Copy link
Member

This looks really nice! I'm curious if the effect works well with the same techniques used in the PencilScribbles augmentation. We might need to pull that pencil effect out into a shared lib as I sense we might use it in more than one place.

@proofconstruction
Copy link
Contributor

This looks great, and I agree it'd look even better with PencilScribbles generating the strikethrough line. I'm definitely in favor of pulling that code out into augraphy/augmentations/lib.py. In the future, we could expand on this and PencilScribbles to create an Annotation augmentation that accepts a string and some font and writes that text onto the page using "pencil" in that style, so we could replicate the added text in the picture above too.

@shaheryar1
Copy link
Contributor Author

I have applied the chaikin's algorithm for smoothing the effect of Text Strikethrough. It now looks more realistic to me. Kindly have a look and provide feedback

https://colab.research.google.com/drive/11-_ne7ZJuqH8mkcmtM6WrwWGgD0v3VEt?usp=sharing

image

@jboarman
Copy link
Member

jboarman commented Sep 5, 2021

This looks quite amazing!

I think we’re ready to see the addition of the pencil effect used in PencilScribbles.

Is that effect pulled out into a shared lib yet?

It might be helpful to add antialiasing beforehand, but I’m not sure (via CV_AA). See example on stackoverflow.

This was referenced Sep 8, 2021
@proofconstruction
Copy link
Contributor

Merged #86

@jboarman
Copy link
Member

I came across this underline example, and it made me think that perhaps we could adapt code from Strikethrough to create an Underline augmentation. Eventually, even a Highlight augmentation could evolve from this code. Any thoughts on the best way to share that code between augmentations? Is there a generalization of the approach that could be extracted into the shared library functions?

image

@shaheryar1
Copy link
Contributor Author

Exactly, Yesterday I was also going through Memos and Report looking for sample documents containing text strikethrough and came across multiple samples where underlining was intensively used. I think I can make the code generic to cater this underline effect

@kwcckw
Copy link
Collaborator

kwcckw commented Sep 14, 2021

Exactly, Yesterday I was also going through Memos and Report looking for sample documents containing text strikethrough and came across multiple samples where underlining was intensively used. I think I can make the code generic to cater this underline effect

Or maybe a flag to insert the line at the center of the text or under the text?

@shaheryar1
Copy link
Contributor Author

Here are the results with minor changes in the existing strikethrough code

download (1)
download (3)

@proofconstruction
Copy link
Contributor

Does this work with different font sizes?

If we pick a standard width y for the highlighter effect, we could use the Chaikin algorithm to generate the points p for the underline, then add e.g. a yellow or pink highlighter tint (we could do this in RGB and convert back to grayscale for the most faithful effect) to all the points in the slice [p:p + y]. The underlines are already not perfectly straight, so this should look pretty natural too, like an unsteady hand highlighting a text area.

Underline/Strikethrough, Highlight, and PencilScribbles are all created in the real world by someone using a writing tool on the document. I think what we really have here are different instances of a more general augmentation; we could have a Markup class with some flags to pick one of these sub-effects, or we could (probably) pull a lot of the code out into lib.py and share it between smaller classes for each of these effects.

I can see good reasons to do either of these.

@shaheryar1
Copy link
Contributor Author

Does this work with different font sizes?

Yes it does.

Underline/Strikethrough, Highlight, and PencilScribbles are all created in the real world by someone using a writing tool on the document. I think what we really have here are different instances of a more general augmentation; we could have a Markup class with some flags to pick one of these sub-effects, or we could (probably) pull a lot of the code out into lib.py and share it between smaller classes for each of these effects

I guess for end-users we should create a markup class with multiple options, and for contributors, the
core module should be in lib.py especially the chainkins algo and the script which extract the position of each text line.

@proofconstruction
Copy link
Contributor

@shaheryar1

Is it necessary to blur the image when making the strikethrough lines? https://github.com/shaheryar1/augraphy/blob/dev/augraphy/augmentations/strikethrough.py#L72

Does this help with contour detection?

@proofconstruction
Copy link
Contributor

I removed the blur in testing and it doesn't seem to affect the number or placement of the resulting contours. Unless you think we should keep it in, I'm going to remove the blur from this augmentation so the Strikethrough only draws lines on text, without blurring the original image.

@shaheryar1
Copy link
Contributor Author

Is it necessary to blur the image when making the strikethrough lines? https://github.com/shaheryar1/augraphy/blob/dev/augraphy/augmentations/strikethrough.py#L72

Yes it helps when words spacing is relatively large. I noticed that it is blurring the original image rather than making a copy of it and applying blur operation on that. I'll fix this in my next PR.

@proofconstruction
Copy link
Contributor

Can we blur a copy of the image to detect the contours, then use the contours to draw lines on a copy without blur? Otherwise this augmentation is really Blur + Strikethrough 😕

@shaheryar1
Copy link
Contributor Author

shaheryar1 commented Sep 16, 2021

Can we blur a copy of the image to detect the contours, then use the contours to draw lines on a copy without blur?

Yes this is exactly what it is supposed to do. But I guess, In the currently deployed code I forgot to make a copy of image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants