Skip to content

mfwz247/Arabic-CNN-Based-OCR

Repository files navigation

Arabic-CNN-Based-OCR

object detection based

Codes are still under development

Hi, This is Mohamed Fawzy.

For the stubborn case of arabic handwritten words recognition which is so hard due to the hierearchy of the word itself...

Arabic words are written continously without pauses unlike English words!

image

But English words:

image

Another problem is that every single character has three forms/shapes

How I tried to solve this out:

First Apporach: OpenCV find/draw contours

The most popular approach to acheive characters recognition which is by finding contours using OpenCV doesn't work for arabic words!

Why?? Because contouring a single character is not possible as every character is connected with its neighbors.

marked areas

While in English:

image

Second Approach: Training on words that an actual OCR had recognized succesfully.

This approach is kinda limited as Arabic has 12.3 Million words.

I trained a VGG16 model with input of handwritten augmented words and their labels using AlexU dataset that contains 109 unique words.

67-3

with 85% of validation accuracy.

Third Approach: Using already sliced characters dataset

which I found here https://github.com/HusseinYoussef/Arabic-OCR

Thanks to Eng. Hussein Osama!

here's a sample of ب :

0 4 13

I also thought of splitting the image to vertical rectangles where every rectangle may contain a character.

sliding_window_example

So to implement it i used the code here: https://pyimagesearch.com/2020/06/22/turning-any-cnn-image-classifier-into-an-object-detector-with-keras-tensorflow-and-opencv/

which splits the image into ROI's (Region of interests) and search for the characters.

Fourth Approach: Mimicking an actual OCR (Reinforcement Learning)

we can actually make an ocr that already works perfectly with arabic to giveback its results to the model for every batch!

this way we don't have to worry about tuning deep nets.

reinforcement learning is dynamically learning by adjusting actions based on continuous feedback to maximize a reward.

It will just take ages to train it.

About

object detection based

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published