Arabic-CNN-Based-OCR

object detection based

Codes are still under development

Hi, This is Mohamed Fawzy.

For the stubborn case of arabic handwritten words recognition which is so hard due to the hierearchy of the word itself...

Arabic words are written continously without pauses unlike English words!

But English words:

Another problem is that every single character has three forms/shapes

How I tried to solve this out:

First Apporach: OpenCV find/draw contours

The most popular approach to acheive characters recognition which is by finding contours using OpenCV doesn't work for arabic words!

Why?? Because contouring a single character is not possible as every character is connected with its neighbors.

While in English:

Second Approach: Training on words that an actual OCR had recognized succesfully.

This approach is kinda limited as Arabic has 12.3 Million words.

I trained a VGG16 model with input of handwritten augmented words and their labels using AlexU dataset that contains 109 unique words.

with 85% of validation accuracy.

Third Approach: Using already sliced characters dataset

which I found here https://github.com/HusseinYoussef/Arabic-OCR

Thanks to Eng. Hussein Osama!

here's a sample of ب :

I also thought of splitting the image to vertical rectangles where every rectangle may contain a character.

So to implement it i used the code here: https://pyimagesearch.com/2020/06/22/turning-any-cnn-image-classifier-into-an-object-detector-with-keras-tensorflow-and-opencv/

which splits the image into ROI's (Region of interests) and search for the characters.

Fourth Approach: Mimicking an actual OCR (Reinforcement Learning)

we can actually make an ocr that already works perfectly with arabic to giveback its results to the model for every batch!

this way we don't have to worry about tuning deep nets.

reinforcement learning is dynamically learning by adjusting actions based on continuous feedback to maximize a reward.

It will just take ages to train it.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
Approach 1.ipynb		Approach 1.ipynb
Approach 2.ipynb		Approach 2.ipynb
Approach 3.ipynb		Approach 3.ipynb
README.md		README.md
arabic-words-recognizer.ipynb		arabic-words-recognizer.ipynb
letter-forms-recognizer.ipynb		letter-forms-recognizer.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Arabic-CNN-Based-OCR

First Apporach: OpenCV find/draw contours

Second Approach: Training on words that an actual OCR had recognized succesfully.

Third Approach: Using already sliced characters dataset

Fourth Approach: Mimicking an actual OCR (Reinforcement Learning)

About

Releases

Packages

Languages

mfwz247/Arabic-CNN-Based-OCR

Folders and files

Latest commit

History

Repository files navigation

Arabic-CNN-Based-OCR

First Apporach: OpenCV find/draw contours

Second Approach: Training on words that an actual OCR had recognized succesfully.

Third Approach: Using already sliced characters dataset

Fourth Approach: Mimicking an actual OCR (Reinforcement Learning)

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages