Use Convolutional Recurrent Neural Network to recognize the Handwritten line text image without pre segmentation into words or characters. Use CTC loss Function to train.
Deep Learning self extracts features with a deep neural networks and classify itself. Compare to traditional Algorithms it performance increase with Amount of Data.
- First Use Convolutional Recurrent Neural Network to extract the important features from the handwritten line text Image.
- The output before CNN FC layer (512x100x8) is passed to the BLSTM which is for sequence dependency and time-sequence operations.
- Then CTC LOSS Alex Graves is used to train the RNN which eliminate the Alignment problem in Handwritten, since handwritten have different alignment of every writers. We just gave the what is written in the image (Ground Truth Text) and BLSTM output, then it calculates loss simply as
-log("gtText")
; aim to minimize negative maximum likelihood path. - Finally CTC finds out the possible paths from the given labels. Loss is given by for (X,Y) pair is:
- Finally CTC Decode is used to decode the output during Prediction.
- Project consists of Three steps:
- Tensorflow 1.8.0
- Flask
- Numpy
- OpenCv 3
- Spell Checker
autocorrect
>=0.3.0pip install autocorrect
- IAM dataset download from here
- Only needed the lines images and lines.txt (ASCII).
- Place the downloaded files inside data directory
You can find trained model to download from here. Download and extract all files inside the model/
directory.
To Train the model from scratch
$ python main.py --train
To validate the model
$ python main.py --validate
To Prediction
$ python main.py
Run in Web with Flask
$ python upload.py
Validation character error rate of saved model: 8.654728%
Python: 3.6.4
Tensorflow: 1.8.0
Init with stored values from ../model/snapshot-24
Without Correction clothed leaf by leaf with the dioappoistmest
With Correction clothed leaf by leaf with the dioappoistmest
Prediction output on IAM Test Data
Prediction output on Self Test Data
- Line segementation can be added for full paragraph text recognition.
- Better Image preprocessing such as: reduce backgoround noise to handle real time image more accurately.
- Better Decoding approach to improve accuracy. Some of the CTC Decoder found here
This is part of my last semester project of Computer Science Program From Tribhuvan University. July 2019