Analysis of handwriting has been a common exercise in the fields of Computer Vision and Artificial Intelligence. However, much of the research that has been done involving handwriting has been focused more on discerning the contents of the text and less on discerning information about the writers. Research on handwriting identification could be a useful tool for crime investigations or forensics, meaning that this lack of research is a missed opportunity for development in these fields. One 2018 study proposed a method of handwriting identification based on Cloud of Line Distribution (COLD) features of handwriting that was able to outperform the existing method of identifying nationalities based on English handwriting (Nag, Shivakumara, Yirui, Pal, & Lu). However, the method was only designed to recognize nationalities between five countries that use different scripts in their native languages, so this method would likely not be effective in distinguishing between people with similar backgrounds, which could be a common case when considering suspects for a criminal investigation.
We propose creating software to match a handwriting sample for its author given a group of potential authors and other handwriting samples written by each potential author. We will approach this by implementing a convolutional neural network (CNN). The network will take in data from the IAM handwriting database, which contains handwritten works by over 650 authors, consisting of over 1,500 pages. Each page can be broken down into individual sentences or words, and are labeled with the author (Marti and Bunke, 2002). We break these into randomly generated squares of text, as we would like it to be language-independent for further use on languages that aren’t written in horizontal lines, from right to left, such as Arabic.
Python 3.6.X
Tensorflow
Keras
Sci-kit Learn
matplotlib
glob