Handwritten Character Segmentation

Introduction

In the context of the Venice atlas project, automatic recognition of handwritten text is a crucial issue. And to achieve a good text recognition, a common approach involves the segmentation of the handwritten script into individual characters which can then be identified by the usage of a character recognition algorithm. Ultimately, the goal of this project is thus to achieve a robust segmentation of handwritten text into separate characters.

A binary classifier for text segmentation

The main idea is to build a binary classifier to solve the segmentation problem. One assumption is that the text has already been segmented into individual lines which is relatively easy to achieve. Then, at each point along the line, a set of features is extracted and we ask ourselves the question whether we should cut the line or not at a specific point. The binary classifier would answer that question and our task is to propose a set of relevant features which allow for a good text segmentation.

The crucial point is that character segmentation is easier if we have information about the trajectory of the writing. For example, it is possible to do character segmentation and recognition on modern gadgets like tablets because one can take into account information about the speed and the acceleration at each point in time when the person was writing the text. Of course the material we deal with is just a 2D image where temporal information is lost. It is just static test, also called offline writing as opposed to online writing for the tablet example. Studies and experiments have proven the difficulties to successfully segment an offline text. This is where the idea of recovering the temporal information about the handwritten text comes in. Indeed, there are ways to recover the drawing order from a static text, which could then be used as a feature when performing the segmentation. Moreover, the temporal information will possibly be useful also for character recognition. However, this goes beyond the scope of this project.

Connections with Graph Theory

Our first source of inspiration was a paper by Yoshiharu Kato and Makoto Yasuhara [1]. They propose a strategy that relies on a strong and rigorous graph theory. To keep things simple, we are now dealing with text written in a single stroke. By stroke we mean a trace of pen-tip movement which starts at pen-down and ends at pen-up. We read an interesting article explaining a method to recover the drawing order. In graph theory this corresponds to a semi-Eulerian graph. An example is shown on the right. Semi-Eulerian DHgraphs are well understood. Their main property is indeed the fact that the two endpoints of the stroke can be connected traversing all of the edges exactly once. However, such path is not unique and our goal is to identify the ‘right’ one among all possible alternatives. There are many ways to approach this problem. Considering all possible paths has been proven to be computationally too expensive as it often leads to combinatory explosions.

There is a simpler and more elegant solution. The main property of semi-Eulerian graphs is that all the vertices have degree 4. The only exception is for the endpoints, that have degree 1. By degree of a vertex we mean the number of his neighboring vertices. Hence our algorithm will take advantage of this property in the following way: start from an endpoint, follow the edges, and when encountering an intersection, out the 3 possible directions, choose the smoothest, which we suppose the be the middle one. One can see how this assumption is indeed very natural, by imagining the drawing of the number 8.

One important remark is that semi-Eulerian graphs do not cover all possible drawings. We are excluding double-traces lines, which add much complexity to the graphs.

Implementation

Another source of inspiration came from an article by Anuj Sharma [2]. In his paper he follows again the idea of following the smoothest among all possible trajectories, and provided us the MatLab code that would allow us to achieve this. However, this code is still an ongoing project and as such was still incomplete and with a few bugs and conceptual errors. So we still had to do some coding of our own to make this program function properly. Here we’ll briefly explain how it is implemented.

The main problem is that we are given an image and not a graph. In order to get something similar to a graph we first have to perform a thinning algorithm. A good performance of this algorithm is the foundation for a proper recovery of drawing order. In particular, we will later see that at times clusters are created close to intersections. These clusters can compromise the reliability of the whole program. There are many thinning algorithms out there. After some tests, we chose to work with one built in MatLab. Below one skeleton of two cursive letters.
D_all J_all

Now that we obtained a line of width one pixel, we can really start recovering the drawing order. First , we look for the two pixels with only one neighbor. We choose to start with the one located closer to the left margin. Then we proceed by following the pixels which have not be covered yet. We do so until we are in the situation where more than one direction can be taken. We remark that does not imply that we actually arrived at a vertex of degree 4. Indeed, when dealing with pixels, there is plenty of circumstances  where a pixel has more than 2 neighbors. This is why the thinning algorithm is so important. As we stated before, we wish to choose the smoothest path. This is done by comparing the angles of possible directions with the angle of the previous step. The process is iterated until reaching the last pixel.

Results

The good news is that the algorithm indeed works. Below one can see how the drawing order has been successfully recovered for the images we had shown before ( it is necessary to click on the image to properly see the animation. )

draw_D
draw_J

However, it is not as good as it seems. The program relies too much on the thinning algorithm. One paradoxical example is that, given the same image the code performs differently  depending to what size we rescale the image. This is because the intersections can be wrongly interpreted. Below one can see a clear example of this improper behavior.

C_all

We decided to perform an ultimate test on some actual text we wrote ourselves by hand. As one can see the result is partially positive: 3 intersections out of 4 have been interpreted as they should.

draw_file file2

Conclusions

We can declare ourselves satisfied with the results. Hopefully we came one step closer in solving the diffucult problem of handwritten text recognition. The task was challenging and we managed to recover the drawing order to a potentially wide range of handwritten images. Still, there is room for much improvement. More care should be given to the thinning algorithm, as we already mentioned. Moreover, one could extend the code to manage the cases of double-traced lines. Finally, with the aid of machine learning, one could train the program to improve its behavior based on previous experience with handwriting.

References

[1] Yoshiharu Kato and Makoto Yasuhara, Recovery of Drawing Order from Single-Stroke Handwriting Images, IEEE, 2000
[2] Anuj Sharma, Recovery of drawing order in handwritten digit images , IEEE, 2013