Handwritten Character Segmentation

Introduction

The purpose of this blog post is to be briefly summarize the progresses and intermediary results we obtained in the last few weeks.

Progresses

As we anticipated in our last blog post, we started the implementation stage of our project.

Recovering the drawing order of a handwritten script is a difficult task that requires some preprocessing steps before solving the main problem. Since the drawing order recovery algorithm requires an image composed of lines with a width of one pixel, one of the most critical preprocessing steps in the algorithm is to get the skeleton (a thinner version) of our image.

There exist plenty of those thinning algorithm out there. Most of them are written in c++  rely on the OpenCV library. However, for the majority of the cases, the resulting image skeleton comes with undesirable artifacts, which can be very problematic, specially in our case when we are dealing with handwritten scripts. And that is the reason why we spent the last few weeks trying to find a good thinning algorithm with as few artifacts as possible that we can use in our handwritten segmentation project.

After some research and comparison between the different algorithms, we managed to build some good code. What we essentially did was mixing these algorithms while trying to keep their good features, in order to get to most out of them. If course, many attempts and tests were necessary to refine our algorithm.

Finally we tested the resulting thinning algorithm on an image of ancient handwritten text relevant to our project and the obtained result seems fairly good. The images below show the process: first the images is turned to a binary black and white image. Then it is possible to apply the thinning algorithm.

word3.0

word3

skeleton3

As we can see, we eliminated clusters and undesired short lines that should not be there. Of course there is still room for improvement, but we fell that we are satisfied with these results and that we can move on to the next step of our project.

Future Plans

The next step is to implement the algorithm to recover the drawing order for these skeleton images. The paper we read [1] gave us the logical fundamentals to approach this problem. However, building a code by ourselves would be beyond our capabilities. This is the reason why we contacted the authors of such paper, as they claimed to have successfully implement their algorithm in Java. However, we received no response yet, so if we are not going to hear from them by the end of the Easter, we will ask for new directions to our TA.


[1] Yoshiharu Kato and Makoto Yasuhara, Recovery of Drawing Order from Single-Stroke Handwriting Images, IEEE, 2000