Handwritten Text Recognition using Hidden Markov Models

The goal of this project is to try some of the most recent and promising algorithms in Machine Learning for handwritten recognition and apply it to the datasets from the Venice time machine. The algorithms will be based on the Hidden Markov Models, and are already implemented in a framework from the University of Aachen. After taking some time to understand the theory behind the Hidden Markov Models, most of the work will be to tune the available parameters to try to get the most of the algorithm on the dataset. If some time is remaining after trying the Hidden Markov Models, we would like to compare the results with other algorithms, like some based on Recursive Neural Networks.

The result of our work will therefore be the code for the algorithms, and our results on applying the algorithms on the dataset.

Methodology

Introduction: a machine learning project

As it was presented in the previous abstract of our project, the project is essentially focused on machine learning. This means that a machine learning methodology will have to be applied to it. Hopefully we are quite familiar to it thanks to the courses in Machine Learning and Applied Machine Learning we have both followed at the EPFL.

We plan to tackle the project in different steps. The first step will be to understand the method we want to apply, and read different papers about it. After having done this first part, it is then essential to get used to the dataset we are going to work with. Indeed, if we go directly down-headed into the project, we might quickly get lost because there is no geometrical nor other practical representation of what we will be working with. Thus it is mandatory to at least have a good idea of the abstraction.

Only then, we will really dive in the projects itself and apply the methods to our datasets and put the hands in the code. We will not implement from scratch the methods. Instead, we will focus on applying the code developed by the University of Aachen (RWTH)  (here) to our dataset.

We hope that we will have some time to try other methods on our dataset, to make comparisons, and determine wich algorithm seems to be the more adapted to the problem.

1.      Learning the theory on machine learning with “Hidden Markov Models”

The machine learning method we are going to use is based on the hidden Markov models, which we will call HMM for more convenience.

The difficulty here will be double and we will have to go onwards in several steps. Not only will we need to master the basic theory about Markov models before applying it to machine learning, but we’ll then have also to understand how the RWTH is working before implementing our own framework.

This second part might well be more difficult than it looks, depending on the clarity of the code at our disposal.

This is why we think that two to four weeks will be necessary to complete this initial part of the project. It is really important that this part is well done, since it is basically the foundation of all that will follow, hence it needs to be well mastered and well done.

2.      Implementing the HMM-based methods to our dataset

After having understood the method and the code provided by the University of Aachen, we will then have to implement it in a way to be able to apply it to our handwritten samples database. We will have to use the format given to us and apply the HMM method to it.

The first part is to be able to handle the pictures in numerical. A priori, we will not need to do segmentation of any kind and will only focus on learning the script. But, what we will maybe have to do, is possibly some kind of preprocessing. Indeed, it necessary to make the representative feature “stick out”, before doing any kind of machine learning. Indeed, the algorithms usually rely on this features, by representing it numerically, to be able to classify the samples.

Once we are done with this first step, we will have to examine the numerical representation of our database, in order to then be able to choose wisely the parameters we have to tune in the machine learning algorithm. Indeed, it is often the only feasible way to have an idea in which range to look for the parameters. This is vital, because machine learning often involves many dimensions (at least more than three) and the human mind is blind to intuition when it comes to too many dimensions. Hence the necessity for numerical abstraction.

The time we will need to execute this task is hardly quantifiable because of all the possible unforeseen difficulties which could present to us, but we think it will take another three to six weeks. Which theoretically leaves us some remaining time.

3.      Comparison with other methods (Recursive Neural Networks)

This part of the project is not yet well defined and will depend on the remaining time and/or the subject chosen by other groups.

If eventually the corresponding subject has been taken by another group, we will be able to compare the different results(which doesn’t require too much work).

If not, it might be interesting to try to test ourselves the RNN, depending on the time we still have. Some part of the work will already have been done, since we will already be familiar with the dataset and have some intuition on it.

Conclusion and project plan

As a conclusion, the final outcome of the project will be somehow dependent of what happens with RNN method. Still, we should be able to deliver at least the results concerning the application of hidden Markov models to the learning of the dataset of handwritten script.

ProjectPlan