For the last part of the project, we divided the work into three parts. The first one was to finish the segmentation of the register. This provides data for the training and testing sets we need for classification. The second part is the actual digit recognition. So far, we have tested two main approaches for classification: support vector machines (SVM) and back-propagation neural networks (BPNN). Finally, the last part is the web service which allows users to submit an image containing a single number to get the actual value of the number it contains as well as additional useful information.
At the time of the second progress report, we were stuck on the segmentation part of the project. The main issue was to accommodate the distortion that occurred on the scanned images of the register. We finally found a way to adapt our segmentation algorithm to handle most of the pages correctly. Unfortunately, given our poor initial knowledge in computer vision, after several weeks of improving our algorithms, we found we are unable to achieve a perfect rate of correctly segmented numbers in the register. Even so, this should not a big issue. Indeed, these extracted numbers are only used along the MNIST database to create our testing and training sets. So if we miss some of them in an unbiased way, this should not have a significant impact on the digit recognition phase. Moreover we are using a framework that allows us to manually and efficiently segment numbers to get the ones we miss if necessary.
After the extraction of the numbers from the register, we will need to manually label them if we want to use them to train our classifiers. To this end, a framework similar to the one used for manual extraction allows us to label the different numbers easily.
As mentioned in the last blog post, we will use the MNIST database as the main training dataset. We will use it as is (i.e with no transformation or pre-processing which would allow to generate even more data). However, training with the MNIST introduces a new constraint: we will need to restrict the size of the input digits. We will need to adjust the size of each digit image so it has a size of 28×28 pixels. For images that are much smaller or larger the resizing may decrease the performance of the classifiers.
For the SVM part, we will use scikit-learn (a Python library dedicated to machine learning). We did not implement this classifier yet at the time of this report.
For the neural network, after some research on the Internet, we decided to try one developed by Evert von Nieuwenburg, a PhD student from ETHZ. It is a back-propagation neural network which uses sigmoid neurons and a cross-entropy cost function. Back-propagation is a supervised machine learning algorithm that consists of two phases (output generation and weight update), it allows a fast computation of the gradient of a cost-function. Sigmoid neurons are neurons whose activation patterns follow a sigmoid curve (instead of an all-or-nothing activation). The cross-entropy cost function averages the cross-entropy of each sample, which is a measure of similarity between the predicted and true label. For now, no optimization has been done on it. Still, it achieved a precision of 97.81% on unseen MNIST test data. While not perfect, this will serve as a basis for now. The performance is likely to drop when using data from the Venitian register. Improvement is possible though, the book ‘Neural Networks and Deep Learning’, by Michael Nielsen introduces some ideas that could help us to achieve precision above 98%.
The last part of the project is to implement a web-service capable of recognizing handwritten numbers inside a given cropped image. Basically the process is based on the research and the code that we wrote when identifying and recognizing the numbers of the Sommarioni. This way, it is reusable for other projects similar to ours. An application could be for instance to extract numbers from the Napoleonic Cadaster and send them to the web service to be recognized by the and reuse the output for other purposes.
The way the service works for now is the following: in a web page the user has the possibility to upload the cropped image containing the number to be recognized, once uploaded the actual recognition process is done by the server. The result is then available for the user to download as a JSON file containing the value and a confidence index.
To achieve this goal we decided to develop the service as a Flask application. Flask is a Python micro-framework which is perfect for small web applications as the one we want to create. This last week we managed to setup the skeleton of the web-service. At the moment the user is capable of uploading an image, then a dummy program on the server does some operations on the image, and finally it displays the result to the user. In the future we will need to make a RESTful API and integrate the code that recognizes numbers.
There is still much work to be done, but it mainly consists in putting the pieces together. Thus, we should be ready for the final presentation which is due next month