Sunday, January 31, 2010

convolutional neural networks adventures

One of my past projects was a handwriting math equation editor.
For it I needed a handwriting recognition library that should recognize a wide range of mathematical symbols. I could not find a recognizer like this; the solution was to write my own. I quickly wrote a DTW algorithm (a viterbi matcher) using a couple of sample of my own writing as templates. For the immediate purpose of a demonstration it was enough, but my interest was piqued and I spent a long time looking at various algorithms and their strengths and weaknesses.

After a time I decided to investigate Convolutional Neural Networks. They are very roughly inspired from the way the animal vision works. The authority on the subject is Yann LeCun who introduced them in the 90’s. Some years ago, Patrice Simard from Microsoft worked on some newer implementations. I could not find the actual code for any of their articles (a much too often encountered situation in computer science). As luck has it, there is a nice article on codeproject by Mike O’Neill about his implementation of CNN, and the best part is that in this case the code is available. The problem with O’neill-s code is that is quite slow; the code is not written for speed and it makes heavy use of pointers to describe the CNN structure.
I quickly wrote a prototype in C# that was quite simple and fast. Alas, it was not fast enough for my impatience; to properly train a CNN, at least 10-20 epochs are needed. My prototype was taking 5 minutes for each epoch – which was a lot because I needed to make lots of experiments to find and correct bugs.
That was the occasion when I found how difficult is to debug machine learning programs. If a normal program has a bug, then its output is completely fucked up and you know that something is wrong. If a machine learning program has an error, the algorithm will learn around it, and it will still spill out a reasonable answer. The difference is that the capability of the program is reduced, and the error rate is bigger – the algorithm cannot learn as well.
Fortunately, there a couple of techniques that can be applied to test the correctness of a NN implementation. Unfortunately I was too lazy to implement them.

After thinking a little about how to optimize the initial C# version, I proceeded to rewrite the library in C, with the intention to add some SSE fragments in the future (which I did).
After I played with it a little (quite pleasant activity) and solved some little bugs, made some measurements and optimized here and there (caching was a big problem) I was satisfied with results. By then I no longer had enough free time because of a new job, so I stashed the project for later.

Time passed and I started thinking about putting some of my projects on the net, and the cnn project looked like a good candidate.
Once again I’ve corrected some bugs then cleaned up the code and the interface for the trainer, wrote some documentation, and added the project to codeplex. I called the project fastCNN.
For the interested parties, the whole project (source + binaries + documentation) is at http://fastcnn.codeplex.com/.
During development, I used the MNIST dataset (handwritten digits) to evaluate the network performance and guess at the presence/absence of bugs.
For my other projects I started gathering data for handwritten mathematical symbols.
Another application that I though about is OCR. I was planning on working on an OCR application for pictures of text made with digital cameras, but never found the time and energy. CNN’s are a great match for this kind of application.

Yann LeCun publications

No comments: