Gábor Melis' () blog - December 2009

Older posts

Introduction to MGL (part 3)

In the previous part we went through a trivial example of a backprop network. I said before that the main focus is on Boltzmann Machines so let's kill the suspense here and now by cutting straight to the heart of the matter.

Cottrell's Science article provides a clear and easy to follow description of the spiral problem that we are going to implement. The executive summary is that we want to train an auto-encoder: a network that reproduces its input as output with a small encoding layer somewhere in between. By forcing the information through the bottleneck of the encoding layer the network should pick up a low dimensional code that represents the input, thus performing dimensionality reduction.

The function under consideration is f(x) = [x, sin(x), cos(x)]. It is suprisingly difficult to learn the mapping from x to f(x). A network architecture that is able to represent this transformation has 3 inputs, 10 neurons in the next layer, 1 neuron in the encoding layer, 10 neurons again in the reconstruction part and 3 in the output layer. However, randomly initialized backpropagation fails at learning this; a better solution is to first learn a Deep Belief Network, "unroll" it to a backprop network and use backprop to fine tune the weights.

Read more

Introduction to MGL (part 2)

Having been motivated, today we are going to walk through a small example and touch on the main concepts related to learning within this library.

At the top of food chain is the generic function TRAIN:

 (defgeneric train (sampler trainer learner)
   (:documentation "Train LEARNER with TRAINER on the examples from
 SAMPLER. Before that TRAINER is initialized for LEARNER with
 INITIALIZE-TRAINER. Training continues until SAMPLER is finished."))

Read more

Introduction to MGL (part 1)

This is going to be the start of an introduction series on the MGL Common Lisp machine learning library. MGL focuses mainly on Boltzmann Machines (BMs). In fact, the few seemingly unrelated things it currently offers (gradient descent, conjugate gradient, backprop) are directly needed to implement the learning and fine tuning methods for different kinds of BMs. But before venturing too far into specifics, here is a quick glimpse at the bigger picture and the motivations.

Most of the current learning algorithms are based on shallow architectures: they are fundamentally incapable of basing higher level concepts on other, learned concepts. The most prominent example of succesful shallow learners is Support Vector Machines, for which there is a simple CL wrapper around libsvm, but that's a story for another day.

On the other hand, deep learners are theorethically capable of building abstraction on top of abstraction, the main hurdle in front of their acceptance being that they don't exist or - more precisely - we don't know how to train them.

Read more