<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?>
<feed xmlns="http://www.w3.org/2005/Atom">

  <title type="text">Gábor Melis' () blog</title>
  <subtitle type="text">[No_subtitle]</subtitle>
  <updated>2010-03-01T22:38:50+01:00</updated>
  <id>http://quotenil.com/</id>
  <link rel="alternate" type="text/html" hreflang="en" href="http://quotenil.com/" />
  <link rel="self" type="application/atom+xml" href="http://quotenil.com/atom.xml" />
  <rights>Copyright (c) 2010 Gábor Melis</rights>
  <generator uri="http://www.cognition.ens.fr/~guerry/u/blorg.el" version="0.75e">
    Done with blorg 0.75e -- org-mode 6.21b and GNU Emacs 23.1.1
  </generator>

<entry>
  <title>Google AI Challange 2010 Results</title>
  <link rel="alternate" type="text/html" href="http://quotenil.com/Google-AI-Challange-2010-Results.html"/>
  <id>http://quotenil.com/Google-AI-Challange-2010-Results.html</id>
  <updated>2010-03-01T22:38:50+01:00</updated>
  <published>2010-03-01T00:00:00+01:00</published>
  <author>
    <name>Gábor Melis</name>
    <uri>http://quotenil.com</uri>
    <email>mega@retes.hu</email>
  </author>
  <content type="xhtml" xml:lang="en" xml:base="http://quotenil.com/">
    <div xmlns="http://www.w3.org/1999/xhtml">

<p>
For what has been a fun ride the official results are now <a href="http://csclub.uwaterloo.ca/contest/rankings.php">available</a>.
In the end, 11th out of 700 is not too bad and it's the highest
ranking non-C++ entry by some margin.
</p>


<p>
I entered the contest a bit late with a rather specific approach in
mind: <a href="http://senseis.xmp.net/?UCT">UCT</a>, an algorithm from the Monte Carlo tree search family. It
has been rather successful in Go (and in Hex too, taking the crown
from <a href="http://six.retes.hu/">Six</a>). So with UCT in mind, to serve as a baseline I implemented a
quick <a href="http://en.wikipedia.org/wiki/Minimax">minimax</a> with a simple territory based evaluation function ...
that everyone else in the competition seems to have invented
independently. Trouble was looming because it was doing too well: with
looking ahead only one move (not even considering moves of the
opponent) it played a very nice positional game. That was the first
sign that constructing a good evaluation function may not be as hard
for Tron as it is for Go.
</p>


<p>
But with everyone else doing minimax, the plan was to keep silent and
Monte Carlo to victory. As with most plans, it didn't quite work out.
First, to my dismay, some contestants were attempting to do the same
and kept advertising it on #googleai, second it turned out that UCT is
not suited to the game of Tron. A totally random default policy kept
cornering itself in a big area faster than another player could hit
the wall at the end of a long dead end. That was worrisome, but
fixable. After days of experimentation I finally gave up on it
deciding that Tron is simply too tactical - or not fuzzy enough, if
you prefer - for MC to work really well.
</p>


<p>
Of course, it can be that the kind of default policies I tried were
biased (a sure thing), misguided and suboptimal. But then again, I was
not alone and watched the UCT based players struggle badly. In the
final standings the highest ranking one is jmcarthur in position 105.
One of them even implemented a number of different default policies
and switched between them randomly with little apparent success. Which
makes me think that including a virtual strategy selection move at
some points in the UCT search tree should be interesting, but I
digress.
</p>


<p>
So I went back to minimax, implemented <a href="http://en.wikipedia.org/wiki/Alpha-beta_pruning">Alpha-beta&nbsp;pruning</a> with
principal variation, and <a href="http://en.wikipedia.org/wiki/Iterative_deepening">iterative&nbsp;deepening</a>. It seemed to do
really well on the then current maps whose size was severely reduced
to 15x15 to control the load on the servers. Then I had an idea to
explore how the parities of squares in an area affect the longest path
possible which was quickly pointed out to me over lunch by a friend.
And those pesky competitors have also found and advertised it in the
contest forum. Bah.
</p>


<p>
There were only two days left at this point and I had to pull an all
nighter to finally implement a graph partitioning idea of mine that
unsurprisingly someone has described pretty closely in the forum. At
that point, I finally had the tool to improve the evaluation function
but neither much time or energy remained and I settled for using it
only in the end game where the players are separated.
</p>


<p>
The code itself is as ugly as exploratory code can be, but in the
coming days I'll factor the UCT and the Alpha-beta code out.
</p>
    </div>
  </content>
</entry>


<entry>
  <title>Google AI Challange 2010</title>
  <link rel="alternate" type="text/html" href="http://quotenil.com/Google-AI-Challange-2010.html"/>
  <id>http://quotenil.com/Google-AI-Challange-2010.html</id>
  <updated>2010-03-01T22:38:50+01:00</updated>
  <published>2010-02-11T00:00:00+01:00</published>
  <author>
    <name>Gábor Melis</name>
    <uri>http://quotenil.com</uri>
    <email>mega@retes.hu</email>
  </author>
  <content type="xhtml" xml:lang="en" xml:base="http://quotenil.com/">
    <div xmlns="http://www.w3.org/1999/xhtml">

<p>
Tron is a fun little game of boxing out the opponent and avoiding
crashing into a wall first. The rules are simple so the barrier to
entry into <a href="http://csclub.uwaterloo.ca/contest/index.php">this contest</a> is low. Thanks to <a href="http://www.aerique.net/">aeruiqe</a> who made to Common
Lisp starter pack it took as little as a few hours to get a very bare
bones algorithm going. It's doing surprisingly well: it is number 23
on the <a href="http://csclub.uwaterloo.ca/contest/rankings.php">leaderboard</a> at the moment with 43 wins, 2 losses and 9 draws.
</p>
    </div>
  </content>
</entry>


<entry>
  <title>Upgrade Woes 2</title>
  <link rel="alternate" type="text/html" href="http://quotenil.com/Upgrade-Woes-2.html"/>
  <id>http://quotenil.com/Upgrade-Woes-2.html</id>
  <updated>2010-03-01T22:38:50+01:00</updated>
  <published>2010-02-08T00:00:00+01:00</published>
  <author>
    <name>Gábor Melis</name>
    <uri>http://quotenil.com</uri>
    <email>mega@retes.hu</email>
  </author>
  <content type="xhtml" xml:lang="en" xml:base="http://quotenil.com/">
    <div xmlns="http://www.w3.org/1999/xhtml">

<p>
Debian Squeeze finally got Xorg 7.5 instead of the old and dusty 7.4.
The upgrade was as smooth as ever: <a href="http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=568872">DPI&nbsp;is&nbsp;off</a>, keyboard
repeat for the Caps Lock key
<a href="http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=568868">does&nbsp;not&nbsp;survive&nbsp;suspend/resume</a> and the trackpoint
<a href="http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=568873">stopped&nbsp;working</a>. Synaptics click by tapping went away before the
upgrade so that doesn't count.
</p>
    </div>
  </content>
</entry>


<entry>
  <title>Micmac Initial Release</title>
  <link rel="alternate" type="text/html" href="http://quotenil.com/Micmac-Initial-Release.html"/>
  <id>http://quotenil.com/Micmac-Initial-Release.html</id>
  <updated>2010-03-01T22:38:50+01:00</updated>
  <published>2010-02-06T00:00:00+01:00</published>
  <author>
    <name>Gábor Melis</name>
    <uri>http://quotenil.com</uri>
    <email>mega@retes.hu</email>
  </author>
  <content type="xhtml" xml:lang="en" xml:base="http://quotenil.com/">
    <div xmlns="http://www.w3.org/1999/xhtml">
  From a failed experiment today I salvaged <a href="http://cliki.net/micmac">Micmac</a>, a statistical
  library wannabe, that for now only has Metropolis-Hastings MCMC and
  Metropolis Coupled MCMC implemented. The code doesn't weigh much but
  I think it gets the API right. In other news <a href="http://cliki.net/MGL">MGL</a> v0.0.6 was released.
    </div>
  </content>
</entry>


<entry>
  <title>Deep Boltzmann Machine on MNIST</title>
  <link rel="alternate" type="text/html" href="http://quotenil.com/Deep-Boltzmann-Machine-on-MNIST.html"/>
  <id>http://quotenil.com/Deep-Boltzmann-Machine-on-MNIST.html</id>
  <updated>2010-03-01T22:38:50+01:00</updated>
  <published>2010-01-18T00:00:00+01:00</published>
  <author>
    <name>Gábor Melis</name>
    <uri>http://quotenil.com</uri>
    <email>mega@retes.hu</email>
  </author>
  <content type="xhtml" xml:lang="en" xml:base="http://quotenil.com/">
    <div xmlns="http://www.w3.org/1999/xhtml">

<p>
Let me interrupt the flow of the <a href="http://cliki.net/MGL">MGL</a> introduction series with a short
report on what I learnt playing with <a href="http://www.cs.toronto.edu/~hinton/absps/dbm.pdf">Deep Boltzmann Machines</a>. First,
lots of thanks to Ruslan Salakhutdinov, then at <a href="http://www.cs.toronto.edu/~rsalakhu/">University of Toronto</a>
now at <a href="http://web.mit.edu/~rsalakhu/www/">MIT,</a> for making the Matlab source <a href="http://web.mit.edu/~rsalakhu/www/DBM.html">code</a> for the <a href="http://yann.lecun.com/exdb/mnist/">MNIST</a> digit
classification problem available.
</p>


<p>
The linked <a href="http://www.cs.toronto.edu/~hinton/absps/dbm.pdf">paper</a> claims a record of 99.05% in classification accuracy
on the permutation invariant task (no prior knowledge of geometry). A
previous approach trained a <a href="http://www.scholarpedia.org/article/Deep_belief_networks">DBN</a> in an unsupervised manner and fine
tuned it with backpropagation. Now there is one more step: turning the
DBN into a DBM (Deep Boltzmann Machine) and tune it further before
handing the baton over to backprop. While in a DBN the constituent
RBMs are trained one by one, the DBM is trained as a whole which, in
theory, allows it to reconcile bottom-up and top-down signals, i.e.
what you see and what you think.
</p>

<img alt="mnist-2-dbm.png" src="images/mnist-2-dbm.png"/>


<p>
In the diagram above, as before, dark gray boxes are constants (to
provide the connected chunks with biases), inputs are colored mid gray
while hidden features are light gray. <code>INPUTS</code> is where the 28x28
pixel image is clamped and <code>LABEL</code> is a softmax chunk for the 10 digit
classes.
</p>


<p>
In the Matlab code, there are a number of prominent features that may
or may not be important to this result:
</p>

<ul>
<li>The second RBM gets the the correct label as input which
  conveniently allows tracking classification accuracy during its
  training, but also - more importantly - forces the top-level
  features to be somewhat geared towards reconstruction of labels and
  thus classification.

</li>
<li>A sparsity term is added to the gradient. Sparse representations are
  often better for classification.</li>
</ul>


<p>
Focusing only on what makes DBM learning tick, I tried a few variants
of the basic approach. All of them start with the same DBN whose RBMs
are trained for 100 epochs each:
</p>

<img alt="mnist-2-dbn-training.png" src="images/mnist-2-dbn-training.png"/>


<p>
DBN training finishes with around 97.77%, averaging 97.9% in the last
10 epochs.
</p>


<p>
On to the DBM. As the baseline, the DBM was not trained at all and the
BPN did not get the marginals of the approximate posterior as inputs
as prescribed in the paper, only the normal input. It's as if the DBN
were unrolled into a BPN directly. Surprisingly, this baseline is
already at 99.00% at the end of BPN training (all reported accuracies
are averages from the last 10 epochs of training).
</p>


<p>
The second variant performs DBM training but without any sparsity term
and gets 99.07%. The third is using a sparsity penalty ("normal
sparsity" in the diagram) for units in opposing opposing layers on at
the same time and nets 99.08%. The fourth is just a translation of the
sparsity penalty from the Matlab code. This one is named "cheating
sparsity" because it - perhaps in an effort to reduce variance of the
gradient - changes weights according to the average activation levels
of units connected by them. Anyway, this last one reaches 99.09%.
</p>

<img alt="mnist-2-dbm-training.png" src="images/mnist-2-dbm-training.png"/>

<img alt="mnist-2-bpn-training.png" src="images/mnist-2-bpn-training.png"/>


<p>
To reduce <a href="http://en.wikipedia.org/wiki/Publication_bias">publication bias</a> a bit, let me mention some experiments that
were found to have no effect:
</p>

<ul>
<li>In an effort to see whether DBM training is held back by high
  variance of the gradient estimates a batch size of 1000 (instead of
  100) was tested for a hundred epochs after the usual 500. There was
  no improvement.

</li>
<li>In the BPN label weights and biases were initialized from the DBM.
  This initial advantage diminishes gradually and by the end of
  training there is nothing (+0.01%) between the initialized and
  uninitialized variants. Nevertheless, all results and diagrams are
  from runs with label weights initialized.

</li>
<li>The matlab code goes out of its way to compute negative phase
  statistics from the <span style="text-decoration:underline;">expectations</span> of the units in <code>F1</code> and <code>F2</code>
  supposedly to help with variance of estimates and this turned out to
  be very important: with the same calculation based on the sampled
  values DBM classification deteriorated. Using the expectations for
  chunks <code>INPUTS</code> and <code>LABEL</code> did not help, though.</li>
</ul>


<p>
What I take home from these experiments is that from the considerable
edge of DBM over DBN training only a small fraction remains by the end
of BPN training and that the additional sparsity constraint accounts
for very little in this setup.
</p>
    </div>
  </content>
</entry>


<entry>
  <title>Introduction to MGL (part 3)</title>
  <link rel="alternate" type="text/html" href="http://quotenil.com/Introduction-to-MGL-(part-3).html"/>
  <id>http://quotenil.com/Introduction-to-MGL-(part-3).html</id>
  <updated>2010-03-01T22:38:50+01:00</updated>
  <published>2009-12-29T00:00:00+01:00</published>
  <author>
    <name>Gábor Melis</name>
    <uri>http://quotenil.com</uri>
    <email>mega@retes.hu</email>
  </author>
  <content type="xhtml" xml:lang="en" xml:base="http://quotenil.com/">
    <div xmlns="http://www.w3.org/1999/xhtml">

<p>
In the <a href="http://quotenil.com/Introduction-to-MGL-(part-2).html">previous part</a> we went through a trivial example of a backprop
network. I said before that the main focus is on Boltzmann Machines so
let's kill the suspense here and now by cutting straight to the heart
of the matter.
</p>


<p>
<a href="http://cseweb.ucsd.edu/users/gary/pubs/cottrell-science-2006.pdf">Cottrell's&nbsp;Science&nbsp;article</a> provides a clear and easy to
follow description of the spiral problem that we are going to
implement. The executive summary is that we want to train an
auto-encoder: a network that reproduces its input as output with a
small encoding layer somewhere in between. By forcing the information
through the bottleneck of the encoding layer the network should pick
up a low dimensional code that represents the input, thus performing
dimensionality reduction.
</p>


<p>
The function under consideration is <code>f(x) = [x, sin(x), cos(x)]</code>. It
is suprisingly difficult to learn the mapping from <code>x</code> to <code>f(x)</code>. A
network architecture that is able to represent this transformation has
3 inputs, 10 neurons in the next layer, 1 neuron in the encoding
layer, 10 neurons again in the reconstruction part and 3 in the output
layer. However, randomly initialized backpropagation fails at learning
this; a better solution is to first learn a Deep Belief Network,
"unroll" it to a backprop network and use backprop to fine tune the
weights.
</p>


<p>
A <a href="http://www.scholarpedia.org/article/Boltzmann_machine#Learning_deep_networks_by_composing_restricted_Boltzmann_machines">Deep&nbsp;Belief&nbsp;Network</a> is just a stack of
<a href="http://www.scholarpedia.org/article/Boltzmann_machine#Restricted_Boltzmann_machines">Restricted&nbsp;Boltzmann&nbsp;Machines</a>. An RBM is a BM restricted to
be a two layer network with no intralayer connections. The "lower"
layer is called visible and the "higher" layer is called hidden layer,
because from the point of view of a single RBM it is the visible layer
that's connected to - maybe indirectly - to external stimuli. In the
upward pass of a DBN, where the low level representations are
subsequently transformed into higher level ones by the constituent
RBMs, the values of the hidden units are clamped onto the visible
units of the next RBM. In other words, an RBM shares its visible and
hidden layers with the hidden and visible layers of the RBM below and
above, respectively, respectively.
</p>


<p>
Let's start with a few utility functions:
</p>


<p>
<pre>
 (defun sample-spiral ()
   (random (flt (* 4 pi))))
 
 (defun make-sampler (n)
   (make-instance 'counting-function-sampler
                  :max-n-samples n
                  :sampler #'sample-spiral))
 
 (defun clamp-array (x array start)
   (setf (aref array (+ start 0)) x
         (aref array (+ start 1)) (sin x)
         (aref array (+ start 2)) (cos x)))
 
 (defun clamp-striped-nodes (samples striped)
   (let ((nodes (storage (nodes striped))))
     (loop for sample in samples
           for stripe upfrom 0
           do (with-stripes ((stripe striped start))
                (clamp-array sample nodes start)))))
</pre>
</p>



<p>
Subclass <code>RBM</code> and define <code>SET-INPUT</code> using the above utilites:
 
<pre>
 (defclass spiral-rbm (rbm) ())
 
 (defmethod mgl-train:set-input (samples (rbm spiral-rbm))
   (let ((chunk (find 'inputs (visible-chunks rbm) :key #'name)))
     (when chunk
       (clamp-striped-nodes samples chunk))))
</pre>
</p>



<p>
Define the DBN as a stack of two RBMs: one between the 3 inputs and 10
hidden features, the other between the 10 hidden features and the
encoding layer that's unsurprisingly has a single neuron:
 
<pre>
 (defclass spiral-dbn (dbn)
   ()
   (:default-initargs
    :layers (list (list (make-instance 'constant-chunk :name 'c0)
                        (make-instance 'gaussian-chunk :name 'inputs :size 3))
                  (list (make-instance 'constant-chunk :name 'c1)
                        (make-instance 'sigmoid-chunk :name 'f1 :size 10))
                  (list (make-instance 'constant-chunk :name 'c2)
                        (make-instance 'gaussian-chunk :name 'f2 :size 1)))
     :rbm-class 'spiral-rbm))
</pre>
</p>



<p>
Note that by default, each pair of visible and hidden chunks is
connected by a <code>FULL-CLOUD</code> that's the simplest kind of connection.
<code>INPUTS</code> via the cloud between <code>INPUTS</code> and <code>F1</code> contributes to the
activation of <code>F1</code>: in the upward pass the values found in <code>INPUTS</code>
are simply multiplied by a matrix of weights and the result is added
to the activation of <code>F1</code>. Downward pass is similar.
</p>


<p>
Once the activations are calculated according to what the clouds
prescribe, chunks take over control. Each chunk consists of a number
of nodes and defines a probability distribution over them based on the
activations. For instance, <code>SIGMOID-CHUNK</code> is a binary chunk: each
node can take the value of 0 or 1 and the probability of 1 is <code>1 /
(1 + e^(-x))</code> where <code>X</code> is the activation of the node.
</p>


<p>
Nodes in a <code>GAUSSIAN-CHUNK</code> are normally distributed with means equal
to their activations and unit variance. In <code>SPIRAL-DBN</code> above the
<code>INPUTS</code> and the final code, <code>F2</code>, are gaussian.
</p>


<p>
Let's check out how it looks:
</p>


<p>
<pre>
 (let* ((dbn (make-instance 'spiral-dbn))
        (dgraph (cl-dot:generate-graph-from-roots dbn (chunks dbn))))
   (cl-dot:dot-graph dgraph "spiral-dbn.png" :format :png))
</pre>
</p>


<img alt="spiral-dbn.png" src="images/spiral-dbn.png"/>


<p>
In a box the first line shows the class of the chunk and the number of
nodes in parens (omitted if 1), while the second line is the name of
the chunk itself. The constant chunks - in case you wonder - provide
the connected chunks with a bias. So far so good. Let's train it RBM
by RBM:
</p>


<p>
<pre>
 (defclass spiral-rbm-trainer (rbm-cd-trainer) ())
 
 (defun train-spiral-dbn (&key (max-n-stripes 1))
   (let ((dbn (make-instance 'spiral-dbn :max-n-stripes max-n-stripes)))
     (dolist (rbm (rbms dbn))
       (train (make-sampler 50000)
              (make-instance 'spiral-rbm-trainer
                             :segmenter
                             (repeatedly (make-instance 'batch-gd-trainer
                                                        :momentum (flt 0.9)
                                                        :batch-size 100)))
              rbm))
     dbn))
</pre>
</p>



<p>
Now we can unroll the DBN to a backprop network and add the sum of the
squared differences between the inputs and the reconstructions as the
error:
</p>


<p>
<pre>
 (defclass spiral-bpn (bpn) ())
 
 (defmethod mgl-train:set-input (samples (bpn spiral-bpn))
   (clamp-striped-nodes samples (find-lump (chunk-lump-name 'inputs nil) bpn)))
 
 (defun unroll-spiral-dbn (dbn &key (max-n-stripes 1))
   (multiple-value-bind (defs inits) (unroll-dbn dbn)
     (let ((bpn-def `(build-bpn (:class 'spiral-bpn
                                        :max-n-stripes ,max-n-stripes)
                       ,@defs
                       (sum-error (->sum-squared-error
                                   :x (lump ',(chunk-lump-name 'inputs nil))
                                   :y (lump ',(chunk-lump-name
                                               'inputs
                                               :reconstruction))))
                       (my-error (error-node :x sum-error)))))
       (let ((bpn (eval bpn-def)))
         (initialize-bpn-from-bm bpn dbn inits)
         bpn))))
</pre>
</p>



<p>
The BPN looks a whole lot more complicated, but it does nothing more
than performing a full upward pass in the DBN and a full downward
pass:
</p>


<p>
<pre>
 (let* ((dbn (make-instance 'spiral-dbn))
        (bpn (unroll-spiral-dbn dbn))
        (dgraph (cl-dot:generate-graph-from-roots bpn (lumps bpn))))
   (cl-dot:dot-graph dgraph "spiral-bpn.png" :format :png))
</pre>
</p>


<img alt="spiral-bpn.png" src="images/spiral-bpn.png"/>


<p>
Training it is as easy as:
</p>


<p>
<pre>
 (defclass spiral-bp-trainer (bp-trainer) ())
  
 (defun train-spiral-bpn (bpn)
   (train (make-sampler 50000)
          (make-instance 'spiral-bp-trainer
                         :segmenter
                         (repeatedly
                           (make-instance 'batch-gd-trainer
                                          :learning-rate (flt 0.01)
                                          :momentum (flt 0.9)
                                          :batch-size 100)))
          bpn)
   bpn)
</pre>
</p>



<p>
I'm tempted to dwell on pesky little details such as tracking errors,
but this entry is long enough already. Instead, load the <code>mgl-example</code>
system and see what <code>example/spiral.lisp</code> has in addition to what was
described. Evaluate the block commented forms at the end of the file
to see how training goes.
</p>
    </div>
  </content>
</entry>


<entry>
  <title>Introduction to MGL (part 2)</title>
  <link rel="alternate" type="text/html" href="http://quotenil.com/Introduction-to-MGL-(part-2).html"/>
  <id>http://quotenil.com/Introduction-to-MGL-(part-2).html</id>
  <updated>2010-03-01T22:38:50+01:00</updated>
  <published>2009-12-17T00:00:00+01:00</published>
  <author>
    <name>Gábor Melis</name>
    <uri>http://quotenil.com</uri>
    <email>mega@retes.hu</email>
  </author>
  <content type="xhtml" xml:lang="en" xml:base="http://quotenil.com/">
    <div xmlns="http://www.w3.org/1999/xhtml">

<p>
Having <a href="http://quotenil.com/Introduction-to-MGL-(part-1).html">been motivated</a>, today we are going to walk through a small
example and touch on the main concepts related to learning within this
library.
</p>


<p>
At the top of food chain is the generic function <code>TRAIN</code>:
</p>


<p>
<pre>
 (defgeneric train (sampler trainer learner)
   (:documentation "Train LEARNER with TRAINER on the examples from
 SAMPLER. Before that TRAINER is initialized for LEARNER with
 INITIALIZE-TRAINER. Training continues until SAMPLER is finished."))
</pre>
</p>



<p>
A learner is anything that can be taught, which currently means it's
either a <a href="http://en.wikipedia.org/wiki/Backpropagation">backpropagation&nbsp;network</a> (<code>BPN</code>) or some kind of
boltzmann machine (<code>BM</code>). The method with which a learner is trained
is decoupled from the learner itself and lives in the trainer object.
This makes it cleaner to support multiple learning methods for the
same learner: for instance, either gradient descent (<code>BP-TRAINER</code>) or
conjugate gradients (<code>CG-BP-TRAINER</code>) can be used to train a BPN, and
either contrastive divergence (<code>RBM-CD-TRAINER</code>) or persistent
contrastive divergence (<code>BM-PCD-TRAINER</code>) can be used to train a
restricted boltzmann machine (<code>RBM</code>).
</p>


<p>
The function <code>TRAIN</code> takes training examples from <code>SAMPLER</code> (observing
the batch size of the trainer, if applicable) and calls <code>TRAIN-BATCH</code>
with the list of examples, the trainer and the learner. This may be as
simple as:
</p>


<p>
<pre>
 (defmethod train (sampler (trainer bp-trainer) (bpn bpn))
   (while (not (finishedp sampler))
     (train-batch (sample-batch sampler (n-inputs-until-update trainer))
                  trainer bpn)))
</pre>
</p>



<p>
Ultimately, <code>TRAIN-BATCH</code> arranges for the training examples to be
given as input to the learner ("clamped" on the input nodes of some
network) by <code>SET-INPUT</code>; exactly how this should be done must be
customized. Then, in the case of <code>BP-TRAINER</code>, the gradients are
calculated and added to the gradient accumulators that live in the
trainer. When the whole batch is processed the weights of the network
are updated according to the gradients.
</p>


<p>
Let's put together a toy example:
</p>


<p>
<pre>
 (use-package :mgl-util)
 (use-package :mgl-train)
 (use-package :mgl-gd)
 (use-package :mgl-bp)
 
 (defclass linear-bpn (bpn) ())
 
 (defparameter *matrix*
   (matlisp:make-real-matrix '((1d0 2d0) (3d0 4d0) (5d0 6d0))))
 
 (defparameter *bpn*
   (let ((n-inputs 3)
         (n-outputs 2))
     (build-bpn (:class 'linear-bpn)
       (input (input-lump :size n-inputs))
       (weights (weight-lump :size (* n-inputs n-outputs)))
       (product (activation-lump :weights weights :x input))
       (target (input-lump :size n-outputs))
       (sse (->sum-squared-error :x target :y product))
       (my-error (error-node :x sse)))))
 
 (defmethod set-input (samples (bpn linear-bpn))
   (let* ((input-nodes (nodes (find-lump 'input bpn)))
          (target-nodes (nodes (find-lump 'target bpn)))
          (i-v (storage input-nodes)))
     (assert (= 1 (length samples)))
     (loop for i below (length i-v) do
           (setf (aref i-v i) (elt (first samples) i)))
     ;; TARGET-NODES = INPUT-NODES * *MATRIX*
     (matlisp:gemm! 1d0 (reshape2 input-nodes 1 3) *matrix*
                    0d0 (reshape2 target-nodes 1 2))))
 
 (defun sample-input ()
   (loop repeat 3 collect (random 1d0)))
 
 (train (make-instance 'counting-function-sampler
                       :sampler #'sample-input
                       :max-n-samples 10000)
        (make-instance 'bp-trainer
                       :segmenter
                       (repeatedly
                         (make-instance 'batch-gd-trainer
                                        :learning-rate (flt 0.01)
                                        :momentum (flt 0.9)
                                        :batch-size 10)))
        *bpn*)
</pre>
</p>



<p>
We subclassed <code>BPN</code> as <code>LINEAR-BPN</code> and hanged a <code>SET-INPUT</code> method on
it. The <code>SAMPLES</code> argument will be a sequence of samples returned by
the sampler passed to <code>TRAIN</code>, that is, what <code>SAMPLE-INPUT</code> returns.
</p>


<p>
The network multiplies <code>INPUT</code> taken as a 1x3 matrix by <code>WEIGHTS</code>
(initialized randomly) and the training aims to minimize the squared
error as calculated by the lump named <code>SSE</code>. Note that <code>SET-INPUT</code>
clamps both the real input and the target.
</p>


<p>
We instantiate <code>BP-TRAINER</code> that inherits from <code>SEGMENTED-GD-TRAINER</code>.
Now, <code>SEGMENTED-GD-TRAINER</code> itself does precious little: it only
delegates training to child trainers where each child is supposed to
be a <code>GD-TRAINER</code> (with all the usual knobs such as learning rate,
momentum, weight decay, batch size, etc). The mapping from segments
(bpn lumps here) of the learner to gd trainers is provided by the
function in the <code>:SEGMENTER</code> argument. By using <code>REPEATEDLY</code>, for now,
we simply create a distinct child trainer for each weight lump as it
makes a function that on each call evaluates the form in its body (as
opposed to <code>CONSTANTLY</code>).
</p>


<p>
That's it without any bells and whistles. If all goes well <code>WEIGHTS</code>
should be trained to be equal to <code>*MATRIX*</code>.
Inspect <code>(nodes (find-lump 'weights *bpn*))</code> to verify.
</p>


<p>
Impatience satisfied, examine the <code>BUILD-BPN</code> form in detail. The
<code>:CLASS</code> argument is obvious, and the rest of the forms are a sequence
of bindings like in a <code>LET*</code>. The extra touches are that the name of
the variable to which a lump is bound is going to be supplied as the
<code>:NAME</code> of the lump and an extra <code>MAKE-INSTANCE</code> is added so
</p>


<p>
<pre>
 (input (input-lump :size n-inputs))
</pre>
</p>



<p>
is something like
</p>


<p>
<pre>
 (make-instance 'input-lump :name 'input :size n-inputs)
</pre>
</p>



<p>
One can replicate this with <code>MAKE-INSTANCE</code> and <code>ADD-LUMP</code>, but it's
more work. For ease of comprehension the network can be visualized by
loading the <code>mgl-visuals</code> system and:
</p>


<p>
<pre>
 (let ((dgraph (cl-dot:generate-graph-from-roots *bpn* (lumps *bpn*))))
   (cl-dot:dot-graph dgraph "linear-bpn.png" :format :png))
</pre>
</p>


<img alt="linear-bpn.png" src="images/linear-bpn.png"/>


<p>
That's it for today, thank you for your kind attention.
</p>
    </div>
  </content>
</entry>


<entry>
  <title>Introduction to MGL (part 1)</title>
  <link rel="alternate" type="text/html" href="http://quotenil.com/Introduction-to-MGL-(part-1).html"/>
  <id>http://quotenil.com/Introduction-to-MGL-(part-1).html</id>
  <updated>2010-03-01T22:38:50+01:00</updated>
  <published>2009-12-02T00:00:00+01:00</published>
  <author>
    <name>Gábor Melis</name>
    <uri>http://quotenil.com</uri>
    <email>mega@retes.hu</email>
  </author>
  <content type="xhtml" xml:lang="en" xml:base="http://quotenil.com/">
    <div xmlns="http://www.w3.org/1999/xhtml">

<p>
This is going to be the start of an introduction series on the <a href="http://cliki.net/MGL">MGL</a>
Common Lisp machine learning library. MGL focuses mainly on
<a href="http://en.wikipedia.org/wiki/Boltzmann_machine">Boltzmann&nbsp;Machines</a> (BMs). In fact, the few seemingly unrelated
things it currently offers (gradient descent, conjugate gradient,
backprop) are directly needed to implement the learning and fine
tuning methods for different kinds of BMs. But before venturing too
far into specifics, here is a quick glimpse at the bigger picture and
the motivations.
</p>


<p>
Most of the current learning algorithms are based on shallow
architectures: they are fundamentally incapable of basing higher level
concepts on other, learned concepts. The most prominent example of
succesful shallow learners is Support Vector Machines, for which there
is a simple <a href="http://cliki.net/cl-libsvm">CL wrapper</a> around <a href="http://www.csie.ntu.edu.tw/~cjlin/libsvm/">libsvm</a>, but that's a story for another
day.
</p>


<p>
On the other hand, deep learners are theorethically capable of
building abstraction on top of abstraction, the main hurdle in front
of their acceptance being that they don't exist or - more precisely -
we don't know how to train them.
</p>


<p>
A good example of a deep learner is the multi-layer perceptron: with
only three layers it is a <a href="http://en.wikipedia.org/wiki/Universal_approximation_theorem">universal approximator</a> which is not a
particularly difficult achievement, and the practical implications of
this result are not earth shattering: the number of required training
examples and hidden units can be very high and generalization can be
bad.
</p>


<p>
Deep architectures mimic the layered organization of the brain and, in
theory, have better abstraction, generalization capability, higher
encoding effeciency. Of course, these qualities are strongly related.
While this has been known/suspected for a long time, it was only
recently that training of deep architectures
<a href="http://www.cs.toronto.edu/~hinton/science.pdf">started&nbsp;to&nbsp;become&nbsp;feasible</a>.
</p>


<p>
Of deep learners, boltzmann machines deserve special attention as they
have demonstrated very good performance on a number of problems and
have a biologically plausible, local, <a href="http://en.wikipedia.org/wiki/Hebbian_theory">Hebbian</a> learning rule.
</p>


<p>
Now that you are sufficiently motivated, stick around and in the
<a href="http://quotenil.com/Introduction-to-MGL-(part-2).html">later&nbsp;parts</a> of this series we are going to see real examples.
</p>
    </div>
  </content>
</entry>


<entry>
  <title>Ultimate Fallout 2 Ironman Munchkin</title>
  <link rel="alternate" type="text/html" href="http://quotenil.com/Ultimate-Fallout-2-Ironman-Munchkin.html"/>
  <id>http://quotenil.com/Ultimate-Fallout-2-Ironman-Munchkin.html</id>
  <updated>2010-03-01T22:38:50+01:00</updated>
  <published>2009-11-21T00:00:00+01:00</published>
  <author>
    <name>Gábor Melis</name>
    <uri>http://quotenil.com</uri>
    <email>mega@retes.hu</email>
  </author>
  <content type="xhtml" xml:lang="en" xml:base="http://quotenil.com/">
    <div xmlns="http://www.w3.org/1999/xhtml">

<p>
I'm cleaning up the accumulated junk and found this guide that was
written eons ago.
</p>


<p>
This build is focused on survival. No save/loading, killap's final
patch, hard combat and game difficulty. As it is not only ironman but
a munchkin too it must be a sniper since HtH is a bit underpowered.
</p>


<p>
See <a href="http://faqs.ign.com/articles/777/777224p1.html">this</a> for good insights into ironman survival. The two most
important pieces of advice it has is: sneak and outdoorsman. Sneak
does not work as well for me as advertised, that is I cannot end
combat in all cases if the critter I shot did not die. Still, Sneak is
still incredibly useful so it's tagged and raised to 150% in a hurry.
Small Guns and Speech get tagged as well.
</p>


<p>
A true munchkin takes Gifted and Small frame. We rely on Sneak to keep
us from being shot at so Fast Shot would not help too much. The main
emphasis is on survival so endurance and agility must be 10. NPCs
cannot be kept alive and are not needed for most of the game, plus
conversation is ruled by Speech thus charisma is 2. Good minmaxing so
far. Luck is 8 and will be 10 after the Zeta scan. This is important
for Sniper.
</p>


<p>
This leaves us with 18 points for Strength, Perception and
Intelligence. Each point in Perception adds 8% to shooting accuracy.
However putting the same point into Intelligence gives 66 skill points
to spend by level 34. Small Guns shall go beyond 200% where the cost
of 1% is 3 skill points (it's tagged). 66 points is worth 22% to
shooting accuracy. Clearly Intelligence seems the better place.
</p>


<p>
However, Perception also determines sequence. Contrary to what <a href="http://faqs.ign.com/articles/777/777224p1.html">the link above</a>
says I find it important that enemies don't get a double turn. If
combat could be ended dependably it would be less of an issue, alas,
it cannot. This makes Perception pretty important. With the +1
obtainable by surgery 9 is a good number.
</p>


<p>
Strength allows you to carry more which is a small convenience at the
beginning. Even with Small Frame one can get away with a strength as
low as 2 especially with Sulik in the beginning and later the car. The
other, main use of strength is to avoid the 20% shooting accuracy
penalty per point under the weapon's minimum strength. At high levels
you'll have 6-7 in strength with armor and surgery. At lower levels
the penalty can be (over)compensated for by pouring skill points into
Small Guns (after reading the books of course).
</p>


<p>
What about strength vs perception? Strength is useful in the early
game before Small Guns is high enough. Perception remains useful at
the end due to its accuracy bonus. Intelligence below 6 is out of
question unless you aim for maximum efficiency at level 50.
</p>


<p>
So what is it that we optimize for? Munchkin combat power, of course.
When? Mostly at the end, say at level 34. That's 33 levels gained and
11 perks. So without further ado, pure munchkin starting stats:
</p>


<p>
<pre>
 S: 2, P: 9, E: 10, C: 2, I: 7, A: 10, L: 8
</pre>
</p>



<p>
Developed stats with armor, surgeries, shades, zeta scan, combat
implants:
</p>


<p>
<pre>
 S: 7, P: 10, E: 10, C: 2, I: 8, A: 10, L: 10
</pre>
</p>



<p>
Skill point needs:
</p>


<p>
<pre>
 93=28+26+39 Sneak (tagged): 45% -> 151%
 
 265=23+26+39+52+65+60 Small Guns (tagged): 55% -> 221%
 
 21 Speech (tagged): 19% -> 76% (+5% expert excrement expeditor, +10% enhancer)
 
 32 Doctor: ~1% -> 55% (+5% VC training, +20% doctor's bag, +10% enhancer)
 
 0 Outdoorsman: ~23% -> 110% (books, +20% motion sensor)
</pre>
</p>



<p>
That's 411 skill points. Required points in intelligence:
499/33levels/2 = 6.22. A few more points can be saved by reading Guns
and Bullets magazines (max 18). Still, there is a point in improving
Small Guns over 220%, take the gauss rifle or pistol for example which
has a 20% bonus on accuracy and try to shoot someone in the eye in
Advanced Power Armor II from 30 hexes:
</p>


<p>
<pre>
 (SmallGuns = 220) + (8 * (Perception = 10)) + (Weapon bonus = 20)
 + (Ammo AC modifier = 30)
 - 30 - (AC = 45) - (4 * (N-Hex = 30)) - (Eyes = 60) = 95%
</pre>
</p>



<p>
Perks:
</p>


<p>
<pre>
 Toughness(3)
 Lifegiver(2)
 Bonus rate of fire
 Better Criticals
 Sniper
 Bonus Move(2)
 Action Boy(2)
 Living Anatomy
</pre>
</p>



<p>
This character is hard to play in the early game. Small Guns takes
some time to develop and the -40% penalty for most guns due to the low
strength is a killer until power armor and/or the Red Ryder.
</p>


<p>
To make the early game less of a struggle Small Guns shall be raised
pretty early, after Sneak reaches 100% and you have read a few Guns
and Bullets from Klamath, the Den, Redding and Modoc. Grab as many
Scout handbooks as you can.
</p>


<p>
Useful links:
</p>


<p>
<a href="http://faqs.ign.com/articles/777/777224p1.html">http://faqs.ign.com/articles/777/777224p1.html</a>
</p>


<p>
<a href="http://user.tninet.se/~jyg699a/fallout2.html#combat">http://user.tninet.se/~jyg699a/fallout2.html#combat</a>
</p>


<p>
<a href="http://www.fanmadefallout.com/fo2-items/">http://www.fanmadefallout.com/fo2-items/</a>
</p>


<p>
<a href="http://www.fanmadefallout.com/procrit/">http://www.fanmadefallout.com/procrit/</a>
</p>


<p>
<a href="http://www.playithardcore.com/pihwiki/index.php?title=Fallout_2">http://www.playithardcore.com/pihwiki/index.php?title=Fallout_2</a>
</p>
    </div>
  </content>
</entry>


<entry>
  <title>Upgrade Woes</title>
  <link rel="alternate" type="text/html" href="http://quotenil.com/Upgrade-Woes.html"/>
  <id>http://quotenil.com/Upgrade-Woes.html</id>
  <updated>2010-03-01T22:38:50+01:00</updated>
  <published>2009-11-06T00:00:00+01:00</published>
  <author>
    <name>Gábor Melis</name>
    <uri>http://quotenil.com</uri>
    <email>mega@retes.hu</email>
  </author>
  <content type="xhtml" xml:lang="en" xml:base="http://quotenil.com/">
    <div xmlns="http://www.w3.org/1999/xhtml">

<p>
Debian Lenny was released back in February. My conservativeness only
lasts about half a year so I decided to upgrade to Squeeze aka Debian
testing. The upgrade itself went rather smoothly with a few notable
exceptions. With KDE 4.3 I should have waited more. Notes:
</p>

<ul>
<li>Who thought it a grand idea that in the default theme (Oxygen) the
  color of the panel and window title bar cannot be customized?
  Installing the desktop theme called Aya solved the panel issue while
  going back to Keramik helped with the title bar.

</li>
<li>The kmail message list is a train wreck by default with its
  multi-line entries. It took me ages to find how to turn it to back
  to classic theme (hint: it's not under `Configure KMail'), at the
  cost of no threading messages.

</li>
<li>I had customized kwin to use the Super key for all shortcuts. KDE3
  decided to call it the Win key, but hey, I understand that's where
  it's often mapped. After the upgrade my settings were wiped out.

</li>
<li>In org-mode <code>C-u C-c C-t</code> had asked for a new tag. After the upgrade
  and <code>(setq org-use-fast-todo-selection 'prefix)</code> it does so again.

</li>
<li>The X.org upgrade broke my fragile xmodmap config so I wrote
  an <a href="upload/lisp-xkb.tar.gz">xkb based config</a> instead. It's activated with: </li>
</ul>

<p>
<pre>
 xkbcomp -I$HOME/.xkb ~/.xkb/keymap/us_lisp $DISPLAY
</pre>
</p>


<ul>
<li>Upgrading to emacs23 was painless, except for blorg that needed a
  couple of hacks to get this entry out the door.</li>
</ul>
    </div>
  </content>
</entry>

</feed>
