Gábor Melis' () blog - TECH


PAX World

The promise of MGL-PAX has always been that it will be easy to generate documentation for different libraries without requiring extensive markup and relying on stable urls. For example, without PAX if a docstring in the MGL library wanted to reference the matrix class MGL-MAT:MAT from the MGL-MAT library, it would need to include ugly HTML links in the markdown:

 "Returns a [some-terrible-github-link-to-html][MAT] object."

With PAX however, the uppercase symbol MAT will be automatically linked to the documentation of MAT if its whereabouts are known at documentation generation time, so the above becomes:

Read more

Recurrent Nets

I've been cleaning up and documenting MGL for quite some time now and while it's nowhere near done, a good portion of the code has been overhauled in the process. There are new additions such as the Adam optimizer and Recurrent Neural Nets. My efforts were mainly only the backprop stuff and I think the definition of feed-forward:

 (build-fnn (:class 'digit-fnn)
   (input (->input :size *n-inputs*))
   (hidden-activation (->activation input :size n-hiddens))
   (hidden (->relu hidden-activation))
   (output-activation (->activation hidden :size *n-outputs*))
   (output (->softmax-xe-loss :x output-activation)))

and recurrent nets:

Read more

INCLUDE locative for PAX

I'm getting so used to the M-. plus documentation generation hack that's MGL-PAX, that I use it for all new code which highlighted an issue of with code examples.

The problem is that [the ideally runnable] examples had to live in docstrings. Small code examples presented as verifiable transcripts within docstrings were great, but developing anything beyond a couple of forms of code in docstrings or copy-pasting them from source files to docstrings is insanity or an OOAO violation, respectively.

In response to this, PAX got the INCLUDE locative (see the linked documentation) and became its own first user at the same time. In a nutshell, the INCLUDE locative can refer to non-lisp files and sections of lisp source files which makes it easy to add code examples and external stuff to the documentation without duplication. As always, M-. works as well.


I've just committed a major feature to MGL-PAX: the ability to include code examples in docstrings. Printed output and return values are marked up with ".." and "=>", respectively.

 (values (princ :hello) (list 1 2))
 => :HELLO
 => (1 2)

The extras are:

Read more

Migration to github

Due to the bash security hole that keeps giving, I had to disable gitweb at http://quotenil.com/git/ and move all non-obsolete code over to github. This affects Six the Hex AI, the Planet Wars bot, MiCMaC, FSVD, Lassie and cl-libsvm.

Higgs Boson Machine Learning Challenge Bits and Pieces

The Higgs Boson contest on Kaggle has ended. Sticking to my word at ELS 2014, I released some code that came about during these long four months.

MGL-GPR is no longer a Genetic Programming only library because it got another Evolutionary Algorithm implementation: Differential Evolution. My original plan for this contest was to breed input features that the physicists in their insistence on comprehensibility overlooked, but it didn't work as well as I had hoped for reasons specific to this contest and also because evolutionary algorithms just do not scale to larger problem sizes.

In other news, MGL got cross-validation, bagging and stratification support in the brand new MGL-RESAMPLE package documented with MGL-PAX which all of you will most definitely want to use. My winning submission used bagged cross-validated dropout neural networks with stratified splits so this is where it's coming from.

Read more

Higgs Boson Machine Learning Challenge Post-Mortem

Actually, I'll only link to the post-mortem I wrote in the forum. There is a also a model description included in the git repo. A stand-alone distribution with all library dependencies and an x86-64 linux precompiled binary is also available.

This has been the Kaggle competition that attracted the most contestants so it feels really good to come out on top even though there was an element of luck involved due to the choice of evaluation metric and the amount of data available. The organizers did a great job explaining the physics, why there is no more data, motivating the choice of evaluation metric, and being prompt in communication in general.

I hope that the HEP guys will find this useful in their search for more evidence of tau tau decay of the Higgs boson. Note that I didn't go for the 'HEP meets ML Award' so training time is unnecessarily high (one day with a GTX Titan GPU). By switching to single precision floating point and a single neural network, training time could be reduced to about 15 minutes with an expected drop in accuracy from 3.805 to about 3.750. Even with the bagging approach the code logs out-of-bag estimates of the evaluation metric after training each constituent model and the training process can be C-c'ed early. Furthermore, the model can be run on a CPU with BLAS about 10 times slower than on a Titan.

Liblinear Support Added to cl-libsvm

In addition to the cl-libsvm asdf system, there is now another asdf system in the cl-libsvm library: cl-liblinear that, predictably enough, is a wrapper for liblinear. The API is similar to that of cl-libsvm.

Stackoverflow Post-Mortem

After almost two years without a single competition, last September I decided to enter the Stackoverflow contest on Kaggle. It was a straightforward text classification problem with extremely unbalanced classes.

Just as Bocsimackó did the last time around, his lazier sidekick Malacka (on the right) brought success. I would have loved to be lazy and still win, but the leaderboard was too close for comfort.


Read more

Hung Connections

My ISP replaced a Thomson modem with a Cisco EPC3925 modem-router to fix the speed issue I was having. The good news is that the connection operates near its advertised bandwidth, the bad news is that tcp connections started to hang. It didn't take long to find out that this particular router drops "unused" tcp connections after five minutes.

The fix recommended in the linked topic (namely sysctl'ing net.ipv4.tcp_keepalive_time & co) was mostly effective but I had to lower the keepalive to one minute to keep my ssh sessions alive. The trouble was that OfflineIMAP connections to the U.S. west coast still hanged intermittently while it could work with Gmail just fine.

In the end, OfflineIMAP had to be patched to use the keepalive and the keepalive be lowered to 15s:

Read more