Gábor Melis' () blog - TECH

Archive

Higgs Boson Machine Learning Challenge Bits and Pieces

The Higgs Boson contest on Kaggle has ended. Sticking to my word at ELS 2014, I released some code that came about during these long four months.

MGL-GPR is no longer a Genetic Programming only library because it got an another Evolutionary Algorithm implementation: Differential Evolution. My original plan for this contest was to breed input features that the physicists in their insistence on comprehensibility overlooked, but it didn't work as well as I had hoped for reasons specific to this specific contest and also because evolutionary algorithms just do not scale to larger problem sizes.

In other news, MGL got cross-validation, bagging and stratification support in the brand new MGL-RESAMPLE package documented with MGL-PAX which all of you will most definitely want to use. My winning submission used bagged cross-validated dropout neural networks with stratified splits so this is where it's coming from.

Read more

Higgs Boson Machine Learning Challenge Post-Mortem

Actually, I'll only link to the post-mortem I wrote in the forum. There is a also a model description included in the git repo. A stand-alone distribution with all library dependencies and an x86-64 linux precompiled binary is also available.

This has been the Kaggle competition that attracted the most contestants so it feels really good to come out on top even though there was an element of luck involved due to the choice of evaluation metric and the amount of data available. The organizers did a great job explaining the physics, why there is no more data, motivating the choice of evaluation metric, and being prompt in communication in general.

I hope that the HEP guys will find this useful in their search for more evidence of tau tau decay of the Higgs boson. Note that I didn't go for the 'HEP meets ML Award' so training time is unnecessarily high (one day with a GTX Titan GPU). By switching to single precision floating point and a single neural network, training time could be reduced to about 15 minutes with an expected drop in accuracy from 3.805 to about 3.750. Even with the bagging approach the code logs out-of-bag estimates of the evaluation metric after training each constituent model and the training process can be C-c'ed early. Furthermore, the model can be run on a CPU with BLAS about 10 times slower than on a Titan.

Liblinear Support Added to cl-libsvm

In addition to the cl-libsvm asdf system, there is now another asdf system in the cl-libsvm library: cl-liblinear that, predictably enough, is a wrapper for liblinear. The API is similar to that of cl-libsvm.

Stackoverflow Post-Mortem

After almost two years without a single competition, last September I decided to enter the Stackoverflow contest on Kaggle. It was a straightforward text classification problem with extremely unbalanced classes.

Just as Bocsimackó did the last time around, his lazier sidekick Malacka (on the right) brought success. I would have loved to be lazy and still win, but the leaderboard was too close for comfort.

Overview

Read more

Hung Connections

My ISP replaced a Thomson modem with a Cisco EPC3925 modem-router to fix the speed issue I was having. The good news is that the connection operates near its advertised bandwidth, the bad news is that tcp connections started to hang. It didn't take long to find out that this particular router drops "unused" tcp connections after five minutes.

The fix recommended in the linked topic (namely sysctl'ing net.ipv4.tcp_keepalive_time & co) was mostly effective but I had to lower the keepalive to one minute to keep my ssh sessions alive. The trouble was that OfflineIMAP connections to the U.S. west coast still hanged intermittently while it could work with Gmail just fine.

In the end, OfflineIMAP had to be patched to use the keepalive and the keepalive be lowered to 15s:

Read more

OfflineIMAP with Encrypted Authinfo

I've moved to an OfflineIMAP + Gnus setup that's outlined at various places. Gnus can be configured to use ~/.authinfo as a netrc style of file to read passwords from and can easily use encrypted authinfo files as well. Offlineimap, on the other hand, offers no such support and passwords to the local and remote imap accounts are normally stored in clear text in .offlineimaprc.

For the local account this can be overcome by not running a dovecot server but making offlineimap spawn a dovecot process when needed:

 [Repository LocalGmail]
 type = IMAP
 preauthtunnel = /usr/sbin/dovecot -c ~/.dovecot.conf --exec-mail imap

Read more

Alpha-beta

It hasn't been a year yet since I first promised that alpha-beta snippet and it is already added to micmac in all its 35 line glory. The good thing about not rushing it out the door is that it saw more a bit more use. For a tutorialish tic-tac-toe example see test/test-game-theory.lisp.

The logging code in the example produces output suitable for cut and pasting into an org-mode buffer and exploring it by TABbing into subtrees to answer the perpetual 'What the hell was it thinking?!' question.

Nash equilibrium finder

While I seem to be unable to make my mind up on a good interface to alpha-beta with a few bells and whistles, I added a Nash equilibrium finder to Micmac that's becoming less statistics oriented. This was one of the many things in Planet Wars that never really made it.

Let's consider the Matching pennies game. The row player wins iff the two pennies show the same side. The payoff matrix is:

 |       | Heads | Tails |
 +-------+-------+-------+
 | Heads |     1 |    -1 |
 | Tails |    -1 |     1 |

Read more

Planet Wars Post-Mortem

I can't believe I won.

I can't believe I won decisively at all.

The lead in the last month or so was an indicator of having good chances, but there was a huge shuffling of ranks in the last week and some last minute casualties.

Read more

Important Update to the Planet Wars Starter Package

First, is it possible to get something as simple as RESOLVE-BATTLE wrong? Apparently, yes. That's what one gets for trying to port Python code that's pretty foreign in the sense of being far from the way I'd write it.

More importantly, I found out the hard way that sbcl 1.0.11 that's still on the official servers has a number of bugs in its timer implementation making WITH-TIMEOUT unreliable. Also, it can trigger timeouts recursively eventually exceeding the maximum interrupt nesting depth. Well, "found out" is not the right way to put it as we did fix most of these bugs ages ago.

In the new starter package (v0.8 in git, latest tarball), you'll find timer.lisp that's simply backported almost verbatim from sbcl 1.0.41 to sbcl 1.0.11. Seems to work for me, but I also had to lower the timeout to 0.8 from 0.98 because the main server is extremely slow.

Read more