Gábor Melis' () blog - September 2014

Archive

Older posts

Migration to github

Due to the bash security hole that keeps giving, I had to disable gitweb at http://quotenil.com/git/ and move all non-obsolete code over to github. This affects Six the Hex AI, the Planet Wars bot, MiCMaC, FSVD, Lassie and cl-libsvm.

Higgs Boson Machine Learning Challenge Bits and Pieces

The Higgs Boson contest on Kaggle has ended. Sticking to my word at ELS 2014, I released some code that came about during these long four months.

MGL-GPR is no longer a Genetic Programming only library because it got another Evolutionary Algorithm implementation: Differential Evolution. My original plan for this contest was to breed input features that the physicists in their insistence on comprehensibility overlooked, but it didn't work as well as I had hoped for reasons specific to this contest and also because evolutionary algorithms just do not scale to larger problem sizes.

In other news, MGL got cross-validation, bagging and stratification support in the brand new MGL-RESAMPLE package documented with MGL-PAX which all of you will most definitely want to use. My winning submission used bagged cross-validated dropout neural networks with stratified splits so this is where it's coming from.

Read more

Higgs Boson Machine Learning Challenge Post-Mortem

Actually, I'll only link to the post-mortem I wrote in the forum. There is a also a model description included in the git repo. A stand-alone distribution with all library dependencies and an x86-64 linux precompiled binary is also available.

This has been the Kaggle competition that attracted the most contestants so it feels really good to come out on top even though there was an element of luck involved due to the choice of evaluation metric and the amount of data available. The organizers did a great job explaining the physics, why there is no more data, motivating the choice of evaluation metric, and being prompt in communication in general.

I hope that the HEP guys will find this useful in their search for more evidence of tau tau decay of the Higgs boson. Note that I didn't go for the 'HEP meets ML Award' so training time is unnecessarily high (one day with a GTX Titan GPU). By switching to single precision floating point and a single neural network, training time could be reduced to about 15 minutes with an expected drop in accuracy from 3.805 to about 3.750. Even with the bagging approach the code logs out-of-bag estimates of the evaluation metric after training each constituent model and the training process can be C-c'ed early. Furthermore, the model can be run on a CPU with BLAS about 10 times slower than on a Titan.