Tuesday, September 23 2014 @ 00:00 +0200
Actually, I'll only link to the post-mortem I wrote in the forum. There is a also a model description included in the git repo. A stand-alone distribution with all library dependencies and an x86-64 linux precompiled binary is also available.
This has been the Kaggle competition that attracted the most contestants so it feels really good to come out on top even though there was an element of luck involved due to the choice of evaluation metric and the amount of data available. The organizers did a great job explaining the physics, why there is no more data, motivating the choice of evaluation metric, and being prompt in communication in general.
I hope that the HEP guys will find this useful in their search for more evidence of tau tau decay of the Higgs boson. Note that I didn't go for the 'HEP meets ML Award' so training time is unnecessarily high (one day with a GTX Titan GPU). By switching to single precision floating point and a single neural network, training time could be reduced to about 15 minutes with an expected drop in accuracy from 3.805 to about 3.750. Even with the bagging approach the code logs out-of-bag estimates of the evaluation metric after training each constituent model and the training process can be C-c'ed early. Furthermore, the model can be run on a CPU with BLAS about 10 times slower than on a Titan.