TLDR of Argmin's Summary of Half of the Meehl Lectures
Tags: ai
, pompousness
, Date: 2024-05-22
Over at argmin.net, Ben Recht is reflecting on Meehl's lectures on the metatheory of science, which is about how science progresses. The original lectures are fascinating but also long as well as long-winded, and I found Ben's blog series a much better read (especially since the originals are video recordings). Still, at the time of writing, with 13 blog posts covering less than half of the lectures (5/12), no self-respecting 21st century scientist can risk the time investment (equivalent to publishing 0.25 papers in machine learning) or – even worse – getting slowed down by methodological considerations.
So, here is my TLDR for the busy professional: it's all Bayes and incentives. There is no silver bullet method, and while we do questionable things for all the wrong reasons, time will clean up any mess that we make anyway.
Expanding that for slightly longer attention spans:
Theories can be disproved but cannot be proved.
We can only accumulate evidence that supports a theory.
Evidence is subjective.
All theories are wrong, but some are useful.
The utility of a theory is its only grounding in reality.
At this point, the rest is somewhat predictable; my armchair is like any other. But if you can tolerate examples and spelling out implications, read on.
We want to run convincing experiments, but what is convincing to someone depends on their beliefs about the possible hypotheses, results and what they know about our methodology (never fully specified) and us (e.g. motivations, beliefs, funding).
We choose experiments to rule out large swaths of the hypothesis space weighted by our beliefs, which might bear little resemblance to others' beliefs.
If the hypothesis space is large and we don't have very strong beliefs, it may be that we don't even think in terms of hypotheses. Instead, we may think in terms of probability of results (as if we marginalized out the hypotheses). "Hey, these results hold to 37 decimal places! What do you think the chances of that are if our model was wrong?"
The entire process that produced a result is considered in belief updates. This includes the researcher, the machinery, the funding agency, the organization, etc.
With so many factors, there is always room for different interpretations of results. Eventually, theories die when they are no longer useful (for any purpose).
Classical formal logic has limited use in this setting. It seems to be all Bayes with a bit of decision/game theory thrown in.
In these Bayesian belief updates, perceived incentives play a prominent role: many results are downweighted because we know the twisted academic and applied research incentive structure.
I believe improving the incentives is the most important contribution one can make in today's world. Now, go and read Ben's posts.
Practitioner's Guide to Two-Tailed Averaging
Tags: ai
, Date: 2022-12-06
This is a complement to the Two-Tailed Averaging paper, approached from the direction of what I think is a fairly common technique: averaging checkpoints.
We want to speed up training and improve generalization. One way to do that is by averaging weights from optimization, and that's a big win (e.g. 1, 2, 3). For example, while training a language model for the down-stream task of summarization, we can save checkpoints periodically and average the model weights from the last 10 or so checkpoints to produce the final solution. This is pretty much what Stochastic Weight Averaging (SWA) does.
... read the rest of Practitioner's Guide to Two-Tailed Averaging.
On the Design of Matrix Libraries
Tags: ai
, lisp
, Date: 2015-02-26
UPDATE: 2020-05-03 – Things have changed the during last 5 years. This is a non-issue in Tensorflow and possibly in other frameworks, as well.
I believe there is one design decision in MGL-MAT that has far reaching consequences: to make a single matrix object capable of storing multiple representations of the same data and let operations decide which representation to use based on what's the most convenient or efficient, without having to even know about all the possible representations.
... read the rest of On the Design of Matrix Libraries.
Recurrent Nets
Tags: ai
, lisp
, Date: 2015-01-19
I've been cleaning up and documenting MGL for quite some time now, and while it's nowhere near done, a good portion of the code has been overhauled in the process. There are new additions such as the Adam optimizer and Recurrent Neural Nets. My efforts were mainly only the backprop stuff and I think the definition of feed-forward:
... read the rest of Recurrent Nets.
Higgs Boson Challenge Bits and Pieces
Tags: ai
, lisp
, Date: 2014-09-23
The Higgs Boson contest on Kaggle has ended. Sticking to my word at ELS 2014, I released some code that came about during these long four months.
... read the rest of Higgs Boson Challenge Bits and Pieces.
Higgs Boson Challenge Post-Mortem
Tags: ai
, lisp
, Date: 2014-09-23
Actually, I'll only link to the post-mortem I wrote in the forum. There is a also a model description included in the git repo. A stand-alone distribution with all library dependencies and an x86-64 linux precompiled binary is also available.
This has been the Kaggle competition that attracted the most contestants so it feels really good to come out on top although there was an element of luck involved due to the choice of evaluation metric and the amount of data available. The organizers did a great job explaining the physics, why there is no more data, motivating the choice of evaluation metric, and being prompt in communication in general.
... read the rest of Higgs Boson Challenge Post-Mortem.
Liblinear Support Added to cl-libsvm
Tags: ai
, lisp
, Date: 2013-04-09
In addition to the cl-libsvm asdf system, there is now another asdf system in the cl-libsvm library: cl-liblinear that, predictably enough, is a wrapper for liblinear. The API is similar to that of cl-libsvm.
Stackoverflow Post-Mortem
Tags: ai
, lisp
, Date: 2013-04-09
After almost two years without a single competition, last September I decided to enter the Stackoverflow contest on Kaggle. It was a straightforward text classification problem with extremely unbalanced classes.
... read the rest of Stackoverflow Post-Mortem.
Alpha–Beta
Tags: ai
, lisp
, Date: 2010-12-27
It hasn't even been a year yet since I first promised that alpha–beta snippet, and it is already added to Micmac in all its 35 line glory. The good thing about not rushing it out the door is that it saw a bit more use. For a tutorialish tic-tac-toe example see test/test-game-theory.lisp.
The logging code in the example produces
output, which is suitable for cut
and pasting into an org-mode buffer and exploring it by TAB
bing
into subtrees to answer the perpetual 'What the hell was it
thinking?!' question.
Nash Equilibrium Finder
Tags: ai
, lisp
, Date: 2010-12-26
While I seem to be unable to make my mind up on a good interface to alpha–beta with a few bells and whistles, I added a Nash equilibrium finder to Micmac, which is becoming less statistics oriented. This was one of the many things in Planet Wars that never really made it.
... read the rest of Nash Equilibrium Finder.
Planet Wars Post-Mortem
Tags: ai
, lisp
, Date: 2010-12-01
I can't believe I won.
I can't believe I won decisively at all.
The lead in the last month or so was an indicator of having good chances, but there was a huge shuffling of ranks in the last week and some last minute casualties.
... read the rest of Planet Wars Post-Mortem.
Important Update to the Planet Wars Starter Package
Tags: ai
, lisp
, Date: 2010-10-25
First, is it possible to get something as simple
as RESOLVE-BATTLE
wrong? Apparently, yes. That's what one gets for
trying to port Python code that's pretty foreign in the sense of
being far from the way I'd write it.
... read the rest of Important Update to the Planet Wars Starter Package.
Planet Wars Common Lisp Starter Package Actually Works
Tags: ai
, lisp
, Date: 2010-09-21
Released
v0.6 (git,
latest
tarball).
The way the server compiles lisp submissions was fixed, and this
revealed a problem where MyBot.lisp redirected *STANDARD-OUTPUT*
to *ERROR-OUTPUT*
causing the server to think compilation failed.
Planet Wars Common Lisp Starter Package
Tags: ai
, lisp
, Date: 2010-09-19
The Google AI Challenge is back with a new game that's supposed to be much harder than Tron was this spring. The branching factor of the game tree is enormous, which only means that straight minimax is out of question this time around. Whether some cleverness can bring the game within reach of conventional algorithms remains to be seen.
... read the rest of Planet Wars Common Lisp Starter Package.
UCT
Tags: ai
, lisp
, Date: 2010-03-19
As promised, my UCT
implementation is released, albeit somewhat belatedly. It's in
Micmac v0.0.1, see test/test-uct.lisp
for an example. Now I only owe you alpha–beta.
Google AI Challenge 2010 Results
Tags: ai
, lisp
, Date: 2010-03-01
For what has been a fun ride, the official results are now available. In the end, 11th out of 700 is not too bad and it's the highest ranking non-C++ entry by some margin.
... read the rest of Google AI Challenge 2010 Results.
Google AI Challenge 2010
Tags: ai
, lisp
, Date: 2010-02-11
Tron is a fun little game of boxing out the opponent and avoiding crashing into a wall first. The rules are simple, so the barrier to entry into this contest is low. Thanks to aeruiqe, who made the Common Lisp starter pack, it took as little as a few hours to get a very bare-bones algorithm going. It's doing surprisingly well: it is number 23 on the leaderboard at the moment with 43 wins, 2 losses and 9 draws.
Micmac Initial Release
Tags: ai
, lisp
, Date: 2010-02-06
From a failed experiment today, I salvaged Micmac, a statistical library wannabe, which for now only has Metropolis-Hastings MCMC and Metropolis Coupled MCMC implemented. The code doesn't weigh much, but I think it gets the API right. In other news MGL v0.0.6 was released.
Deep Boltzmann Machine on MNIST
Tags: ai
, lisp
, Date: 2010-01-18
Let me interrupt the flow of the MGL introduction series with a short report on what I learnt playing with Deep Boltzmann Machines. First, lots of thanks to Ruslan Salakhutdinov, then at University of Toronto now at MIT, for making the Matlab source code for the MNIST digit classification problem available.
... read the rest of Deep Boltzmann Machine on MNIST.
Introduction to MGL (part 3)
Tags: ai
, lisp
, Date: 2009-12-29
UPDATE: This post out of date with regards to current MGL. Please refer to the documentation instead.
In Introduction to MGL (part 2), we went through a trivial example of a backprop network. I said before that the main focus is on Boltzmann Machines so let's kill the suspense here and now by cutting straight to the heart of the matter.
... read the rest of Introduction to MGL (part 3).
Introduction to MGL (part 2)
Tags: ai
, lisp
, Date: 2009-12-17
UPDATE: This post out of date with regards to current MGL. Please refer to the documentation instead.
After Introduction to MGL (part 1), today we are going to walk through a small example and touch on the main concepts related to learning within this library.
... read the rest of Introduction to MGL (part 2).
Introduction to MGL (part 1)
Tags: ai
, lisp
, Date: 2009-12-02
This is going to be the start of an introduction series on the MGL Common Lisp machine learning library. MGL focuses mainly on Boltzmann Machines (BMs). In fact, the few seemingly unrelated things it currently offers (gradient descent, conjugate gradient, backprop) are directly needed to implement the learning and fine tuning methods for different kinds of BMs. But before venturing too far into specifics, here is a quick glimpse at the bigger picture and the motivations.
... read the rest of Introduction to MGL (part 1).
Active Learning for cl-libsvm
Tags: ai
, lisp
, Date: 2009-06-22
Along the lines of active learning with python &
libsvm,
I added support for
calculating distance of a point from the separating hyperplane to
cl-libsvm. In binary classification,
there is only one SVM involved and one hyperplane. However, with
N-class problems, there is a binary SVM for each of the $N(N-1)/2$
pairs of classes, and there are as many separating hyperplanes,
something the linked python code fails to take into account. As per
the libsvm
FAQ, the
absolute value of the decision value (see PREDICT-VALUES
, wrapper
of svm_predict_values
) divided by the norm of the normal vector of
the separating hyperplane is the distance. PREDICT-VALUES
and
MODEL-W2S
are sufficient to calculate it. Note that among the
distributed binaries only the linux-x86 version has been recompiled
with the necessary changes, but patched sources are also included
for your recompiling pleasure.
2008 Computer Games Olympiad
Tags: ai
, Date: 2008-12-11
It seems that the competition has not been standing still (as opposed to Six), and this year marks the end of the golden era. Congratulations to both Wolve and MoHex, who beat Six! Thanks to Ryan Hayward, who again, kindly registered Six for the Olympiad.
About the future, I don't really plan on resuming work on Hex in general (and Six in particular) although losing does irk me a bit.