Monday, March 01 2010 @ 00:00 +0100
For what has been a fun ride the official results are now available. In the end, 11th out of 700 is not too bad and it's the highest ranking non-C++ entry by some margin.
I entered the contest a bit late with a rather specific approach in mind: UCT, an algorithm from the Monte Carlo tree search family. It has been rather successful in Go (and in Hex too, taking the crown from Six). So with UCT in mind, to serve as a baseline I implemented a quick minimax with a simple territory based evaluation function ... that everyone else in the competition seems to have invented independently. Trouble was looming because it was doing too well: with looking ahead only one move (not even considering moves of the opponent) it played a very nice positional game. That was the first sign that constructing a good evaluation function may not be as hard for Tron as it is for Go.
But with everyone else doing minimax, the plan was to keep silent and Monte Carlo to victory. As with most plans, it didn't quite work out. First, to my dismay, some contestants were attempting to do the same and kept advertising it on #googleai, second it turned out that UCT is not suited to the game of Tron. A totally random default policy kept cornering itself in a big area faster than another player could hit the wall at the end of a long dead end. That was worrisome, but fixable. After days of experimentation I finally gave up on it deciding that Tron is simply too tactical - or not fuzzy enough, if you prefer - for MC to work really well.
Of course, it can be that the kind of default policies I tried were biased (a sure thing), misguided and suboptimal. But then again, I was not alone and watched the UCT based players struggle badly. In the final standings the highest ranking one is jmcarthur in position 105. One of them even implemented a number of different default policies and switched between them randomly with little apparent success. Which makes me think that including a virtual strategy selection move at some points in the UCT search tree should be interesting, but I digress.
So I went back to minimax, implemented Alpha-beta pruning with principal variation, and iterative deepening. It seemed to do really well on the then current maps whose size was severely reduced to 15x15 to control the load on the servers. Then I had an idea to explore how the parities of squares in an area affect the longest path possible which was quickly pointed out to me over lunch by a friend. And those pesky competitors have also found and advertised it in the contest forum. Bah.
There were only two days left at this point and I had to pull an all nighter to finally implement a graph partitioning idea of mine that unsurprisingly someone has described pretty closely in the forum. At that point, I finally had the tool to improve the evaluation function but neither much time or energy remained and I settled for using it only in the end game where the players are separated.
The code itself is as ugly as exploratory code can be, but in the coming days I'll factor the UCT and the Alpha-beta code out.