Wednesday, December 01 2010 @ 00:00 +0100
I can't believe I won.
I can't believe I won decisively at all.
The lead in the last month or so was an indicator of having good chances, but there was a huge shuffling of ranks in the last week and some last minute casualties.
I had promised myself not to enter this one and resisted for about two weeks when my defenses were worn away and I was drawn into the fray.
The game didn't look very exciting at first. I thought that the bots would soon reach a point of near perfect tactics and the rock-paper-scissors scenarios would dominate (more on this later).
That's enough of tribute, let's steer off the trodden path.
Driven by the first virtue of programmers I was going to approach the game in a non-labor-intensive fashion leaving most of the hard work to the machine. The second virtue was kept in check for a week while I was working out how to do that exactly. In the meantime, the third spurred me to take aerique's starter pack and to make myself comfortable with minor modifications to it.
As with tron, UCT was on my mind. However, I was keenly aware of failing to make it work acceptably last time. No matter how cool UCT was, it was hard to miss one important similarity to tron: the fitness function is very jagged, one ship more or less can make all the difference. Clearly, a naive random policy was not going to cut it.
Another problem was the practically unlimited branching factor. Without a similarity function over moves it was hopeless to explore a meaningful portion of the game tree.
At this point I had to start getting my hands dirty. The first thing
was to implement simulating the future (see
FUTURE class) that was
trivial except I screwed battle resolution up and for the longest time
it was holding results back. Think of a future as a vector of owner
and ship count over turns.
By watching some games it became apparent that multi-planet, synchronized attacks are the way to go. The implementation operates on step targets, steps and moves.
A step is a set of orders from the same player targeting the same planet. The constituent orders need not be for the same turn, neither do they need to arrive on the same turn.
A move is a set of orders from the same player without any restriction. That includes future orders too.
Move generation first computes so called step targets. A step target is a ship count vector over turns representing the desired arrivals. The desired arrivals are simply minimal reinforcements for defense and invasion forces for attack.
For each step target a number of steps can be found that produce the desired arrivals. In the current implementation there is a single step generated for a step target.
For a while my bot could only make moves that consisted of a single step, but it quickly became the limiting factor and strength testing of modifications was impossible.
Combining steps into moves turned out to be easy. Not all combinations are valid, but the number of combinations can be huge. To limit the number of moves generated we first evaluate steps one by one, sort them in descending order of evaluation score and try to combine them starting from the first.
Normally futures are calculated taking into account fleets already in
flight in the observable game state that the engine sends. Back when I
was still walking up and down instead of typing away furiously it
occurred to me that if for all planets of player 1 player 2 cannot
take that planet if both players sent all ships to it then player 2
cannot take any planet of player 1 even if he's allowed to attack
multiple planets in any pattern. Clearly, this breaks down at the
edges (simultaneous moves), but it was a useful idea that gave birth
FULL-ATTACK-FUTURE class. The intention was to base position
evaluation on the sum of scores of individual full attack futures (one
Now the problem with full attack future is that sending all available
ships away from a planet can invalidate some orders scheduled from
that planet for the future. Enter the concept (and class) of
The surplus of player P at planet A at time t is the number of ships that can be sent away on that turn from the defending army without:
- making any scheduled order from planet A invalid
- causing the planet to be lost anytime after that (observing only the fleets already in space)
- bringing an imminent loss closer in time
As soon as the full attack based position evaluation function was operational results started to come. But there was a crucial off-by-one bug.
That bug was in the scoring of futures. For player 1 it used the possible arrivals (number of ships) one turn before those of player 2. I made several attempts at fixing it but each time playing strength dropped like a stone.
Finally, a principled solution emerged: when computing the full attack
future from the surpluses constrain the turn of departures. That is,
to roughly duplicate the effect of the off-by-one bug, one could say
that surpluses of player 1 may not leave the planet before turn 1
(turn 0 is current game state) (see
MIN-TURN-TO-DEPART-1 in the
code). This provided a knob to play with. Using 1 for
MIN-TURN-TO-DEPART-1 made the bot actually prefer moves to just
sitting idly, using 2 made it prefer moves that needed no
reinforcement on the next turn.
I believe this is the most important one character change I made so
this gets its own paragraph. Using 2 as
the bot tend towards situations in which the rock-paper-scissors
nature of the game is suppressed. The same bot with 1 beats the one
with 2, but as was often the case, on tcp the results were just the
opposite. By a big margin. TCP is dhartmei's unofficial server is
where most useful testing took place.
Constraints were added for arrivals too (see
that eased scoring planets that started out neutral but were
non-neutral at the horizon by making the evaluation function sniping
Sniping is when one player takes a neutral losing ships in the process and the opponent comes - typically on the next turn - and takes it away. This game is a nice illustration of the concept.
As pointed out by iouri in his post-mortem, redistribution of ships is a major factor. The machinery described so far lends itself to easy implementation of redistribution.
When scoring a full attack future the scoring function gives a very slight positional penalty every simulated turn for every enemy ship. This has the effect of preferring positions where the friendly ships are near the enemy, and positions of influence with multiple enemy planets being threatened.
The move generator was modified to generate steps to each friendly planet from each friendly planet on turn 0 sending all the surplus at that turn. This scheme is rather restrictive, the more flexible solutions had mixed results.
There is a knob, of course, to control how aggressively ships are
redistributed. It's called
POSITIONAL-MIN-TURN-TO-DEPART-1. As its
name implies it's like
MIN-TURN-TO-DEPART-1 but used only when
computing the positional penalty.
How far ahead the bot looks has a very strong effect on its play: too far and it will be blind to tactics, too close and it will miss capturing higher cost neutrals.
Horizon was constant 30 for quite some time. I wanted to raise it but couldn't without seriously hurting close range fighting ability. After much experimentation with a slightly complicated mechanism the horizon was set so that the three earliest breakeven turns of safe to take neutrals are included. A neutral is deemed safe to take if from the initial investment until the breakeven point no friendly planet can be possibly lost in a full attack future.
There are - especially at the very beginning of games - situations were there is no best move, it all depends on what the opponent plays on the same turn.
If one has a number of candidate moves for each player and the score for any pair of them the optimal mixed strategy can be computed that's just a probability assigned to each move.
I tried and tried to make it work but it kept making mistakes that looked easy to exploit and although it did beat 1 ply minimax about 2 to 1 it was too slow to experiment with.
Yes, for the longest time it was a 1 ply search. Opponent moves were never considered and position evaluation was good enough to pick up the slack.
However, there was a problem. The evaluation function did not score planets that were neutral at the end of the normal future, because doing so made the bot just sit there doing nothing, getting high scores for all planets that could be conquered but when it tried to make a move it realized that it can conquer only one. Such is the nature of full attack based evaluation function, it was designed with complete disregard for neutrals.
The late change to the map generator increased the number of planets at an equal distance from the players and emphasized the rock-paper-scissors nature further. Some bots didn't like it, some took this turn of events better. Before this point my bot had a very comfortable lead on the official leaderboard which was greatly reduced.
With the failure of the nash experiment I resurrected previously unsuccessful alpha-beta code in hopes of that considering opponent moves will show the bot the error of it ways and force it to not leave valuable central planets uncovered.
It's tricky to make alpha-beta work with moves that consist of orders at arbitrary times in the future. I had all kinds of funky, correct and less correct ways to execute orders at different depths of the search. In the end what prevailed was the most simple-minded, incorrect variant that simply scheduled all orders that made up the move (yes, even the future ones) and fixed things up when computing the future so that ship counts stayed non-negative and sending enemy ships did not occur.
In local test against older versions of my bot a two ply alpha-beta bot showed very promising results but when it was tested on tcp it fell way short of the expectations and performed worse than the one ply bot. It seemed particularly vulnerable to a number of bots. In retrospect, I think this was because their move generator was sufficiently different that my bot was just blind to a good range of real possibilities.
In the end, I settled for using four ply alpha-beta for the opening phase (until the third planet was captured). This allowed the bot to outwait opponents when needed and win most openings. After the final submission I realized that maybe I was trying to push things the wrong way and even three planets is too many. With six hours left until the deadline in a test against binaries of a few fellow competitors the two planet limit seemed to perform markedly better, but it was too late to properly test it against a bigger population.
Like many fellow contestants I am very happy that the contest is over and I got my life back. I'm sure that many families breathed a collective sigh of relief. But if I were to continue I'd try rethinking the move generator, because that may just be the thing that holds alpha-beta back and maybe nash too.
Dissapointingly, there was no learning, adapting to opponent behaviour, etc. All that made it to the todo list, but had to take second seat to more pressing concerns.
Ah, yes. One more thing. Bocsimackó (pronounced roughly as bo-chee-mats-ko), after whom the bot was named, is the handsome hero of a children's book, pictured on the left: