Before reading this post it would help to understand the rules of go. Hugo-a-go-go follows (an approximation of) the chinese rules. Due to the limited time we don’t yet check for ko and don’t even attempt to detect the end of the game. The code at the moment is incredibly messy and probably very buggy (the version we submitted seems to actually try to lose) so treat it with suspicion.
The best place to start is with the board representation. The most expensive operation for the AI is detecting suicide and death. To make this fast we track connected strings of pieces.
colour is one of
:grey (for the border) or
liberties tracks the number of pseudo-liberties the string has (for black or white strings; for empty or grey strings the
liberties value is never read and exists just to avoid having to branch on the colour).
The board is represented by a 1d array of pointers to strings (this representation is inspired by gnugo rather than pushkin) and a pointer to the empty string (which we use for fast
1 2 3 4 5 6 7 8
To create a board we just have to setup the empty-string and border-string.
1 2 3 4 5 6 7 8 9 10 11 12
A given move is not suicide if, after the move is made, there is at least one neighbour which is either:
the same colour and has more than zero liberties
the opposite colour and has zero liberties (ie would die if the move was carried through)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
Actually making a move is similar but we have to clear out dead strings and join adjacent strings together. Proving that it’s safe to do all this in a single pass is straightforward, if tedious.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
Go branches far too much to exhaustively check all possible futures. Instead we use a heuristic measure of the value of a move – the Monte Carlo estimate of the expected score when both players choose from the set of valid moves uniformly at random. To put it simply, we run large numbers of random games from this board position and take the mean score as our measure of how strong this board position is. Since we don’t have a test for the end of the game we just run until either 100 moves have been made or until both sides have no valid moves remaining.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41
You may notice that the above code actually only runs until one side has no moves – this is the first of many bugs.
The scoring and random-move code was a huge bottleneck so at the last minute I ‘optimised’ it by changing it to:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
I think it is these two changes that are largely responsible for the submitted version playing so poorly – it doesn’t check for eyes in the random playouts, doesn’t allow the other player to keep killing strings when the ai player has no moves and doesn’t count eyes in the final score. This explains why it likes to tightly pack pieces against the edge of the board.
While the monte-carlo estimate gives us a reasonable heuristic for move strength it doesn’t re-use any information between passes. With such a large move space we need to explore more intelligently. The UCT algorithm treats move-selection like a multi-armed bandit problem.
We build a tree of moves where each node in the tree tracks not just the estimated score for all its child nodes but also the upper bound of a confidence interval on that estimate.
colour is the colour making the move at this node.
pos is the position at which it is moving.
nodes is a list of child nodes for which we have estimates.
valids is a list of valid moves which have not yet been converted into nodes.
sum track the mean score for all the children in
On each iteration we pick a path through the tree, choosing some explore/exploit tradeoff using the upper confidence bounds. Given the limited time we had, I decided to just copy a scoring function from a paper without stopping to understand it, so I don’t actually know what explore/exploit tradeoff we are making :S
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
On reaching a leaf we extend it by one more move, estimate the value of that move using monte-carlo simulations and then propagate the value back up the path to the top of tree.
1 2 3 4 5 6 7 8 9 10 11 12 13
Finally, the ai chooses its move by running a number of iterations of this algorithm and returning the value of
best-child at the root (this is probably wrong – at this point we should just exploit, not explore).
1 2 3 4 5 6
Together we spent around 20 man-hours on the competition. I spent the first two thirds of the competition just getting the board representation to work correctly. Part of the delay was that after moving to a cljs-only implementation the feedback loop was much slower. I wasted an hour or two tring to get brepl working without any success and after that had to rely on print statements and pre-compiled test cases. Finding errors in cljs also leaves a lot to be desired (eg a typo in a field name resulted in an
undefined value which, several functions later, became a
NaN which then behaves interestingly inside max/min). I only started on the UCT code an hour or two before the deadline. Tom started on the user input around the same time. We played our first game against the ai about five minutes before the deadline and frantically submitted whatever code we had running.
If we were taking it more seriously we certainly could have done a lot more to prepare – being familiar with the cljs dev tools, actually learning the rules of go, sketching out the board representation and the UCT implementation before the weekend started, not walking a marathon on the same weekend. But winning was not the goal and instead we had a lot of fun and excitement seeing just how much we can hack together in such a short space of time.
Our AI is definitely not correct so it’s difficult to evaluate the project yet. The code is relatively short and simple (especially compared to eg gnugo) but that doesn’t mean much until it actually works. The performance is promising – the current version can simulate around 5k games per second in chrome. Fixing the monte-carlo step and the scoring will eat into that performance but I’ve already spotted plenty of inefficiencies in other places. We haven’t even started experimenting with vertigo or asm.js yet so there is certainly lots of headroom.
I am definitely hoping to come back to this project. To echo Zach Tellman’s motivation, it will be really interesting to see if it is possible to write a competitive go AI in a high-level language. We’ve also thought about distributing the UCT step and have team games pitching the aggregated wisdom of a group of human players voting on their next move against the assembled computing power of their browsing machines.