Is Google’s Go-bot a Breakthrough for Artificial Intelligence?

If you know Betteridge’s Law of Headlines, you immediately know the answer is “no.” The answer, much like kids, is not really the interesting part however; what is interesting is the process of getting there.

Go, like chess and most other games, has been conceptually solved for many years. So many that I as a freshman at university 16 years ago was talking about how a fairly simple distributed effort like the then-popular distributed.net, SETI@Home and Folding@Home could easily kill chess. I went on to instead drink beers and shortly thereafter IBM pretty much did with Deep Blue (along with predecessor Deep Thought).

Solving a game is conceptually very simple. So simple, really, that let’s do exactly that. Go and chess are still too complex to solve using simple means, so let’s instead go with a simpler game: tic-tac-toe. Tic-tac-toe should be known to most and is about two players marking fields on a grid; the first to make a line wins. The players can be designated cross and naught or as we’ll do here red and blue. That’s just practical because apparently people who don’t deal with graph algorithms find it weird when I talk about the letter K or “cross” as a color, much like they seem to roll their eyes when I talk about uppercase and lowercase red. People are weird. A situation in the game could look like this:

Screenshot 2016-03-25 11.05.08

If the next move belongs to red, they may place a token as such and win:

Screenshot 2016-03-25 11.06.56

The rules are simple: the players alternate picking a blank field and placing their token on that field. The game ends when there’s no spots left or one player has three tokens in a row, thereby winning the game. The rules are so simple, in fact, that we can model them simply using a computer:

Screenshot 2016-03-25 11.37.18

We use a simple colored Petri net made in CPN Tools to model the game. At the center we have the board; it contains 9 fields modeled by their position (e.g., (1, 2)) and state (currently all are empty (E). At the top left we have player behavior. We keep track of the Next player (currently red). Either the Red or Blue player can make a move. A move is performed by picking any field (f) that is empty (E) and instead marking it as their field (P(red) or P(blue)). We can see the next move can be performed by red as Red has a green halo. If red places their token in the top-left field, we get the state:

Screenshot 2016-03-25 11.40.49

The Board has been updated and it is now blue’s turn. The game continues and we get the state from the top image:

Screenshot 2016-03-25 11.43.32

Finally red deals the finishing blow after what has been a really dumb game:

Screenshot 2016-03-25 11.44.36

Now, the Win action at the bottom right is enabled. It picks 3 fields at random and checks that the fields are marked by the same player and constitute a winning move. There’s many ways of doing this; the way here is done for readability and is very inefficient. Ignoring that, we just record the Winner and clear the board so no more moves can be made:

Screenshot 2016-03-25 11.46.47

It’s is important that we have just modeled the rules. The players have no intelligence. If you do a play-thru to get to the situation above, you will see that both players would have to make really dumb moves to get to the situation. Except, the moves proved not to be dumb for red as they won.

If you want to play around with the model, get CPN Tools from here and the model here.

The thing is that we can now use this model to have the computer play out all possible games. It’s not even difficult; it took my computer just under 6 minutes for this inefficient model on a 5 years old laptop.

Solving the game consists og making a graph that contains all states of the game as nodes and all possible moves as arcs. For a simplified version fo the game with just a 2 * 2 grid, the state space looks like this:

Screenshot 2016-03-25 12.13.50

The game starts at the left; first we have 4 possible red moves corresponding to red being able to pick any of the initial 4 empty fields. We have only expanded the one, and see blue now has 3 possible moves. After each blue move, red has two possible moves. At the very right, we see the only possible final state with red as winner. This means that for this initial move, regardless of what blue does as their move, the only possible outcome is that red wins. This is actually the case for the other 3 possible initial moves for red, so this game is solved and the solution is: no matter what any of the players do, red will win.

The reason I didn’t skip directly to regular 3 by 3 tic-tac-toe is that it is a bit too large to display in full. The simplified game has a graph 32 nodes, while the full game has 5.480 nodes. It nothing for a computer, but a bit much for a human. Here’s an extract of what happens if red starts by putting a token at (3, 3):

Screenshot 2016-03-25 20.05.31

Even that is too big to get much information out of it, but we can see the general structure. At the top right corner, we see the start of the game. Then we alternate between red and blue moves. A filled node indicates that one player has won. If you like looking at very large graphs, you can get the entire solution to tic-tac-toe right here.

Let us take just one closer look at a part of the graph before we move on; this is the lower left corner of the extract above:

Screenshot 2016-03-25 20.06.49

We see that N5412 at the bottom is winning for red. That actually means that N5032, N5126, N5125, N5127 are also winning for red. From each of those, red can make a single move leading to a node where it has won. N4155 is not winning for red. Even though, it is possible to make a blue move to N5125, which is winning, it is also possible to make a blue move to N5137, which is winning for blue. That in fact makes N5125 winning for blue. The state shown on the tic-tac-toe board at the top is winning because by making a particular red move, red can win.

It is possible to extend this to a full algorithm (a state is winning for red if it is possible to make a red move to a state that is winning for red or if all blue moved lead to states that are winning for red), which can be computed efficiently using such a graph. This corresponds to checking a particular instance of the modal logic Computation Tree Logic. If we fill in all nodes like that, we always just have to make a choice that will leads us to (or failing that towards) a winning state. There’s a bit more to it – a state might not be winning for either player, and we can introduce various states in-between and prioritize which ones we want to move towards – but this should suffice for now.

This method is entirely generic and can be applied to many games. You might notice in the graphs above that, e.g., node N127 at the bottom right has multiple nodes leading into it, such as N4128 and N4126. This reflects that it doesn’t matter if blue, e.g., puts a token in (1, 1) and then (2, 2) or the other way around. This kind of information sharing makes computation more efficient and also allows us to handle some games that are not necessarily finite (like chess without the rules about remis).

The problem is one of size. For tic-tac-toe, we got 32 and 5.480 nodes for the solutions. How big would a solution be for chess or go? Well, we can’t know that without computing it, but we can make an estimate. To illustrate, let us assume we didn’t know the results for tic-tac-toe and make an estimate for them (I did in fact make that very estimate before making the computation to ensure it wouldn’t take too long).

In tic-tac-toe, each field can have 3 different values: empty, red token, or blue token. The entire board can therefore be in 3 * 3 * 3 * 3 = 81 or 3⁴ for 2 * 2 tic-tac-toe. For 3 * # tic-tac-toe, the answer is 3⁹ = 19.683. There’s of course configuration that are not possible (such as an all-blue board or a board where both players have won – think about that last one), which is why the real result is a bit lower at 32 and 5.480.

For chess there’s 6 pieces on each side + an empty sport, so an upper bound would be 13⁶⁴ = 196.053.476.430.761.073.330.659.760.423.566.015.424.403.280.004.115.787.589.590.963.842.248.961 or roughly 2 followed by 71 zeros. There’s bit more complexity such as whether there’s been a rotation and such, which makes the number larger, but also simplifications such as the impossibility of having a board with 64 white kings. We can improve the estimate, but we’ll still arrive at a number that’s very, very large.

For go, we have a board of 19 * 19 with 3 possible values, for a total of 3^19*19 = 17.408.965.065.903.192.790.718.823.807.056.436.794.660.272.495.026.354.119.482.811.870.680.105.167.618.464.984.116.279.288.988.714.938.612.096.988.816.320.780.613.754.987.181.355.093.129.514.803.369.660.572.893.075.468.180.597.603 or roughly 2 followed by 172 zeros. It is more than one googol times as complex as chess.

One estimate of the number of atoms in the universe is 10⁸² or 1 followed by 82 zeros. We see that the number of states for chess is worryingly close to that and the number of states for go vastly exceeds the number of atoms in the universe, so we could never hope to solve it using the naive approach above.

For our tic-tac-toe example, one thing we notice is that the board is symmetrical: we can rotate and mirror it. That way, our simplified 2 * 2 tic-tac-toe actually only has one opening move (the others can be reached by rotation), and there is only two possible follow-ups (block linearly or diagonally), each with just two responses. This reduces the number of nodes from 32 down to 9 nodes. We can do similar reductions for the full 3 * 3 game; for example, there’s only really 3 initial moves (corner, center, or between two corners). This doesn’t help much for chess, but does reduce go. It’s not enough, though.

Next, we can just not compute everything. Just compute “some” moves ahead; no reason to store moves after a corner initial move if we’re going to start in the center in tic-tac-toe. Modern computers can evaluate millions of situations a second so it can compute several moves ahead. The hard pard is the middle of the game where there’s a lot of positions and every move discards a lot of possible positions. We only compute that when needed and perhaps store a database with known good starting and ending moves (there’s fewer possible moves at the beginning and end, so it is viable to store all of those – compare with the full graph for tic-tac-toe to see the same tendency).

No reason to consider what happens if the opponent makes a stupid mistake, because they likely won’t (and if they do, we can probably beat them easily). The big thing to succeed here is how we recognize which positions look good and which look bad; we can then focus our attention on computing moves for good-looking situations.

And that’s how almost every chess computer works: provide databases of starting and ending moves. Use a fast computer to evaluate millions of positions a second for the middle part of the game, use with an intelligent evaluation function to focus the effort on promising positions and rule out stupid ones. That’s also how go computers work, and pretty much any non-cheating computer playing simple games.

What makes go hard is that on an 19 * 19 board, there’s 361 starting moves and 360 moves after that, requiring almost 50 million evaluations just to compute 3 steps ahead in full (divided by something to take symmetries into account). Relying on computation is therefore going to lead to thinking very few steps ahead, so the evaluation function becomes much more important.

The evaluation function can be given hints by an expert player or learned by observing games. Google’s go computer did both; it had one of the world’s top players assisting in improving the evaluation function using his knowledge and observed millions of games, both historical human player games and by simply playing against itself to see what leads to winning.

The thing with these games is that getting ahead just one or two moves allows you too see and exploit a mistake before your opponent. Computers getting faster yields moves over time, but as the complexity of the game is higher than the speed improvements, that avenue is very slow (technically, computers get exponentially faster over time – speed doubles every 18 months according to a misinterpretation of Moore’s law – but the number of positions you have to evaluate also grows exponentially as you add a move – for example for go, the number of positions grow by a factor 200-400 every time you add a move). Slight improvements in the evaluation function on the other hand allows you to cut off large numbers of positions for evaluation.

So that’s what happened: slight improvements on an existing and extremely well-understood technique. An extremely well-understood technique applied to an extremely well-understood problem.

That doesn’t mean the feat isn’t impressive. They did make a computer that can come up with a good solution to an immensely complex problem (in terms of computation). They did make improvements to pattern recognition. But the technique is very unlikely to be transferable to other fields or even other games. It is highly unlikely it will have any direct immediate consequences for artificial intelligence.

After all, beating a human doesn’t necessarily need a very sophisticated being. Sometimes it just requires a big hammer.