Today we will analyze the minimax strategy, a system at the base of the simulation carried out in Game Theory and AI.
In a previous post of this series about simulation, we took into consideration the difference between unfair game and zero-sum game, presenting a brief sample of code to show such behaviour when run on a computer. We have noticed how a computer, when properly programmed, can present an apparently intelligent behavior, similar to that of a human being. Finally, a classical data structure has been described for the development of decision algorithms: an archiving system is needed that can “remember” or “calculate” the state of the moves over time, as well as a probabilistic mechanism, generally based on labels with different weights to apply to decisions.
Today we will deepen these concepts further, examining the strategy behind most of the simulations carried out in Game Theory.
The Minimax strategy
First of all we need to contextualize the topic.
The analysis applies to deterministic games (ie games of pure ability in which chance plays no role). We will only consider the following games:
- two players games, where only two contenders are faced, indicated with 0 and 1;
- games in alternating moves, in which the players move in turn, alternating (if a certain move is up to the player x, the next move is up to the player 1-x)
In any game, we can define the concept of state: for example, in the case of chess the state is represented by the arrangement of the pieces on the board. A game configuration is given by the state in which the game is located and by the player whose turn it is to move (called the “turn player”).
Minimax is a method to minimize the maximum (minimax) possible loss; alternatively, to maximize the minimum gain (maximin). It was discovered in game theory in the case of zero-sum game with two players, both in the case of alternative moves (turns) and simultaneous moves, later being extended to more complex games and decision support in the presence of uncertainty.
A simple version of the algorithm can be seen in games such as three of a kind (tic-tac-toe), where both players can win, lose or draw.
- If player A can win with one move, the best move is the winning one.
- If player B knows that a given move will lead to A being able to win with his next move, while another will lead him to draw, player B’s best move is the one that will lead him to the draw.
Towards the end of the game it is easy to understand which are the best moves; the minimax algorithm finds the best move at a given time by searching for it starting from the end of the game and going back to the current situation. At each step, the algorithm assumes that player A tries to maximize his chances of winning, while B tries to minimize A’s chances of winning, for example by maximizing his chance of winning.
How dows the minimax work
The algorithm recursively analyzes the tree starting from the last terminal nodes (leaves), going up the various game paths (branches).
In a game situation two players participate:
- Mini. It is the player who must minimize the game value.
- Max. It is the player who must maximize the value of the game.
The two players have opposite goals. It is therefore a zero-sum (non-cooperative) and sequential game.
The minimax algorithm
The minimax algorithm is a recursive algorithm for finding the best move in a zero-sum game that takes place between two players. The minimax algorithm is used in decision theory when players find themselves in strategic interaction. Strategic interaction is the situation that occurs when two or more decision-makers (players) are influenced by the outcome of their choices and by those of other agents. In these circumstances the agents must formulate a strategy in order to make the best choices to maximize their utility (payoff) also taking into account the strategies adopted by the other agents participating in the same game.
The minimax algorithm allows to identify the best choices of the two players during the game, analyzing the game tree backwards starting from the terminal nodes, ie from the possible situations in which the game can end (end of game), and progressively going up up to the current position of the players. For example in the following game tree a sequential game between two players is represented, the first player (MAX) must maximize the value of the game while the second player must minimize it.
Suppose we have a game in which the two players have only two possible moves each each turn. The algorithm generates the game tree, where the circles represent the moves of the player using the algorithm (maximizing player) and the squares are the moves of the opponent (minimizing player). Due to the limitations of the calculation system, we limit the tree to a depth of four shifts.
What value is actually assigned depends on how the final positions of victory and defeat are evaluated: one method is to assign 1 to the moves that lead to final victory and -1 to those of defeat (which leads to developing the combinatorial theory of games formulated by John Conway), another is to adopt the positive and negative infinite values. The value for player A of each non-immediately winning move is then the minimum value of all possible counter-moves of B. For such reason A is also called “maximizing player” and B “minimizing player”.
This assumes that it is possible for those who compute (not yet talking about computer applications) to evaluate the whole tree of possible moves of the game; in reality this can be done only for very simple games, while for others, such as chess or go this is not possible or only possible in the final stages, and in general you can only calculate an estimate of the probability that a given move will lead to the victory of one of the players.
The calculation of this estimate can be improved if it is possible to provide a heuristic function that evaluates the non-terminal positions of the game without having to know all the subsequent moves: in this case only a certain number of future moves can be considered. This number is called the “depth” of the algorithm and is measured in moves:
The algorithm evaluates each leaf node with a heuristic function and obtains the values shown. The moves that make the maximizing player win have an infinite positive value, while the moves that give the minimizing player victory have an infinite negative value. At the third level the algorithm will choose, for each node, the child node with the lowest value and assign this value to the node under examination; for example the node on the left will take the minimum value between “10” and “+ ∞”, that is “10”. The next step, at level 2, chooses the maximum value among all child nodes (level 3 nodes) and assigns it to the respective parent node of the second level. The algorithm continues to evaluate the maximum and minimum values of the child nodes and assign them to the parent nodes, alternatively, until it reaches the root node of the tree, for which it chooses the child node with the maximum value (represented with a blue arrow in the figure). This move is what the player should do to minimize the maximum possible loss function.
In practice the minimax algorithm’s job is to explore all the possible nodes of the game tree: the average number of child nodes for each node under examination is called a branching factor, and is equal to the average number of possible moves in a generic game position. Therefore the number of nodes to be evaluated grows exponentially with the search depth and is equal to the branching factor raised to the search depth. The computational complexity of the minimax algorithm is therefore NP-complete, which makes it impractical to obtain a definitive solution of less than trivial games (For example, only the simple game of tic-tac-toe with nine boxes has well over 360 thousand terminal nodes (9!) ie game combinations, chess has about 1040 terminal nodes), and useful only for obtaining a “not too bad” game conduct. However the performance of minimax can be drastically improved, maintaining the correctness of the results, adopting the alpha-beta pruning: there are other heuristic methods of pruning the game tree, but this is the only one that does not influence the final result of minimax research .
The alpha-beta pruning is a variant of the minimax algorithm that allows reducing the number of nodes and branches to be analyzed to reach the optimal minimax result. The idea behind the alpha-beta pruning is very simple and consists in interrupting the exploration of the nodes in depth when they are not selected by the players because they are worse than the possible alternatives. This allows you to make a logical pruning of the game tree and reduce the complexity of the algorithm.
Unlike the complete minimax algorithm, the alpha-beta pruning algorithm only analyzes a part of the game tree, still reaching the same optimal strategic solution.
The efficiency of the alpha-beta pruning is strongly conditioned by the order of presentation of the leaf nodes in a branching.
When pruning interrupts the exploration, the algorithm could still have scanned the entire branching of node B. Possible solutions to the problem are an ordering of the successors starting from the current move (alpha-beta pruning with sorting) or identification of the best move using a heuristic function (heuristic alpha-beta pruning).
|Perfect information||Imperfect information|
|Deterministic games||Chess, Go, Checkers, |
Othello, 4 in a row
(A game or a puzzle?)
1) Minimax only works in games with a finite number of turns
The minimax algorithm is not very suitable for solving strategic games with an indefinite number of turns.
It is important to know who has the last move in the game.
2) The complexity of the minimax grows exponentially
If the game has many shifts on the search tree, the search in depth takes a lot of time and memory.
The number of terminal nodes may be excessive.
3) Minimax is not suitable for games with more than two players
When there are more than two decision-makers, strategic choices become much more complex.
The minimax method fails to interpret interactions because other factors come into play.
Example. In games with three or more players, some agents may join forces against another agent: the choices are not individual.
Moreover, the same coalitions are subject to the risk of betrayals.
The reputation of the players is therefore important.
And so on.