Homework Assignment 3: Game Tree Search Solution




Acknowledgements: This project is based on one used in Columbia University’s Artificial Intelligence Course (COMS W4701). Special thanks to Daniel Bauer, who originally developed the starter code.

Othello is a 2-player board game that is played with distinct pieces that are typically black on one side and white on the other side, each side belonging to one player. Our version of the game is played on a chess board of any size, but the typical game is played on an 8×8 board. Players (black and white) take turns placing their pieces on the board.

Placement is dictated by the rules of the game, and can result in the flipping of coloured pieces from white to black or black to white. The rules of the game are explained in detail at https://en.wikipedia.org/wiki/Reversi.

Objective: The player’s goal is to have a majority of their coloured pieces showing at the end of the game.

Game Ending: Our version of the game differs from the standard rules described on Wikipedia in one

minor point: The game ends as soon as one of the players has no legal moves left.

Rules: The game begins with four pieces placed in a square in the middle of the grid, two white pieces and two black pieces (Figure 1, at left). Black makes the first move.

At each player’s turn, the player may place a piece of their own colour on an unoccupied square, if it “brackets” one or more opponent pieces in a straight line along at least one axis (vertical, horizontal, or diagnonal). For example, from the initial state black can achieve this bracketing by placing a black piece in any of the positions indicated by grey pieces in Figure 1 (in the middle). Each of these potential placements would create a Black-White-Black sequence, thus “bracketing” the White piece. Once the piece is placed, all opponent pieces that are bracketed, along any axis, are flipped to become the same colour as the current player’s. Returning to our example, if black places a piece in Position 6-d in the middle panel of Figure 1,

Assignment 3, University of Toronto, CSC384 – Intro to AI, Winter 2020


Figure 2: At right, possible moves of white player are shown in grey. At left, the board state after the move of white player.

the white piece in position 5-d will become bracketed and consequently will be flipped to black, resulting in the board depicted in the right panel of Figure 1.

Now it’s white’s turn to play. All of white’s possibilities at this time are shown as grey pieces in Figure 2 (at left). If white places a piece on 4-c it will cause the black piece in 4-d to be bracketed resulting in the 4-d piece being flipped to white as shown in Figure 2 (at right). To summarize, a legal move for a player is one that results in at least one of its opponents pieces being flipped. Our version of the game ends when one player no longer has any legal moves available.

Getting Started

The starter code contains 5 files:

  1. othello gui.py, which contains a simple graphical user interface (GUI) for Othello.

  1. othello game.py, which contains the game ”manager”. This stores the current game state and communicates with different player AIs.

  1. othello shared.py, which contains functions for computing legal moves, captured disks, and suc-cessor game states. These are shared between the game manager, the GUI and the AI players.

  1. randy ai.py, which specifies an ”AI” player (named Randy) that randomly selects a legal move.

  1. agent.py – The file where you will implement your game agent.

Game State Representation:

Each game state contains two pieces of information: The current player and the current disks on the board. Throughout our implementation, Player 1 (dark) is represented using the integer 1, and Player 2 (light) is represented using the integer 2.

Assignment 3, University of Toronto, CSC384 – Intro to AI, Winter 2020


The board is represented as a a tuple of tuples. The inner tuple represents each row of the board. Each entry in the rows is either an empty square (integer 0), a dark disk (integer 1) or a light disk (integer 2). For example, an 8×8 initial state looks like this:

((0, 0, 0, 0, 0, 0, 0, 0),

(0, 0, 0, 0, 0, 0, 0, 0),

(0, 0, 0, 0, 0, 0, 0, 0),

(0, 0, 0, 2, 1, 0, 0, 0),

(0, 0, 0, 1, 2, 0, 0, 0),

(0, 0, 0, 0, 0, 0, 0, 0),

(0, 0, 0, 0, 0, 0, 0, 0),

(0, 0, 0, 0, 0, 0, 0, 0))

Running the code:

You can run the Othello GUI by typing $python3 othello gui.py -d board size -a agent.py, where the parameter board size is an integer that determines the dimension of the board upon which you will play and agent.py is the game agent you would like to play against. If you type $python3 othello gui.py -d board size -a randy.py, you will play against an agent that selects moves ran-domly, and that is named Randy. Playing a game should bring up a game window. If you play against a game agent, you and the agent will take turns. We recommend that you play against Randy to develop a better understanding of how the game works and what strategies can give you an advantage.

The GUI can take also take two AI programs as command line parameters. When two AIs are speci-fied at the command line you can watch them play against each other. To see Randy play against itself, type $python3 othello gui.py -d board size -a randy.py -b randy.py. You may want to try playing the agents you create against those that are made by your friends.

The GUI is rather minimalistic, so you need to close the window and then restart to play a new game.

Communication between the Game Host and the AI:

This is a technical detail that you can skip if you are not interested. Functions for communicating with the game manager are provided as part of the scaffolding code. Note however that the game manager com-municates with agents via stdout. If you want to print debugging statements, you will therefore want to print to stderr instead. You can do this by using the eprint command that is found int agent.py.

The AI and the Game Manager / GUI will run in different Python interpreters. The Game Manager / GUI will spawn a child process for each AI player. This makes it easier for the game manager to let the AI process time out and also makes sure that, if the AI crashes, the game manager can keep running. To communicate with the child process, the game manager uses pipes. Essentially, the game manager reads from the AI’s standard output and writes to the AI’s standard input. The two programs follow a precise protocol to communicate:

The AI sends a string to identify itself. For example, randy ai sends the string “Randy”. You can come up with a fun name for your AI.

Assignment 3, University of Toronto, CSC384 – Intro to AI, Winter 2020


The game manager sends back “1” or “2”, indicating if the AI plays dark or light. Then the AI sits and waits for input form the game manager.

When it is the AI’s turn, the game manager will send two lines: The current score, for example “SCORE 2 2” and the game board (a Python tuple converted to string). The game manager then waits for the AI to respond with a move, for example “4 3”.

At the end of the game, the game master sends the final score, for example “FINAL 33 31”.

Time Constraints:

Your AI player will be expected to make a move within 10 seconds. If no move has been selected, the AI loses the game. This time constraint does not apply for human players.

You may change the time constraint by editing line 32 in othello game.py: TIMEOUT = 10

However, we will run your AI with a timeout of 10 seconds during grading and for the Othello compe-tition.

Part I. Minimax [25 pts]

You will want to test your Minimax implementations on boards that are only 4×4 in size. This restriction makes the game somewhat trivial: it is easy even for human players to think ahead to the end of the game. When both players play optimally, the player who goes second always wins. However, the default Minimax algorithm, without a depth limit, takes too long even on a 6×6 board.

Write the function compute utility(board, color) that computes the utility of a final game board state (in the format described above). The utility should be calculated as the number of disks of the player’s color minus the number of disks of the opponent. Hint: The function get score(board) returns a tuple (number of dark disks, number of light disks).

Then, implement the method select move minimax(board, color). For the time being, you can ignore the limit and caching parameters that the function will also accept; we will return to these later. Your func-tion should select the action that leads to the state with the highest minimax value. The ’board’ parameter is the current board (in the format described above) and the ’color’ parameter is the color of the AI player. We use an integer 1 for dark and 2 for light. The return value should be a (column, row) tuple, representing the move. Implement minimax recursively by writing two functions minimax max node(board, color) and minimax min node(board, color). Again, just ignore the limit and caching parameters for now.

Hints: Use the get possible moves(board, color) function in othello shared.py, which returns a list of (column, row) tuples representing the legal moves for player color. Use the play move(board, color, move) function in othello shared.py, which computes the successor board state that results from player color playing move (a (column, row) tuple). Pay attention to which player should make a move for min nodes and max nodes.

Once you are done you can run your MINIMAX algorithm via the command line using the flag -m. If you issue the command ”$python3 othello gui.py -d 4 -a agent.py -m” you will play against your

Assignment 3, University of Toronto, CSC384 – Intro to AI, Winter 2020


agent on a 4×4 board using the MINIMAX algorithm. The command ”$python3 othello gui.py -d

  • -a agent.py” will have you play against the same agent using the ALPHA-BETA algorithm, which you will implement next. You can also play your agent against Randy with the command ”$python3 othello gui.py -d 4 -a agent.py -b randy ai.py”

Part II. Alpha-Beta Pruning [25 pts]

The simple minimax approach becomes infeasible for boards larger than 4×4. To ameliorate this we will write the the function select move alphabeta(board, color) to compute the best move using alpha-beta pruning. The parameters and return values will be the same as for minimax. Once again, you can again ignore the limit, caching and ordering parameters that the function will also accept for the time being; we will return to these later. Much like minimax, your alpha-beta implementation should recursively call two helper functions: alphabeta min node(board, color, alpha, beta) and alphabeta max node(board,color, alpha, beta).

Playing with pruning should speed up decisions for the AI, but it may not be enough to be able to play on boards larger than 4×4. Use the command ”$python3 othello gui.py -d 4 -a agent.py” to play against your agent using the ALPHA-BETA algorithm on a 4×4 board.

Part III. Depth Limit [20 pts]

Next we’ll work on speeding up our agents. One way to do this is by setting a depth limit. Your starter code is structured to do this by using the -l flag at the command line. For example, if you type $python3 othello gui.py -d 6 -a agent.py -m -l 5, the game manager will call your agent’s MINIMAX routine with a depth limit of 5. If you type $python3 othello gui.py -d 6 -a agent.py -l 5, the game manager will call your agent’s ALPHA-BETA routine with a depth limit of 5.

Alpha beta will recursively send the ’limit’ parameter to both alphabeta min node and alphabeta max node. Minimax is similar and will recursively sends the ’limit’ parameter to minimax min node and minimax max node. In order to enforce the depth limit in your code, you will want to decrease the limit parameter at each recur-sion. When you arrive at your depth limit (i.e. when the ’limit’ parameter is zero), use a heuristic function

to define the value any non-terminal state. You can call the compute utility function as your heuristic to estimate non-terminal state quality.

Experiment with the depth limit on boards that are larger than 4×4. What is the largest board you can play on without timing out after 10 seconds?

Part IV. Caching States [12 pts]

We can try to speed up the AI even more by caching states we’ve seen before. To do this, we will want to alter your program so that it responds to the -c flag at the command line. To implement state caching you will need to create a dictionary in your AI player (this can just be stored in a global variable on the top level of the file) that maps board states to their minimax value. Modify your minimax and alpha-beta pruning functions to store states in that dictionary after their minimax value is known. Then check the dictionary,

Assignment 3, University of Toronto, CSC384 – Intro to AI, Winter 2020


at the beginning of each function. If a state is already in the dictionary and do not explore it again.

The starter code is structured so that if you type $python3 othello gui.py -d 6 -a agent.py -m -c, the game manager will call your agent’s MINIMAX routines with the ’caching’ flag on. If instead you remove the -m and type $python3 othello gui.py -d 6 -a agent.py -c, the game manager will call your agent’s ALPHA-BETA routines with the ’caching’ flag on.

Part IV. Node Ordering Heuristic [18 pts]

Alpha-beta pruning works better if nodes that lead to a better utility are explored first. To do this, in the Alpha-beta pruning functions, we will want to order successor states according to the following heuristic: the nodes for which the number of the AI player’s disks minus the number of the opponent’s disks is high-est should be explored first. Note that this is the same as the utility function, and it is okay to call the utility function to compute this value. This should provide another small speed-up.

Alter your program so that it executes node ordering when the -o flag is placed on the command line. The starter code is already structured so that if you type $python3 othello gui.py -d 6 -a agent.py -o, the game manager will call your agent’s ALPHA-BETA routines with an ’ordering’ parameter that is equal to 1.

Taken together, the steps above should give you an AI player that is challenging to play against. To play against your agent on an 8×8 board when it is using state caching, alpha-beta pruning and node ordering, type $python3 othello gui.py -d 8 -a agent.py -c -o.

Part VI. Othello Competition (Optional)

If you choose to do this part of the assignment, we will reward you with an extra grace day.

The prior steps should give you a good AI player, but we have only scratched the surface. There are many possible improvements that would create an even better AI. To improve your AI, you may want to create a good game heuristic. We encourage you to use this in place of compute utility in your alpha-beta routines. Some ideas for good heuristics are below.

In the Othello Competition, we will have your AI players compete with each other. The finalists and winner of the competition will receive a prize. We are planning to run most of the competition after the homework assignment is due, and then play out the quarter-finals, semi-finals and finals during the final exam period.

To submit an AI to the competition, simply include a file agent YOURID.py with your homework sub-mission. You can restructure the code in your submission as you please as this file will not be marked. You can also give your AI a funny/informative/unique name (i.e. the first string that is sent to the game manager). We will run the competition anonymously and will only announce the finalists and winners by real name.

Assignment 3, University of Toronto, CSC384 – Intro to AI, Winter 2020


Some Ideas for Heuristic Functions for Othello Game

  1. Consider board locations where pieces are stable, i.e. where they cannot be flipped anymore.

  1. Consider the number of moves you and your opponent can make given the current board configura-tion.

  1. Come up with a better node-ordering function.

  1. Come up with a better evaluation function.

  1. Instead of a fixed depth limit, use iterative deepening and a timer to maximize the number of levels you can explore in 10 seconds.

  1. Use a different strategy in the opening, mid-game, and end-game.

You can also do your own research to find a wide range of other good heuristics (for example, here is a good start: http://www.radagast.se/othello/Help/strategy.html).