Your cart is currently empty!
Problem description GridWorld is a 2D rectangular grid of size (N rows, N columns) with an agent starting off at one grid cell, moving from cell to cell through the grid, and eventually exiting after collecting a reward. This grid environment is described as follows: State space: GridWorld has N rows ×…
Problem description
GridWorld is a 2D rectangular grid of size (N rows, N columns) with an agent starting off at one grid cell, moving from cell to cell through the grid, and eventually exiting after
collecting a reward. This grid environment is described as follows:
Furthermore, action “Run X” moves the agent two cells in the X direction with
probability prun , but with probabilities 0.5 (1 − prun) and 0.5 (1 − prun) moves the agent two cells at angles of 90° and -90° to the direction X, respectively.
If moving in a particular direction causes the agent to bump into a wall, the movement fails, and the agent stays in the same cell “i, j” . We write P (s′|s, a) to denote the probability of reaching state s′ if action a is done in state s . The following examples illustrate the environment dynamics:
Furthermore, if there are K terminal states, the reward in the kth terminal state, s is R(s,Exit) = Rterminal(k) (a constant).
The agent prefers receiving its reward as soon as possible. Therefore, it applies a discount factor, γ , to future rewards. For example, if it runs once and then exits from the
kth terminal state, it will receive a total reward of Rrun + γRterminal(k). If it instead walks twice and then exits from the kth terminal state, it will receive a total reward of
Rwalk + γRwalk + γ2 Rterminal(k).
Your Task
In this assignment, you need to find the action that the agent should take at each state to maximize its expected reward. You can use any algorithm for this homework.
Implementation Notes
“Walk Up” > “Walk Down” > “Walk Left” > “Walk Right” > “Run Up” > “Run Down” > “Run Left” > “Run Right”.
File Formats
Input format:
<GRID SIZE> includes two positive integers separated by a comma “,” where the first one is the number of rows ( N rows ) of the grid, and the second one is the number of
columns ( N columns ) of the grid.
<WALL CELLS NUMBER> is a number (greater than or equal to zero) that specifies the number of wall cells in the grid.
<WALL CELLS POSITION> contains <WALL CELLS NUMBER> lines where each line includes 2 values separated by a comma “,”. The two values in each line indicate the row and column numbers of the gird where a wall is located.
<TERMINAL STATES NUMBER> is a number (greater than or equal to one) that specifies the number of terminal states in the grid.
<TERMINAL STATES POSITION and REWARDS> contains <TERMINAL STATES NUMBER> lines where each line includes 3 values separated by a comma “,”. The first two values in the k -th line indicate the row and column numbers of the grid where the
kthterminal state is located, and the third value (a float) denotes Rterminal(k) . <TRANSITION MODEL> contains two probability values separated by a comma “,”,
where the first value is pwalk and the second value is prun .
<REWARDS> contains two floating-point numbers separated by a comma “,” where the
first value is Rwalk and the second value is Rrun.
<DISCOUNT FACTOR>which is a number in the interval [0,1].
Output format:
<OPTIMAL ACTION GRID> contains N rows lines where each line includes N columns strings separated by a comma “,”. The j -th string at the i -th line should indicate the
optimal action (one of “Walk Up”, “Walk Down”, “Walk Left”, “Walk Right”, “Run Up”, “Run Down”, “Run Left”, or “Run Right” for non-terminal states; “Exit” for terminal states; or “None” for wall states) at state “ N rows − i + 1 , j ”.
Grading
Your homework submission will be tested as follows: Your program should take no command-line arguments. It should read a text file called “input.txt” in the current directory that contains a problem definition. It should write a file “output.txt” with your solution. The formats for files “input.txt” and “output.txt” are specified below. End-of-line convention is Unix (because Vocareum is a Unix system).
During grading, 50 test cases will be used. If your homework submission passes all 50 test cases, you receive 100 points. For each test case that fails, you will lose 2 points.
Note that if your code does not compile, or somehow fails to load and parse input.txt, or writes an incorrectly formatted output.txt, or no output.txt at all, or OuTpUt.TxT, you will get zero points.
Sample Test Cases
You are provided with 3 sample test cases and outputs. Please find them in the Homework 3 folder.
Homework Rules