Your cart is currently empty!
Recall from lecture1 that Sam is either t or un t = f t, un tg and has to decide whether to exercise or relax = fexercise, relaxg on the basis of the following (probability, reward)-matrices (p(s; a; s0); r(s; a; s0)) for row s, column s0 in table with corner a exercise t un…
Recall from lecture1 that Sam is either t or un t
= f t, un tg
and has to decide whether to exercise or relax
= fexercise, relaxg
on the basis of the following (probability, reward)-matrices (p(s; a; s0); r(s; a; s0))
for row s, column s0 in table with corner a
exercise |
t |
un t |
relax |
t |
un t |
||
t |
.99, 8 |
.01, 8 |
t |
.7, 10 |
.3, 10 |
||
un t |
.2, 0 |
.8, 0 |
un t |
0, 5 |
1, 5 |
||
The -discounted value of (s; a) is |
|||||||
lim |
qn(s; a) |
n!1
where
q0(s; a) := p(s; a; t)r(s; a; t) + p(s; a; un t)r(s; a; un t)
Vn(s) := max(qn(s; exercise); qn(s; relax))
qn+1(s; a) := q0(s; a) + p(s; a; t)Vn( t) + p(s; a; un t)Vn(un t) :
In particular, = 0:9 leads to the following qn(s; a) for n = 0; 1; 2
exercise |
relax |
||
t |
8, 16.955, 23.812 |
10, 17.65, 23.685 |
relax, relax, exercise |
un t |
0, 5.4, 10.017 |
5, 9.5, 13.55 |
relax, relax, relax |
Your task is to write a program that given
a positive integer n, a -setting G (0 < G < 1), and a state s
returns the values
qn(s; exercise) and qn(s; relax)
for = G. You may use any of the following programming languages
Prolog, Java, Python
but be prepared to demonstrate your program on Tue, March 6 (noon-1, LG 12, O’Reilly) or Wed, March 7 (10-11, LB04; on your machine).
It may help to read Poole & Mackworth, 9.5 Decision Processes.