\$30.00

Category:

## Description

1. Unsupervised Learning, 30 pts. In this problem, we will experiment with two clus-tering algorithms: (i) k-means, and (ii) EM algorithm for Gaussian mixtures. In what follows, N denotes the number of data points, D denotes the dimension of each data point, xi 2 RD denotes a single data point, K denotes the number of clusters, and xi p(xjz = k) for k = 1; :::; K denotes the data generating distribution, and p(z = k) is the prior on the class (cluster in this case).

1. First, we will generate some data for this problem. Set the number of points N = 400, their dimension D = 2, and the number of clusters K = 2, and generate data from the distribution p(xjz = k) = N ( k; k). Sample 200 data points for k = 1 and 200 for k = 2,

with

 1 = 0:1 , 2 = 0:1 and 1 = 2 = 7 10 0:1 6:0 10 7

Here, N = 400. Since you generated the data, you already know which sample comes from which class. Run the cell in the IPython notebook to generate the data.

Make a scatter plot of the data points showing the true cluster assignment of each point either using di erent color codes or shape or both.

1. Now, we assume that the true class labels are not known. Implement the k-means algorithm for this problem. Write two functions: km_assignment_step, and km_refitting_step as given in the lecture (Here, km_ means k-means). Identify the correct arguments, and the

order to run them. Initialize the algorithm with

 (1.1) ^1 = 0:0 , ^2 = 1:0 0:0 1:0

and run it until convergence. Show the resulting cluster assignments on a scatter plot either using di erent color codes or shape or both. Also plot the cost vs. the number of iterations. Report your misclassi cation error. Hint: you generated the data, you know the correct labels. Your unsupervised learning algorithm doesn’t.

1

1. Next, implement the EM algorithm for Gaussian mixtures. Write three functions: log-likelihood, gm_e_step, and gm_m_step as given in the lecture. Identify the correct arguments, and

the order to run them. Initialize the algorithm with means as in (1.1), covariances with

^ ^

1 = 2 = I, and ^1 = ^2.

In addition to the update equations in the lecture, for the M (Maximization) step, you need to use this following equation to update the covariance matrix k:

^ 1 N (n) (n) ^ (n) ^ >

X

(1.2) k = rk (x k)(x k) ;

Nk n=1

and for the E (Expectation) step, the update rule for rk(n) needs to be modi ed accordingly

N j^ N j^ ^

(simply replace (x k; I) with (x k; k) in the update). Run the algorithm until con-vergence and show the resulting cluster assignments on a scatter plot either using di erent color codes or shape or both. Also plot the log-likelihood vs. the number of iterations. Report your misclassi cation error.

1. Comment on the results:

1. Compare the performance of k-Means and EM based on the resulting cluster assign-ments.

1. Compare the performance of k-Means and EM based on their convergence rate. What is the bottleneck for which method?

1. Experiment with 5 di erent data realizations (generate new data), run your algorithms, and summarize your ndings. Does the algorithm performance depend on di erent realizations of data?

1. Reinforcement Learning, 65 pts. In this portion of the assignment, you will write software to implement a simulated environment and build a reinforcement learning agent that discovers the optimal (shortest) path to a goal. The agent’s environment will look like:

Each cell is a state in the environment. The cell labeled \I” is the initial state of the agent. The cell labeled \G” is the goal state of the agent. The black cells are barriers|states that are inaccessible to the agent. At each time step, the agent may move one cell to the left, right, up, or down. The environment does not wraparound. Thus, if the agent is in the lower left corner and tries to move down, it will remain in the same cell. Likewise, if the agent is in the initial state and tries to move up (into a barrier), it will remain in the same cell.

2.1. Implementation (20 points). You should implement a Q learning algorithm that selects moves for the agent. The algorithm should perform exploration by choosing the action with the maximum Q value 90% of the time, and choosing one of the four actions at random the remaining 10% of the time. We should “break-ties” when the Q-values are zero for all the actions (happens initially) by essentially choosing uniformly from the action. So now you have two conditions to act randomly: for amount of the time, or if the Q values are all zero.

The simulation consist of a series of trials, each of which runs until the agent reaches the goal state, or until it reaches a maximum number of steps, which you can set to 100. The reward at the goal is 10, but at every other state is 0. You can set the parameter to 0.9.

2

G

I

2.2. Experiments.

1. Basic Q learning experiments (10 points).

1. Run your algorithm several times on the given environment.

1. Run your algorithm by passing in a list of 2 goal locations: (1,8) and (5,6). Note: we are using 0-indexing, where (0,0) is top left corner. Report on the results.

1. Experiment with the exploration strategy, in the original environment (20 points).

1. Try di erent values in -greedy exploration: We asked you to use a rate of =0.1, but try also 0.5 and 0.01. Graph the results (for the 3 -values) and discuss the costs and bene ts of higher and lower exploration rates.

1. Try exploring with policy derived from the softmax of Q-values described in the Q learning lecture. Use the values 2 f1; 3; 6g for your experiment, keeping xed throughout the training.

1. Instead of xing the = 0 to the initial value, we will increase the value of as the number of episodes t increase:

(2.1) (t) = 0ekt

That is, the value is xed for a particular episode. Run the training again for di erent values of k 2 f0:05; 0:1; 0:25; 0:5g, keeping 0 = 1:0. Compare the results obtained with this approach to those obtained with a static value.

1. Stochastic environments (15 points).

1. Make the environment stochastic (uncertain), such that the agent only has say a 95% chance of moving in the chosen direction, and a 5% chance of moving in a random direction.

3

1. Change the learning rule to handle the non-determinism, and experiment with di erent values of the probability that the environment performs a random action prand 2 f0:05; 0:1; 0:25; 0:5g in this new rule. How does performance vary as the environment becomes more stochastic?

Use the same parameters as in the rst part, except change the alpha to be less than 1, e.g., = 0:5. Report your results.

2.3. Write-up. Hand in a brief summary of your experiments. For each sub-section, this sum-mary should include a one paragraph overview of the problem and your implementation. It should include a graph showing number of steps required to reach the goal as a function of learning trials (one trial is one run of the agent through the environment until it reaches the goal or maximum number of steps). You should also make a gure showing the policy of your agent for the rst 2.1.1 section. The policy can be summarized by making an array of cells corresponding to the states of the environment, and indicating the direction (up, down, left,right) that the agent is most likely to move if it is in that state.

1. Course Evaluation, 5 pts. You will get the nal 5 points on the assignment if you con rm that you submitted your course evaluation.

4