Name: Solved-Theory -Assignment 4- Solution
SKU: 23826
Price: 35.00 USD
Availability: InStock

Description

5/5 – (2 votes)

Instructions

Submission: Assignment submission will be via courses.uscden.net. By the submission date, there will be a folder named ‘Theory Assignment 4’ set up in which you can submit your files. Please be sure to follow all directions outlined here.

You can submit multiple times, but only the last submission counts. That means if you finish some prob-lems and want to submit something first and update later when you finish, that’s fine. In fact you are encouraged to do this: that way, if you forget to finish the homework on time or something happens (re-member Murphy’s Law), you still get credit for whatever you have turned in.

Problem sets must be typewritten or neatly handwritten when submitted. In both cases, your submis-sion must be a single PDF. It is strongly recommended that you typeset with L^AT_EX. There are many free integrated L^AT_EX editors that are convenient to use (e.g Overleaf, ShareLaTeX). Choose the one(s) you like the most. This tutorial Getting to Grips with LaTeX is a good start if you do not know how to use L^AT_EX yet.

Please also follow the rules below:

The file should be named as firstname lastname USCID.pdfg., Don Quijote de la Mancha 8675309045.pdf).

Do not have any spaces in your file name when uploading it.

Please include your name and USCID in the header of your report as well.

Collaboration: You may discuss with your classmates. However, you need to write your own solutions and submit separately. Also in your report, you need to list with whom you have discussed for each problem. Please consult the syllabus for what is and is not acceptable collaboration. Review the rules on academic conduct in the syllabus: a single instance of plagiarism can adversely affect you significantly more than you could stand to gain.

Notes on notation:

Unless stated otherwise, scalars are denoted by small letter in normal font, vectors are denoted by small letters in bold font and matrices are denoted by capital letters in bold font.

k.k means L2-norm unless specified otherwise i.e. k.k = k.k₂

Please always reduce your solution to the simplest form.

Problem 1 Optimization over the simplex (25 points)

Many machine learning problems, especially clustering problems, involve optimization over a simplex. Thus, in this exercise, you will prove two optimization results over the simplex that we used multiple times in the lectures. Recall that a K 1 dimensional simplex D is defined as

= fq 2 R^K j q_k 0, 8k and å q_k = 1g,

k=1

which means that a K 1 dimensional simplex has K non-negative entries, and the sum of all K entries is

The property that a simplex sums to one coincides with a probability distribution of a discrete random variable of K possible outcomes, and is thus frequently used in clustering problems.

1.1 Given a₁, . . . , a_K 2 R⁺, solve the following optimization over simplex (find the optimal value of q_k.)

(12 points)

arg max å a_k ln q_k,

s.t. q_k 0,

K
å q_k = 1.
k=1
1.2 Given b₁, . . . , b_K 2 R, solve the following optimization problem	(13 points)
K
arg max å (q_kb_k q_k ln q_k) ,
^q2D k=1

s.t. q_k 0,

q_k = 1.

k=1

Problem 2 Gaussian Mixture Model and EM (30 points)

2.1 In the lecture we applied EM to learn Gaussian Mixture Models (GMMs) and showed the M-Step without a proof. Now it is time that you prove it.

Consider a GMM with the following PDF of x_n:

(

^p2_p)^DjS_kj2

k=1

p(x_n) = å w_k

(x_n

m_k, S_k) = å

w_k

exp

(x_n

m_k)^T S_k

¹(x_n m_k)

where K 2 N is the number of Gaussian components, D 2 N is dimension of a data point x_n. This GMM has K tuples of model parameters (m_k, S_k, w_k), which standards for the mean vector, covariance matrix, and component weight of the k-th Gaussian component. jSj denotes the determinant of matrix S.

For simplicity, we further assume that all components are isotropic Gaussian, i.e., S_k = s_k² I. Find the MLE of the expected complete log-likelihood. Equivalently, find the optimal solution to the following optimiza-tion problem.

arg max ååg_nk ln w_k + ååg_nk ln N (x_n j m_k, S_k)

w_k,m_k,S_k n _k n _k

s.t. w_k 0

w_k = 1

k=1

where g_nk is the posterior of latent variables computed from the E-Step. Hint: you can make use of the

result from Problem 1.1. (15 points)

2.2 In the lecture we derived EM through a lower bound of the log-likelihood function. Specifically, we find the tightest lower bound by solving

arg max E_z_n _q_n ln p(x_n, z_n ; q⁽^t⁾) E_z_n _q_n [ln q_n] .

q_n 2D

Find the optimal solution to this optimization problem using the results from Problem 1.2. (The solution was already given in the lecture, but here we require you to derive it.)

(5 points)

2.3 The posterior probability of z in GMM can be seen as a soft assignment to the clusters; In contrast, K-means assign each data point to one cluster at each iteration (hard assignment). Describe how to set the model parameters such that the GMM in the previous question reduces to a K-means; please also derive p(z_n = kjx_n) in this case.

(10 points)

Problem 3 Hidden Markov Model (45 points)

Recall that an HMM is parameterized as follows:

initial state distribution P(X₁ = s) = p_s

transition distribution P(X_t₊₁ = s⁰ j X_t = s) = a_s,s0

emission distribution P(O_t = o j X_t = s) = b_s,o

3.1 Suppose we observe a sequence of outcomes o₁, . . . , o_T and wish to predict the next state X_T₊₁

P(X_T₊₁ = s j O_1:T = o_1:T ).

Denote the forward message as

a_s(T) = P(X_T = s, O_1:T = o_1:T ).

Please derive how P(X_T₊₁ = s j O_1:T = o_1:T ) can be represented using a_s(T).

3.2 Given an HMM with the following probabilities:

Transition probabilities:					Emission probabilities:
Current		Next					Word
Current	Start	A	B	End			Word
Start	0	0.7	0.3	0	State:	‘’	‘ f ight⁰	‘on⁰
A	0	0.2	0.7	0.1	Start	1	0	0
B	0	0.7	0.2	0.1	A	0	0.4	0.6
End	0	0	0	1	B	0	0.7	0.3

We assume that the process stops at state ‘End’.

Suppose that the process starts from state ‘Start’ at t = 0 and that we observe o_1:2 = f ight on , write down the forward messages a_s(2) and determine the most likely state at t = 3 by computing the probability for each state. Round your numbers to 4 decimal points.

(7 points)

Suppose that the process starts from state ‘Start’ at t = 0 and that we observe the whole output se-quence as f ight on on , what is the most likely sequence of states that produce this? Please show your derivation step by step.

(8 points)

3.3 [Unrelated to the above.] Describe how to set the model parameters such that an HMM reduces to the GMM described in Problem 2. In addition, derive the posterior probability P(X₂ = s j O₁ = o₁, O₂ = o₂) using the parameters you set to show that the HMM really reduces to GMM. Hint: compare your result

with the posterior p(zjx) of GMM in Problem 2.2. (20 points)

Solved-Theory -Assignment 4- Solution

Description

Related products

GTLegend: A Role-Playing Game SOlution

Lab 3: “Thanks for All the Fish!!!” SOlution

Lab 5: Vegas Blackjack Solution

Mario Level Generation Solution

Project Nested Circles Solution