Name: Solved--Theory Assignment 3 --Solution
SKU: 23824
Price: 35.00 USD
Availability: InStock

Description

5/5 – (2 votes)

Instructions

Submission: Assignment submission will be via courses.uscden.net. By the submission date, there will be a folder named ‘Theory Assignment 3’ set up in which you can submit your files. Please be sure to follow all directions outlined here.

You can submit multiple times, but only the last submission counts. That means if you finish some prob-lems and want to submit something first and update later when you finish, that’s fine. In fact you are encouraged to do this: that way, if you forget to finish the homework on time or something happens (re-member Murphy’s Law), you still get credit for whatever you have turned in.

Problem sets must be typewritten or neatly handwritten when submitted. In both cases, your submis-sion must be a single PDF. It is strongly recommended that you typeset with L^AT_EX. There are many free integrated L^AT_EX editors that are convenient to use (e.g Overleaf, ShareLaTeX). Choose the one(s) you like the most. This tutorial Getting to Grips with LaTeX is a good start if you do not know how to use L^AT_EX yet.

Please also follow the rules below:

The file should be named as firstname lastname USCID.pdfg., Don Quijote de la Mancha 8675309045.pdf).

Do not have any spaces in your file name when uploading it.

Please include your name and USCID in the header of your report as well.

Collaboration: You may discuss with your classmates. However, you need to write your own solutions and submit separately. Also in your report, you need to list with whom you have discussed for each problem. Please consult the syllabus for what is and is not acceptable collaboration. Review the rules on academic conduct in the syllabus: a single instance of plagiarism can adversely affect you significantly more than you could stand to gain.

Notes on notation:

Unless stated otherwise, scalars are denoted by small letter in normal font, vectors are denoted by small letters in bold font and matrices are denoted by capital letters in bold font.

k.k means L2-norm unless specified otherwise i.e. k.k = k.k₂

Problem 1 Principle Component Analysis (25 points)

In this problem, we use proof by induction to show that the M-th principle component corresponds to the M-th eigenvector of X^T X sorted by the eigenvalue from largest to smallest. Here X is the centered data matrix and we denote the sorted eigenvalues as l₁ l₂ … l_d. In the lecture, the results was proven for M = 1. Now suppose the result holds for a value M, and you are going to show that it holds for M + 1. Note that the M + 1 principle component corresponds to the solution of the following optimization problem:

max	v^T X^T Xv	(1)
v	kvk₂ = 1
s.t.	kvk₂ = 1	(2)
	v^T v = 0, i = 1, …, M	(3)
	i

where v_i is the i-th principle component. Write down the Lagrangian of the optimization problem above, and show that the solution v_M₊₁ is an eigenvector of X^T X. Then show that the quantity in (1) is maximized when the v_M₊ ₁ is the eigenvector with eigenvalue l_M₊₁.

Problem 2 Support Vector Regression (30 points)

In this problem, we derive an extension of support vector machine to regression problem, called Support Vector Regression (SVR). Define the regressor f (x) = w^T f(x) + b, and given a dataset f(x_n, y_n)g_n^N₌₁, y_n 2 R. Intuitively, we want to find a regressor that has small weight w and also ensure small approximation error to f(x_n, y _n)g_n^N₌₁. The intuition can be formulated as the following optimization problem:

min ¹ kwk²

w,b 2 ²

s.t. jw^T f(x_n) + b y_nj e

For an arbitrary dataset, the e-close constraint may not be feasible, Therefore, we optimize the “soft” version of the loss above:

	1	2		N
min			+	^C å ^Ee	(	^yn	f	(	^xn	))	(4)
min			+		(			(		))
w,b	₂ ^kwk2
				n=1

E_e is the e-insensitive error function which gives zero error if the difference between prediction and ground truth is smaller than e and incurs linear penalty otherwise. It is defined as follow:

e		( _x	e	j_x	j	> e
E	(x) =	0		x	j	e
		j j		j	j

Question 1 Reformulate the unconstrained optimization problem in equation 4 as a constraint optimiza-tion problem by introducing slack variables for each data points. Hint: For each data point, introduce slack

variables x_n		n	0 such that	n	y_n f (x_n)		n
variables x_n		0, x⁰	0 such that	e x⁰	y_n f (x_n)		e + x_n. Then replace E_e with x_n, x⁰ . (12 points)

Question 2 Write down the Lagrangian of the constrained optimization derived in Question 1, then mini-

mize the Lagrangian by taking derivative w.r.t w, b, x_n, x_n⁰ and set the gradient to 0, and simplify expressions.
Hint: there are no b, x_n, x_n⁰ in the final expressions.	(18 points)

Problem 3 Support Vector Machine									(25 points)
	(	)		is a real value, and		y 2 f	1, 1	g	is the class label.
Consider the dataset consisting of points		x, y	, where x	p		y 2 f		g
There are only three points (x₁, y₁) = (0, 1), (x₂, y₂) = (					, 1), (x₃, y₃) = (p, 1). Let the feature mapping
There are only three points (x₁, y₁) = (0, 1), (x₂, y₂) = (				2	, 1), (x₃, y₃) = (p, 1). Let the feature mapping
f(x) = [cos x, sin x]^T, corresponding to the kernel function k(x, y) = cos(x						y).

Question 1 Write down the primal and dual formulations of SVM for this dataset in the transformed two-dimensional feature space based on f( ). Note that we assume the data points are separable and set the hyperparameter C to be +¥, which forces all slack variables (x) in the primal formulation to be 0 (and thus

can be removed from the optimization). (12 points)

Question 2 Next, solve the dual formulation. Based on that, derive the primal solution. (13 points)

Problem 4 Boosting (20 points)

Recall the procedure of AdaBoost algorithm described in class:

Algorithm 1: Adaboost

Given: A training set f(x_n, y_n 2 f+1, 1g)g_n^N₌₁, and a set of classifier H, where each h 2 H takes a

feature vector as input and outputs +1 or 1.

2 Goal: Learn H(x) = sign å_t^T₌₁ b_th_t(x)

Initialization: D₁(n) = _N¹ , 8n 2 [N].

4 for t = 1, 2,, T do

^Find ^ht ⁼ ^{arg min}_h_2H ån:y_n 6=h(x_n ) ^Dt⁽ⁿ⁾^.
Compute

et ⁼ å

D_t(n)

and

b_t =

log

e_t

n:y_n 6=h_t (x_n )

e_t

7Compute

^Dt+1

(n) =

_D_t₍_n₎_e ^bt ^yn ^ht ⁽^xn ⁾

D_t(n⁰)e

b y

_n0

)

for each n 2 [N]

^ån⁰=1

Question 1 We discussed in class that AdaBoost minimizes the exponential loss greedily. In particular, Adaboost seeks the optimal b_t that minimizes

	e_t(e^b^t		_e ^bt _{) +} _e ^bt
where e is the weighted classification error of		^ht	and is fixed. Show that b	=	1	ln	1	e_t

(8 points)^t				2				e_t

is the minimizer.

Question 2 Recall that at round t of AdaBoost, a classifier h_t is obtained and the weighting over the training set is updated from D_t to D_t₊₁. Prove that h_t is only as good as random guessing in terms of

classification error weighted by D_t₊₁. That is (12 points)

å ^Dt+1	(n) =	1	.
		2
n:h_t (x_n )6=y_n

Hint: you can somehow ignore the denominator of D_t₊₁	(n) to simplify calculation.

Solved–Theory Assignment 3 –Solution

Description

Related products

GTLegend: A Role-Playing Game SOlution

Homework 3: Mbed Setup Solution

Homework 2: Path Network Navigation SOlution

Project 4: GPU Programming Solution

Project 1B: Transformation Matrices Solution