Name: Solved--Assignment 6 --Solution
SKU: 35739
Price: 25.00 USD
Availability: InStock

Description

5/5 – (2 votes)

Problem 1 [33%]

In this problem, we will establish some basic properties of vectors and linear functions.

The L₂ norm of a vector measures the length (size) of a vector. The norm for a vector x of size n is defined as:

x ₂	= ^v	x_i²
k k	u	n
		i=1
	u^X

Show that the norm can be expressed as the following quadratic expression:

kxk²₂ = x^T x

Let a and x be vectors of size n = 3 and consider the following linear function f(x) = a^T x. Show that the gradient of f is: r_xf(x) = a.
Let A be a symmetric matrix of size 3 × 3 and consider the following quadratic function f(x) = x^T Ax. Show that the gradient of f is: r_xf(x) = 2Ax. A matrix is symmetric if A_ij = A_ji for all i and j.

Problem 2 [34%]

Hint: You can follow the slides from the March 4th class, or the LAR reference from the class website. See the class website for some recommended linear algebra references.

You will derive the formula used to compute the solution to ridge regression. The objective in ridge regression is:

f(β) = ky − Aβk²₂ + λkβk²₂

Here, β is the vector of coeﬃcients that we want to optimize, A is the design matrix, y is the target, and λ is the regularization coeﬃcient. The notation k·k₂ represents the Euclidean (or L₂) norm.

Our goal is to find β that solves:

Follow the next steps to compute it.

min f(β)

Express the ridge regression objective f(β) in terms of linear and quadratic terms. Recall that kβk²₂ = β^T β. The results should be similar to the objective function of linear regression.

Derive the gradient: r_βf(β) using the linear and quadratic terms above.

Since f is convex, its minimal value is attained when

r_βf(β) = 0

Derive the expression for the β that satisfies the inequality above.

Implement the algorithm for computing β and use it on a small dataset of your choice. Do not forget about the intercept.

Compare your solution with glmnet (or another standard implementation) using a small example. Are the results the same? Why yes, or no?

Problem 3 [33%]

Using the MNIST dataset, which we used already in Assignment 2, compare whether boosting, bagging, and random forests work the best. You may may want to use only a subset of the data.

Optional: Use xgboost (also available for Python) to see whether the results are better than other boosting methods.

Optional Problem 4 [+10%]

Hint: We did not cover this material in class, but it is important regardless. The slides from March 9th may be helpful when implementing this.

In this problem, you will implement a gradient descent algorithm for solving a linear regression problem.

Recall that the RSS objective in linear regression is:

f(β) = ky − Aβk²₂

Consider the problem of predicting revenue as a function of spending on TV and Radio advertising. There are only 4 data points:

Revenue	TV	Radio

20	3	7
15	4	6
32	6	1
5	1	1

Write down the design matrix of predictors, A, and the response vector, y, for this regression problem. Do not forget about the intercept, which can be modeled as a predictor with a constant value over all data points. The matrix A should be 4 × 3 dimensional.

Express the RSS objective f(β) = ky − Aβk²₂ in terms of linear and quadratic terms.

Derive the gradient r_βf(β) using the linear and quadratic terms above.

Implement a gradient descent method with a fixed step size in R/Python.

Use your implementation of linear regression to solve the simple problem above and on a small dataset of your choice. Compare the solution with linear regression from R or Python (sklearn). Do not forget about the intercept.

Solved–Assignment 6 –Solution

Description

Related products

Homework 3: Mbed Setup Solution

Homework 2: When Harry Met Sally Solution

Homework 1 Solution

Homework 4: A* Pathfinding Solution

Homework 2: Path Network Navigation SOlution