Name: HOMEWORK #1 – V2 Solution
SKU: 3536
Price: 35.00 USD
Availability: InStock

Description

5/5 – (2 votes)

Probability and Calculus.

1.1. Variance and covariance – 15 pts. Let X; Y be two independent random vectors in R^m.

Show that their covariance is zero.

For a constant matrix A 2 R^{m m}, show the following two properties:

E(X + AY ) = E(X) + AE(Y )

Var(X + AY ) = Var(X) + AVar(Y )A^T

Using part (b), show that if X N ( ; ), then AX N (A ; A A^T ). Here, you may use the fact that linear transformation of a Gaussian random vector is again Gaussian.

1.2. Densities – 10 pts. Answer the following questions:

Can a probability density function (pdf) ever take values greater than 1?

Let X be a univariate normally distributed random variable with mean 0 and variance 1=100. What is the pdf of X?

What is the value of this pdf at 0?

What is the probability that X = 0?

1.3. Calculus – 10 pts. Let x; y 2 R^m and A 2 R^{m m}. In vector notation, what is

the gradient with respect to x of x^T y?
the gradient with respect to x of x^T x?
the gradient with respect to x of x^T Ax?
the gradient with respect to x of Ax?

2.1. Linear regression – 15 pts. Suppose that X 2 R^{n m} with n m and Y 2 Rⁿ, and that

j N ² ^{^}

Y X; (X ; I). We know that the maximum likelihood estimate of is given by

			^ _{= (X}^T _X) ¹_X^T _Y:
(a)			^
(a)	Find the distribution of , its expectation and covariance matrix.
(b)	Write the log-likelihood implied by the model above, and compute its gradient w.r.t. .
(c)	Assuming that	2	^	is in
(c)	Assuming that		is known, what is the probability that an individual parameter _i	is in

j ^{^} j

the -neighborhood of the corresponding entry of the true parameter _i, i.e. P( _{i i} )? (Hint: Use Gaussian CDF (t).)

2.2. Ridge regression and MAP – 20 pts. Suppose that we have Y jX; N (X ; ²I) and we place a normal prior on , i.e., N (0; ²I).

(a) Show that the MAP estimate of given Y in this context is

^{^} T 1 T

_MAP = (X X + I) X Y

where = ²= ².

Show that ridge regression is equivalent to adding m additional rows to X where the j-th p

additional row has its j-th entry equal to and all other entries equal to zero, adding m corresponding additional entries to Y that are all 0, and and then computing the maximum likelihood estimate of using the modi ed X and Y .

2.3. Cross validation – 30 pts. In this problem, you will write a function that performs K-fold cross validation procedure to tune the penalty parameter in Ridge regression. Your cross_validation function will rely on 6 short functions which are de ned below along with their variables.

data is a variable and refers to a (y; X) pair (can be test, training, or validation) where y

	is the target (response) vector, and X is the feature matrix.
	^
	model is a variable and refers to the coe cients of the trained model, i.e. .

data_shf = shuffle_data(data) is a function and takes data as an argument and returns its randomly permuted version along the samples. Here, we are considering a uniformly random permutation of the training data. Note that y and X need to be permuted the same way preserving the target-feature pairs.

data_fold, data_rest = split_data(data, num_folds, fold) is a function that takes data, number of partitions as num_folds and the selected partition fold as its arguments and returns the selected partition (block) fold as data_fold, and the remaining data as data_rest. If we consider 5-fold cross validation, num_folds=5, and your function splits

the data into 5 blocks and returns the block fold (2 f1; 2; 3; 4; 5g) as the validation fold and the remaining 4 blocks as data_rest. Note that data_rest [ data_fold = data, and data_rest \ data_fold = ;.

model = train_model(data, lambd) is a function that takes data and lambd as its argu-ments, and returns the coe cients of ridge regression with penalty level . For simplicity, you may ignore the intercept and use the expression in question 2.2.

predictions = predict(data, model) is a function that takes data and model as its arguments, and returns the predictions based on data and model.

error = loss(data, model) is a function which takes data and model as its arguments and returns the average squared error loss based on model. This means if data is composed

of y ₂ Rn and X ₂ Rn p, and model is ^, then the return value is _ky X ^_k2=n.

cv_error = cross_validation(data, num_folds, lambd_seq) is a function that takes the training data, number of folds num_folds, and a sequence of ‘s as lambd_seq as its arguments and returns the cross validation error across all ‘s. Take lambd_seq as evenly spaced 50 numbers over the interval (0.02, 1.5). This means cv_error will be a vector of 50 errors corresponding to the values of lambd_seq. Your function will look like:

data = shuffle_data(data)

for i = 1,2,…,length(lambd_seq)

lambd = lambd_seq(i)

cv_loss_lmd = 0.

for fold = 1,2, …,num_folds

val_cv, train_cv = split_data(data, num_folds, fold)

model = train_model(train_cv, lambd)

cv_loss_lmd += loss(val_cv, model)

cv_error(i) = cv_loss_lmd / num_folds

return cv_error

Download the dataset from the course webpage dataset.mat and place it in your working di-rectory, or note its location file_path. For example, le path could be /Users/yourname/Desktop/

In R: library(R.matlab)

dataset = readMat(‘file_path/dataset.mat’) data.train.X = dataset$data.train.X data.train.y = dataset$data.train.y[1,] data.test.X = dataset$data.test.X data.test.y = dataset$data.test.y[1,]

In Python:

import scipy.io as sio

dataset = sio.loadmat(‘file_path/dataset.mat’) data_train_X = dataset[‘data_train_X’] data_train_y = dataset[‘data_train_y’][0] data_test_X = dataset[‘data_test_X’] data_test_y = dataset[‘data_test_y’][0]

Write the above 6 functions, and identify the correct order and arguments to do cross validation.

Find the training and test errors corresponding to each in lambd_seq. This part does not use the cross_validation function but you may nd the other functions helpful.

Plot training error, test error, and 5-fold and 10-fold cross validation errors on the same plot for each value in lambd_seq. What is the value of proposed by your cross validation procedure? Comment on the shapes of the error curves.

HOMEWORK #1 – V2 Solution

Description

Related products

Homework 1 Solution

Homework 5: Minion Agents Solution

Homework 4: A* Pathfinding Solution

Homework 2: Path Network Navigation SOlution

Project 1A: Transformation Matrices Solution