HOMEWORK 1 – V2 Solution

$35.00 $24.00

  Probability and Calculus.   1.1. Variance and covariance – 15 pts.   Let X; Y be two independent random vectors in Rm.   Show that their covariance is zero.   For a constant matrix A 2 Rm m, show the following two properties:   E(X + AY ) = E(X) + AE(Y )   Var(X…

You’ll get a: . zip file solution

 

 
Categorys:

Description

5/5 – (2 votes)

 

  1. Probability and Calculus.

 

1.1. Variance and covariance – 15 pts.   Let X; Y be two independent random vectors in Rm.

 

  • Show that their covariance is zero.

 

  • For a constant matrix A 2 Rm m, show the following two properties:

 

E(X + AY ) = E(X) + AE(Y )

 

Var(X + AY ) = Var(X) + AVar(Y )AT

 

  • Using part (b), show that if X N ( ; ), then AX N (A ; A AT ). Here, you may use the fact that linear transformation of a Gaussian random vector is again Gaussian.

 

1.2. Densities – 10 pts.     Answer the following questions:

 

  • Can a probability density function (pdf) ever take values greater than 1?

 

  • Let X be a univariate normally distributed random variable with mean 0 and variance 1=100. What is the pdf of X?

 

  • What is the value of this pdf at 0?

 

  • What is the probability that X = 0?

 

1.3. Calculus – 10 pts.    Let x; y 2 Rm and A 2 Rm  m. In vector notation, what is

  • the gradient with respect to x of xT y?
  • the gradient with respect to x of xT x?
  • the gradient with respect to x of xT Ax?
  • the gradient with respect to x of Ax?

 

 

2.1. Linear regression – 15 pts.    Suppose that X 2 Rn  m with n          m and Y 2 Rn, and that

j                N                   2                                                                                                                  ^

Y X;                (X ;         I). We know that the maximum likelihood estimate        of      is given by

 

      ^ = (XT X) 1XT Y:  
(a)     ^  
Find the distribution of  , its expectation and covariance matrix.  
(b) Write the log-likelihood implied by the model above, and compute its gradient w.r.t.  .
(c) Assuming that 2 ^ is in
  is known, what is the probability that an individual parameter  i

j ^              j

the -neighborhood of the corresponding entry of the true parameter i, i.e. P( i i )? (Hint: Use Gaussian CDF (t).)

 

 

1

 

 

 

2.2. Ridge regression and MAP – 20 pts. Suppose that we have Y jX; N (X ; 2I) and we place a normal prior on , i.e., N (0; 2I).

(a) Show that the MAP estimate of      given Y in this context is

^                       T                     1     T

MAP = (X X +  I)                X Y

 

where   =      2= 2.

  • Show that ridge regression is equivalent to adding m additional rows to X where the j-th p

additional row has its j-th entry equal to and all other entries equal to zero, adding m corresponding additional entries to Y that are all 0, and and then computing the maximum likelihood estimate of using the modi ed X and Y .

 

2.3. Cross validation – 30 pts. In this problem, you will write a function that performs K-fold cross validation procedure to tune the penalty parameter in Ridge regression. Your cross_validation function will rely on 6 short functions which are de ned below along with their variables.

 

data is a variable and refers to a (y; X) pair (can be test, training, or validation) where y

 

  is the target (response) vector, and X is the feature matrix.
  ^
model is a variable and refers to the coe  cients of the trained model, i.e.   .

data_shf = shuffle_data(data) is a function and takes data as an argument and returns its randomly permuted version along the samples. Here, we are considering a uniformly random permutation of the training data. Note that y and X need to be permuted the same way preserving the target-feature pairs.

 

data_fold, data_rest = split_data(data, num_folds, fold) is a function that takes data, number of partitions as num_folds and the selected partition fold as its arguments and returns the selected partition (block) fold as data_fold, and the remaining data as data_rest. If we consider 5-fold cross validation, num_folds=5, and your function splits

 

the data into 5 blocks and returns the block fold (2 f1; 2; 3; 4; 5g) as the validation fold and the remaining 4 blocks as data_rest. Note that data_rest [ data_fold = data, and data_rest \ data_fold = ;.

model = train_model(data, lambd) is a function that takes data and lambd as its argu-ments, and returns the coe cients of ridge regression with penalty level . For simplicity, you may ignore the intercept and use the expression in question 2.2.

 

predictions = predict(data, model) is a function that takes data and model as its arguments, and returns the predictions based on data and model.

 

error = loss(data, model) is a function which takes data and model as its arguments and returns the average squared error loss based on model. This means if data is composed

of y 2 Rn and X 2 Rn  p, and model is ^, then the return value is ky                     X ^k2=n.

cv_error = cross_validation(data, num_folds, lambd_seq) is a function that takes the training data, number of folds num_folds, and a sequence of ‘s as lambd_seq as its arguments and returns the cross validation error across all ‘s. Take lambd_seq as evenly spaced 50 numbers over the interval (0.02, 1.5). This means cv_error will be a vector of 50 errors corresponding to the values of lambd_seq. Your function will look like:

 

data = shuffle_data(data)

 

for i = 1,2,…,length(lambd_seq)

 

 

2

 

 

lambd = lambd_seq(i)

 

cv_loss_lmd = 0.

 

for fold = 1,2, …,num_folds

 

val_cv, train_cv = split_data(data, num_folds, fold)

 

model = train_model(train_cv, lambd)

 

cv_loss_lmd += loss(val_cv, model)

 

cv_error(i) = cv_loss_lmd / num_folds

 

return cv_error

 

  • Download the dataset from the course webpage dataset.mat and place it in your working di-rectory, or note its location file_path. For example, le path could be /Users/yourname/Desktop/

 

In R: library(R.matlab)

dataset = readMat(‘file_path/dataset.mat’) data.train.X = dataset$data.train.X data.train.y = dataset$data.train.y[1,] data.test.X = dataset$data.test.X data.test.y = dataset$data.test.y[1,]

 

In Python:

 

import scipy.io as sio

 

dataset = sio.loadmat(‘file_path/dataset.mat’) data_train_X = dataset[‘data_train_X’] data_train_y = dataset[‘data_train_y’][0] data_test_X = dataset[‘data_test_X’] data_test_y = dataset[‘data_test_y’][0]

 

  • Write the above 6 functions, and identify the correct order and arguments to do cross validation.

 

  • Find the training and test errors corresponding to each in lambd_seq. This part does not use the cross_validation function but you may nd the other functions helpful.

 

  • Plot training error, test error, and 5-fold and 10-fold cross validation errors on the same plot for each value in lambd_seq. What is the value of proposed by your cross validation procedure? Comment on the shapes of the error curves.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

3