Your cart is currently empty!
Probability and Calculus. 1.1. Variance and covariance – 15 pts. Let X; Y be two independent random vectors in Rm. Show that their covariance is zero. For a constant matrix A 2 Rm m, show the following two properties: E(X + AY ) = E(X) + AE(Y ) Var(X +…
1.1. Variance and covariance – 15 pts. Let X; Y be two independent random vectors in Rm.
E(X + AY ) = E(X) + AE(Y )
Var(X + AY ) = Var(X) + AVar(Y )AT
1.2. Densities – 10 pts. Answer the following questions:
1.3. Calculus – 10 pts. Let x; y 2 Rm and A 2 Rm m. In vector notation, what is
2.1. Linear regression – 15 pts. Suppose that X 2 Rn m with n m and Y 2 Rn, and that
j N 2 ^
Y X; (X ; I). We know that the maximum likelihood estimate of is given by
^ = (XT X) 1XT Y: | ||||
(a) | ^ | |||
Find the distribution of , its expectation and covariance matrix. | ||||
(b) | Write the log-likelihood implied by the model above, and compute its gradient w.r.t. . | |||
(c) | Assuming that | 2 | ^ | is in |
is known, what is the probability that an individual parameter i |
j ^ j
the -neighborhood of the corresponding entry of the true parameter i, i.e. P( i i )? (Hint: Use Gaussian CDF (t).)
1
2.2. Ridge regression and MAP – 20 pts. Suppose that we have Y jX; N (X ; 2I) and we place a normal prior on , i.e., N (0; 2I).
(a) Show that the MAP estimate of given Y in this context is
^ T 1 T
MAP = (X X + I) X Y
where = 2= 2.
additional row has its j-th entry equal to and all other entries equal to zero, adding m corresponding additional entries to Y that are all 0, and and then computing the maximum likelihood estimate of using the modi ed X and Y .
2.3. Cross validation – 30 pts. In this problem, you will write a function that performs K-fold cross validation procedure to tune the penalty parameter in Ridge regression. Your cross_validation function will rely on 6 short functions which are de ned below along with their variables.
data is a variable and refers to a (y; X) pair (can be test, training, or validation) where y
is the target (response) vector, and X is the feature matrix. | |
^ | |
model is a variable and refers to the coe cients of the trained model, i.e. . |
data_shf = shuffle_data(data) is a function and takes data as an argument and returns its randomly permuted version along the samples. Here, we are considering a uniformly random permutation of the training data. Note that y and X need to be permuted the same way preserving the target-feature pairs.
data_fold, data_rest = split_data(data, num_folds, fold) is a function that takes data, number of partitions as num_folds and the selected partition fold as its arguments and returns the selected partition (block) fold as data_fold, and the remaining data as data_rest. If we consider 5-fold cross validation, num_folds=5, and your function splits
the data into 5 blocks and returns the block fold (2 f1; 2; 3; 4; 5g) as the validation fold and the remaining 4 blocks as data_rest. Note that data_rest [ data_fold = data, and data_rest \ data_fold = ;.
model = train_model(data, lambd) is a function that takes data and lambd as its argu-ments, and returns the coe cients of ridge regression with penalty level . For simplicity, you may ignore the intercept and use the expression in question 2.2.
predictions = predict(data, model) is a function that takes data and model as its arguments, and returns the predictions based on data and model.
error = loss(data, model) is a function which takes data and model as its arguments and returns the average squared error loss based on model. This means if data is composed
of y 2 Rn and X 2 Rn p, and model is ^, then the return value is ky X ^k2=n.
cv_error = cross_validation(data, num_folds, lambd_seq) is a function that takes the training data, number of folds num_folds, and a sequence of ‘s as lambd_seq as its arguments and returns the cross validation error across all ‘s. Take lambd_seq as evenly spaced 50 numbers over the interval (0.02, 1.5). This means cv_error will be a vector of 50 errors corresponding to the values of lambd_seq. Your function will look like:
data = shuffle_data(data)
for i = 1,2,…,length(lambd_seq)
2
lambd = lambd_seq(i)
cv_loss_lmd = 0.
for fold = 1,2, …,num_folds
val_cv, train_cv = split_data(data, num_folds, fold)
model = train_model(train_cv, lambd)
cv_loss_lmd += loss(val_cv, model)
cv_error(i) = cv_loss_lmd / num_folds
return cv_error
In R: library(R.matlab)
dataset = readMat(‘file_path/dataset.mat’) data.train.X = dataset$data.train.X data.train.y = dataset$data.train.y[1,] data.test.X = dataset$data.test.X data.test.y = dataset$data.test.y[1,]
In Python:
import scipy.io as sio
dataset = sio.loadmat(‘file_path/dataset.mat’) data_train_X = dataset[‘data_train_X’] data_train_y = dataset[‘data_train_y’][0] data_test_X = dataset[‘data_test_X’] data_test_y = dataset[‘data_test_y’][0]
3