Solved–Assignment 4– Solution

$25.00 $14.00

Instructions: Solutions to problems 1 and 2 are to be submitted on Quercus (PDF les only). 1. Consider the model Yi = i + “i (i = 1; ; n) where f”i g is a sequence of random variables with mean 0 and nite variance representing noise. As in Assignment #2, we will assume that…

You’ll get a: . zip file solution

 

 
Categorys:

Description

5/5 – (2 votes)

Instructions: Solutions to problems 1 and 2 are to be submitted on Quercus (PDF les only).

1. Consider the model
Yi = i + “i (i = 1; ; n)

where f”i g is a sequence of random variables with mean 0 and nite variance representing noise. As in Assignment #2, we will assume that 1; ; n are dependent or \smooth” in

the sense that the absolute di erences fj i i 1jg are small for most values of i. Rather
than penalizing a lack of smoothness by i( i 2 i 1 + i 2)2 (as in Assignment #2), we
f g f y g
will consider estimating i given data Pi by minimizing
n n i 1j
(yi i)2 + j i (1)
Xi X
=1 i=2

where > 0 is a tuning parameter. The resulting estimates b1; ; bn, are called fusion estimates and are useful if f i g contain \jumps”, that is, i = g(i=n) where g is a smooth function with a small number of discontinuities (i.e. jumps).

The Gauss-Seidel algorithm used in Assignment #2 is e ectively a coordinate descent algo-rithm. However, in computing the fusion estimates minimizing (1), application of the coor-dinate descent algorithm is frustrated to some extent by the fact that the non-di erentiable penalty in (1) is not separable. However, this can be easily xed by de ning i = i i 1 for i = 2; ; n and then minimizing

n n
(yi i)2 + j ij (2)
Xi X
=1 i=2

where now each i (for i = 2; ; n) will be a function of 1; 2; ; i.

(a) Show that k = 1 + Pki=2 i for k 2.

(b) For xed values of 2; ; n, show that the objective function is minimized at

1 n 0 i 1
Xi @ X A
1 = n =1 yi j :
j=2

1
(c) If we x the values of 1 and f i : 2 i 6= j ng, show that the subgradient of the objective function (2) with respect to j is
2 n yi1 i k! + @j j j = 2(n j + 1) j 2 n 0yi 1 i k1 + @j j j
X X X @ k X A
i=j k=2 i=j =2;k=j

where
8 +1 if j > 0
>
>
< @ j j j = [ 1; 1] if j = 0 >
>
: 1if j < 0 and hence the objective function is minimized over j for xed values of 1 and f i : 2 i 6= j ng at 0 1 n i j = 0 if i=j yi 1 k=2;k=j k 2 X X6 @ A n i n i j = 1 8 0yi 1 k1 9 if 0yi 1 k 1 >
n j + 1 2 i=j 2
<i=j k=2;k=j = k=2;k=j
: X @ X A ; X @ X A
1 n i 6 n i 6
j = 8 0yi 1 k1 + 9 if 0yi 1 k 1 <
n j + 1 i=j 2
<i=j k=2;k=j 2 = k=2;k=j : X @ X A ; X @ X A 6 6 (d) [BONUS – but highly recommended!] Write a function in R implementing the coordinate descent algorithm suggested in parts (b) and (c) and test it on the data generated as fol-lows: Test your function (for various tuning parameters) on data generated by the following command: > y <- c(rep(0,250),rep(1,250),rep(0,50),rep(1,450)) + rnorm(1000,0,0.1)

(Convergence of the coordinate descent algorithm here is very slow but you should be able to produce good estimates of the underlying function.)

2. On Quercus, there is a le (buffsnow.txt) on seasonal snowfall totals in Bu alo, NY over 134 years up to 2017-18.

A possible model for these data is a mixture of normals; that is, the density of the data is

m
X
f(x) = kfk(x)

k=1

where fk is the density of a normal distribution with unknown mean and variance k and k2 respectively. f kg are non-negative with 1 + + m = 1.

2
(a) The \complete” data likelihood here is

n m
L( 1; ; m; 12; ; m2; 1; ; m) = Y Y ffk(xi)uik ukik g
i=1 k=1

where uik = 0 or 1 with ui1 + + uim = 1 for i = 1; ; n. Show that the complete data MLEs are

k = n uik ! 1 n uikxi
b Xi X
n 1 n
=1 ! i=1
k2 = Xi uik X uik(xi k)2
b 1 b
n i=1
b =1
Xi
k = uik:
n
=1

For the EM algorithm, we need to estimate the unobserved fuikg; for given f kg and ffkg (which depend on the means and variances), these estimates are

kfk(xi)
ubik = 1f1(xi) + + mfm(xi):

(You do not need to prove this.)

(b) Write an R function that uses the EM algorithm to compute the MLEs of 1; ; m, 1; ; m and 12; ; m2. Your function should take as inputs initial estimates of these unknown parameters. (A template function is provided.)

(c) Using your function, estimate the parameters in the normal mixture model with m = 2 and m = 3. The success of the EM algorithm for mixture models depends on having reasonable initial estimates of the parameters. One simple ad hoc approach is to use a kernel density estimate where the bandwidth parameter is varied so that the estimate has the appropriate number of modes (corresponding to the di erent components). This can be done using the R function density | use the default kernel (which is a Gaussian kernel) and change the bandwidth using the parameter bw; for example, if the data are in snow then
plot(density(snow,bw=5)) will give a plot of the density estimate for a bandwidth of 5.

Decreasing the bandwidth will produce more modes in the kernel density estimate.

(d) Which of the two models considered in part (c) do you prefer and why?

3