$30.00
Description
Instructions: Solutions to problems 1 and 2 are to be submitted on Quercus (PDF les only).
1. Consider the model
Y_{i} = _{i} + “_{i} (i = 1; ; n)
where f”_{i}g is a sequence of random variables with mean 0 and nite variance representing
noise. 
We will assume that _{1}; ; _{n} are dependent or \smooth” in the sense that the 

absolute di erences fj _{i} 
_{i} 
_{1}jg are small for most values of i. Rather than penalizing a lack 

of smoothness by 
( 
2 
1 ^{+} i 
_{2})^{2} (as in Assignment #2), we will consider estimating 

f 
g 
f 
g 
minimizing^{i} 

i 
given data 
_{y}^{P}_{i}i _{by}i 

n 
n 

(y_{i} 
_{i})^{2} + 
j _{i i} _{1}j 
(1) 

^{X}i 
X 

=1 
i=2 

n 
_{i} _{1}j represents the total variation of f _{i}g. 

where > 0 is a tuning parameter and ^{X} j _{i} 
i=2
The resulting estimates b_{1}; ; b_{n}, are sometimes called fusion estimates and are useful if f _{i}g contain \jumps”, that is, _{i} = g(i=n) where g is a smooth function with a small number of discontinuities (i.e. jumps).
The nondi erentiable part of the objective function in (1) can be made separable by de ning _{i} = _{i i} _{1} for i = 2; ; n and then minimizing

n
n
(y_{i i})^{2} +
j _{i}j
(2)
X_{i}
X
=1
i=2
where now each _{i} (for i = 2; ; n) will be a function of _{1}; _{2}; ; _{i}. The representation of the objective function in (2) can be used to compute the parameter estimates using coordinate descent although there is must faster algorithm. However, (2) is useful for deriving properties of the estimates.

Show that _{k} = _{1} + ^{P}^{k}_{i=2} _{i} for k 2.

Show that if b_{1}; ; b_{n} minimize (1) (or (2)) then
n
X
(y_{i} b_{i}) = 0:
i=1
(Hint: Use the representation (2) and compute its partial derivative with respect to _{1}.)
1
(Hint: Show that 

@ 
^{(} j _{i i} _{1}j^{)} [ 2 ; 2 ]^{n} 

^{X}i 

=2 
for any _{1}; ; _{n}.)
(d) For su ciently large, we will have b_{1} = = b_{n} = y or equivalently b_{2} = = b_{n} = 0. How large must be in order to have b_{1} = = b_{n} = y? (Hint: Look at the subgradient of (2) with respect to ( _{1}; _{2}; ; _{n}); when is (0; 0; ; 0) an element of this subgradient at (y; 0; ; 0)?)
Note: An R function tvsmooth is available on Quercus in a le tvsmooth.txt. You may nd it useful to simulate data from a discontinous function with additive noise and estimate the function using tvsmooth to gain some insight into this method.
2. Suppose that X_{1}; ; X_{n} are sampled from the following truncated Poisson distribution:
P (X_{i} = x) = 
exp( 
) ^{x} 
for x = r + 1; r + 2; 

x! (r) 

for some integer r 0 where 

(r) = 
1 
exp( 
) ^{x} 
= 1 
r 
exp( ) ^{x} _{:} 

X 
X 

x=r+1 
x! 
x=0 
x! 

Such a sample might arise if we were sampling from a Poisson population but were unable to observe data less than or equal to r.
The EM algorithm can be employed to estimate from the observed X_{1}; ; X_{n}. The key is to think of the observed data as a subset of some larger (\complete”) data set X_{1}; ; X_{n}; X_{n+1}; ; X_{n+M} where M 0 is a random variable and X_{n+1}; ; X_{n+M} r; given M = m, this complete data set is now assumed to be m + n independent observations from a Poisson distribution with mean . The loglikelihood for the complete data is
n+m 

ln L( ) = ln( ) 
^{X}i 

x_{i} 
(n + m) ; 

=1 

n+m 

which depends on two unknowns 
i=^{X} 
x_{i} and m. To use the EM algorithm, we need to 

n+1 

estimate these two unknowns. 

(a) The probability distribution of M is 
(r))^{m} (r)^{n} for m = 0; 1; 2; 

P (M = m) = ^{n} ^{+} _{m} 
1 
^{!}(1 

m 
2
(r))= (r). 

(b) Show that 

_{E} ^{0} n+M 
X_{i} 
X_{1} 
= x_{1}; 
; X_{n} = x_{n} 
^{1} = E (M)E (X_{i} X_{i} 
r): 

i=n+1 
j 

X 
A 

@ 
(Hint: Note that (a) X_{n+1}; ; X_{n+M} are independent of X_{1}; ; X_{n} and (b) X_{n+1}; ; X_{n+M} r.)

Consider the data given in the table below. They represent the accident claims submitted to La Royale Belge Insurance Company during a single year. A crude model for the number of claims submitted for a given policy is Poisson. However, the data below does not provide the number of policies for which no claims were submitted. We want to estimate as well as to impute (estimate) the number M of policies with no claims.

Number of claims
1
2
3
4
5
6
7
Number of policies
1317
239
42
14
4
4
1
Assume a truncated Poisson model for these data taking r = 0 and estimate as well as M using the EM algorithm (which in this case has a particularly simple form). Do you think the truncated Poisson model is useful for these data? (For example, do you think your estimate of M is reasonable?)
3