$30.00
Description
Instructions: Solutions to problems 1 and 2 are to be submitted on Quercus (PDF les only). You are strongly encouraged to do problems 3{6 but these are not to be submitted for grading.
1. An interesting variation of rejection sampling is the ratio of uniforms method. We start ^{Z} 1
by taking a bounded function h with h(x) 0 for all x and h(x) dx < 1. We then de ne the region
q
C_{h} = (u; v) : 0 u h(v=u)
and generate (U; V ) uniformly distributed on C_{h}. We then de ne the random variable X = V=U.
(a) The joint density of (U; V ) is

f(u; v) =
1
for (u; v) 2 C_{h}
jC_{h}j
where jC_{h}j is the area of C_{h}. Show that the joint density of (U; X) is

u
for 0 u ^{q}
g(u; x) =
h(x)
h
jC
j
and that the density of X is h(x) for some > 0.

The implementation of this method is somewhat complicated by the fact that it is typically di cult to sample (U; V ) from a uniform distribution on C_{h}. However, it is usually
possible to nd a rectangle of the form D_{h} = f(u; v) : 0 u u_{+}; v v v_{+}g such that C_{h} is contained within D_{h}. Thus to draw (U; V ) from a uniform distribution on C_{h}, we can use rejection sampling where we draw proposals (U ; V ) from a uniform distribution on the rectangle D_{h}; note that the proposals U and V are independent random variables with
Unif(0; u_{+}) and Unif(v 
; v_{+}) distributions, respectively. Show that we can de ne u_{+}, v and 

v_{+} as follows: 

+ ^{=} 
x 
q 
x 
q 
+ 
x 
q 

u 
h(x) v 
h(x): 

max 
h(x) v = min x 
= max x 
(Hint: It su ces to show that if (u; v) 2 C_{h} then (u; v) 2 D_{h} where D_{h} is de ned using u_{+},

, and v_{+} above.)
(c) Implement (in R) the method above for the standard normal distribution taking h(x) =
exp( x^{2}=2). In this case, u_{+} = 1, v = 
q 
= 0:8577639, and v_{+} = ^{q} 

2=e 
2=e = 0:8577639. 

What is the probability that proposals are accepted? 
1
2. Suppose we observe y_{1}; ; y_{n} where
y_{i} = _{i} + “_{i} (i = 1; ; n)
where f”_{i}g is a sequence of random variables with mean 0 and nite variance representing noise. We will assume that _{1}; ; _{n} are smooth in the sense that _{i} = g(i) for some continuous and di erentiable function g. The least squares estimates of _{1}; ; _{n} are trivial

b_{i} = y_{i} for all i  but we can modify least squares in a number of ways to accommodate the \smoothness” assumption on f _{i}g. In this problem, we will consider estimating f _{i}g by
minimizing

n 1

(y_{i i})^{2}
+
( _{i+1} 2 _{i} + _{i} _{1})^{2}
X
X_{i}
i=1
=2
where > 0 is a tuning parameter that controls the smoothness of the estimates b_{1}; ; b_{n}. (This method is known as Whittaker graduation in actuarial science and the HodrickPrescott lter in economics.)

Show that if fy_{i}g are exactly linear, i.e. y_{i} = a i + b for all i then b_{i} = y_{i} for all i.

In principal, we could compute b_{1}; ; b_{n} using ordinary least squares estimation. Show
that b = (b_{1}; ; b_{n})^{T} minimizes
ky X k^{2}
where y is a vector of length 2n 2 and X is an (2n 2) 
n matrix. What are y and X? 


When n is large, computing b_{1}; ; b_{n} directly, for example, using the OLS formulation in part (b) is computationally expensive when n is large. Alternatively, we could use the GaussSeidel algorithm but it converges slowly, particularly for larger values of . One possible alternatively is a randomized modi cation of the GaussSeidel algorithm, which at
each stage solves a p( n) variable least squares problem. The basic algorithm is as follows:


Initialize b.



Randomly sample a subset w of size p from the integers 1; ; n. De ne X_{w} to be the submatrix of X with column indices w and X_{w} to be the submatrix of X with column indices in the complement of w; de ne _{w} and _{w} analogously so that X =

X_{w w} + X_{w w}.


De ne b_{w} to minimize


^{X}w
w
X_{w w}
_{y}
b
with respect to _{w}.
2
3. Repeat steps 1 and 2 until convergence.
Show that the objective function is nonincreasing from one iteration to the next.

On Quercus, there is a function HP in a le HP.txt, which implements the randomized block GaussSeidel algorithm outlined in part (c). This function can be used as follows
> r < HP(x,lambda=2000,p=20,niter=100)
where lambda is the value of , p is the value of p, and niter speci es the number of iterations of the algorithm. The output of the function (contained here in r) consists of two components: the estimates of f g in r$xhat and the values of the objective function for each iteration in r$ss.
Use this function (with = 2000) on data on monthly yields on shortterm British securities over a 21 year period (252 months). Try out various values of p between 5 and 50 (using 1000 iterations). Describe how the objective function decreases with each iteration as p varies between 5 and 50.

(Optional) Methods such as the randomized block GaussSeidel algorithm are essential in problems where the number of unknown parameters is extremely large. The goal is not to minimize the objective function but merely to nd a solution that’s close to the minimum. In the context of the randomized block GaussSeidel algorithm, what factors should you consider in choosing p, the numbers of parameters being optimized at each step? Keep in
mind that for least squares, the number of oating point operations needed increases with p like p^{2}.
3
Supplemental problems (not to hand in):

To generate random variables from some distributions, we need to generate two independent two independent random variables Y and V where Y has a uniform distribution over some nite set and V has a uniform distribution on [0; 1]. It turns out that Y and V can be generated from a single Unif(0; 1) random variable U.
(a) Suppose for simplicity that the nite set is f0; 1; ; n 1g for some integer n 2. For U Unif(0; 1), de ne
Y = bnUc and V = nU Y
where bxc is the integer part of x. Show that Y has a uniform distribution on the set f0; 1; ; n 1g, V has a uniform distribution on [0; 1], and Y and V are independent.
(b) What happens to the precision of V de ned in part (a) as n increases? (For example, if

has 16 decimal digits and n = 10^{6}, how many decimal digits will V have?) Is the method in part (a) particularly feasible if n is very large?

One issue with rejection sampling is a lack of e ciency due to the rejection of random variables generated from the proposal density. An alternative is the acceptancecomplement (AC) method described below.
Suppose we want to generate a continuous random variable from a density f(x) and that
f(x) = f_{1}(x) + f_{2}(x) (where both f_{1} and f_{2} are nonnegative) where f_{1}(x) g(x) for some density function g. Then the AC method works as follows:

Generate two independent random variables U Unif(0; 1) and X with density g.

If U f_{1}(X)=g(X) then return X.

Otherwise (that is, if U > f_{1}(X)=g(X)) generate X from the density

f_{2} (x) =
f_{2}(x)
:
^{Z} ^{1} f_{2}(t) dt
Note that we must be able to easily sample from g and f_{2} in order for the AC method to be e cient; in some cases, they can both be taken to be uniform distributions.

Show that the AC method generates a random variable with a density f. What is the probability that the X generated in step 1 (from g) is \rejected” in step 2?
4
(b) Suppose we want to sample from the truncated Cauchy density
f(x) = 
2 
( 1 x 
1) 

(1 + x^{2}) 

2 
x 
2 

using the AC method with f (x) = k, a constant, for 1 
1 (so that f (x) = 1=2 is a 
uniform density on [ 1; 1]) with
f_{1}(x) = f(x) f_{2}(x) = f(x) k ( 1 x 1):
If g(x) is also a uniform density on [ 1; 1] for what range of values of k can the AC method be applied?
(c) De ning f_{1}, f_{2}, and g as in part (b), what value of k minimizes the probability that X generated in step 1 of the AC algorithm is rejected?

Suppose we want to generate a random variable X from the tail of a standard normal distribution, that is, a normal distribution conditioned to be greater than some constant b. The density in question is

exp( x^{2}=2)
^{f(x) =} ^{p}2 (1 (b))
for x b
with f(x) = 0 for x < b where (x) is the standard normal distribution function. Consider rejection sampling using the shifted exponential proposal density
g(x) = b exp( b(x b)) for x b.
(This proposal density is used by the Monty Python algorithm to sample from the tail of the normal distribution.)

De ne Y be an exponential random variable with mean 1 and U be a uniform random variable on [0; 1] independent of Y . Show that the rejection sampling scheme de nes X =
b + Y =b if
Y ^{2}
2 ln(U) _{b}_{2} :
(Hint: Note that b + Y =b has density g.)
(b) Show the probability of acceptance is given by
p
2 b(1 (b))_{:}
exp( b^{2}=2)
What happens to this probability for large values of b? (Hint: You need to evaluate M = max f(x)=g(x).)
5
(c) Suppose we replace the proposal density g de ned above by
g (x) = exp( (x b)) for x b.
(Note that g is also a shifted exponential density.) What value of maximizes the probability of acceptance? (Hint: Note that you are trying to solve the problem
min max ^{f(x)}
>0 x b g (x)
for . Because the density g (x) has heavier tails, the minimax problem above will have the same solution as the maximin problem
max min 
f(x) 

g (x) 

x b >0 

which may be easier to solve.) 

6. (a) Suppose that E_{1}; E_{2}; 
are independent Exponential random variables with density 

f(x) = exp( x) for x 0. 
Then the distribution of T_{k} 
= E_{1} + + E_{k} is a Gamma 

distribution whose density is 

k_{x}k 
^{1} exp( 
x) 
for x 0. 

g_{k}(x) = 

(k 1)! 

Show that 
k 
^{1} ^{j} exp( ) 

^{P} ^{(T}k ^{1) =} ^{Z}_{1}^{1} ^{g}k^{(x)} ^{dx} ^{=} _{j=0} 
j! 

X 
(Hint: Use integration by parts.)

How might you use the result of part (a) to generate random variables from a Poisson distribution with mean ?
6