$30.00
Description
Instructions: Solutions to problems 1 and 2 are to be submitted via Quercus and only PDF les will be accepted. You are strongly encouraged to do problems 3 and 4 but these are not to be submitted for grading.
1. Suppose that Z is an m n pixel image where m = 2^{k} and n = 2^{‘}. If H_{m} and H_{n} are m m and n n Hadamard matrices then we can de ne the WalshHadamard tranform of

by
Zb = H_{m}ZH_{n}:
In this problem, we will explore using this WalshHadamard transform to denoise an image.

Show that Z = H_{m}ZHb_{n}=(mn). (This gives the inverse WalshHadamard transform.)

To denoise an image using the WalshHadamard transform, we rst compute Zb and then perform hard or softthresholding to obtain a thresholded (or shrunken) transform Zb (which will typically be \sparser” than Zb). Our hope is that the components of Zb corresponding to noise in the image are eliminated and so the denoised image can be obtained by applying the inverse transform to Zb .
Using an R function fwht2d (available on Quercus) to compute the WalshHadamard transform, write R functions to perform both hard and softthresholding (dependent on a parameter ).

The le boats.txt contains a noisy 256 256 pixel grayscale image of sailboats. Its entries are numbers between 0 and 1 where 0 represents black and 1 represents white. The data can be read into R and displayed as an image as follows:

boats < matrix(scan(“boats.txt”),ncol=256,byrow=T)

image(boats, axes=F, col=grey(seq(0,1,length=256)))
Using your functions from part (b), try to denoise the image as best as possible. (This is quite subjective but try di erent methods and parameter values.)
Note: You should not expect your noisereduced image to be a drastic improvement over noisy image; in fact, connaisseurs of pointillism may nd prefer the noisy image. There are two issues here: First, we are applying noise reduction to the whole image rather than dividing the image into smaller subimages and applying noise reduction to each subimage. Second, even very good noise reduction tend to wash out ner details, thereby rendering the noisereduced image less visually appealing.

Suppose that X_{1}; X_{2}; is an in nite sequence of independent identically distributed random variables with some distribution function F and N is a Poisson distribution with mean that is independent of the sequence fX_{i}g. Then we can de ne a compound Poisson
random variable Y by
N
X
Y = X_{i}
i=1
where Y = 0 if N = 0. This distribution arises naturally in risk theory and insurance { for example, if N represents the number of claims and X_{1}; X_{2}; the amounts paid for each claim then Y is the total sum paid. For the purposes of risk management, it is useful to know the distribution of Y , particularly its tail.

Suppose that fX_{i}g are discrete integervalued random variables with probability generating function (s) = E(s^{X}i ). Show that the probability generating function of Y is g( (s)) where g(s) is the probability generating function of N, which is given by
g(s) = exp( (1 s)):

The distribution of Y can be approximated using the Fast Fourier Transform. Assume that the random variables fX_{i}g have a distribution p(x) on the integers 0; 1; ; ‘. The complication is that, unlike the distribution of S = X_{1} + + X_{n}, the distribution of Y is not concentrated on a nite set of integers. Therefore, we need to nd an integer M such
that P (Y M) is smaller than some predetermined threshold ; M will depend on the Poisson mean as well as the integer ‘ (or more precisely, the distribution p(x)). (Also to optimize the FFT algorithm, M should be a power of 2 although this isn’t absolutely
necessary unless M is very large.) Show that if P (N m) then P (Y m‘) and so we can take M m‘.

The bound M determined in part (b) is typically very conservative and can be decreased substantially. One approach to determining a better bound is based on the probability
generating function of Y derived in part (a) and Markov’s inequality. Speci cally, if s > 1 we have
P (Y M) = P (s^{Y} s^{M} ) 
E(s^{Y} ) 
exp( 
(1 
(s))) 

= 
: 

s^{M} 
s^{M} 

Use this fact to show that for P (Y M) < , we can take 

M = inf 
ln( ) 
(1 
(s)) 
: 

ln(s) 

s>1 

Given M (which depends on ), the algorithm for determining the distribution of Y goes as follows:
1. Evaluate the DFT of fp(x) : x = 0; ; M 
1g: 

M 1 
j 

p(j) = 
_{x=0} exp 
p(x) 

2 _{M} x 

b 
X 

where p(‘ + 1) = = p(M 1) = 0.
2. Evaluate g(pb(j)) = exp( (1 pb(j))) for j = 0; ; M 1.
3. Evaluate P (Y = y) by computing the inverse FFT:

P (Y = y) =
_{1} M 1
exp
y
g(p(j))
^{M} j=0
2 _{M} j
X
b
Write an R function to implement this algorithm where M is determined using the method in part (c) with = 10 ^{5}. Use this function to evaluate the distribution of Y in the case where = 3 and the distribution of fX_{i}g is

11
x
p(x) = P (X_{i} = x) =
for x = 0; ; 10.
66
(You do not need to evaluate the bound M from part (c) with great precision; for example, a simple approach is to take a discrete set of points S = f1 < s_{1} < s_{2} < < s_{k}g and de ne
ln( ) (1 (s))
M = min
s2S ln(s)
where = s_{i+1} s_{i} and s_{k} are determined graphically (that is, by plotting the appropriate function) so that you are convinced that the value of M is close to the actual in mum.
On Blackboard, there is a paper by Embrechts and Frei \Panjer recursion versus FFT for compound distributions”, which (not surprisingly) compares the FFT approach to alternative approach (Panjer recursion). This paper includes some R code for the FFT algorithm described above and discusses some modi cations to the basic FFT algorithm { feel free to use this R code as a template for your function. (Feel free to explore some of the other aspects of this paper for your own edi cation.)
Supplemental problems (not to hand in):

An alternative to the Quicksort algorithm is Mergesort (see p. 122 of the text). The Mergesort algorithm works by merging pairs of ordered lists. If the two lists contain r and
s elements, respectively, then the number of comparisons needed to merge the two lists into single list lies between min(r; s) and r + s 1.
De ne C(n) to be the (random) number of comparisons in the Mergesort algorithm. It can be shown that the expected number of comparisons, E[C(n)] satis es the recursive formula
E[C(n)] = E(Z_{n}) + E[C(bn=2c)] + E[C(dn=2e)]
where bn=2c = dn=2e = n=2 if n is even with bn=2c = (n 
1)=2, dn=2e = (n + 1)=2 if n is 

odd, and Z_{n} is a random variable whose distribution is 

P (Z_{n} = k) = 
bn=2c 
1 _{1}^{!} 
+ 
dn=2e 
1 
1 
! 
for 
n=2 k 
n 1 

k 
k 

b 
c 

n=2 
! 

b 
n 

c 

and whose expected value is 

E(Z_{n}) = bn=2cdn=2e 
1 
+ 
1 
! 

b 
n=2 
+ 1 
d 
n=2 + 1 

c 
e 
(Z_{n} represents the number of comparisons needed to merge two ordered lists of lengths bn=2c and dn=2e, respectively.)
(a) In class, we derived the following recursive formula for E[C(n)] for the Quicksort algo
rithm: 
n 

2 

_{k}X 

E[C(n)] = n 1 + 
n 
E[C(k 1)] 

=1 

where E[C(0)] = E[C(1)] = 0. Using R, compute E[C(n)] for n = 1; ; 1000 and compare the average performance of Quicksort to that of Mergesort.
(b) Calculate the worst case behaviour of C(n) for Mergesort.
4. As noted in lecture, catastrophic cancellation in the subtraction x y can occur when x
and y are subject to roundo error. Speci cally, if (x) = x(1 + u) and (y) = y(1 + v) then
(x) (y) = x y + (xu yv)
where the absolute error jxu yvj can be very large if both x and y are large; in some cases, this error may swamp the object we are trying to compute, namely x y, particularly if
jx yj is relatively small compared to jxj and jyj. For example, if we compute the sample variance using the right hand side of the identity

n
n
n
n
!
2
(1)
(x_{i} x)^{2} =
x_{i}^{2}
^{x}i
;
X
X
1
X_{i}
i=1
i=1
=1
a combination of roundo errors from the summations and catastrophic cancellation in the subtraction may result in the computation of a negative sample variance! (In older versions of Microsoft Excel, certain statistical calculations were prone to this unpleasant phenomenon.) In this problem, we will consider two algorithms for computing the sample variances that avoid this catastrophic cancellation. Both are \one pass” algorithms, in the sense that we only need to cycle once through the data (as is the case if we use the right hand side of (1)); to use the left hand side of (1), we need two passes since we need to rst compute x before computing the sum on the left hand side of (1). In parts (a) and (b) below, de ne x_{k} be the sample mean of x_{1}; ; x_{k} and note that
k 1
^{x}k+1 ^{=} _{k} _{+ 1}^{x}k ^{+} _{k} _{+ 1}^{x}k+1
with x = x_{n}.
n 

X_{i} 
x)^{2} can be computed using the recursion 

(a) Show that (x_{i} 

=1 

k+1 
k 
k 

X_{i} 
(x_{i} 
x_{k+1})^{2} 
= (x_{i} x_{k})^{2} 
+ 
^{(x}k+1 
x_{k})^{2} 

k + 1 

X 

=1 
i=1 
for k = 1; ; n 1. (This is known as West’s algorithm.)

A somewhat simpler onepass method replaces x by some estimate x_{0} and then corrects for the error in estimation. Speci cally, if x_{0} is an arbitrary number, show that
n
n
X
(xi
x)2 =
X
(xi
x0)2
n(x0
x)2:
i=1
i=1

The key in using the formula in part (b) is to choose x_{0} to avoid catastrophic cancellation, that is, x_{0} should be close to x. How might you choose x_{0} (without rst computing x) to minimize the possibility of catastrophic cancellation? Ideally, x_{0} should be calculated using
o(n) operations.
(An interesting paper on computational algorithms for computing the variance is \Algorithms for computing the sample variance: analysis and recommendations” by Chan, Golub, and LeVeque; this paper is available on Quercus.)