Homework 1 Solution


Category: Tag:


Instructions: Solutions to problems 1 and 2 are to be submitted via Quercus and only PDF les will be accepted. (Ideally, you should submit only one le.) You are strongly encouraged to do problems 3 and 4 but these are not to be submitted for grading.

  1. In this problem, you will use the discrete cosine transform (DCT) to denoise an image. An R function to compute a two-dimensional DCT is available in the R package dtt { this package contains functions to compute a number of \trigonometric” transforms. The R function that will be used in this package is mvdct.

  1. In lecture, we de ned matrix transforms for vectors of length n, focusing on families of matrices fAng satisfying ATn An = Dn where Dn is a diagonal matrix.

Suppose that Z is an m n pixel image, which we can represent as an m n matrix. Then using fAng, we can de ne a transform of Z as follows:

Zb = AmZATn

Given the m n matrix Zb, show that we can reconstruct the original image by Z = Dm1ATmZAbnDn 1.

  1. To denoise an image using the DCT, we rst compute Zb and then perform hard- or soft-thresholding to obtain a thresholded (or shrunken) transform Zb (which will typically be \sparser” than Zb). Our hope is that the components of Zb corresponding to noise in the image are eliminated and so the denoised image can be obtained by applying the inverse transform to Zb .

Using mvdct, write R functions to perform both hard- and soft-thresholding (dependent on a parameter ). Your functions should not threshold the (1; 1) component of the DCT matrix. (A simple R function to do hard thresholding will be provided on Quercus; this can be modi ed to do soft thresholding.)

  1. The le boats.txt contains a noisy 256 256 pixel grayscale image of sailboats. Its entries are numbers between 0 and 1 where 0 represents black and 1 represents white. The data can be read into R and displayed as an image as follows:

  • boats <- matrix(scan(“boats.txt”),ncol=256,byrow=T)

  • image(boats, axes=F, col=grey(seq(0,1,length=256)))

Using your functions from part (b), try to denoise the image as best as possible. (This is quite subjective but try di erent methods and parameter values.)

Note: You should not expect your noise-reduced image to be a drastic improvement over noisy image; in fact, connaisseurs of pointillism may nd prefer the noisy image. There are two issues here: First, we are applying noise reduction to the whole image rather than dividing the image into smaller sub-images and applying noise reduction to each sub-image. Second, even very good noise reduction tends to wash out some details, thereby rendering the noise-reduced image less visually appealing.

  1. Suppose that U and V are independent Poisson random variables with means u and v. We then de ne X = U + 2V , which is said to have a Hermite distribution with parameters

u and v. (The Hermite distribution is the distribution of a sum of two correlated Poisson random variables.)

(a) Show that the probability generating function of X is

h i

g(s) = E(sX ) = exp u(s 1) + v(s2 1) :

  1. The distribution of X can, in theory, be obtained exactly in closed-form with exact computation for given u and v somewhat more di cult. However, we can approximate the distribution very well by combining the exact probability generating with the discrete

Fourier transform. The key to doing this is to nd M such that P (X M) is very small so that we can use the discrete Fourier transform to compute (to a very good approximation)

P (X = x) for x = 0; 1; ; M 1.

One approach to determining M is to use the probability generating function of X and Markov’s inequality. Speci cally, if s > 1 we have

P (X M) = P (sX sM )

E(sX )

exp [ u(s 1) + v(s2 1)]




Use this fact to show that for P (X M) , we can take


1) + v(s2 1)

ln( )

M = inf



  1. Given M (which depends on ), the algorithm for determining the distribution of X goes as follows:

    1. Evaluate the probability generating function g(s) at s = sk = exp( 2 k=M) for k = 0; ; M 1; the values of s can be created in R as follows:

> s <- exp(-2*pi*1i*c(0:(M-1))/M)

2. Evaluate P (X = x) by computing the inverse FFT of the sequence fg(sk) : k =

0; ; M 1g:

1 M 1



P (X = x) =

M k=0 exp

2 M k


Write an R function to implement this algorithm where M is determined using the method in part (b) with = 10 5. Use this function to evaluate the distribution of X for the following two cases:

  1. u = 1 and v = 5;

  1. u = 0:1 and v = 2.

Note that you do not need to evaluate the bound M with great precision; for example, a simple approach is to take a discrete set of points S = f1 < s1 < s2 < < skg and de ne


1) + v(s2 1) ln( )

M = min



where = si+1 si and sk are determined graphically (that is, by plotting the appropriate function) so that you are convinced that the value of M is close to the actual in mum.

Supplemental problems (not to hand in):

3. As noted in lecture, catastrophic cancellation in the subtraction x y can occur when x

and y are subject to round-o error. Speci cally, if (x) = x(1 + u) and (y) = y(1 + v) then

(x) (y) = x y + (xu yv)

where the absolute error jxu yvj can be very large if both x and y are large; in some cases,

this error may swamp the object we are trying to compute, namely x y, particularly if

jx yj is relatively small compared to jxj and jyj. For example, if we compute the sample variance using the right hand side of the identity








(xi x)2 =











a combination of round-o errors from the summations and catastrophic cancellation in the subtraction may result in the computation of a negative sample variance! (In older versions of Microsoft Excel, certain statistical calculations were prone to this unpleasant phenomenon.) In this problem, we will consider two algorithms for computing the sample variances that avoid this catastrophic cancellation. Both are \one pass” algorithms, in the sense that we only need to cycle once through the data (as is the case if we use the right hand side of (1)); to use the left hand side of (1), we need two passes since we need to rst compute x before

computing the sum on the left hand side of (1). In parts (a) and (b) below, de ne xk be the sample mean of x1; ; xk and note that

k 1

x = x + x

with x = xn.



x)2 can be computed using the recursion

(a) Show that (xi








= (xi xk)2




k + 1




for k = 1; ; n 1. (This is known as West’s algorithm.)

  1. A somewhat simpler one-pass method replaces x by some estimate x0 and then corrects for the error in estimation. Speci cally, if x0 is an arbitrary number, show that

n n

X(xi x)2 = X(xi x0)2 n(x0 x)2:

i=1 i=1

  1. The key in using the formula in part (b) is to choose x0 to avoid catastrophic cancellation, that is, x0 should be close to x. How might you choose x0 (without rst computing x) to minimize the possibility of catastrophic cancellation? Ideally, x0 should be calculated using

o(n) operations.

(An interesting paper on computational algorithms for computing the variance is \Algorithms for computing the sample variance: analysis and recommendations” by Chan, Golub, and LeVeque; this paper is available on Quercus.)

  1. (a) Suppose that A, B, C, and D are matrices so that AC and BD are both well-de ned. Show that

(AC) (BD) = (A B)(C D)

(Hint: This is easier than it looks | the key is to start with the right hand side of the identity.)

(b) Use the result of part (a) to show that

(A B)1=A1 B1

for invertible matrices A and B.

(c) Suppose that H2k is a 2k 2k Hadamard matrix. Prove the claim given in lecture:



H2k = (I2j 1 H2 I2k j )


(Hint: Use induction, starting with the fact that the identity holds trivially for k = 1.)