Solved-Homework 1- Solution

$30.00 $19.00

Instructions. Homework is due Tuesday, February 5, at 11:59pm; no late homework accepted. Everyone must submit individually at gradescope under hw1 and hw1code. The \written” submission at hw1 must be typed, and submitted in any format gradescope accepts (to be safe, submit a PDF). You may use LATEX, markdown, google docs, MS word, whatever you…

You’ll get a: . zip file solution

 

 
Categorys:
Tags:

Description

5/5 – (2 votes)

Instructions.

Homework is due Tuesday, February 5, at 11:59pm; no late homework accepted. Everyone must submit individually at gradescope under hw1 and hw1code.

The \written” submission at hw1 must be typed, and submitted in any format gradescope accepts (to be safe, submit a PDF). You may use LATEX, markdown, google docs, MS word, whatever you like; but it must be typed!

When submitting at hw1, gradescope will ask you to mark out boxes around each of your answers; please do this precisely!

Please make sure your NetID is clear and large on the rst page of the homework.

Your solution must be written in your own words. Please see the course webpage for full academic integrity information. Brie y, you may have high-level discussions with at most 3 classmates, whose NetIDs you should place on the rst page of your solutions, and you should cite any external reference you use; despite all this, your solution must be written in your own words.

We reserve the right to reduce the auto-graded score for hw1code if we detect funny business (e.g., rather than implementing an algorithm, you keep re-submitting the assignment to the auto-grader, eventually completing a binary search for the answers).

In this assignment, for all code unless otherwise speci ed please return your output as a NumPy array.

  1. Decision Trees.

Consider the training data given in Figure 1.

4

3

2

1

0

1

2

3

4

Figure 1: Training data

  1. Describe a decision tree of depth one with integral and axis-aligned decision boundaries for this data set, with training error at most 16 = :166 : : : .

  1. Describe a decision tree with integral and axis-aligned decision boundaries for this data set, with zero training error.

  1. Describe a decision tree with integral and axis-aligned decision boundaries for this data set, with zero training error such that the testing error rate is large when given the testing set of Figure 2.

5

4

3

2

1

0 0

1

2

3

4

5

Figure 2: Testing data

Solution. (Your solution here.)

2. Linear Regression.

Recall that the empirical risk in the linear regression method is de ned as

b

(w) :=

n

(w x

y )2=n,

where xi 2 Rd is a data point and yi is an associated label.

R

Pi=1

> i

i

  1. Implement the linear regression method using gradient descent in linear gd(X, Y, lrate, num iter) function in hw1.py. You are given a training set X as input and training labels Y as input along with a learning rate lrate and maximum number of iterations num iter. Using gradient descent nd parameters w that minimize the empirical risk Rb(w). One iteration is equivalent to one full data gradient update step. Use a learning rate of lrate and only run for num iter iterations. Use w = 0 as your initial parameters, and return your parameters w as output. (Note: gradient descent will be covered in lecture 5.)

  1. Implement linear regression by setting the gradient to zero and solving for the variables, in linear normal(X,Y) function in hw1.py. You are given a training set X as input and training labels Y as input. (Lectures 3-4 give a few ways to get an answer here.) Return your parameters w as output.

  1. Implement the plot linear() function in hw1.py. Use the provided function utils.load reg data() to generate a training set X and training labels Y. Plot the curve generated by linear normal() along with the points from the data set. Return the plot as output. Include the plot in your written submission.

Solution. (Your solution here.)

  1. Singular Value Decomposition.

Recall, as detailed in lecture 3, that for every matrix A 2 Rn d, there are matrices U 2 Rn n, S 2 Rn d and V 2 Rd d such that A = U SV > and S is a diagonal matrix and U and V are orthonormal matrices, that is U 1 = U> and V 1 = V > (i.e. its inverse is equal to its transpose). (A convenient alternative

notation, as discussed in lecture, is A = Pr siuivi>.)

i=1

  1. Let A 2 Rn n be a square matrix and consider its singular value decomposition U SV > with values s1; : : : ; sn on the diagonal of S. Show that A is invertible if and only if si 6= 0 for all i 2 f1; : : : ; ng.

  1. Show that for all A 2 Rn d and all positive 2 R>0, the matrix (A>A) + I is invertible.

  1. Prove that lim #0(A>A + I) 1A> ! A+, where A+ is to show that every entry of the matrix (A>A + I) in A+ as vanishes.

is the pseudoinverse of the matrix A. That 1A> converges to the corresponding entry

Solution. (Your solution here.)

4. Polynomial Regression.

In problem 2 you constructed a linear model w>x = Pd xiwi. In this problem you will use

i=1

the same setup as in the previous problem, but enhance your linear model by doing a quadratic expansion of the features. Namely, you will construct a new linear model fw with parameters (w0; w01; : : : ; w0d; w11; w12; : : : ; w1d; w22; w23; : : : ; w2d; : : : ; wdd) de ned:

d

d

X

X

fw(x) = w> (x) = w0 + w0ixi +

wijxixj

i=1

i j

  1. Given a 3-dimensional feature vector x = (x1; x2; x3) completely write out the quadratic expanded feature vector (x).

  1. Implement the poly gd() function in hw1.py. The input is in the same format as it was in problem 2. Use this training set to implement gradient descent to determine the parameters w. Use w = 0 as your initial parameters. Return your w parameters as output. Please return your parameters in the exact order mentioned here (bias, linear, and then quadratic). For example, if d = 3 then you would return (w0; w01; w02; w03; w11; w12; w13; w22; w23; w33).

  1. Implement the poly normal function in hw1.py. You are given the same data set as from part (b), but this time you will determine the w parameters by solving the normal equations. Return your w parameters as output. Again, return these parameters in the same order you returned them for part (b).

  1. Implement the plot poly() function in hw1.py. Use the provided function utils.load reg data() to generate a training set X and training labels Y. Plot the curve generated by poly normal() along with the points from the data set. Return the plot as output and include it in your written submission. Compare and contrast this plot with the plot from problem 2. Which model appears to approximate the data best? Provide a justi cation for your answer.

  1. The Minsky-Papert XOR problem is a classi cation problem with data set:

X = f( 1; +1); (+1; 1); ( 1; 1); (+1; +1)g

where the label for a given point (x1; x2) is given by its product x1x2. For example, the point ( 1; +1) would be given label y = ( 1)(1) = 1. Implement the poly xor() function in hw1.py. In this function you will load the XOR data set by calling the utils.load xor data() function, and then apply the linear normal() and poly normal() functions to generate labels for the XOR points. Include a plot of contour lines that show how each model classi es points in your written submission. You may use contour plot() in hw1 utils.py to help you plot the contour lines. As output, return both the labels for the linear model and the polynomial model in that order. Do both models correctly classify all points?

Solution. (Your solution here.)

  1. Nearest Neighbor.

    1. Implement the 1-nearest neighbor algorithm in the nn() function in hw1.py. In the starter code you are given three NumPy arrays as input:

X – training set

Y – training labels X test – testing set

Use the training set to determine labels for the testing set. Return the labels for the testing set as determined by your nearest neighbor implementation.

  1. Plot the Voronoi diagram of your nearest neighbor results. Use the data set returned from utils.load nn data() to make this plot. You may use the function utils.voronoi plot() provided to you in hw1 utils.py to help generate the diagram. There is no need to submit code for this part, only submit the plots in the written portion.

  1. Implement the nn iris() function in hw1.py. Here you will use your nearest neighbor implemen-tation on the Iris data set provided by the scikit-learn library (which can be installed via \pip3 install scikit-learn”). Use the utils.load iris data() function to load the data set, and then split the data set into a testing set and training set. Take the rst 30% of the data set to be the testing set, and the rest of the data set to be the training set. Run your nn() function on this data set and return your classi cation accuracy as output. Report your classi cation accuracy in your written submission.

Solution. (Your solution here.)

  1. Logistic Regression.

Recall the empirical risk Rb for logistic regression (as presented in lectures 5-6):

1

n

b

Xi

Rlog(w) =

n

ln(1 + exp( yiw>xi))

=1

Here you will minimize this risk using gradient descent.

  1. In your written submission, derive the gradient descent update rule for this empirical risk by taking the gradient. Write your answer in terms of the learning rate , previous parameters w, new parameters w0, number of examples n, and training examples xi. Show all of your steps.

  1. Implement the logistic() function in hw1.py. You are given a training set X as input and training labels Y as input along with a learning rate lrate and maximum number of iterations num iter. Implement gradient descent in order to nd parameters w that minimize the empirical risk Rblog(w). One iteration is equivalent to one full data gradient update step. Return your parameters w as output. You may use PyTorch to handle the gradient computation for you. Use w = 0 as your initial parameters. Use a learning rate of lrate and only run for num iter iterations.

  1. Now implement the logistic vs ols() function in hw1.py. This time you are only given the training set X and training labels Y as input. Run logistic(X,Y) from part (b) taking X and Y as input to obtain parameters w (use the defaults for num epochs and lrate). Also run linear gd(X,Y) from problem 2 also taking X and Y as input to obtain parameters w. Plot both lines generated by logistic regression and least squares along with the data X. Which model appears to classify the data better? Provide an explanation for why you believe your choice is the better classi er for this problem.

Solution. (Your solution here.)

8