Your cart is currently empty!
Submit a pdf through ConneX before 11:55pm. I will accept submissions up to 24 hours late for a 10% reduction in your mark. After that point, no submissions will be accepted. Show your work for all questions. Different marking schemes will be used for undergrad (SEng 474) and grad (CSc 578D) students.…
Submit a pdf through ConneX before 11:55pm. I will accept submissions up to 24 hours late for a 10% reduction in your mark. After that point, no submissions will be accepted.
Show your work for all questions.
Different marking schemes will be used for undergrad (SEng 474) and grad (CSc 578D) students. Undergrad students do not have to answer the grad questions.
All code questions use the python/numpy/pytorch/matplotlib. You may install these libraries on your own computer, or you can work in the lab.
For each question you should hand the attached code skeleton, including your implementations in it. Include graphs and written answers in your pdf.
1: Neural Networks
(SEng 474; CSc 578D: 20 points)
Use the attached code skeleton circle.py to complete this question.
(SENG 474: 30 points, CSC578D: 40 points. SEng 474; CSc 587D: 5 Bonus points)
In this part of the assignment we will implement a linear regression model, using the Boston houses dataset. We will implement two techniques for minimizing the cost function: batch gradient descent, and stochastic gradient descent.
You are provided with a skeleton python program linreg.py and you have to implement the missing parts.
First let’s recall some theory and mathematical notation1. We express our dataset as a matrix X 2 Rn m where each row is a training sample and each column represents a feature, and a vector y 2 Rn of target values. In order to make the following notation easier, we defined a training sample as vector as x = [1; x1; x2; : : : ; xm]. Basically we add a column of ones to X. The purpose of linear regression is to estimate the vector of parameters w = [w0; w1; : : : ; wm] such that the hyper-plane x wT fits, in an optimal way, our training samples. Once we have obtained w we can predict the value of testing sample, xtest, simply by plugging it into the equation y^ = xtest wT . We estimate w by minimizing a cost function over the entire dataset, which is defined as follows
n
E(w) = 21n X yi xi wT 2
i=1
we apply gradient descent techniques to find the parameters vector w that minimizes the cost function E(w). This is achieved by an iterative algorithm that starts with an initial guess for w and repeatedly performs the update
wj := wj | @ | E(w) j = 0; 1; : : : ; m |
@wj |
where is the learning rate parameter.
In this question you are asked to implement some gradient descent techniques. We are going to set a maximum number of iteration, max_iter in the code, and analyze what happens to the cost function at each iteration by plotting the error function at each iteration.
SEng 474; CSc 587D: 30 pts
CSc 587D only, 10 pts
SEng 474; CSc 587D: Bonus Question, 5 pts
0 | 1 n | y(i) x(i) wT | 2m | 1 | ||||
E(w) = | 1 | wj2 | ||||||
2 | n i=1 | + j=1 | ||||||
@ | X | X | A | |||||
Note that by convention we don’t regularize w0. [5 pts]
(SENG 474; CSC578D: 15 points)
For this question, we will implement logistic regression, and test it using the breast cancer dataset and the code skeleton in logreg.py. For this question, you only need to implement batch gradient descent. This time, plot the log likelihood function for each iteration:
L(w) = X yi(w0xi) X ln(1 + ew0 xi ) (1)
i i
Test it with a few different learning rates. E.g. = 0:001, 0:01, 0:1. Explain what is happening. [15 pts]