$30.00
Description
Note
A zipped file containing skeleton Python script files and data is provided. Note that for each problem, you need to write code in the specified function within the Python script file. For logistic regression, do not use any Python libraries/toolboxes, builtin functions, or external tools/libraries that directly perform the learning or prediction.. Using any external code will result in 0 points for that problem.
Evaluation
We will evaluate your code by executing script.py file, which will internally call the problem specific functions. You must submit an assignment report (pdf file) summarizing your findings. In the problem statements below, the portions under REPORT heading need to be discussed in the assignment report.
Introduction
In this assignment, we will extend the first programming assignment in solving the problem of handwritten digit classification. In particular, your task is to implement Logistic Regression and use the Support Vector Machine tool in sklearn.svm.SVM to classify handwritten digit images and compare the performance of these methods.
To get started with the exercise. you will need to download the supporting files. Unzip its contents to the directory where you want to complete this assignment.
Datasets
In this assignment, we still use the same data set of first programming assignment – MNIST. In the script file provided to you, we have implemented a function, called preprocess(), with preprocessing steps. This will apply feature selection, feature normalization, and divide the dataset into 3 parts: training set, validation set, and testing set.
Your tasks
Implement Logistic Regression and give the prediction results.
Use the Support Vector Machine (SVM) toolbox sklearn.svm.SVM to perform classification.
Write a report to explain the experimental results with these 2 methods.
Extra credit (both 474 and 574): Implement the gradient descent minimization of multiclass Logistic Regression (using softmax function).
1
Consider x 2 R^{D} as an input vector. We want to classify x into correct class C_{1} or C_{2} (denoted as a random variable y). In Logistic Regression, the posterior probability of class C_{1} can be written as follow:
P (y = C_{1}x) = σ(w^{T} x + w_{0})
where w 2 R^{D} is the weight vector.
For simplicity, we will denote x = [1, x_{1}, x_{2}, · · · , x_{D}] and w = [w_{0}, w_{1}, w_{2}, · · · , w_{D}]. With this new notation, the posterior probability of class C_{1} can be rewritten as follow:

P (y = C_{1}x) = σ(w^{T} x)
(1)
And posterior probability of class C_{2} is:
P (y = C_{2}x) = 1 − P (y = C_{1}x)
We now consider the data set {x_{1}, x_{2}, · · · , x_{N} } and corresponding label {y_{1}, y_{2}, · · · , y_{N} } where

y_{i} =
1
if x_{i} 2 C_{1}
⇢
0
if x_{i} 2 C_{2}
for i = 1, 2, · · · , N.
With this data set, the likelihood function can be written as follow:
Y^{N}
p(yw) = ✓_{n}^{y}n (1 − ✓_{n})^{1−y}n
n=1
where ✓_{n} = σ(w^{T} x_{n}) for n = 1, 2, · · · , N.
We also define the error function by taking the negative logarithm of the log likelihood, which gives the crossentropy error function of the form:

1
1
N
n^{X}
E(w) = −_{N} ln p(yw) = −_{N}
_{=1}{^{y}n ^{ln} ^{✓}n ^{+ (1} − ^{y}n^{) ln(1} − ^{✓}n^{)}}
(2)
Note that this function is di↵erent from the squared loss function that we have used for Neural Networks and Perceptrons.
The gradient of error function with respect to w can be obtained as follow:

1
N
n^{X}
rE(w) = _{N}
_{=1}(✓n − yn)xn
(3)
Up to this point, we can use again gradient descent to find the optimal weight wb to minimize the error function with the formula:

w^{new} = w^{old} − ⌘rE(w^{old})
(4)
Implementation
You are asked to implement Logistic Regression to classify handwritten digit images into correct corresponding labels. The data is the same that was used for the first programming assignment. Since the labels associated with each digit can take one out of 10 possible values (multiple classes), we cannot directly use a binary logistic regression classifier. Instead, we employ the onevsall strategy. In particular, you have to build 10 binaryclassifiers (one for each class) to distinguish a given class from all other classes. In order to implement Logistic Regression, you have to complete function blrObjFunction() provided in the base code (script.py). The input of blrObjFunction.m includes 3 parameters:
2
X is a data matrix where each row contains a feature vector in original coordinate (not including the bias 1 at the beginning of vector). In other words, X 2 R^{N} ^{⇥}^{D}. So you have to add the bias into each feature vector inside this function. In order to guarantee the consistency in the code and utilize automatic grading, please add the bias at the beginning of feature vector instead of the end.
w_{k} is a column vector representing the parameters of Logistic Regression. Size of w_{k} is (D + 1) ⇥ 1.
y_{k} is a column vector representing the labels of corresponding feature vectors in data matrix X. Each entry in this vector is either 1 or 0 to represent whether the feature vector belongs to a class C_{k} or not (k = 0, 1, · · · , K − 1). Size of y_{k} is N ⇥ 1 where N is the number of rows of X. The creation of y_{k} is already done in the base code.
Function blrObjFunction() has 2 outputs:
error is a scalar value which is the result of computing equation (2)
error grad is a column vector of size (D + 1) ⇥ 1 which represents the gradient of error function obtained by using equation (3).
For prediction using Logistic Regression, given 10 weight vectors of 10 classes, we need to classify a feature vector into a certain class. In order to do so, given a feature vector x, we need to compute the posterior probability P (y = C_{k}x) and the decision rule is to assign x to class C_{k} that maximizes P (y = C_{k} x). In particular, you have to complete the function blrPredict() which returns the predicted label for each feature vector. Concretely, the input of blrPredict() includes 2 parameters:
Similar to function blrObjFunction(), X is also a data matrix where each row contains a feature vector in original coordinate (not including the bias 1 at the beginning of vector). In other words, X has size N ⇥ D. In order to guarantee the consistency in the code and utilize automatic grading, please add the bias at the beginning of feature vector instead of the end.
W is a matrix where each column is a weight vector (w_{k}) of classifier for digit k. Concretely, W has size (D + 1) ⇥ K where K = 10 is the number of classifiers.
The output of function blrPredict() is a column vector label which has size N ⇥ 1.
(REPORT) In your report, record and discuss classification results and accuracy.
For Extra Credit (474 and 574) – Direct Multiclass Logistic Regression
In this part, you are asked to implement multiclass Logistic Regression. Traditionally, Logistic Regression is used for binary classification. However, Logistic Regression can also be extended to solve the multiclass classification. With this method, we don’t need to build 10 classifiers like before. Instead, we now only need to build 1 classifier that can classify 10 classes at the same time.
For multiclass Logistic Regression, the posterior probabilities are given by a softmax transformation of linear functions of the feature variables, so that
exp(w^{T} x)
P (y = C_{k}x) = P ^{k} (5)
_{j} exp(w_{Tj} x)
Now we write down the likelihood function. This is most easily done using the 1ofK coding scheme in which the target vector y_{n} for a feature vector x_{n} belonging to class C_{k} is a binary vector with all elements zero except for element k, which equals one. The likelihood function is then given by

N
K
N K
Y
_{k}Y
^{y}nk
Y Y
^{y}nk
P (Yw_{1}, · · · , w_{K} ) =
P (y = C_{k}x_{n})
=
(6)
^{✓}nk
n=1
=1
n=1 k=1
3
where ✓_{nk} is given by (5) and Y is an N ⇥ K matrix (obtained using 1ofK encoding) of target variables with elements y_{nk}. Taking the negative logarithm then gives

N
K
X
k^{X}
E(w_{1}, · · · , w_{K} ) = − ln P (Yw_{1}, · · · , w_{K} ) = −
y_{nk} ln ✓_{nk}
(7)
n=1
=1
which is known as the crossentropy error function for the multiclass classification problem.
We now take the gradient of the error function with respect to one of the parameter vectors w_{k} . Making use of the result for the derivatives of the softmax function, we obtain:

N
@E(w_{1}, · · · , w_{K} )
n^{X}
nk −
=
(✓
y
nk
)x
n
(8)
@w_{k}
=1
then we could use the following updating function to get the optimal parameter vector w iteratively:

_{w}new
_{w}old
−
⌘
@E(w_{1}, · · · , w_{K} )
(9)
k
k
@w_{k}
(REPORT) In your report, record and discuss classification results and accuracy.
(REPORT) Discuss the performance of multiclass logistic regression compared to the performance of logistic regression when using the onevsall strategy.
Note: This part is an extra 20 credits for both 474 and 574.
Support Vector Machines
In this part of assignment you are asked to use the Support Vector Machine tool in sklearn.svm.SVM to perform classification on our data set. The details about the tool are provided here: http://scikitlearn. org/stable/modules/generated/sklearn.svm.SVC.html.
Your task is to fill the code in Support Vector Machine section of script.py to learn the SVM model and compute accuracy of prediction with respect to training data, validation data and testing using the following parameters:
Using linear kernel (all other parameters are kept default).
Using radial basis function with value of gamma setting to 1 (all other parameters are kept default).
Using radial basis function with value of gamma setting to default (all other parameters are kept default).
Using radial basis function with value of gamma setting to default and varying value of C (1, 10, 20, 30, · · · , 100) and plot the graph of accuracy with respect to values of C in the report.
(REPORT) In your report provide justification for the above selection of hyperparameters, as well as plots and dicussion of your results.
(REPORT) Discuss results, comparing the selections of linear kernel and radial basis function kernel.
Submission
You are required to submit a single file called proj3.zip using UBLearns.
File proj3.zip must contain 3 files: Your report (report.pdf ), params.pickle, and script.py. The params.pickle file should contain the weight matrix,W, learnt for the logistic regression
Submit your report in a pdf format. Please indicate the team members, group number, and your course number on the top of the report.
4
The code file should contain all implemented functions. Please do not change the name of the file.
Using UBLearns Submission: Continue using the groups that you were in for programming assignment 2. You should submit one solution per group through the groups page. If you want to change the group, contact the instructors.
Project report: A hardcopy of your report will be collected in class on the due date. Your report should include the experimental results you have performed using Logistic Regression and Support Vector Machine.
Grading scheme
Implementation:

blrObjFunction(): 20 points

blrPredict(): 20 points

script.py: 20 points (your code in SVM section)
Project report: 30 points
Accuracy of classification methods: 10 points
Extra Credit: 20 points.
5