Name: Solved-Theory -Assignment 1 -Solution
SKU: 23818
Price: 35.00 USD
Availability: InStock

Description

5/5 – (2 votes)

Instructions

Submission: Assignment submission will be via courses.uscden.net. By the submission date, there will be a folder named ‘Theory Assignment 1’ set up in which you can submit your files. Please be sure to follow all directions outlined here.

You can submit multiple times, but only the last submission counts. That means if you finish some prob-lems and want to submit something first and update later when you finish, that’s fine. In fact you are encouraged to do this: that way, if you forget to finish the homework on time or something happens (re-member Murphy’s Law), you still get credit for whatever you have turned in.

Problem sets must be typewritten or neatly handwritten when submitted. In both cases, your submis-sion must be a single PDF. It is strongly recommended that you typeset with L^AT_EX. There are many free integrated L^AT_EX editors that are convenient to use (e.g Overleaf, ShareLaTeX). Choose the one(s) you like the most. This tutorial Getting to Grips with LaTeX is a good start if you do not know how to use L^AT_EX yet.

Please also follow the rules below:

The file should be named as firstname lastname USCID.pdfg., Don Quijote de la Mancha 8675309045.pdf).

Do not have any spaces in your file name when uploading it.

Please include your name and USCID in the header of your report as well.

Collaboration: You may discuss with your classmates. However, you need to write your own solutions and submit separately. Also in your report, you need to list with whom you have discussed for each problem. Please consult the syllabus for what is and is not acceptable collaboration. Review the rules on academic conduct in the syllabus: a single instance of plagiarism can adversely affect you significantly more than you could stand to gain.

Notes on notation:

Unless stated otherwise, scalars are denoted by small letter in normal font, vectors are denoted by small letters in bold font and matrices are denoted by capital letters in bold font.

k.k means L2-norm unless specified otherwise i.e. k.k = k.k₂

Problem 1 Nearest Neighbor Classification (8 points)

In the class, we use the Euclidean distance as the distance metric for the nearest neighbor classification.

Given data x 2 R^D, the Euclidean distance between x_i and x_j is defined as following:

		2	D
E(x_i, x_j) = x_i	x_j	2	= å(x_id x_jd)²	(1)
		2	d=1

In some applications such as information retrieval and neural language processing, the cosine distance is widely applied. It is defined as:

C(x_i, x_j) = 1

where the norm of x is defined as

x_i^T x_j	= 1	å_d^D₌₁(x_id x_jd)	,
kx_ik₂kx_jk₂
		kx_ik₂kx_jk₂

u _D

_kxk ₌ ^u å _x2_.

2 ^t _d

d=1
(2)

(3)

Now you are asked to prove that for any x_i and x_j normalized to the unit norm, i.e. kx_ik₂ = kx_j k₂ = 1, changing the distance metric from the Euclidean distance to the cosine distance will NOT affect the nearest neighbor classification results. Specifically, for any x_i, x_j and x _o, show that, if C(x _i, x_j) C(x_i, x_o), then E(x_i, x_j) E(x_i, x_o), where kx_ik₂ = kx_jk₂ = kx_ok₂ = 1.

Problem 2 Nearest Neighbor Classification and Decision Trees (4 points)

Assume we have a dataset, each data x is a 100 dimensional binary vector, i.e. x 2 f0, 1g ¹⁰⁰. Can we have a decision tree to classify the dataset with zero classification error? Can we specify a 1-NN to result in exactly the same classification as our decision tree? For both questions explain why or why not with (counter) examples. You can assume that all data points are distinct, i.e. 8x_i, x_j in the dataset, x_i 6= x_j.

Problem 3 Nearest Neighbor Classification and Decision Trees (6 points)

Assume we have a decision tree in Fig. 1 which classifies x 2 C², and C = Z n fA, Bg. In other words, A and B are integers and each dimension of x is an integer excluding A and B. Can this decision tree be implemented as a 1-NN? If so, explicitly write down each of the values you use for the 1-NN (you should use the minimal number possible). If not, either explain why or provide a counter example.

Figure 1: A decision tree example.

Problem 4 Decision Tree (8 points)

In this problem, you are given four 2-dimensional data points as shown in Table 1:

^x1	x₂	label y
0	0	0
0	1	1
1	0	1
1	1	0

Table 1: Four 2-dimensional data points and their labels.

Figure 2: The plus sign means label y = 0 and the circle means have y = 1.

4.1 Fig. 3 is a decision tree of the given data with zero training error.

Figure 3: The decision tree with zero training error.

Suppose now you have two test data points:

x₁	x₂	label y
0.2	0.2	0
0.2	0.8	1

What would be your test error based on decision tree in Fig. 3? (Define the test error as the fraction of

mis-classifications made on the testing set.) (2 points)

4.2 Now consider a new decision tree in Fig. 4. Note that the depth of the new decision tree is 1, and it does not have zero training error for the given data anymore.

Figure 4: The decision tree with depth = 1.

Given the two test data points from Question 4.1, what would be the test error using the new decision tree? (2 points)

4.3 Is the decision tree in Fig. 4 a linear or non-linear classifier in terms of (x₁, x₂) (Yes/No)? Can you classify the given data in Table 1 and get zero classification error by drawing a depth-1 decision tree similar to Fig. 4 (Yes/No)? Note that the decision rule should be based on a single variable (x₁ or x₂) in the rectangle

(e.g., x₁ > 1 or x₂ < 2). (2 points)

4.4 If you can put linear combination of variables in the rectangle (e.g.: ax₁ + bx₂ c or ax₁ + bx₂ < c , where a, b, c should be numbers), can you classify the given data in Table ?? and get zero classification error by drawing a depth-1 decision tree similar to Fig. 4 (Yes/No)? Please also briefly justify your answer. (2 points)

Problem 5 Decision Trees (12 points)

Consider a binary dataset with 400 examples, where half of them belongs to class A and the other half belongs to class B.

Next consider two decision stumps (i.e. trees with depth 1) T₁ and T₂, each with two children. For T₁, its left child has 150 examples in class A and 50 examples in class B; for T₂, its left child has 0 example in class A and 100 examples in class B. (You should infer what are in the right child.)

5.1 For each leaf of T₁ and T₂, compute the corresponding classification error, entropy (base e) and Gini impurity, rounding off your answer to 2 decimal places. (Note: the value/prediction of each leaf is the

majority class among all examples that belong to that leaf.) (6 points)

5.2 Compare the quality of T₁ and T₂ (that is, the two different splits of the root) based on classification error, conditional entropy (base e), and weighted Gini impurity respectively, rounding off your answer to 2

decimal places. (6 points)

Problem 6 Naive Bayes (12 points)

In this problem, we will try to predict whether it is suitable for playing tennis or not based on the weather condition, the emotion and the amount of homework, using Naive Bayes Classifier. You can think ‘PlayTen-nis’ is a label and ‘PlayTennis = Yes’ means it is suitable for playing tennis. We assume the probability P(Weather, Emotion, HomeworkjPlayTennis) can be factorized into the product form such that

P(Weather, Emotion, HomeworkjPlayTennis) =

P(WeatherjPlayTennis) P(EmotionjPlayTennis) P(HomeworkjPlayTennis).

The training data is as following. Each data point has three attributes (Weather, Emotion, Homework),

where Weather 2	Sunny, Cloudy , Emotion 2			Happy, Normal, Unhappy , Homework 2			Much, Little .
		Weather	Emotion		Homework	PlayTennis

		Sunny	Happy		Little	Yes
		Sunny	Normal		Little	Yes
		Cloudy	Happy		Much	Yes
		Cloudy	Unhappy		Little	Yes
		Sunny	Unhappy		Little	No
		Cloudy	Normal		Much	No

6.1 What are the probabilities of P(PlayTennis = Yes) and P(PlayTennis = No)? Each of your answer

should be an irreducible fraction. (2 points)

6.2 Write down the following conditional probabilities. Each of your answer should be an irreducible

fraction. (6 points)

P(Weather = SunnyjPlayTennis = Yes) =?

P(Emotion = NormaljPlayTennis = Yes) =?

P(Homework = MuchjPlayTennis = Yes) =?

6.3 Given the new data instance x = (Wheather = Sunny, Emotion = Normal, Homework = Much), which of the following has larger value: P(PlayTennis = Yesjx) or P(PlayTennis = Nojx)? Each of your

answer should be an irreducible fraction. (4 points)

Solved-Theory -Assignment 1 -Solution

Description

Related products

Lab 3: “Thanks for All the Fish!!!” SOlution

[solved]Homework 6: Hero Agents-Solution

Homework 2: Path Network Navigation SOlution

Project 1B: Transformation Matrices Solution

Project 1A: Transformation Matrices Solution