Assignment 3. Summer Solution

$30.00

Description

Misunderstanding of probability may be the greatest of all impediments to scienti c literacy.

Gould, Stephen Jay

Student Name:

Student Number:

Instructor George Tzanetakis

Question

Value

Mark

1

4

2

3

3

3

Total

10

  • Overview

So far in the assignments we have been mostly building things from \scratch” and not using existing libraries and frameworks as much. In this assignment you are asked to use existing tools to explore some of the ideas we have learned in class. I suggest software in Python for this purpose but if you can nd similar software in other languages that is ok. The topics covered are CSP, machine learning with Naive Bayes and Discrete Bayesian Networks.

Don’t hesitate to contact the instructor via email or utilize the chat room of the ConneX course website for any questions/clari cations you might need.

The questions are marked as (*) necessary to pass, (**) expected and (***) exceptional.

IMPORTANT: YOU SHOULD ONLY SUBMIT A SINGLE PDF FILE WITH YOUR REPORT THROUGH CONNEX. ANY OTHER FORMAT WILL NOT BE ACCEPTED. THE ANSWERS SHOULD BE LABELED BY THE CORRESPONDING QUES-TIONS NUMBERS

  • Constraint-satisfaction problems (4pts)

    1. Using the conventions of the CSP library you are using solve the Cryptarith-metic puzzle provided in Figure 5.2 of the AIMA textbook.(3pts (ONLY UGRADS) – **)

    1. Similarly solve the problem of Waltz Filtering described in the fol-lowing document http://www.cs.cityu.edu.hk/~hwchun/research/ PDF/PAJava99Waltz.pdf using the library. There is a lot of infor-mation in the paper and it uses JSolver a java library for constraint programming. Your task is to transfer the constraints and problem described in the paper to pgmpy. This is easier than it might seem at rst read but it does require some more e ort than the simple example of the rst question which is why this question if for graduate students or undergraduates who want an A+. 4pts (GRADS – NO NEED TO IMPLEMENT QUESTION 1), 1pt (UGRAD).

  • Naive Bayes Text Classi cation using scikit-learn (3pts)

Using the movie review data from assignment 2 use the scikit-learn library to compute the classi cation accuracy using 10-fold cross-validation of the Naive Bayes Bernoulli and Naive Bayes Multinomial classi ers that are part of scikit-learn. Show your code and results. You will need to read the excellent documentation of scikit-learn for the details. (3pts – ***)

  • Discrete Bayesian Networks (3pts)

Consider how to compute the probability of a particular state for example the probability that the candidate has strong musicianship, the pieces played are not that di cult, the rating is 2 stars, the exam score is high, and the rec-ommendation letter is weak. We can use the semantics of Bayesian networks

Di culty

Musicianship

Low

High

Weak

Strong

0:6

0:4

0:7

0:3

Di culty

Musicianship

Rating

Rating

Exam

Di culty Musicianship

*

**

***

Exam

Low

Weak

0:3

0:4

0:3

Musicianship

Low

High

Low

Strong

0:05

0:25

0:7

Letter

High

Weak

0:9

0:08

0:02

Weak

0:95

0:05

High

Strong

0:5

0:3

0:2

High

0:2

0:8

Letter

Rating

Weak

Strong

*

0.1

0:9

**

0.4

0:6

***

0.99

0:01

to compute this joint probability as follows:

P (m = strong)P (d = low)P (r = jm = strong; d = low)

P (e = highjm = strong)P (letter = weakj )

  • 0:3 0:6 0:08 0:8 0:4

    • 0:004608

For this question my suggestion is to use https://github.com/pgmpy/ pgmpy which is a Python Library for Probabilistic Graphical Models for this assignment. Your deliverable should show all the code you used to solve the problem. Your task is to explore this particular discrete Bayesian Network.

  1. Using pgmpy (or some other simialr library) show how the query pro-vided as an example can be computed using exact inference (1pt – **)

  1. Consider a particular musician, George and we want to know how likely he will get a strong recommendation letter. Knowing nothing else this probability is about 50:2%. If we nd out that George is not a strong musician then that probability goes down to around 38:9%. Show how these numbers are derived using the software tool and exact inference (1pt – **)

  1. Answer the previous two questions using some approximate inference algorithm. Comment on the di erence (1pt)

Just to be clear you DO NOT need to do the exact or approximate in-ference by hand but just use the tool with the right conventions to get the results.

  • Grading

The submission should be a single PDF containing code snippets, text and gures with your answers in order. The questions are worth a total of 10 points. Grading will be based on the content of the answers as well as the quality of the report.

4