Assignment #2 Solution



Instruction: This assignment should be completed individually. Please make sure your answer is legible (and preferably formatted using MS Words or LATEX/LYX). If a question requires you to fol-low an algorithm, show a clear trace of the algorithm. If the algorithm is iterative, show the details in the first two iterations. For each of the remaining iterations, show the status of the algorithm at the end of the iteration. Please also submit to the BlackBoard a single file that contains a PDF or a Word version of your solutions (no scanned image please), and the source code, the input data, and the output of your program. The following questions are based on this transaction database.


items bought

T100 {M, O, N, K, E}

T200 {D, O, N, K, E, Y}

T300 { M, A, K, E}

T400 {M, U, C, K, Y}

T500 {C, O, K, I ,E}

  1. [40] Let supmin = 60% and confmin = 80%.

    1. Find all frequent itemsets by following the Apriori Algorithm.

    1. Identify closed and maximal frequent itemsets.

    1. Draw the hash tree for candidate 2-itemsets, assuming that the hash function will map A, E, M, Y to the left branch, C, I N to the center branch, and D, K O to the right branch.

    1. Identify the hash tree notes visited when counting the support of 2-itemsets for trans-action T400

  1. [20] Let supmin = 60% and confmin = 80%. Find all frequent itemsets by following the FP-growth algorithm.

    1. Find the F-list and draw the FP-tree.

    1. For each projected database, list all frequent itemsets that can be enumerated from the database.

  1. [20] Find all association rules. Each rule should be written as

r# buys(item1); buys(item2) ) buys(item3) [s; c]

Computer Science 4373 Assignment 2 January 6, 2018

where r# is the id of the rule, s is the support, c is the confidence, and itemi is an item (such as, “M”, “O”,etc.).

    1. Identify all strong association rules (that is, those rules with support s and confidence c higher than the thresholds).

    1. Compute the lift measure for each strong association rule.

  1. [20] Convert the transaction data table into an hwk02.arff file, in which each item is a column with either a value t or ? (that is, with a type { t}). For example:

@relation hwk02 @attribute A { t}

@data ?,t,?,?,t,…

Use Weka Explore to load the data, and run the Aproiri and FPGrowth algorithms to learn interesting association rules. Do it for at least three sets of minimum support and confidence. For each algorithm, report the following items.

    1. The commands used to run the learning tasks

    1. The output from the runs