Assignment 3 Solution

$30.00

Description

General instructions.

1.Thefollowinglanguagesareacceptable: Java,C/C++,Matlab,PythonandR.

2. Youcanworkinateamofupto3people. Each teamwillonlyneedtosubmit onecopyofthesource codeandreport.Youneedto explicitlystate eachmember’scontributioninpercentages,i.e.,foreachgroup member provide anumber xx%-percentage ofthetotalproject she/he isresponsible for. Note thatall teammembers areexpectedtocontributeequallytotheassignment.Theindividuals whosecontributionis significantlylowerthanexpectationwillreceivepenalty totheassignmentgrade.

3.YoursourcecodeandreportwillbesubmittedthroughtheTEACH site

https://secure.engr.oregonstate.edu:8000/teach.php?type=want_auth

Pleaseclearlyindicateyourteammembers’information.

4.Besuretoanswerallthequestionsinyourreport. Youwillbegraded basedonyourcodeaswellasthe report. Inparticular,theclarityandqualityofthereport will beworth10pts. Sopleasewriteyour reportinclearandconcisemanner. Clearlylabelyourfigures,legends,andtables.

5. Inyour report, theresults should alwaysbeaccompanied bydiscussions oftheresults. Dotheresults followyourexpectation? Anysurprises? Whatkindofexplanationcanyouprovide?

Decision trees and Random forest (total points: 50 + 10 pts)

In this assignment you will implement (1) the decision tree learning algorithm; and (2) construction of the

random forest (usingfeatureBaggingandbootstrappedsampling) withthemodifieddecisiontreelearning algorithm frompart(1)asthebaselearner. YouwilltestyourimplementationontheIRISdatasets,which isatree-class classification problem, with 4continuous features. Youwilltrainyour classifiersusing the iris-train.csvfileandtestontheiris-test.scvfile.First4columnsprovidefeaturevaluesandthelastcolumn providesaclasslabel:

(a) sepallength incm (b) sepalwidth incm (c) petal lengthincm (d) petalwidthincm

(e) class:IrisSetosa(class0),IrisVersicolour(class1),IrisVirginica(class2). Inparticular,youneedtodothefollowing:

1.(15pts)Implementadecisiontreelearning algorithm.

(a) Implementadecisiontreelearningalgorithm,usingthenumber ofinstancesatleavenodeasastopping condition. Thatis,ifthenumber of examplesatanodeislessthan k,onemuststopandturnitintoa leavenode. Notethat,sincethefeatures arecontinuous,foreverynodeofthetree,youshouldcompute thresholdθthatgivesyouthebestinformationgain.

(b) Please reporttheinformationgainofeachthresholdand eachfeature(recall thatyouonlyneedto computeinformationgainwhenclasslabelchanges) fortherootnode.

(c) Report(plot)trainingand testingerrors (i.e., thepercentageofcorrectlyclassifiedexamples) versus parameterk.

(d) Whateffectdoeskhasontrainingandtestingaccuracy ofthetree?

2. (35pts)Implement random forest byconstructing anensemble ofdecisiontrees. Inparticular you shoulddothefollowing:

(a) Modifytreelearningalgorithm (frompart1)suchthat,ateachcandidatesplitinthelearningprocess, itselectsarandom subsetofthefeatures(please select2random featuresoutof4). This processis sometimescalled”featurebagging”.

(b) Then, usingmodifiedlearningalgorithm,buildatreeonarandomly (with replacement)drawnsubset ofyour trainingdata(thesizeofeach ofsuch randomly drawn subsetisequal tothesizeofyour originaltrainingdata).Thistechniqueiscalled”bootstrapaggregating”or”bagging”.

(c) Repeat this procedure Ltimes (thus,youwillhaveLdecisiontrees). Please, consider thefollowing valuesofL:5,10,15,20,25and30.Notethatpredictionforanysamplescanbemadebyoutputting theclassthatisthemodeoftheclassespredictedbyeachofLtrees(i.e.,majority vote).

(d) ForeveryvalueofL(5,10,15,20,25and30)plotthetrainingandtestingerrors(i.e.,thepercentage ofcorrectlyclassifiedexamples) versusvalues oftheparameterk. Notethatforbagging, duetoits stochasticnature,different random runs mayleadtodifferent results. Toincrease therobustness of theresults, please showtheaverage error ratesover10random runs. Your plotsshould beclearly labeledwitheasy-to-readlegends.

(e) Whattrenddoyouobserveintermsoftheaccuracy onthetrainingand testingdatarespectivelyas weincreasethenumber oftreesintheensemble? Howrandom featureselectionaffectstheaccuracy ofclassification? Whateffectdoeskhasontrainingandtestingaccuracy?

Note,youneedtosubmityoursourcecodeofyourimplementationsandyourreport.

Do not forget to include group members and indicate pro ject contribution for each member in percentages.