Description

5/5 - (2 votes)

1 Aim

Please identify each of a large number of black-and-white rectangular pixel displays a one of the 26 capital letters in the English alphabet. You have to use various classi cation models taught in class up to Chapter Five of the textbook. Other than the necessary dat preprocessing such as partitioning a dataset into separate training and test datasets an scaling, you are required to investigate the e ectiveness of feature selection/extraction. Yo may apply new methods or use new packages to improve the classi cation performance, b if you do so, you have to give a brief introduction of the key concepts and provide necessar citations, instead of just direct copy paste or importing. However, in this assignment, yo are not allowedto use any neural network related models (e.g., multilayer perceptron, LeNet, etc). In case any neural network related method is applied, you will receive no credits. Onc an algorithm package is merged or imported into your code, please list the package link i your reference and describe its mathematical concepts in your report followed by the reas for adoption.

2 Dataset Description

The dataset can be downloaded from UCI Machine Learning Repository https://archive.ics.uci.edu/ml/datasets/Letter+Recognition.
There are 20000 instances in the dataset. Each instance has 16 features and one class label.

You can see the dataset information in the above web page.

3 Submission Format

You have to submit a compressed file hw1 studentID.zip which contains the following

files:
1. hw1 studentID.ipynb : detailed report, Python codes, results, discussion and math-

ematical descriptions;
2. hw1 studentID.tplx : extra Latex related setting, including the bibliography;

3. hw1 studentID.bib : citations in the ”bibtex” format;

4. hw1 studentID.pdf : the pdf version of your report which is exported by your ipynb

with

(a) %% jupyter nbconvert –to latex –template hw1 studentID.tplx hw1 studentID.ipynb

(b) %% pdflatex hw1 studentID.tex

(c) %% bibtex hw1 studentID

(d) %% pdflatex hw1 studentID.tex

(e) %% pdflatex hw1 studentID.tex

5. Other files or folders in a workable path hierarchy to your jupyter notebook (ipynb).

 

4 Coding Guidelines

For the purpose of individual demonstration with TA, you are required to create a func-tion code in your jupyter notebook, as specified below, to reduce the data dimensionality, learn a classification model, and evaluate the performance of the learned model.

hw1 student ID handwritten(in x, in label, mode, feature engr, f para, clas-si cation, cpara, other para)

{ in x: [string] csv file or a folder path for handwritten letter image data.

– in label: [string] csv file or a folder path, which contains labels to the corre-sponding instances in in x.

– mode: [string] ’featengr’ for reducing the data dimensionality by feature engi-neering; ‘training’ for building models; ‘test’ for using built model to evaluate performance.

{ feature engr: [None or string] described in Report Requirement.

{ f para: [None or numpy array] default None, declaring necessary parameter(s) for feature selection/extraction.

{ classi cation: [None or string] described in Report Requirement.

{ c para: [None or numpy array] default None, declaring necessary parameter(s) for classification.

{ other para: [None or numpy array] default None, declaring necessary parame-ter(s) for your program other than the ones for feature engr and classification.

When Mode=\test”, please dump the results to files, * hw1 studentID results.csv: one column with header ‘label’; * hw1 studentID performance.txt: showing the perfor-mance (accuracy) in ‘%’. Only output one number of the type “float”, without any extra ‘string’ words.

 

5 Report Requirement

List names of packages used in your program;

Describe the keywords in the argument of your function hw1 student ID handwritten(in x, in label, mode, feature engr, f para, classi cation, c para, other para)

{ a list of feature engr methods, for example

None: (default) no feature engineering (selection/extraction)

`L1′ : L1-regularization feature selection

`SFS’ : sequential feature selection

`Forest’: assessing feature importance with random forest

`PCA’: principal component analysis

`GKPCA’: Gaussian kernel principal component analysis

`LDA’: linear discriminant analysis and so on;

{ a list of classi cation methods, for example

None: used when Mode = ’featengr’

`SVM’: Support vector machine

`GKSVM’: Gaussian kernel Support vector machine

`logReg’ : Logistic regression

`Perceptron’ : Perceptron

`KNN’: K-nearest neighbors

`Decision’: Decision tree

`Forest’: Random forest and so on;

For better explanation, draw owcharts of the methods or procedures used in the program;

Describe the mathematical concepts of any new algorithms or models employed as well as the roles they play in your feature selection/extraction or classification task in Markdown cells [?];

Discuss the performance among different classifiers with/without feature selection/extraction.

5.1 Basic Requirement

Use the original grayscale image data without any feature selection/extraction to do classification. Then compare results after feature selection (such as L1 regularization, sequential feature selection, or feature importance assessing with random forest) or feature extraction (such as PCA, kernel PCA or LDA) is applied.

All the classifiers taught in class should be investigated and compared by performance. For SVM, you should investigate both linear-SVM and kernel-SVM. Also for percep-tron, logistic-regression and SVM classifiers, you should investigate their stochastic gradient descent (SGD) versions provided in scikit-learn to handle large datasets [?][?].

If you apply new methods or use new packages to improve the classification perfor-mance, you have to give a brief introduction of the key concepts and provide necessary citations/links, instead of just direct copy paste or importing.

 

 

 

 

 

 

 

 

 

 

 

 

 

4