Neural Network & Image Classification Solution



  • Question: Neural Network Playground

Tensor ow Playground is an interactive tool for learning neural networks (more speci cally, multi-layer-perceptron1 networks). A customized version of Tensor ow Playground is available at http://www.cs.

First, we will get familiar with the interface of Tensor ow Playground. Let’s get to it!

You can choose from four di erent datasets on the left side of the page in the DATA panel: Circle, Exclusive Or, Gaussian, and Spiral.The actual data point locations are available on the right side under the OUTPUT label. Do not change the ratio of training to test data or click the REGENERATE button throughout this question. Batch size indicates how many samples are used in your mini-batch gradient descent. You may change this parameter if you want.

The neural network model is in the middle of DATA and OUTPUT. This model is a standard \feed-forward” neural network, where you can vary: (1) the input features (2) the number of hidden layers (3) the number of neurons at each layer. By default, it uses only the raw inputs X1 and X2 as features, and no hidden layers. You will need to change theses attributes later.

Several hyper-parameters are tunable at the top of the page, such as the learning rate, the activation (non-linearity) function of each neuron, the regularization norm as well as the regularization rate.

There are three buttons at the top left for you to control the training of a neural network: Reset, Run/Pause and Step (which steps through each mini-batch of samples at a time).

1.1 Hand-crafted Feature Engineering

Now that we are familiar with the Playground, we will start to build models to classify samples.

First you are going to earn some experience in hand-crafted feature engineering with a simple perceptron model with no hidden layers, sigmoid activations, no regularization. In other words, don’t change the model you’re given by default when loading the page.

A perceptron (single-layer arti cial neural network) with a sigmoid activation function is equivalent to lo-gistic regression. As a linear model, it cannot t some datasets like the Circle, Spiral, and Exclusive Or. To extend linear models to represent nonlinear functions of x, we can apply the linear model to a transformed input (x).

One option is to manually engineer . This can easily be done in the Playground since you are given 7 di erent features to choose from in the FEATURES panel.

Task: You are required to nd the best perceptron models for the four datasets, Circle, Exclusive Or, Gaussian and Spiral by choosing di erent features. Try to select as few features as possible. For the best model of each dataset, you should report the selected features, iterations and test loss in a table. If you also change other hyper-parameters, e.g. the learning rate, you should include them in your report.

For each dataset (Circle, Exclusive Or, Gaussian and Spiral), please include a web page screenshot of the result in your report and explain why this con guration works.

(Note: Don’t worry if you cannot nd a good perceptron for the Spiral dataset. For the other three datasets, the test loss of a good model should be lower than 0.001.)

1.2 Regularization

Forget the best features you have found in last question. You should now select all the possible (i.e. all 7) features to test the regularization e ect here.

You are required to work on the three datasets: Circle, Exclusive Or and Gaussian.

Task A: Try both L1 and L2 regularization with di erent (non-zero) regularization rates. In the report, you are required to compare the decision boundary and the test loss over the three models trained with similar number of iterations: no regularization, L1 regularized, L2 regularized.

For each dataset (Circle, Exclusive Or and Gaussian), please include a web page screenshot of the result in your report and explain why this con guration works.

Task B: We have learned that L1 regularization is good for feature selection. Take a look at the features with signi cantly higher weights. Are they the same as the ones you select in last question? Write down the results you observe in your report. (You can get the feature weights by moving the mouse pointer over the dash lines.)

1.3 Automated Feature Engineering with Neural Network

While we were able to nd di erent parameters which were able to make good predictions, the previous two sections required a lot of hand-engineering and regularization tweaking 🙁 We will now explore the power of a neural network’s ability to automatically learn good features 🙂 Let’s try it out on two datasets: the Circle and Exclusive Or.

Here we should only select X1 and X2 as features (since we are trying to automatically learn all other fea-tures from the network). As we have previously seen, a simple perceptron is not going to learn the correct boundaries since both datasets are not linearly separable. However, a more complex neural network should be able to learn an approximation of the complex features that we have selected in the previous experiments.

You can click the + button in the middle to add some hidden layers for the model. There is a pair of + and

  • buttons at the top of each hidden layer for you to change the number of the hidden units. Note that each hidden unit, or neuron, is the same neuron from Lecture 17 (possibly varying the sigmoid activation function).

Task: Find a set of neural network model parameters which allow the model to nd a boundary which cor-rectly separates the testing samples. Report the test loss and iterations of the best model for each dataset. If you modify the other parameters (e.g. activation function), please report them too. Can it beat or approach the result of your hand-crafted feature engineering?

For each dataset (Circle, Exclusive Or), please include a web page screenshot of the result in your report and explain why the con guration would work.

1.4 Spiral Challenge (0.5 Extra credit)

Congratulations on your level up!

Task: Now try to nd a model that achieves a test loss lower than 0.01 on the Spiral dataset. You`re free to use other features in the input layer besides X1 and X2, but a simpler model architecture is preferred. Report the input features, network architecture, hyper parameters, iterations and test loss in a table. For simplicity, please represent your network architecture by the hidden layers a-b-c-…, where a, b, c are number of hidden units of each layer respectively.

Please include a web page screenshot of the result in your report and explain why this con guration works.

You are now a neural network expert (on the Playground)!

  • Question: Image Classi cation)

An online tutorial might be helpful to you for nishing this assignment,

As part of Homework 3, you will be participating in a class-wide Kaggle-style competition. Given a dataset of images of handwritten digits, you will create a model to classify a given image as the digit it rep-resents. You will be competing with your peers to create the model with the greatest classi cation accuracy.

The data set is available as zip.train and zip.test. Each example in the train and test data is a the number in question followed by 256 grayscale values representing a 16 16 image. Values have already been normalized. You can read a full description of the data set in the le available in collab. Visualization of some samples are provided in the subdir \image digits” in the attached data ZIP le.

Your objective is to develop a model with the highest possible classi cation accuracy for the images.

2.1 Evaluation

The main evaluation metric for this competition is Classi cation Accuracy (%-age of correctly classi ed images). Classi cation accuracy is given by:

Acc = #correct


on the testing dataset.

Your code should generate predicted labels of the testing dataset. Each prediction label should be on its own line!

The results le should follow the following format:











2.2 Extra Credit

Top-performing models will receive up to 1 extra credit point on this homework: (The top 20 submissions will get extra credits. We will run your code to generate the predicted labels.)