Name: Solved--Homework 1 --Solution
SKU: 15667
Price: 30.00 USD
Availability: InStock

Description

5/5 – (2 votes)

Hard-Coding a Network. [2pts] In this problem, you need to nd a set of weights and biases for a multilayer perceptron which determines if a list of length 4 is in sorted order. More speci cally, you receive four inputs x₁; : : : ; x₄, where x_i 2 R, and the network must output 1 if x₁ < x₂ < x₃ < x₄, and 0 otherwise. You will use the following architecture:

All of the hidden units and the output unit use a hard threshold activation function:

1 if z 0

(z) =

0 if z < 0

Please give a set of weights and biases for the network which correctly implements this function (including cases where some of the inputs are equal). Your answer should include:

A 3 4 weight matrix W⁽¹⁾ for the hidden layer

A 3-dimensional vector of biases b⁽¹⁾ for the hidden layer A 3-dimensional weight vector w⁽²⁾ for the output layer A scalar bias b⁽²⁾ for the output layer

You do not need to show your work.

Backprop. Consider a neural network with N input units, N output units, and K hidden units. The activations are computed as follows:

= W⁽¹⁾x + b⁽¹⁾

= (z)

= x + W⁽²⁾h + b⁽²⁾;

where denotes the logistic function, applied elementwise. The cost will involve both h and y:

=R+S R = r^>h

S =	1
	₂ky sk²

for given vectors r and s.

[1pt] Draw the computation graph relating x, z, h, y, R, S, and J .

[3pts] Derive the backprop equations for computing x = @J =@x. You may use ⁰ to denote the derivative of the logistic function (so you don’t need to write it out explicitly).

Sparsifying Activation Function. [4pts] One of the interesting features of the ReLU activation function is that it sparsi es the activations and the derivatives, i.e. sets a large fraction of the values to zero for any given input vector. Consider the following network:

Note that each w_i refers to the weight on a single connection, not the whole layer. Suppose we are trying to minimize a loss function L which depends only on the activation of the output unit y. (For instance, L could be the squared error loss ¹₂ (y t)².) Suppose the unit h₁ receives an input of -1 on a particular training case, so the ReLU evaluates to 0. Based only on this information, which of the weight derivatives

@L_; @L_;

@w₁ @w₂

are guaranteed to be 0 for this training case? answers.

@w₃

Write YES or NO for each. Justify your

Solved–Homework 1 –Solution

Description

Related products

Project One: Top of Pile Solution

Homework 3: Mbed Setup Solution

Lab 5: Vegas Blackjack Solution

Project 2A: Object Modeling Solution

Animation Project 2B: Solution