Youtube videos

8 important questions on Youtube videos

How can weights ensure that a unit in a hidden layer becomes sensitive to a specific feature in the picture?

  • You want the weights of the connections to that unit to have values so that it becomes active when that specific feature is present in the picture
  • that unit must become active when the edge it is sensitive to is present and inactive otherwise.

What does a sigmoid activation do?

  • A sigmoid activation ensures that the resulting number is between 0 and 1.
  • the value serving as input for the activation can be in any range of numbers, the sigmoid activation scales this range to be between 0 and 1

How does a bias influence the activation of a unit?

  • Without a bias the activation of the unit is fully determined by the weighted input.
  • it can be the case however that you only want the unit to become active when it has a weighted input that is higher than some number.
  • this number is then the bias which can be included in the weighted inputs to steer it in a favourable direction. The input with added bias is then passed through the activation function.
  • Higher grades + faster learning
  • Never study anything twice
  • 100% sure, 100% understanding
Discover Study Smart

How many weights and biases are there between one input layer of 784 units and a hidden layer of 16 units, assuming these are dense layers?

  • Each unit of the input layer is connected to each unit in the hidden layer. One unit in the hidden layer has 784 connections and thus 784 weights. Counting all 16 hidden units that is 784 * 16 weights.
  • each unit in the hidden layer has a bias, thus additionally to the weights there are 16 biases.

How can the cost/ loss function be thought of?

  • The cost function can be though of as a function itself similarly to how the neural network is a big function as well.
  • it takes the weights and biases of the neural networks as input and the output is the loss

Considering a single weight, how would you know how to update it?

  • The value of the weight can be plotted against the loss values corresponding to each weight value
    • weight on x, loss on y.
  • at any given point, the derivative of the function can be calculated.
  • if the derivative is negative, the weight needs to be increased to move closer to the minimum
  • if the derivative is positive, the weight needs to be decreased to move closer to the minimum.


A usefull trick is to decrease weight adjustment sizes to prevent overshooting as you approach the local minimum.

What information does the gradient vector of gradient descent tell you?

  • The (negative) gradient vector holds all changes made to all weights. It tells you in which way to change which weights in order to decrease the loss function the quickest.
  • the sign of the change tells you whether the weight should increase or decrease
  • the size of the change tells you how large the adjustment should be
    • the adjustment size also tells you how influential that weight is for the loss function, a big change in a weight indicates that that weight is an important pawn in finding the local minimum.

How can you, for the mnist data, identify what the first hidden layer is sensitive to?

  • For each neuron you can make a grid of 28 by 28 with the weights of each input.
  • these weight grid's tell you which input is most important to each neuron.
  • apparently the units do not really look for specific things, it looks fairly random.

The question on the page originate from the summary of the following study material:

  • A unique study and practice tool
  • Never study anything twice again
  • Get the grades you hope for
  • 100% sure, 100% understanding
Remember faster, study better. Scientifically proven.
Trustpilot Logo