Home / Summaries / oefenen / gradient-label-neural

Training a CNN

15 important questions on Training a CNN

What does the loss describe about a prediction of a label in the training procedure of a CNN?

It describes how much the prediction of a label differs from the real label.

What does the Mean Square Error (MSE) do in the context of a loss function in the training process of a neural network?

The Mean Square Error loss function describes the distance/how far the prediction of a label is off from the real label. Thus how correct the neural network was at predicting data, and thus how much it should adapt itself (perceived truth?)

Why do we square the Mean Square Error (MSE)?

To penalize more for predictions that are further away, have a greater distance to, the true value

What does the sign of the cost of a neuron describe in consideration of the prediction value during the iterative process of training a convolutional neural network?

A positive loss describes the prediction value should decrease, thus the sum of the input values times their weights should decrease.

How does gradient descent work in the context of backpropagation when training a neural network? What is the mathematical formulae for doing so?

Gradient descent finds the partial derivatives to move into the expected direction of the goal, and updates the weights appropriately in this direction. The relevant items of formula of the gradient consists of:
    * Error C = (y^* - y)^2
    * Prediction y^* = \sigma(Z^L)
    * Activation Z^L = \sum(w*x)

What is the default algorithm we have learned to use for backpropagation?

Gradient descent

What can we say about the gradient we apply to the input values of a max-pooling layer during backpropagation? What is the semantic outcome of this gradient resolution?

Gradient for maximum is 1, others are zero. This means, we propagate the loss (cost) to that single neuron only.

How are one-hot vectors applied in the training of a convolutional neural network?

It defines the loss for mispredicting a label in a classification task.

In what classification task should we use one-hot vectors?

When exactly one class (label) applies for any arbitrary input of the data set.

What is a one-hot vector?

A vector to represents the correct label in a classification task.
The vector has exactly one element equal to one and all others zero.

What does the learning rate \eta describe in gradient descent?

The learning rate is the step size of the gradient in a given iteration.
It describes the amount of movement in a given iteration.

What are the trade-offs for the height of the learning rate in a gradient descent?

A higher learning rate is:

better at discovering the function space
has a higher tendency to evade local minima
is more quickly to move to good positions

However, a lower learning rate is require to:

move more stable
converge to a solution

Name three weight-initialization methods.

* Xavier initialization: Initialize each weight uniformly sampled in [-1/sqrt(n),1/sqrt(n)], n number of inputs
* Normalized Xavier initialization: More suitable for deeper networks, uniformly sampled takes output into consideration, [-6/sqrt(ni+no),6/srqt(ni+no)], ni number of inputs, no number of outputs
* Kaimang initialization: Develop for ReLU, zero bias, Gaussian distribution of mean 0 and stddev sqrt(2/n), n number of inputs

How does Kaimang initialization work

Initialize each weight from a Gaussian distribution, with mean 0 and a standard deviation of

.
n the number of inputs.
It has zero bias.

What is the advantage of Kaimang initialization?

It is suitable for ReLU-based neural layers

The question on the page originate from the summary of the following study material:

oefenen

View summary

A unique study and practice tool
Never study anything twice again
Get the grades you hope for
100% sure, 100% understanding

Remember faster, study better. Scientifically proven.

Training a CNN

15 important questions on Training a CNN

What does the loss describe about a prediction of a label in the training procedure of a CNN?

What does the Mean Square Error (MSE) do in the context of a loss function in the training process of a neural network?

Why do we square the Mean Square Error (MSE)?

What does the sign of the cost of a neuron describe in consideration of the prediction value during the iterative process of training a convolutional neural network?

How does gradient descent work in the context of backpropagation when training a neural network? What is the mathematical formulae for doing so?

What is the default algorithm we have learned to use for backpropagation?

What can we say about the gradient we apply to the input values of a max-pooling layer during backpropagation? What is the semantic outcome of this gradient resolution?

How are one-hot vectors applied in the training of a convolutional neural network?

In what classification task should we use one-hot vectors?

What is a one-hot vector?

What does the learning rate \eta describe in gradient descent?

What are the trade-offs for the height of the learning rate in a gradient descent?

Name three weight-initialization methods.

How does Kaimang initialization work

What is the advantage of Kaimang initialization?

Summaries related to Image classification and object detection

oefenen

Indian Economics

Global politics

Essentials of international relations

Behavioral genetics

Management and organisational behaviour

Follow Up Engels idioom 4/5 H

International Business

Marketing fundamentals

Projectmanagement, A practical Approach-Engl…

Basic Management Accounting for the Hospital…

International business