Home / Summaries / Class notes - Deep Learning in Python / layers-network-layer

Python course deep learning

Q: In addition to the training data, what two things do we need to train a network?

A 'loss function' that measures how good the network's predictions are. An 'optimizer' that can tell the network how to change its weights.

19 important questions on Python course deep learning

What model is a single unit comparable to?

The linear model:
the neuron multiplies the input by some coefficient and adds this to a constant to obtain an output.

How can a single unit be expanded to work with more features?

The unit can be expanded to have more connections of input.
each connection has a specific weight associated with it.
the unit multiplies each input value by the weight of the connection and adds these numbers and the bias to obtain it's output.

How can you fit a linear unit in keras?

Keras is the python package that is used to train neural networks

from tensorflow import keras
from tensorflow.keras import layers

# Create a network with 1 linear unit
model = keras.Sequential([
layers.Dense(units=1, input_shape=[3])])

With the first argument, units, we define how many outputs we want.
With the second argument, input_shape, we tell Keras the dimensions of the inputs. Setting input_shape=[3] ensures the model will accept three features as input

What is a dense layer?

A dense layer is a layer of units that have a common set of inputs
it is also known as a fully connected layer
each neuron in the layer is connected to every neuron in the previous layer. In this layer, all inputs are connected to all outputs.

How many dense layers do we need to be able to model non-linear relationships?

trick question: No amount of dense layers will be able to model non-linear relationships.
only when non-linear activation functions are used are we able to model non-linear relationships

What are the characteristics of this neural network?

This network is a fully connected network since it consists of dense layers.
it contains two hidden layers, which are called hidden since we never see their outputs directly
this network is appropriate to a regression task, since the output neuron has no activation function.

currently the output is an arbitrary value, other tasks (like classification) might require an activation function on the output.

What is an alternative way of specifying the activation function?

The usual way of attaching an activation function to a Dense layer is to include it as part of the definition with the activation argument.
Sometimes though you'll want to put some other layer between the Dense layer and its activation function. (e.g. When you want to perform batch normalization.) In this case, we can define the activation in its own Activation layer, like so:

layers.Dense(units=8),
layers.Activation('relu')

this is completely equivalent to: layers.Dense(units = 8, activation = "relu")

In addition to the training data, what two things do we need to train a network?

A "loss function" that measures how good the network's predictions are.
An "optimizer" that can tell the network how to change its weights.

How does an optimizer work?

The optimizer is an algorithm that adjusts the weights to minimize the loss.

Virtually all of the optimization algorithms used in deep learning belong to a family called stochastic gradient descent. They are iterative algorithms that train a network in steps.
One step of training goes like this:

Sample some training data and run it through the network to make predictions.
Measure the loss between the predictions and the true values.
Finally, adjust the weights in a direction that makes the loss smaller.

What is stochastic gradient descent?

The gradient is a vector that tells us in what direction the weights need to go.
More precisely, it tells us how to change the weights to make the loss change fastest.
We call our process gradient descent because it uses the gradient to descend the loss curve towards a minimum.
Stochastic means "determined by chance." Our training is stochastic because the minibatches are random samples from the dataset.

What is the difference between a batch/mini batch and an epoch?

A batch or minibatch is a sample of the training data that is passed through the model after which the weights and biases are updated.
an epoch is when all training data has passed through the model (in batches) once

What is the learning rate?

The learning rate determines how drastically the weights and biases can be changed after each batch.
A smaller learning rate means the network needs to see more minibatches before its weights converge to their best values.
The larger the batches, and the lower the learning rate the longer your training time will be.

How many input nodes should a neural network have?

One node for each input variable

In which two ways can you increase the capacity of your model?

You can increase the capacity of a network either by making it wider (more units to existing layers) or by making it deeper (adding more layers).
Wider networks have an easier time learning more linear relationships
deeper networks prefer more nonlinear ones. Which is better just depends on the dataset.

How do the parameters in the earlystopping callback influence the model fit?

If the min_delta becomes larger, the training is stopped earlier

The minimum improvement required is increased. The moment that the improvement does not meet this requirement then occurs earlier.

as patience get's larger, training continues for longer.

if you set patience to 1 epoch for example, you require every epoch of training to elicit a min_delta improvement in the validation loss.
setting patience larger will allow the model to spread the required improvement over multiple epoch's, which enables it to train longer.

What is the use of a dropout layer?

A dropout layer can help prevent overfitting.

How can you include batch normalization in your network?

Apply batchnorm before anything that you want to be normalized.

layers.Dense(16, activation='relu'),
layers.BatchNormalization(),

this normalizes the output of this layer, going to the next layer.

layers.Dense(16),
layers.BatchNormalization(),
layers.Activation('relu'),

this normalizes the z values. Normalization is now applied within the neuron, before the activation function is applied over the values.

if you add it as the first layer of your network it can act as a kind of adaptive preprocessor, standing in for something like Sci-Kit Learn's StandardScaler.

What is the loss function used in a (binary) classification problem?

Classification accuracy is a jumpy function and thus doesn't work well as a loss function.

cannot provide gradient information to facilitate weight tuning

instead we use the cross-entropy
For classification, what we want instead is a distance between probabilities, and this is what cross-entropy provides. Cross-entropy is a sort of measure for the distance from one probability distribution to another.

which probability distributions?

the cross entropy provides gradients for facilitation of weight updates and handles imbalanced data better (one group being larger than the other)

How can you turn inputs into probabilities?

Using the sigmoid activation function in the output neuron
To get the final class prediction, we define a threshold probability. Typically this will be 0.5, so that rounding will give us the correct class: below 0.5 means the class with label 0 and 0.5 or above means the class with label 1. A 0.5 threshold is what Keras uses by default with its accuracy metric.

The question on the page originate from the summary of the following study material:

Deep Learning in Python

View summary

A unique study and practice tool
Never study anything twice again
Get the grades you hope for
100% sure, 100% understanding

Remember faster, study better. Scientifically proven.

Python course deep learning

19 important questions on Python course deep learning

What model is a single unit comparable to?

How can a single unit be expanded to work with more features?

How can you fit a linear unit in keras?

What is a dense layer?

How many dense layers do we need to be able to model non-linear relationships?

What are the characteristics of this neural network?

What is an alternative way of specifying the activation function?

In addition to the training data, what two things do we need to train a network?

How does an optimizer work?

What is stochastic gradient descent?

What is the difference between a batch/mini batch and an epoch?

What is the learning rate?

How many input nodes should a neural network have?

In which two ways can you increase the capacity of your model?

How do the parameters in the earlystopping callback influence the model fit?

What is the use of a dropout layer?

How can you include batch normalization in your network?

What is the loss function used in a (binary) classification problem?

How can you turn inputs into probabilities?

Summaries related to Book .4

Class notes - Deep Learning in Python

Cognitive Psychology

Research Methods in Psychology Evaluating a …

Psychology A Concise Introduction

An Introduction to Developmental Psychology

Abnormal Psychology

Statistics The Art and Science of Learning f…

Organizational Behavior

A conceptual introduction to psychometrics

Electrical Engineering: Concepts and Applica…

Class notes - scientific and statistical rea…

Class notes - Psychological assessment