Home / Summaries / Class notes - Deep Learning in Python / network-learning-weights

Mixed cards

19 important questions on Mixed cards

What is a central cause of the first AI winter?

When a deep learning model gets two binary inputs, it cannot compute their interaction.
the XOR problem

What should you do in this situation, according to the perceptron learning rule?
"If the perceptron predicts a 0 and it should have predicted a 1...

add the input vector to the weight vector.

What distinguishes reinforcement learning from supervised learning?

In reinforcement learning, the model has to interact with its environment.
In reinforcement learning, the correctness of an output can often only be determined after a sequence of model outputs, whereas in supervised learning we usually know right away how good a single model output was.

Imagine a neural network with 3 input variables, one hidden layer with 8 nodes, and a single node in the output layer. How many trainable parameters are in that model (assuming a basic, fully-connected feed-forward architecture).

41
the output node also has a bias. Every node that is not an input node has a bias.

If you have a neural network with 10 trainable weights, that you train for 10 epochs, with a mini-batch size of 20, and your dataset has 400 observations.
How many observations are fed into the model across the entire training procedure?
How many times are the weights updated across the entire training procedure?

4000, one epoch means all training data has been used once. 10 epochs will use all training data once. 400*10 is 4000

amount of weights, and batch-size are irrelevant here.

200, after every mini-batch the weights are updated. 400/20 is 20 update moments in one epoch. We have 10 epochs so 20*10 is 200 updates.

all weights get updated with every update so amount of weights is irrelevant here.

Neural networks are models that have a high degree of ....... when the outcome appears to be close to linear, a neural network will be ....... efficient to use than a linear regression or Lasso model

Variance
less

Based on the learning curve the model, there might be a problem with and something one could try next is .

Underfitting, adding layers.
we want the model to become more complex because currently the true relationship in the data is not captured.

In gradient descent, what does the "gradient" refer to?

A vector of partial derivatives of the loss function with respect to parameters

An RGB image of 3 channels/layers is being evaluated by a neural network that has 2 convolutional layers. The 1st layer looks at colors (RGB; i.e., 3 filters are applied). The second layer looks at vertical shapes and horizontal shapes separately (i.e., 2 filters are applied). How many channels/layers will the image have after passing through these 2 convolutional layers?

2, you end up with as many feature maps as you had filters in this layer.

What is a typical set up for a convolutional network?

Convolutional layer
max pooling layer
flattening layer
fully connected layer

You have an image of size 12x12. You are going to feed this image to a network with 2 convolutional layers before flattening. You are only using 4x4 receptive fields with stride 1. What are the dimensions of your final, to be flattened, image?

6x6

When your data set is rather small, which methods are there to increase the accuracy of your deep convolutional networks in image modeling?

Data augumentation
using a pre-trained classifier

Is the following statement true or false?:

The difference between a multilayer feedforward neural network and a recurrent neural network lies in the fact that the recurrent neural network re-uses the weights in a hidden layer recurrent repeatedly, possibly infinitely often, whereas the multilayer feedforward network has different weights for each hidden layer separately.

True, the weights in an rnn are shared

Consider the analogy
Tiger : Cat ~ X : Dog
How would X be solved for in terms of the logic attributed to Word2Vec embeddings?

Wwolf = Wtiger - Wcat + Wdog

subtract the left side from eachother to get the essence of their relationship and add this essence to the right side to obtain the missing word.

Similarities among word's meaning are quantitively captured by...

Word embeddings

What is a consequence of vanishing gradients?

The network is not learning

Consider an implementation of Rosenblatt’s algorithm:
w = np.random.normal(size=3)
eta = 0.001
for _ in range(100000):
    for i in range(len(X)):
        x = X[i]
        d = np.sign(x.dot(w))
        w = w + eta * (y[i] - d)
what is the role of eta?

It determines the training speed (it’s called the learning rate parameter).
the larger it gets, the more drastically the weights get updated.

What are the characteristics of this network?

The network has two hidden layers.
The network is an example of a (dense) feedforward network (a so called Multilayer Perceptron).
The network learns 10 functions—each of P input variables.

What type of variables call for the same loss function as the likelihood used in binomial logistic regression?

Binary target variables, like yes/no variables

The question on the page originate from the summary of the following study material:

Deep Learning in Python

View summary

A unique study and practice tool
Never study anything twice again
Get the grades you hope for
100% sure, 100% understanding

Remember faster, study better. Scientifically proven.

Mixed cards

19 important questions on Mixed cards

What is a central cause of the first AI winter?

What should you do in this situation, according to the perceptron learning rule?"If the perceptron predicts a 0 and it should have predicted a 1...

What distinguishes reinforcement learning from supervised learning?

Imagine a neural network with 3 input variables, one hidden layer with 8 nodes, and a single node in the output layer. How many trainable parameters are in that model (assuming a basic, fully-connected feed-forward architecture).

Neural networks are models that have a high degree of ....... when the outcome appears to be close to linear, a neural network will be ....... efficient to use than a linear regression or Lasso model

Based on the learning curve the model, there might be a problem with ____ and something one could try next is ____.

In gradient descent, what does the "gradient" refer to?

What is a typical set up for a convolutional network?

You have an image of size 12x12. You are going to feed this image to a network with 2 convolutional layers before flattening. You are only using 4x4 receptive fields with stride 1. What are the dimensions of your final, to be flattened, image?

When your data set is rather small, which methods are there to increase the accuracy of your deep convolutional networks in image modeling?

Consider the analogyTiger : Cat ~ X : DogHow would X be solved for in terms of the logic attributed to Word2Vec embeddings?

Similarities among word's meaning are quantitively captured by...

What is a consequence of vanishing gradients?

Consider an implementation of Rosenblatt’s algorithm:w = np.random.normal(size=3)eta = 0.001for _ in range(100000): for i in range(len(X)): x = X[i] d = np.sign(x.dot(w)) w = w + eta * (y[i] - d) what is the role of eta?

What are the characteristics of this network?

What type of variables call for the same loss function as the likelihood used in binomial logistic regression?

Summaries related to Book .4

Class notes - Deep Learning in Python

Cognitive Psychology

Research Methods in Psychology Evaluating a …

Psychology A Concise Introduction

An Introduction to Developmental Psychology

Abnormal Psychology

Statistics The Art and Science of Learning f…

Organizational Behavior

A conceptual introduction to psychometrics

Electrical Engineering: Concepts and Applica…

Class notes - scientific and statistical rea…

Class notes - Psychological assessment

What should you do in this situation, according to the perceptron learning rule?
"If the perceptron predicts a 0 and it should have predicted a 1...

Based on the learning curve the model, there might be a problem with and something one could try next is .

Consider the analogy
Tiger : Cat ~ X : Dog
How would X be solved for in terms of the logic attributed to Word2Vec embeddings?

Consider an implementation of Rosenblatt’s algorithm:
w = np.random.normal(size=3)
eta = 0.001
for _ in range(100000):
for i in range(len(X)):
x = X[i]
d = np.sign(x.dot(w))
w = w + eta * (y[i] - d)
what is the role of eta?