Mixed cards
19 important questions on Mixed cards
What is a central cause of the first AI winter?
the XOR problem
What should you do in this situation, according to the perceptron learning rule?
"If the perceptron predicts a 0 and it should have predicted a 1...
What distinguishes reinforcement learning from supervised learning?
- In reinforcement learning, the model has to interact with its environment.
- In reinforcement learning, the correctness of an output can often only be determined after a sequence of model outputs, whereas in supervised learning we usually know right away how good a single model output was.
- Higher grades + faster learning
- Never study anything twice
- 100% sure, 100% understanding
Imagine a neural network with 3 input variables, one hidden layer with 8 nodes, and a single node in the output layer. How many trainable parameters are in that model (assuming a basic, fully-connected feed-forward architecture).
- 41
- the output node also has a bias. Every node that is not an input node has a bias.
If you have a neural network with 10 trainable weights, that you train for 10 epochs, with a mini-batch size of 20, and your dataset has 400 observations.
- How many observations are fed into the model across the entire training procedure?
- How many times are the weights updated across the entire training procedure?
- 4000, one epoch means all training data has been used once. 10 epochs will use all training data once. 400*10 is 4000
- amount of weights, and batch-size are irrelevant here.
- 200, after every mini-batch the weights are updated. 400/20 is 20 update moments in one epoch. We have 10 epochs so 20*10 is 200 updates.
- all weights get updated with every update so amount of weights is irrelevant here.
Neural networks are models that have a high degree of ....... when the outcome appears to be close to linear, a neural network will be ....... efficient to use than a linear regression or Lasso model
- Variance
- less
Based on the learning curve the model, there might be a problem with ____ and something one could try next is ____.
we want the model to become more complex because currently the true relationship in the data is not captured.
In gradient descent, what does the "gradient" refer to?
An RGB image of 3 channels/layers is being evaluated by a neural network that has 2 convolutional layers. The 1st layer looks at colors (RGB; i.e., 3 filters are applied). The second layer looks at vertical shapes and horizontal shapes separately (i.e., 2 filters are applied). How many channels/layers will the image have after passing through these 2 convolutional layers?
What is a typical set up for a convolutional network?
- Convolutional layer
- max pooling layer
- flattening layer
- fully connected layer
You have an image of size 12x12. You are going to feed this image to a network with 2 convolutional layers before flattening. You are only using 4x4 receptive fields with stride 1. What are the dimensions of your final, to be flattened, image?
When your data set is rather small, which methods are there to increase the accuracy of your deep convolutional networks in image modeling?
- Data augumentation
- using a pre-trained classifier
Is the following statement true or false?:
The difference between a multilayer feedforward neural network and a recurrent neural network lies in the fact that the recurrent neural network re-uses the weights in a hidden layer recurrent repeatedly, possibly infinitely often, whereas the multilayer feedforward network has different weights for each hidden layer separately.
Consider the analogy
Tiger : Cat ~ X : Dog
How would X be solved for in terms of the logic attributed to Word2Vec embeddings?
subtract the left side from eachother to get the essence of their relationship and add this essence to the right side to obtain the missing word.
Similarities among word's meaning are quantitively captured by...
What is a consequence of vanishing gradients?
Consider an implementation of Rosenblatt’s algorithm:
w = np.random.normal(size=3)
eta = 0.001
for _ in range(100000):
for i in range(len(X)):
x = X[i]
d = np.sign(x.dot(w))
w = w + eta * (y[i] - d)
what is the role of eta?
the larger it gets, the more drastically the weights get updated.
What are the characteristics of this network?
- The network has two hidden layers.
- The network is an example of a (dense) feedforward network (a so called Multilayer Perceptron).
- The network learns 10 functions—each of P input variables.
What type of variables call for the same loss function as the likelihood used in binomial logistic regression?
The question on the page originate from the summary of the following study material:
- A unique study and practice tool
- Never study anything twice again
- Get the grades you hope for
- 100% sure, 100% understanding