Summary: Vl Lstm And Recurrent Neural Nets
- This + 400k other summaries
- A unique study and practice tool
- Never study anything twice again
- Get the grades you hope for
- 100% sure, 100% understanding
Read the summary and the most important questions on VL LSTM and Recurrent Neural Nets
-
1 Simpe recurrent networks
This is a preview. There are 9 more flashcards available for chapter 1
Show more cards here -
1.5 Other RNN learning algorithmes
This is a preview. There are 10 more flashcards available for chapter 1.5
Show more cards here -
Real-Time recurrent Learning
Computes all contributions to the gradients during the forward pass , the derivative of each unit with respect to each weigh is tracked -
4 Transformers and attention
-
4.1.1 temporal attention
This is a preview. There are 9 more flashcards available for chapter 4.1.1
Show more cards here -
Temporal attention (for sequences)
Focuses on relevant elements or intervals of a sequence when processing this sequence -
Playing "soft attention"
I is in range (0,1) -> since it can laso choose to let through half of the information -
4.2 Attention in sequence-to-sequence models
This is a preview. There are 7 more flashcards available for chapter 4.2
Show more cards here -
Bahdanau attention mechanism
Additive attention ; allows to focus on different parts of the input sequence at a time -
Why the name additive attention
Becuse of the sum inside the tanh function. -
11 Script summary
-
11.1 Simple recurrent networks
This is a preview. There are 18 more flashcards available for chapter 11.1
Show more cards here -
Pred_y = g(x;w)
Feedforward network = function that maps an input to prediction using network parameters -
idea is Turing compleet
That every computer program can be represented by the idea -
Jordan network (idea)
- Earliest recurrent neural architectures
- 1 hidden layer
- keep the last ouput as a form of context for processing the next input
-
Fully recurrent network
- Elman network + loosen the choice of reccurent parameters
- add the R^T * a(t-1)
- treat the R as trainable
- a hidden unit now depends on itself + neighboring neurons
- a(0) = 0 <- clean memory
-
11.2 Learning algorithmes for RNN's
This is a preview. There are 11 more flashcards available for chapter 11.2
Show more cards here -
Empirical risk minimization (gradient descent)
- Input sequence with T elements
- find a parameterizatoin that minimizes the risk ; argmin Remp(g(.;w))
- gradient descent : w = w_old - eta * Remp
- iterate this procedure until the only possible choice is eta = 0 and we converged to a local minimum
- Higher grades + faster learning
- Never study anything twice
- 100% sure, 100% understanding