Transformers and attention

4 important questions on Transformers and attention

Temporal attention (for sequences)

Focuses on relevant elements or intervals of a sequence when processing this sequence

Playing "soft attention"

I is in range (0,1) -> since it can laso choose to let through half of the information

Bahdanau attention mechanism

Additive attention ; allows to focus on different parts of the input sequence at a time
  • Higher grades + faster learning
  • Never study anything twice
  • 100% sure, 100% understanding
Discover Study Smart

Why the name additive attention

Becuse of the sum inside the tanh function.

The question on the page originate from the summary of the following study material:

  • A unique study and practice tool
  • Never study anything twice again
  • Get the grades you hope for
  • 100% sure, 100% understanding
Remember faster, study better. Scientifically proven.
Trustpilot Logo