Home / Summaries / Class notes - Cognitive Control & Decision Making / reward-prediction-learning

Cruise control or in control

38 important questions on Cruise control or in control

2nd order conditioning

Respond also to the light.
it is like that the prediction error has moved to the past.

it is as if the reward is moved from the juice to the bell being rewarded to the light being rewarded
-> sequences of events, sequences of actions that lead to reward

When the bell is fully predicting the reward, the prediction error to the reward doesn’t disappear, but we see now a burst in dopamine neurons firing at the time of the bell. It is like as if the prediction error has moved backwards to the bell.
Why?

The bell is positively surprising -> it is telling you that there is being a reward in your future

To solve the p-beauty contest, you need to build a 'model' of behaviour of others?

Models of the world in your head -> controversy in early psychology

Backwards moving prediction errors

Ice cream van only comes on sunny days in the summer
-> positive prediction error to the van leads to conditioning of the sun

What actions will help us to get to the reward?

PE based learning allows us to understand how we learn which stimuli predict future rewards

most of time that is not enough -> requires to take actions

Need to learn to whole sequence of actions

Do we learn a map of the world?
Can we learn this without experiencing prediction errors?

Prediction error learning requires you to experience your prediction error in order to learn.
You have to take all these roots, multiple times to this prediction errors to back propagate

Learning without rewards? Experiment

Rats in a maze

Do rats use cognitive maps?
two groups of rats
one group pre-exposed without food

simple reinforcement learning theory stimulus response -> predict that there was no learning. No reward prediction errors
Regular training to find the fruit (reward)

Learning without rewards? Experimental results

Blue: not trained -> take them 7 days
Purple: reward is introduced at specific point -> take 4 days
Red: reward is introduces at specific point -> take 1 day

-> no evidence that there was learning without reward
-> you can build this map

First hall mark of goal-directed behaviour

Taking the task environment into account

Even rats can learn spatial structure, and use it to plan actions

Simple reinforcement learning (with backpropagation prediction errors) can't do this
note that spatial tasks are really complicated & hard to control

Mixture of planning and reinforcement processes
What affects this balance?

Cognitive
Clinical
Neural circuit

Cognitive: capacity limits

Loading the central executive: working memory task
Prefrontal cortical basis of model-based control?

Habits vs. Goals
Same action can arise from two behaviourally and neurally distinct systems

Habitual and goal-directed
can only distinguish with devaluation test

What path in the brain is important for model-based vs. Model-free learning?

Frontal cortex vs more subcortical cortex

Moderately trained behaviour is goal-directed

Link actions to outcomes
devaluation-sensitive
demonstrates animals represent outcome, not just its cached value
reminiscent of Tolman cognitive map

Overtrained behaviour becomes habitual

Stimulus -> response behaviours
devaluation-insensitive
Represents abstract value, not specific outcome

Devaluation test: will work for food ... You don't want?
Steps:

Step 1: Train rat to press lever for rewarding outcome (e.g. Cupcake)

Step 2: Devalue outcome by

saturation (verzadiging)
pairing with aversive outcome

-> make the animal feel nauseous, injection while they eat food.
-> put food in front of him and look if they still want to eat the food.

Step 3: Offer rat to press lever (without outcomes)
whether they would do this knowing that there is some outcome following and then see how many lever pressing is following per minute

What happens if we take out the frontal cortex cognitively by giving a really hard (working memory) task and at the same time they have to watch out for numbers that disappear on the screen?

Balance between model-free vs. Model-based, when they only doing the two-step task.
Loaded the frontal cortex with the task
Huge drop in this stability to perform the task in a model-based manner
-> there is something about capacity limits

What happens when you train the rat for a decent amount of time, but not quite so long (with devaluation test)?

They still press the lever a lot for cheese, but don't press the lever for cupcake.

Possible to knock out either system with lesion; the other one takes over

Involving areas of cortex and striatum
suggests parallel neural systems -> multiple actions systems?

What suggest the previous two-step task and the reversed one?

There is a prefrontal cortical basis of model-based control

What happens when you train the rat for a moderate training (with devaluation test)?

They know that when pressing the lever they get the cupcake but they hate the cupcake.
After some days, it looks like if the animal is forgotten that he hates the cupcake -> outcome insensitive -> habitual response.
At some point it seems to forget what happens after pressing the lever

2nd hallmark of goal-directed behaviour:

Outcome sensitivity

Neural circuitry: fronto-striatal system

Neural basis of individual differences: fronto-striatal connectivity (DTI)
disrupting right DLPC disrupt model-based relative to model-free control

Outcome sensitivity -> instrumental actions are of 2 kinds:

Goal-directed
habitual actions

What can they quantify with diffusion tensor imaging and what is the result?

The strength of the connection between the striatum and the frontal cortex
turned out that the strength of the connections between the striatum and frontal cortex determined the degree of which people were model-based

Habits are the end-product of a long training regime

Does behaviour necessarily become habitual with training?
Distinguish the road vs. Final destination
Process to get there is model-based vs. Model-free learning

What kind of study is this and what does this study show?

TMS study:

targeted the DLPFC exactly following that working memory what is known to rely on DLPFC is taken out cognitively.
You reduce the degree to which people are able to be model-based

TMS disrupting the right DLPFC

Example of loss of control over repetitive behaviour in a range of disorders:

Obsessive-compulsive disorder (OCD)
Addiction

Hypothesis: Compulsivity is partially due to an imbalance between ....

flexible, goal-directed control and habits -> caused by increased reliance on model-free learning?

Neural basis of habits
Which area is lesioned?
What is the consequence of this lesion

dorsolateral striatum
rats acquire normally but never form habits: perpetually devaluation sensitive

DLS -> connected to motor cortex
lack for devaluation

Neural basis of goal directed action
Which area is lesioned?
What is the consequence of this lesion?

Prefrontal lesion
it produces opposite pattern: even undertrained rats are habitual (devaluation insensitive)

Why multiple decision systems?

Model-based:

learns the structure of the environment.
flexible, accurate, but slow

Model-free:

ignores the structure of the environment
fast, but error-prone and rigid

Clinical: pathological balance?
two step task

Different population -> across population has different balance

Compared the healthy volunteer group to a number of clinical or mental health disorders that are associated with compulsive behaviours
à model-based drops

First evidence that there might be something to compulsive disorders to do with a decrease in this model-based control.

When to favour each decision system?

Cost-benefit analysis?

E.g. not worth deliberating when highly practiced on stable task
When the outcomes are stable, you have to go with it
Environment is changeable or things has to be exactly right à spend time with the computational effort

Related to self-control, impulsivity, compulsion

Evidence for clinical relevance

How do we measure model-based vs. model-free control..?

Correlation of compulsive behaviours across "diagnoses"
3 different factors

Anxious depression
compulsive behaviour
intuitive thoughts

How do we measure model-based vs. Model-free control?

Making a choice between 2 stimuli
second choice: make a decision and get or get not a reward

Tricking part -> 70% or 30% chance of receiving a reward.

If you receive a reward -> you repeat actions that have been rewarded in the past

What does the previous model not take into account? And explain it.

Transition probabilities

What if the following is happening à pick the yellow one and got to the red stimulus and get a reward. If I want to this stimulus again, what should I do next time? Should have choose the black one à 70%. That makes a dissociable prediction. So, this model-based learning system à will take into account whether this transmission you made between step one or step 2 à common (70%) or rare (30%) transition.
You need to shift the first stimulus you picked if it was a rare transition.

The question on the page originate from the summary of the following study material:

Cognitive Control & Decision Making

View summary

A unique study and practice tool
Never study anything twice again
Get the grades you hope for
100% sure, 100% understanding

Remember faster, study better. Scientifically proven.

Cruise control or in control

38 important questions on Cruise control or in control

2nd order conditioning

When the bell is fully predicting the reward, the prediction error to the reward doesn’t disappear, but we see now a burst in dopamine neurons firing at the time of the bell. It is like as if the prediction error has moved backwards to the bell.Why?

To solve the p-beauty contest, you need to build a 'model' of behaviour of others?

Backwards moving prediction errors

What actions will help us to get to the reward?

Do we learn a map of the world?Can we learn this without experiencing prediction errors?

Learning without rewards? Experiment

Learning without rewards? Experimental results

First hall mark of goal-directed behaviour

Even rats can learn spatial structure, and use it to plan actions

Mixture of planning and reinforcement processesWhat affects this balance?

Cognitive: capacity limits

Habits vs. GoalsSame action can arise from two behaviourally and neurally distinct systems

What path in the brain is important for model-based vs. Model-free learning?

Moderately trained behaviour is goal-directed

Overtrained behaviour becomes habitual

Devaluation test: will work for food ... You don't want?Steps:

What happens if we take out the frontal cortex cognitively by giving a really hard (working memory) task and at the same time they have to watch out for numbers that disappear on the screen?

What happens when you train the rat for a decent amount of time, but not quite so long (with devaluation test)?

Possible to knock out either system with lesion; the other one takes over

What suggest the previous two-step task and the reversed one?

What happens when you train the rat for a moderate training (with devaluation test)?

2nd hallmark of goal-directed behaviour:﻿

Neural circuitry: fronto-striatal system

Outcome sensitivity -> instrumental actions are of 2 kinds:

What can they quantify with diffusion tensor imaging and what is the result?

Habits are the end-product of a long training regime

What kind of study is this and what does this study show?

Example of loss of control over repetitive behaviour in a range of disorders:

Hypothesis: Compulsivity is partially due to an imbalance between ....

Neural basis of habitsWhich area is lesioned?What is the consequence of this lesion

Neural basis of goal directed actionWhich area is lesioned?What is the consequence of this lesion?

Why multiple decision systems?

Clinical: pathological balance? two step task

When to favour each decision system?

Correlation of compulsive behaviours across "diagnoses"3 different factors

How do we measure model-based vs. Model-free control?

What does the previous model not take into account? And explain it.

Summaries related to Comparative neuroanatomy of prefrontal cortex

Class notes - Cognitive Control & Decision M…

Class notes - Developmental Cognitive Neuros…

Indian Economics

Global politics

Essentials of international relations

Behavioral genetics

Management and organisational behaviour

Follow Up Engels idioom 4/5 H

International Business

Marketing fundamentals

Projectmanagement, A practical Approach-Engl…

Basic Management Accounting for the Hospital…