Cruise control or in control
38 important questions on Cruise control or in control
2nd order conditioning
it is like that the prediction error has moved to the past.
it is as if the reward is moved from the juice to the bell being rewarded to the light being rewarded
-> sequences of events, sequences of actions that lead to reward
When the bell is fully predicting the reward, the prediction error to the reward doesn’t disappear, but we see now a burst in dopamine neurons firing at the time of the bell. It is like as if the prediction error has moved backwards to the bell.
Why?
To solve the p-beauty contest, you need to build a 'model' of behaviour of others?
- Higher grades + faster learning
- Never study anything twice
- 100% sure, 100% understanding
Backwards moving prediction errors
-> positive prediction error to the van leads to conditioning of the sun
What actions will help us to get to the reward?
- PE based learning allows us to understand how we learn which stimuli predict future rewards
- most of time that is not enough -> requires to take actions
- Need to learn to whole sequence of actions
Do we learn a map of the world?
Can we learn this without experiencing prediction errors?
You have to take all these roots, multiple times to this prediction errors to back propagate
Learning without rewards? Experiment
- Do rats use cognitive maps?
- two groups of rats
- one group pre-exposed without food
- simple reinforcement learning theory stimulus response -> predict that there was no learning. No reward prediction errors
- Regular training to find the fruit (reward)
Learning without rewards? Experimental results
Purple: reward is introduced at specific point -> take 4 days
Red: reward is introduces at specific point -> take 1 day
-> no evidence that there was learning without reward
-> you can build this map
First hall mark of goal-directed behaviour
Even rats can learn spatial structure, and use it to plan actions
note that spatial tasks are really complicated & hard to control
Mixture of planning and reinforcement processes
What affects this balance?
- Cognitive
- Clinical
- Neural circuit
Cognitive: capacity limits
- Loading the central executive: working memory task
- Prefrontal cortical basis of model-based control?
Habits vs. Goals
Same action can arise from two behaviourally and neurally distinct systems
- Habitual and goal-directed
- can only distinguish with devaluation test
What path in the brain is important for model-based vs. Model-free learning?
Moderately trained behaviour is goal-directed
- Link actions to outcomes
- devaluation-sensitive
- demonstrates animals represent outcome, not just its cached value
- reminiscent of Tolman cognitive map
Overtrained behaviour becomes habitual
- Stimulus -> response behaviours
- devaluation-insensitive
- Represents abstract value, not specific outcome
Devaluation test: will work for food ... You don't want?
Steps:
Step 2: Devalue outcome by
- saturation (verzadiging)
- pairing with aversive outcome
-> make the animal feel nauseous, injection while they eat food.
-> put food in front of him and look if they still want to eat the food.
Step 3: Offer rat to press lever (without outcomes)
whether they would do this knowing that there is some outcome following and then see how many lever pressing is following per minute
What happens if we take out the frontal cortex cognitively by giving a really hard (working memory) task and at the same time they have to watch out for numbers that disappear on the screen?
- Balance between model-free vs. Model-based, when they only doing the two-step task.
- Loaded the frontal cortex with the task
- Huge drop in this stability to perform the task in a model-based manner
-> there is something about capacity limits
What happens when you train the rat for a decent amount of time, but not quite so long (with devaluation test)?
Possible to knock out either system with lesion; the other one takes over
- Involving areas of cortex and striatum
- suggests parallel neural systems -> multiple actions systems?
What suggest the previous two-step task and the reversed one?
What happens when you train the rat for a moderate training (with devaluation test)?
After some days, it looks like if the animal is forgotten that he hates the cupcake -> outcome insensitive -> habitual response.
At some point it seems to forget what happens after pressing the lever
2nd hallmark of goal-directed behaviour:
Neural circuitry: fronto-striatal system
- Neural basis of individual differences: fronto-striatal connectivity (DTI)
- disrupting right DLPC disrupt model-based relative to model-free control
Outcome sensitivity -> instrumental actions are of 2 kinds:
- Goal-directed
- habitual actions
What can they quantify with diffusion tensor imaging and what is the result?
- The strength of the connection between the striatum and the frontal cortex
- turned out that the strength of the connections between the striatum and frontal cortex determined the degree of which people were model-based
Habits are the end-product of a long training regime
- Does behaviour necessarily become habitual with training?
- Distinguish the road vs. Final destination
- Process to get there is model-based vs. Model-free learning
What kind of study is this and what does this study show?
- TMS study:
- targeted the DLPFC exactly following that working memory what is known to rely on DLPFC is taken out cognitively.
- You reduce the degree to which people are able to be model-based
- TMS disrupting the right DLPFC
Example of loss of control over repetitive behaviour in a range of disorders:
- Obsessive-compulsive disorder (OCD)
- Addiction
Hypothesis: Compulsivity is partially due to an imbalance between ....
Neural basis of habits
- Which area is lesioned?
- What is the consequence of this lesion
- dorsolateral striatum
- rats acquire normally but never form habits: perpetually devaluation sensitive
- DLS -> connected to motor cortex
- lack for devaluation
Neural basis of goal directed action
- Which area is lesioned?
- What is the consequence of this lesion?
- Prefrontal lesion
- it produces opposite pattern: even undertrained rats are habitual (devaluation insensitive)
Why multiple decision systems?
- learns the structure of the environment.
- flexible, accurate, but slow
Model-free:
- ignores the structure of the environment
- fast, but error-prone and rigid
Clinical: pathological balance?
two step task
Compared the healthy volunteer group to a number of clinical or mental health disorders that are associated with compulsive behaviours
à model-based drops
First evidence that there might be something to compulsive disorders to do with a decrease in this model-based control.
When to favour each decision system?
- Cost-benefit analysis?
- E.g. not worth deliberating when highly practiced on stable task
- When the outcomes are stable, you have to go with it
- Environment is changeable or things has to be exactly right à spend time with the computational effort
- Related to self-control, impulsivity, compulsion
- Evidence for clinical relevance
How do we measure model-based vs. model-free control..?
Correlation of compulsive behaviours across "diagnoses"
3 different factors
- Anxious depression
- compulsive behaviour
- intuitive thoughts
How do we measure model-based vs. Model-free control?
- Making a choice between 2 stimuli
- second choice: make a decision and get or get not a reward
Tricking part -> 70% or 30% chance of receiving a reward.
If you receive a reward -> you repeat actions that have been rewarded in the past
What does the previous model not take into account? And explain it.
What if the following is happening à pick the yellow one and got to the red stimulus and get a reward. If I want to this stimulus again, what should I do next time? Should have choose the black one à 70%. That makes a dissociable prediction. So, this model-based learning system à will take into account whether this transmission you made between step one or step 2 à common (70%) or rare (30%) transition.
You need to shift the first stimulus you picked if it was a rare transition.
The question on the page originate from the summary of the following study material:
- A unique study and practice tool
- Never study anything twice again
- Get the grades you hope for
- 100% sure, 100% understanding