Cruise control or in control

38 important questions on Cruise control or in control

2nd order conditioning

Respond also to the light.
it is like that the prediction error has moved to the past.

it is as if the reward is moved from the juice to the bell being rewarded to the light being rewarded
-> sequences of events, sequences of actions that lead to reward

When the bell is fully predicting the reward, the prediction error to the reward doesn’t disappear, but we see now a burst in dopamine neurons firing at the time of the bell. It is like as if the prediction error has moved backwards to the bell.
Why?

The bell is positively surprising -> it is telling you that there is being a reward in your future

To solve the p-beauty contest, you need to build a 'model' of behaviour of others?

Models of the world in your head -> controversy in early psychology
  • Higher grades + faster learning
  • Never study anything twice
  • 100% sure, 100% understanding
Discover Study Smart

Backwards moving prediction errors

Ice cream van only comes on sunny days in the summer
-> positive prediction error to the van leads to conditioning of the sun

What actions will help us to get to the reward?

  • PE based learning allows us to understand how we learn which stimuli predict future rewards
    • most of time that is not enough -> requires to take actions
  • Need to learn to whole sequence of actions

Do we learn a map of the world?
Can we learn this without experiencing prediction errors?

Prediction error learning requires you to experience your prediction error in order to learn.
You have to take all these roots, multiple times to this prediction errors to back propagate

Learning without rewards? Experiment

Rats in a maze
  • Do rats use cognitive maps?
  • two groups of rats
  • one group pre-exposed without food
    • simple reinforcement learning theory stimulus response -> predict that there was no learning. No reward prediction errors
    • Regular training to find the fruit (reward)

Learning without rewards? Experimental results

Blue: not trained -> take them 7 days
Purple: reward is introduced at specific point -> take 4 days
Red: reward is introduces at specific point -> take 1 day

-> no evidence that there was learning without reward
-> you can build this map

First hall mark of goal-directed behaviour

Taking the task environment into account

Even rats can learn spatial structure, and use it to plan actions

Simple reinforcement learning (with backpropagation prediction errors) can't do this
note that spatial tasks are really complicated & hard to control

Mixture of planning and reinforcement processes
What affects this balance?

  • Cognitive
  • Clinical
  • Neural circuit

Cognitive: capacity limits

  • Loading the central executive: working memory task
  • Prefrontal cortical basis of model-based control?

Habits vs. Goals
Same action can arise from two behaviourally and neurally distinct systems

  • Habitual and goal-directed
  • can only distinguish with devaluation test

What path in the brain is important for model-based vs. Model-free learning?

Frontal cortex vs more subcortical cortex


Moderately trained behaviour is goal-directed

  • Link actions to outcomes
  • devaluation-sensitive
  • demonstrates animals represent outcome, not just its cached value
  • reminiscent of Tolman cognitive map

Overtrained behaviour becomes habitual

  • Stimulus -> response behaviours
  • devaluation-insensitive
  • Represents abstract value, not specific outcome

Devaluation test: will work for food ... You don't want?
Steps:

Step 1: Train rat to press lever for rewarding outcome (e.g. Cupcake)

Step 2: Devalue outcome by
  • saturation (verzadiging)
  • pairing with aversive outcome


-> make the animal feel nauseous, injection while they eat food.
-> put food in front of him and look if they still want to eat the food.

Step 3: Offer rat to press lever (without outcomes)
whether they would do this knowing that there is some outcome following and then see how many lever pressing is following per minute

What happens if we take out the frontal cortex cognitively by giving a really hard (working memory) task and at the same time they have to watch out for numbers that disappear on the screen?

  • Balance between model-free vs. Model-based, when they only doing the two-step task.
  • Loaded the frontal cortex with the task
  • Huge drop in this stability to perform the task in a model-based manner
    -> there is something about capacity limits

What happens when you train the rat for a decent amount of time, but not quite so long (with devaluation test)?

They still press the lever a lot for cheese, but don't press the lever for cupcake.

Possible to knock out either system with lesion; the other one takes over

  • Involving areas of cortex and striatum
  • suggests parallel neural systems -> multiple actions systems?

What suggest the previous two-step task and the reversed one?

There is a prefrontal cortical basis of model-based control

What happens when you train the rat for a moderate training (with devaluation test)?

They know that when pressing the lever they get the cupcake but they hate the cupcake.
After some days, it looks like if the animal is forgotten that he hates the cupcake -> outcome insensitive -> habitual response.
At some point it seems to forget what happens after pressing the lever

2nd hallmark of goal-directed behaviour:

Outcome sensitivity

Neural circuitry: fronto-striatal system

  • Neural basis of individual differences: fronto-striatal connectivity (DTI)
  • disrupting right DLPC disrupt model-based relative to model-free control

Outcome sensitivity -> instrumental actions are of 2 kinds:

  • Goal-directed
  • habitual actions

What can they quantify with diffusion tensor imaging and what is the result?

  • The strength of the connection between the striatum and the frontal cortex
  • turned out that the strength of the connections between the striatum and frontal cortex determined the degree of which people were model-based

Habits are the end-product of a long training regime

  • Does behaviour necessarily become habitual with training?
  • Distinguish the road vs. Final destination 
  • Process to get there is model-based vs. Model-free learning

What kind of study is this and what does this study show?

  • TMS study:
    • targeted the DLPFC exactly following that working memory what is known to rely on DLPFC is taken out cognitively.
    • You reduce the degree to which people are able to be model-based
  • TMS disrupting the right DLPFC

Example of loss of control over repetitive behaviour in a range of disorders:

  • Obsessive-compulsive disorder (OCD)
  • Addiction

Hypothesis: Compulsivity is partially due to an imbalance between ....

flexible, goal-directed control and habits -> caused by increased reliance on model-free learning?

Neural basis of habits
  1. Which area is lesioned?
  2. What is the consequence of this lesion

  1. dorsolateral striatum
  2. rats acquire normally but never form habits: perpetually devaluation sensitive
    • DLS -> connected to motor cortex
    • lack for devaluation

Neural basis of goal directed action
  1. Which area is lesioned?
  2. What is the consequence of this lesion?

  1. Prefrontal lesion
  2. it produces opposite pattern: even undertrained rats are habitual (devaluation insensitive)

Why multiple decision systems?

Model-based:
  • learns the structure of the environment.
  • flexible, accurate, but slow

Model-free:
  • ignores the structure of the environment
  • fast, but error-prone and rigid

Clinical: pathological balance?
two step task

Different population -> across population has different balance

Compared the healthy volunteer group to a number of clinical or mental health disorders that are associated with compulsive behaviours
à model-based drops

First evidence that there might be something to compulsive disorders to do with a decrease in this model-based control.

When to favour each decision system?

  • Cost-benefit analysis?
    • E.g. not worth deliberating when highly practiced on stable task
    • When the outcomes are stable, you have to go with it
    • Environment is changeable or things has to be exactly right à spend time with the computational effort
  • Related to self-control, impulsivity, compulsion
    • Evidence for clinical relevance


How do we measure model-based vs. model-free control..?

Correlation of compulsive behaviours across "diagnoses"
3 different factors

  • Anxious depression
  • compulsive behaviour
  • intuitive thoughts

How do we measure model-based vs. Model-free control?

  1. Making a choice between 2 stimuli
  2. second choice: make a decision and get or get not a reward


Tricking part -> 70% or 30% chance of receiving a reward.

If you receive a reward -> you repeat actions that have been rewarded in the past

What does the previous model not take into account? And explain it.

Transition probabilities

What if the following is happening à pick the yellow one and got to the red stimulus and get a reward. If I want to this stimulus again, what should I do next time? Should have choose the black one à 70%. That makes a dissociable prediction. So, this model-based learning system à will take into account whether this transmission you made between step one or step 2 à common (70%) or rare (30%) transition.
You need to shift the first stimulus you picked if it was a rare transition. 

The question on the page originate from the summary of the following study material:

  • A unique study and practice tool
  • Never study anything twice again
  • Get the grades you hope for
  • 100% sure, 100% understanding
Remember faster, study better. Scientifically proven.
Trustpilot Logo