How we perceive and recognise objects. The computational theory of perception

44 important questions on How we perceive and recognise objects. The computational theory of perception

What can be said about the cells in V1 (primary visual cortex)?

They are interested in the basic features of the visual image, responding to images/lines of specific orientation, motion, size etc.

---> these neurons have relatively small and precise receptive fields

(p. 85 perceptual textbook)

What are the 3 theories of perception?

1. Constructivist
2. Ecological
3. Computational

What is the constructivist theory? Who was it proposed by?

Helmholtz
---> Perception is inferred using cue and clues
  • Higher grades + faster learning
  • Never study anything twice
  • 100% sure, 100% understanding
Discover Study Smart

What is computational theory? Who was it proposed by?

Marr
---> Information-processing; everything is there in the image but needs to be made explicit

What is the definition of middle vision?

A loosely defined stage of visual processing that comes after basic features have been extracted from the image and before object recognition
---> to successfully combine features into objects

(p. 92 perceptual textbook)

What is middle vision? How do we do this?

The goal is to organise the elements of a visual scene into groups that we can then recognise as objects

How we do this

  • Finding Edges, contours, common fate, texture segmentation, similarity, proximity, parallelism, symmetry, synchrony.
  • What you know to be true is also important: TOP-DOWN mechanisms 


Middle vision carves up the retinal image into large scale objects.This leads to the global superiority effect

What is the global superiority effect?

The properties of the whole object take precedence over the properties of parts of the object

---> this effect is consistent with the assumption that the first goal of middle vision is to carve the retinal image into large-scale objects

(p. 104 perceptual textbook)

What are the 5 principles that middle-vision uses to achieve its goals?

1. Bring together features that make an object
---> using Gestalt principles (similarity, proximity, parallelism, symmetry)

2. Split apart features that are not included in the object
---> using edge-finding processes that divide regions from each other
---> figure-ground mechanisms separate objects from the background

3. Use own knowledge about the object


4. Avoid accidents
---> avoid interpretations that require the assumptions of highly specific, accidental combinations of features or accidental viewpoints

5. Seek consensus and avoid ambiguity
---> eliminate all possibilities, thereby resolving the ambiguity and delivering a single solution to the perceptual problem

(p. 104 - 105 perceptual textbook)

What is meant by naive temporal theory? What is a limitation of this theory?

The proposal that the visual system recognises objects by matching the neural representation of the image with a stored representation of the same "shape" in the brain

Limitation
- too many templates are required ---> if we needed a new template for every letter (for example) we would run out of brain

(p. 108 & 109 perceptual textbook)

What is a better theory compared to naive temporal theory in describing how we recognize objects?

Structural description---> using an object's constituent parts and the relationship between these parts to recognise it

i.e. the letter 'A' is being matched to its structural description

(p. 109 perceptual textbook)

What is a key component of structural theory?

How object parts are represented in the descriptions:
  • Marr = generalised cylinders ---> these cylinders could be scaled to represent differently shaped parts

  • Biederman = geometric ions (geons)

These structural descriptions should be viewpoint-invarient
---> they should be equally recognizable from many different vantage points

(p. 109 & 110 perceptual textbook)

What is meant by viewpoint invariance?

A property of an object that does not change when the observer viewpoint changes

(p. 110 perceptual textbook)

What is a problem with structural description theories?

Not all objects are viewpoint invarient

(p. 110 perceptual textbook)

What is meant by object constancy?

Our ability to recognize an object despite the variability in sensory information

Regarding object constancy, what have mental rotation studies found?

The further something is rotated from normal, the longer it takes to recognise it
---> suggests the participants had to mentally rotate the objects back to the upright views they had stored in their memory

(Tarr & Pinker, 1990) - more info p. 110 perceptual textbook

What are the pathways for higher level visual processing?

1. Ventral stream - involved in identifying what an object is
2. Dorsal stream - involved in making appropriate movements to that object

As we move down to the temporal lobe (through the ventral stream) what happens to the receptive fields?

They get much bigger

(p. 88 perceptual textbook)

What do lesions to the dorsal stream result in?

Optic ataxia
---> a deficit in making visually guided movements towards objects

What do lesions to the ventral stream result in?

Agnosia
---> patients are unable to recognise objects

Who investigated the relationship between the temporal lobe and object recognition?

Kluver and Bucy (1938, 1939) - provided early evidence
---> large sections of the temporal lobe was lesioned in monkeys

Results  
The monkeys behaved as though they could see but did not know what they were seeing

(later work found that the inferotemporal cortex of the temporal lobe is particularly important in the visual problems of these monkeys)

p. 88 perceptual textbook

What did Gross et al investigate?

They recorded the activity of single cells in the inferotemporal cortex

Results
Cells in the inferotemporal cortex were discovered to have receptive fields that could spread over half or more of the monkey's field of view

Activating these cells:
- usual stimuli e.g. spots and lines didn't work well
- silhouette of a monkey hand worked well for some cells
- monkey faces excited other cells

---> are some cells specialised for certain objects?

(p. 88 perceptual textbook)

After the findings of Gross et al, what did Barlow (1972) propose?

A hierarchical model of visual perception

---> small receptive fields and simple features of visual cortex are combined with greater complexity as one moves from striate cortex to inferotemporal cortex, eventually culminating in a cell that fires when you see a specific object

(p. 88 perceptual textbook)

What did Quiroga et al (2005) discover?

Cells that respond to highly specific objects

(p. 106 perceptual textbook)

Are the receptive fields of neurons in the inferior temporal lobe large or small?

Very large
---> they can see a lot of the world = gives a more global view

What do neurons in the inferior temporal lobe do?

  • They respond to complex forms
e.g. a cell was found to respond vigorously to a model of an apple. It was found that this cell responds best to a circular disc with a thin bar protruding from it


  • Different shapes elicit strong responses from other inferior temporal lobe neurons
--> Although the stimuli that these neurons respond to do not correspond directly with any recognizable object, it is easy to see how combinations of neurons could represent most objects we see in a hierarchical fashion

What is something that we need to consider when asking how objects are represented?

How are we able to identify objects even when we see them at different distances, or at different locations in the visual field or when we view them from different angles

How does the visual system achieve object constancy?

  • Many neurons in the temporal lobe are responsive to one view of a object

  • However, there are some neurons that respond to many different views (view-invariant). Presumably these neurons receive inputs from neurons that are only selective for one view

  • Size invariant neurons respond to an object no matter what size it is on the retina. Location invariant neurons respond regardless of the location of an object in the visual field

What hemisphere are neurons that respond to faces found in?

Right

Are specific neurons responsible for each object we can recognize (e.g. do we have a neuron that responds specifically to our Grandmother?) Or is the is the representation of specific stimuli by the pattern of firing of many neurons?

It is possibly mediated by memory in the adjacent hippocampus. You can train inferior temporal lobe neurons to respond to objects (that were previously
novel) but this plasticity is view dependant

---> It would seem that cells fire in synchrony in response to all aspects of that person, their names and memories of that person

(see slides 2 & 3 lecture notes)

The inferotemporal cortex maintains close connections with brain parts involved in memory formation (notably the hippocampus). Why is this important?

The inferotemporal cells need to learn their receptive-field properties

(p. 88 perceptual textbook)

Who demonstrated that cells in the inferotemporal cortex have plasticity?

Logothetis et al (1995)

---> trained monkeys to recognise novel objects
---> they found that inferotemporal cortex neurons responded with high firing rates to those objects
---> however, this only happened when the objects were seen from viewpoints similar to those from which they had been learned

  • Human experiment on p. 89 - 90 perceptual textbook

Are different regions of the temporal lobe specialized?

YES

  • fMRI studies have revealed that pictures of faces selectively activate the fusiform gyrus region (FFA) of the temporal lobe.

---> This area shows a higher response to faces than to other objects.

  • Other regions of the temporal lobe show selectivity for inanimate objects (LOC), buildings (PPA) and human body parts (EBA).

  • However, the idea that regions of visual cortex are specialized has been challenged.

---> An alternative hypothesis is that all regions of the temporal lobe contribute to our perception of any object or face.

What may some prosopagnosia patients also suffer from?

Abnormalities in object perception (agnosia)

- There are some cases of prosopagnosia where intact object recognition remains and vice versa (double dissociation)
e.g. patient W.J., who had prosopagnosia, could still name his own sheep!

Do we perceive faces and objects in a different way?

---> In contrast to faces, our perception of objects is based on an analysis of parts

• When subjects are presented with images of objects in a study phase, they are just as good at recognising the objects in the test phase if the whole object or only part of the object is presented
- Face perception is less good at recognizing the component parts of the face

Are there other aspects of facial processing?

Yes

Some brain lesions can selectively affect emotional aspects of facial processing, but leave recognition intact.

Fregoli’s delusion = a condition in which sufferers assume that strangers are familiar
Capgras syndrome = the emotional recognition system is under rather than overactive. Capgras patients can recognize faces they know, but feel no emotional attachment. They also have difficulty decoding facial emotion

What is the argument regarding how neurons become specialized?

The fact that there are neurons specialized for the perception of different categories of objects raises the question of whether we are born with these neurons (nature) or whether they are generated by the experience of looking at and identifying particular objects (nurture)

What evidence favours the argument that neurons become specialized due to nature (as opposed to nuture)?

An evolutionary argument is apparent in the behaviour of babies
---> At only a few weeks of age they have a preference to view faces longer than jumbled faces or other objects

What can be said about humans and recogizing faces from their own/different races?

Humans are better at recognizing faces from their own race than of other races

  • Recent brain imaging experiments have found that same-race faces elicit more activity in brain regions linked to face recognition such as the fusiform gyrus region.

  • However, this effect was stronger in European Americans compared to African Americans. This suggests that our perception of faces has a learned component

What experiment investigated whether the fusiform gyrus region is specialized for faces?

  • A recent fMRI study looked at the effect of recognising a certain type of non-face stimulus known as Greebles.

  • Initially, the response in the fusiform gyrus to different types of Greeble before training is much smaller than to faces. However, after a period of training in which the subjects have to name and identify different Greebles, the response is similar that elicited to faces.

  • Other studies have shown that the fusiform gyrus also responds to cars and birds in people who are experts in recognizing these categories of object

---> there is a learning component

What are the 5 stages of Marr's model of vision?

1. Grey Level Representation (low level vision)
2. Raw Primal Sketch (low level vision)
3. Primal Sketch (middle vision)
4. 2 1/2-D Sketch (middle vision)
5. 3-D Model (object recognition)

Regarding Marr's model of vision, describe the Grey Level Representation (1st stage).

Goal: To make intensity of individual points/pixels explicit

Procedure:
- Measure light intensity in each of a large number of small regions of the image called pixels
- This results in a 2-D array of light intensity values

Mechanism: Each pixel and its intensity value corresponds to a photoreceptor and its receptor potential

Regarding Marr's model of vision, describe the Raw Primal Sketch (2nd stage).

Goal: To make locations of intensity changes explicit (i.e. putting pixels together to detect some form of image)

Procedure: Look for patterns in the light intensity changes to denote segments of lines, line ends, edges, circles and ellipses

Mechanism: Compatible with the function of the variety of orientation specific cells of V1

Regarding Marr's model of vision, describe the Primal Sketch (3rd stage).

Goal: To make contours and boundaries explicit

Procedure: Further grouping of edge segments, bars, terminations and blobs from the raw primal sketch using Gestalt laws of organisation (similarity, common fate, good continuation, closure, relative size, surroundedness, orientation, symmetry and proximity)

Mechanism: Complex cells in V2 may be organised to link line ends

Regarding Marr's model of vision, describe the 2 1/2-D Sketch (4th stage).

Goal: To make local depth and surface orientation explicit

Procedure:
- Integrates multiple depth cues (not unlike constructivist)
- Stereopsis, motion analysis, contour, texture, shading information is used

Mechanism: Top down, optic array processing


The 2½-D Sketch is from the viewer’s perspective, so the 3-D image is
implicit at this stage. Marr argued that in order to recognise objects,they must be transformed into object-centred transformations.

The question on the page originate from the summary of the following study material:

  • A unique study and practice tool
  • Never study anything twice again
  • Get the grades you hope for
  • 100% sure, 100% understanding
Remember faster, study better. Scientifically proven.
Trustpilot Logo