Cleaning up data

5 important questions on Cleaning up data

What is something to watch out for when considering removal of apparent outliers?

  • Extreme measurements can be outliers because they have been caused by factors that are not a part of the experiment or cognitive aspect that you want to measure
    • someone's phone goes off causing a slow reaction time on a trial due to distraction.
  • but also extreme measurements can be present because they're a part of the cognitive function that you want to measure
    • someone's reaction time is structurally slower due to their age or because of an underlying attentional deficit caused by adhd for example
  • in the first case you want to remove the measurement clearly but also in the second case?

What are ways to deal with outliers?

  • Do nothing, accept them as a part of the measurement
  • have a hard, pre-defined cut-off
    • remove anything that deviates x standard deviations from the mean
  • a good way to do this is to do this for each participant
  • this way you only discard outlying measurements within one participant and don't discard extreme measurements in general because it might be the case that someone is structurally scoring very high
  • take the mean of a participant and remove anything that deviates x standard deviations from the mean

How do you deal with subjects who are in general responding outside of the normal range?

  • You generally do nothing
  • only exclude when;
    • they've experienced technical difficulties
    • not finished the experiment
    • didn't follow the instructions
  • Higher grades + faster learning
  • Never study anything twice
  • 100% sure, 100% understanding
Discover Study Smart

What is one situation where you can consider removing subjects who cause extreme outliers?

  • When calculating correlations
  • outliers can heavily influence your correlations
  • when this is the case, don't remove the subject but calculate a correlation with and without including the subject in the sample

On which form of the data are statistical tests performed?

  • The subject means are used, not each individual trial
  • if you see extremely high degrees of freedom you know you've used the trials instead of subject averages.

The question on the page originate from the summary of the following study material:

  • A unique study and practice tool
  • Never study anything twice again
  • Get the grades you hope for
  • 100% sure, 100% understanding
Remember faster, study better. Scientifically proven.
Trustpilot Logo