A: Data properties, preprocessing, p-value

16 important questions on A: Data properties, preprocessing, p-value

After normalization, we are left with random/measurement error in the data. It is important to know about the measurement error. How to proceed?

Statistical testing.

What is a random variable?

  • A variable X whose value is determined by random processes
    • Eg: counts
    • Can be discrete or continuous
    • real numbers, positive negatives

What is the event space?

Omega: All possible events for a random variable.
  • Higher grades + faster learning
  • Never study anything twice
  • 100% sure, 100% understanding
Discover Study Smart

What is a distribution function?

A function that yields the probability of an event or a set of events from the event space.
  • binomial
  • poisson
  • normal

What is a null hypothesis?

Simplest assumption about the distribution function(s) of one or more variables.

When do we call variation in expression significant?

Subjective criteria:
  • More than 2-fold difference
  • Top 20 of differences in expression


Objective criteria:
  • 2-sample t-test
  • With H0: mu1 = mu2 (so no difference)

What is proportional error?

Most common instrumental error: standard deviation depends on the mean

--> heteroscedastic, so don't use a t-test.
But you can transform the value: take log of concentration

What is variance stabilisation?

  • Transform the data in such a way that the variance becomes independent of the mean
    • log transform when error is proportional to mean
    • square root transformation for Poisson-distributed (count) data
  • Transformed data are homoscedastic
  • But may not have normal-distributed error! Should be investigated
  • If this assumption is also met, you can apply t-test, ANOVA etc

What is lamba in the poisson distribution?

Parameter of poisson, equal to the mean!

What is the simplest assumption you can make when looking at a number of counts?

They originate from a poisson distribution with lamba = mean

What is an important characteristic of the poisson distribution?

Variance equals the mean ! (and thus lamba)

We can use it to find out if a variable is poisson distributed.

How is the Poisson distribution applied to RNA seq?

  • We can't use it directly: we don't have counts anymore but fractions
  • But mean, variance relation still conserved, so use that!
  • See slide

How would you proceed (RNAseq) with Poisson testing?

Asking whether a gene has equal expression under different conditions would be asking whether the rate parameter is equal under different conditions (= H0), so the ratio f1/f2 = 1.

use poisson test --> p value

In a RNA-seq experiment, the reads per gene are not poisson distributed. What do we observe?

Slope higher than 1/2 , = overdispersion.

Check slide 39

How do we solve the problem of overdispersion?

  • Sophisticated: Model the random error by Negative Binomial (NB) distribution and use suitable test statistic
    • NB distribution = for discrete variables, and has two parameters instead of one
    • In NB, sigma is independent of mu
    • Poisson distribution is special case of NB
  • Simple: Perform a variance stabilisation, approximate the random error by a Normal distribution and use a t-test
    • pretends that we have a continuous variable
    • Can be performed in a spreadsheet
    • Uses common test statistic

Multiplex technologies lead to multiple hypotheses. Explain.

Eg in a transcriptome experiment, you test thousands of H0's.
One for each gene!
Expression of gene X is equal under conditions A and B.

The question on the page originate from the summary of the following study material:

  • A unique study and practice tool
  • Never study anything twice again
  • Get the grades you hope for
  • 100% sure, 100% understanding
Remember faster, study better. Scientifically proven.
Trustpilot Logo