A: Data properties, preprocessing, p-value
16 important questions on A: Data properties, preprocessing, p-value
After normalization, we are left with random/measurement error in the data. It is important to know about the measurement error. How to proceed?
What is a random variable?
- A variable X whose value is determined by random processes
- Eg: counts
- Can be discrete or continuous
- real numbers, positive negatives
What is the event space?
- Higher grades + faster learning
- Never study anything twice
- 100% sure, 100% understanding
What is a distribution function?
- binomial
- poisson
- normal
What is a null hypothesis?
When do we call variation in expression significant?
- More than 2-fold difference
- Top 20 of differences in expression
Objective criteria:
- 2-sample t-test
- With H0: mu1 = mu2 (so no difference)
What is proportional error?
--> heteroscedastic, so don't use a t-test.
But you can transform the value: take log of concentration
What is variance stabilisation?
- Transform the data in such a way that the variance becomes independent of the mean
- log transform when error is proportional to mean
- square root transformation for Poisson-distributed (count) data
- Transformed data are homoscedastic
- But may not have normal-distributed error! Should be investigated
- If this assumption is also met, you can apply t-test, ANOVA etc
What is lamba in the poisson distribution?
What is the simplest assumption you can make when looking at a number of counts?
What is an important characteristic of the poisson distribution?
We can use it to find out if a variable is poisson distributed.
How is the Poisson distribution applied to RNA seq?
- We can't use it directly: we don't have counts anymore but fractions
- But mean, variance relation still conserved, so use that!
- See slide
How would you proceed (RNAseq) with Poisson testing?
use poisson test --> p value
In a RNA-seq experiment, the reads per gene are not poisson distributed. What do we observe?
Check slide 39
How do we solve the problem of overdispersion?
- Sophisticated: Model the random error by Negative Binomial (NB) distribution and use suitable test statistic
- NB distribution = for discrete variables, and has two parameters instead of one
- In NB, sigma is independent of mu
- Poisson distribution is special case of NB
- Simple: Perform a variance stabilisation, approximate the random error by a Normal distribution and use a t-test
- pretends that we have a continuous variable
- Can be performed in a spreadsheet
- Uses common test statistic
Multiplex technologies lead to multiple hypotheses. Explain.
One for each gene!
Expression of gene X is equal under conditions A and B.
The question on the page originate from the summary of the following study material:
- A unique study and practice tool
- Never study anything twice again
- Get the grades you hope for
- 100% sure, 100% understanding