Lecture bioinformatic databases - Linear regression

14 important questions on Lecture bioinformatic databases - Linear regression

What is the basis equation of a straight line?

Y(x) = ß0 + ß1*x

--> intercept and slope (y=b+ax)

If ß0 and ß1 fit perfectly, does this say something about the population with 100% reliability?

No, because ß0 and ß1 are still estimated form the data with error since it is based on a sample

Is a linear regression model always a good idea?

No, because not all relations are linear

Often, an error term (e) is included in the equation of linear regression, why?

Because of variability between measurements (BP varies between people of age 65). These variabilities are caused by individual differences and measurement errors. These factors are combined in an error term: y(x) = ß0 + ß1*x + e1

What is the purpose of residuals?

Indicate how much variation in outcome is still left after fitting regression model OR indicate which part of the outcome is not associated with the predictor.

What is the null hypothesis (H0) for linear regression?

No slope --> H0: ß1=0
Reject null hypothesis if P<0.05 (with alpha level 5%)

What is the difference when to use linear regression model and t-test?

t-test for comparing mean values of continuous variable of 2 groups (no covariates)
Linear regression model for impact of continuous variable on continuous outcome (covariates)

How can the 95% confidence interval be calculated?

coefficient ± 1.96 * Std.Err

--> -1.96 is the 2.5th percentile of standard normal distribution and 1.96 the 97.5th percentile

Which assumptions are made for linear regression model?

1. Linear relationship between x and y
2. Observations in the sample are independent
3. For each x, outcome y is normally distributed in the population with mean y(x) and variance SD^2
4. Variance is constant
5. X can be measured without error

How can you check assumption 1 of linear regression (linear relationship between x and y) ?

Make plot of residuals (e) vs x
--> no pattern in residual(s), but random scatter: on average 0

How to check assumption 3 of linear regression (for each x, outcome y is normally distributed in the population)?

Draw histogram of residuals for given x's

How to check assumption 2 and 5 of linear regression? (observations are independent and x can be measured without error)

2: design of experiment
5: choose proper measuring device

When do you use multiple linear regression models?

If more than one predictor is assessed at the same time
--> length depends on age and sex, crop depends on fertilizer and amount of rain

What is the general equation of multiple linear regression?

Y(x) = ß0 + ß1x1 + ß2x2 + ...

ß0 = constant

