Algemeen notes
20 important questions on Algemeen notes
What are two important guidelines for presenting statistical results?
- Understandibility: Statistics should be reported in a form that is easily understood by most people.
- Interpretability: report the statistics in units or measure in plain language, or at leas in terms that require the least statistical knowledge of your audience.
- "Confidence" : report a confidence interval (most often 95% CI) to indicate the confidence you have in the reported statistic.
Confidence interval = a region generated by a procedure that, under repeated sampling, contains the true value of the parameter of interest with a specified probability.
On what does the width of the confidence interval dependent?
- The standard error of the parameters -> the larger the sample, the smaller the standard error.
- The level of confidence -> 95% confidence interval is wider than a 90% CI. A 100% is (-infinite, +infinite), contains all possible values of parameter and is not very informative
What is:
1. risk difference
2. relative difference
3. (odd ratio)
- The difference of two proportions
- the ratio of two proportions
Deal with categorical data, displayed in two-by-two contingency table containing counts per category.
- Higher grades + faster learning
- Never study anything twice
- 100% sure, 100% understanding
Calculation:
- standard error
- adjusted confidence interval
onderste foto = adjusted confidence interval
Does not perform well when the sample size is small -> so better to do an Adjusted (wald's) confidence interval.
What does a 95% Confidence interval mean?
Cohen's w effect size index (Blz.31)
it reflects the differences between the observed frequencies in the cells of a contingency table and that expected under the null hypothesis. Can be calculated from observed and expected proportions (relative frequencies) in each cell or from the test statistics X^2
Anova + variance calculation
- Analysis the variance in a set of observations
- it assigns chunks of the total variance to the independent variables and their interaction in the general linear model in equation 4.1 blz. 41
- the remaining residual variance or error is not explained by the model's factors
- to calculate variance -> use the sum of squared errors, SS. So the sum of the squared differences between each individual measurement and the overal mean, X| (streep erboven).
Why use contrasts? (and not post-hoc test)
Degrees of freedom
- Main advantage contrast is based on all observations in the ANOVA.
- so you have more degrees of freedom (than in post hoc test were you can only compare two samples)
- more degrees of freedom means a larger effective sample size and hence higher statistical power
- So a planned comparison is hypothesis-driven and formulated a priori. Also, unlike a post-hoc test, a planned comparison has more power as it uses the entire data set instead of a subset
Why multiple linear regression? Blz. 52
e.g.
- What factors (predictors/independent variables) explain or account for the variation of the dependent variable -> could lead to identification of causal factors
- What effect does one independent factor has on the dependent factor when you correct for another independent variable?
- When I have some info of independent variables, can I then predict the dependent variable?
There are two different ways to construct a multiple linear regression model. Blz.59
Sequential regression -> decided by statistical software.
Sequential regression -> forward selection or backward elimination, determines the order in which predictors are removed or added from a model under construction.
RMSE (root mean square error). Blz. 61
- Is a measure of the average deviation or error of the data points (yi) from the values calculated from the regression model (Y^i)
- The smaller RMSE the better -> the smaller the difference between all measured values and their predicted or calculated values, the better the model fits the data.
What is the Parsimony principle? Blz.67
Statistical power or sensitivity
Smallest effect size of interest. Blz.79
Type-2 error - False negative blz.82
Type-1 error - Fals positive. blz. 82
Noncentrality parameter, ncp. blz.82
- Is a measure of how far the peak of the t-distribution under Ha has shifted from that under H0.
- the same effect size d wil five a larger ncp when sample size n increases.
- When Cohen's effect size d = 0 then ncp = 0 and we have a central t-distribution that is symmetric and centerend around 0.
How to increase the effect size d
- By decreasing variability (standard deviation)
- by increasing the smallest effect size of interest
The two consequences of two tailed testing are:
- There are now two critical t-values: one negative to the left of the central value t= 0 and one positive to the right. The H0 will be rejected when the significance test gives t > trite or t < -tcrit
- the crital t-values are more extreme as they have to enclose aress smaller than 5% under the distribution when H0 is true.
What is a post hoc test and when is it used?
The question on the page originate from the summary of the following study material:
- A unique study and practice tool
- Never study anything twice again
- Get the grades you hope for
- 100% sure, 100% understanding