Manova and lda
18 important questions on Manova and lda
What is linear discriminant analysis used for?
What are the statistical advantages of manova?
- manova can detect differences not detected in univariate ANOVA —
- i.e. Power
- you can test for between group differences in profiles, time course, etc.
- you can test multiple correlated DV’s of interest
- you use only one “omnibus” test this causes you to be protected against multiple testing
- follow up by Bonferroni corrected univariate ANOVA’s
- or Discriminant Analysis to interpret multivariate group differences
What is a disadvantage of the manova?
depending on the particular application, Power may also
decrease
- Higher grades + faster learning
- Never study anything twice
- 100% sure, 100% understanding
What is the test-statistic that an anova uses?
- The H0 of anova is that the means of the different groups are the same.
- this is tested with an F statistic F = variance of means of the groups/pooled variance within groups
- under the null hypothesis the F statistic should be one; because then the variance within groups explains the variance between groups.
- if the F is very large this means that there is high variance between the groups which can't be explained by variance within groups, therefor it is likely that the groups come from different populations.
What is the test statistic a manova uses?
- The h0 of a manova is that the centroids of the groups are the same.
- this is calculated with Lawley-Hotelling’s Trace; U = tr(W^-1 * B)
- W^-1 is the inverse of the within group covariance matrix. this is the equivalent of within variance in anova
- B is the between group covariance matrix. This is the equivalent of between group variance in anova
- the trace is the sum of the diagonal values of a matrix
What are alternatives to lawley-hotteling's trace?
- Wilk's labda; = |W|/|W+B| = |(B+W)^-1 * W|
- |W| Is the determinant of W
- the sum of the variance between and within groups is the variance of the outcome variable
- Pillai’s Trace: V = tr((B+W)^-1 * B) (= p-tr((B+W)^-1 * W) )
Pillai’s Trace is most reported in researchWilks’ Λ is popular for effect size η2p and is likelihood ratio test
How do you peform a manova test in r and what does the output tell you?
- Car::Manova(my.model)
- the sum of squares for the error gives the within group variance matrix adjusted for the degrees of freedom.
- the sum of squares for the hypothesis gives the between group variance matrix adjusted for the degrees of freedom and the number of observations
- the test statistics are shown as well.
- these test statistics show you whether there is a significant difference between groups of the independent variables on the collective of the dependent variables
- if wilk's statistic is significant, there is a group difference on the dependent variables
What does the effect size of a manova depend on?
A combination of the correlation between dependent variables
and the effect size to be detected.
What are the assumptions of the manova?
- yik = μk + εik
- this means that the dependend variables must deviate around a group mean.
- homoscedasticity of covariance matrices: Var(εi.) = Σ, same for all groups
- the covariance matrices of the dependent variables are equal per group
- εi. = (εi1,εi2,...,εik)′ are multivariate normal, zi = p1εi1 + p2εi2 + ···+ pkεik is normal for all conceivable p1,...,pk
- the errors are multivariatly normally distributed;
- any linear combination of the residuals must have a normal distribution.
How can you check the manova's assumption of homoscedasticity?
- Box's M-test
- biotools::boxM(data[dependent.variable.1,dependent.variable.2, group)
- make sure to not include the grouping variable in the dataframe provided
- if the p-value is significant, the assumption of homogeneity of covariance matrices is violated
- Ellipse plot residuals
- car::spm(resid(my.model), groups = group, ellipse = TRUE)
- if the elipses are the same shape and direction the assumption is met.
How can you check for the assumption of multivariate normality?
- It states that every linear combination with of a dependent variable with another variable (also two dependent variables together) must have a normal distribution.
- when violated, power of manova decreases
can be checked with;
- QQ-normality; produced by mardia test on residuals
- Shapiro test on transpose of the residuals,
- mvnormtest::mshapiro.test(t(resid(my.model)))
How can you test for outliers in a manova?
- outliers on the dependent variables simultaneously;
- Mahalanobis distances
- E = residuals(fit)
- d2 = mahalanobis(E,rep(0,ncol(E)),cov(E))
- values greater than 2 * mean(d2) are suspicious
- do a sensitivity analysis on the result when you remove the outlier
When your manova has turned out to be significant, this means that there is a significant difference between the groups on the multivariate dependent variable. What is a logical next step to make?
- you can do univariate anova's on the individual dependent variables to assess which variables cause the differences in the groups.
- You can do univariate anova's with this code
- summary(car::Manova(my.model), univariate=TRUE, p.adjust.method="bonferroni")
- bonferroni is the most used, but holm is a more powerful correction.
How does a linear discriminant analysis work?
- Each datapoint gets a tranformation to get a position in the new graph.
- this is position is found with the following formula
- position on ld axis 1 = intercept + coefficient.1*science + coefficient.1*math
- position on ld axis 2 = intercept + coefficient.2*science + coefficient.2*math
How can LDA be used for classification like logistic regression does?
- LDA can predict multiple group outcomes whereas logistic regression can only predict binary outcomes.
- using the predict function on your LDA model returns a set of matrices
- one matrix contains the new coordinates of the data points.
- posterior is a matrix that contains for each data point how likely it is that it belongs to one of the classes
- class is the matrix that predicts for each data point what class it belongs to on the basis of the posterior probabilities.
How can you assess the performance of the lda model by means of a table?
- You can create a true class vs predicted class table
- table(truth = data$group, predicted = pred$class)
- the resulting table then shows which data points have been accurately predicted.
In the output of an lda model you see two values under proportion of trace for LD1 and LD2, what do these values indicate.
the highest number indicates that that axis is the most informative in the differences.
in the picture it's LD1 because over that axis are the groups actually separated
What does plot(lda_model) do?
- It plots the positions on the LD axes of the data points of the different groups
- this helps to show which groups are well distinguished from other groups and which groups overlap more which will result in more confusion in the predictions on data points belonging to these groups.
- if the groups are very well distinguished in the plot, you can expect high predictive accuracy/discriminatory power.
The question on the page originate from the summary of the following study material:
- A unique study and practice tool
- Never study anything twice again
- Get the grades you hope for
- 100% sure, 100% understanding