Home / Summaries / Class notes - multivariate statistics and machine learning / groups-manova-dependent

Manova and lda

18 important questions on Manova and lda

What is linear discriminant analysis used for?

For analysis of categorical dependent variables with more than two classes

What are the statistical advantages of manova?

manova can detect differences not detected in univariate ANOVA —

i.e. Power

you can test for between group differences in profiles, time course, etc.
you can test multiple correlated DV’s of interest
you use only one “omnibus” test this causes you to be protected against multiple testing

follow up by Bonferroni corrected univariate ANOVA’s
or Discriminant Analysis to interpret multivariate group differences

What is a disadvantage of the manova?

depending on the particular application, Power may also
decrease

What is the test-statistic that an anova uses?

The H0 of anova is that the means of the different groups are the same.
this is tested with an F statistic F = variance of means of the groups/pooled variance within groups
under the null hypothesis the F statistic should be one; because then the variance within groups explains the variance between groups.
if the F is very large this means that there is high variance between the groups which can't be explained by variance within groups, therefor it is likely that the groups come from different populations.

What is the test statistic a manova uses?

The h0 of a manova is that the centroids of the groups are the same.
this is calculated with Lawley-Hotelling’s Trace; U = tr(W^-1 * B)
W^-1 is the inverse of the within group covariance matrix. this is the equivalent of within variance in anova
B is the between group covariance matrix. This is the equivalent of between group variance in anova
the trace is the sum of the diagonal values of a matrix

What are alternatives to lawley-hotteling's trace?

Wilk's labda; = |W|/|W+B| = |(B+W)^-1 * W|

|W| Is the determinant of W
the sum of the variance between and within groups is the variance of the outcome variable

Pillai’s Trace: V = tr((B+W)^-1 * B) (= p-tr((B+W)^-1 * W) )

Pillai’s Trace is most reported in researchWilks’ Λ is popular for effect size η2p and is likelihood ratio test

How do you peform a manova test in r and what does the output tell you?

Car::Manova(my.model)
the sum of squares for the error gives the within group variance matrix adjusted for the degrees of freedom.
the sum of squares for the hypothesis gives the between group variance matrix adjusted for the degrees of freedom and the number of observations
the test statistics are shown as well.
these test statistics show you whether there is a significant difference between groups of the independent variables on the collective of the dependent variables

if wilk's statistic is significant, there is a group difference on the dependent variables

What does the effect size of a manova depend on?

A combination of the correlation between dependent variables
and the effect size to be detected.

What are the assumptions of the manova?

yik = μk + εik

this means that the dependend variables must deviate around a group mean.

homoscedasticity of covariance matrices: Var(εi.) = Σ, same for all groups

the covariance matrices of the dependent variables are equal per group

εi. = (εi1,εi2,...,εik)′ are multivariate normal, zi = p1εi1 + p2εi2 + ···+ pkεik is normal for all conceivable p1,...,pk

the errors are multivariatly normally distributed;
any linear combination of the residuals must have a normal distribution.

How can you check the manova's assumption of homoscedasticity?

Box's M-test

biotools::boxM(data[dependent.variable.1,dependent.variable.2, group)
make sure to not include the grouping variable in the dataframe provided

if the p-value is significant, the assumption of homogeneity of covariance matrices is violated

Ellipse plot residuals
car::spm(resid(my.model), groups = group, ellipse = TRUE)
if the elipses are the same shape and direction the assumption is met.

How can you check for the assumption of multivariate normality?

It states that every linear combination with of a dependent variable with another variable (also two dependent variables together) must have a normal distribution.

when violated, power of manova decreases

can be checked with;

QQ-normality; produced by mardia test on residuals

Shapiro test on transpose of the residuals,

mvnormtest::mshapiro.test(t(resid(my.model)))

How can you test for outliers in a manova?

outliers on the dependent variables simultaneously;
Mahalanobis distances

E = residuals(fit)

d2 = mahalanobis(E,rep(0,ncol(E)),cov(E))
values greater than 2 * mean(d2) are suspicious

do a sensitivity analysis on the result when you remove the outlier

When your manova has turned out to be significant, this means that there is a significant difference between the groups on the multivariate dependent variable. What is a logical next step to make?

you can do univariate anova's on the individual dependent variables to assess which variables cause the differences in the groups.
You can do univariate anova's with this code
summary(car::Manova(my.model), univariate=TRUE, p.adjust.method="bonferroni")

bonferroni is the most used, but holm is a more powerful correction.

How does a linear discriminant analysis work?

Each datapoint gets a tranformation to get a position in the new graph.
this is position is found with the following formula

position on ld axis 1 = intercept + coefficient.1*science + coefficient.1*math

position on ld axis 2 = intercept + coefficient.2*science + coefficient.2*math

How can LDA be used for classification like logistic regression does?

LDA can predict multiple group outcomes whereas logistic regression can only predict binary outcomes.
using the predict function on your LDA model returns a set of matrices
one matrix contains the new coordinates of the data points.
posterior is a matrix that contains for each data point how likely it is that it belongs to one of the classes
class is the matrix that predicts for each data point what class it belongs to on the basis of the posterior probabilities.

How can you assess the performance of the lda model by means of a table?

You can create a true class vs predicted class table
table(truth = data$group, predicted = pred$class)
the resulting table then shows which data points have been accurately predicted.

In the output of an lda model you see two values under proportion of trace for LD1 and LD2, what do these values indicate.

These values indicate which axis of the lda plot is most important in distinguishing between the groups on the dependent variables.
the highest number indicates that that axis is the most informative in the differences.
in the picture it's LD1 because over that axis are the groups actually separated

What does plot(lda_model) do?

It plots the positions on the LD axes of the data points of the different groups
this helps to show which groups are well distinguished from other groups and which groups overlap more which will result in more confusion in the predictions on data points belonging to these groups.
if the groups are very well distinguished in the plot, you can expect high predictive accuracy/discriminatory power.

The question on the page originate from the summary of the following study material:

multivariate statistics and machine learning

View summary

A unique study and practice tool
Never study anything twice again
Get the grades you hope for
100% sure, 100% understanding

Remember faster, study better. Scientifically proven.

Manova and lda

18 important questions on Manova and lda

What is linear discriminant analysis used for?

What are the statistical advantages of manova?

What is a disadvantage of the manova?

What is the test-statistic that an anova uses?

What is the test statistic a manova uses?

What are alternatives to lawley-hotteling's trace?

How do you peform a manova test in r and what does the output tell you?

What does the effect size of a manova depend on?

What are the assumptions of the manova?

How can you check the manova's assumption of homoscedasticity?

How can you check for the assumption of multivariate normality?

How can you test for outliers in a manova?

When your manova has turned out to be significant, this means that there is a significant difference between the groups on the multivariate dependent variable. What is a logical next step to make?

How does a linear discriminant analysis work?

How can LDA be used for classification like logistic regression does?

How can you assess the performance of the lda model by means of a table?

In the output of an lda model you see two values under proportion of trace for LD1 and LD2, what do these values indicate.

What does plot(lda_model) do?

Summaries related to Manova and lda

Class notes - multivariate statistics and ma…

Cognitive Psychology

Research Methods in Psychology Evaluating a …

Psychology A Concise Introduction

An Introduction to Developmental Psychology

Abnormal Psychology

Statistics The Art and Science of Learning f…

Organizational Behavior

A conceptual introduction to psychometrics

Electrical Engineering: Concepts and Applica…

Class notes - scientific and statistical rea…

Class notes - Psychological assessment