Categorical Latent Variables - r assignment
6 important questions on Categorical Latent Variables - r assignment
How can you fit a normal mixture model for 2:12 potential clusters, and plot the BIC values for each model?
clustbic <- mclustBIC(data, G = 2:12) # fit models with 2 to 12 clusters
clustbic
plot(clustbic) # plot the bic's
the output tells you what the three best models are.
The mclust package uses a maximum-BIC strategy, so the higher BIC in this case is a better fit.
After you did your exploratory fitting to find the amount of clusters, how can you fit the best fitting model to your data?
clusfit <- Mclust(data, x = Clustbic) # fits the best fitting model
When you've fitted your model, how can you obtain some interesting statistics from the model?
- clusfit$parameters$pro # these are the class probabilities
- clusfit$z # these are the posterior probabilities of the individuals
- clusfit$classification # what is the cluster each individual is classified to.
- clusfit$parameters$mean # describes the mean score per question per cluster
- Higher grades + faster learning
- Never study anything twice
- 100% sure, 100% understanding
How can you identify whether the clusters are well separated?
# perform dimensionality reduction to plot the clusters
clustred <- MclustDR(clusfit)
plot(clustred, what = "boundaries", ngrid = 200)
plot(clustred, what = "density", dimens = 1)
What is a strategy to identify what the effects of individual characteristics are on the clustering?
- In a dataframe that contains characteristics for every individual (characteristics as columns, individuals as rows), add the clustering as a column.
- Then you can filter certain characteristics and calculate what percentage of people in a cluster has this characteristic.
- this way you can filter for main, 2-way, 3-way, etc. Effects.
- raters %>%
- group_by(cluster) %>%
- count(age_group) %>%
- mutate(clust_tot = sum(n)) %>%
- mutate(clust_prop = n/clust_tot) %>%
- arrange(desc(clust_prop))
- this allows you to see what characteristics cause people to get clustered to certain clusters.
What is the effect of clustering only those for which a cluster has a conditional probability of .8 or more, and NA otherwise?
- This causes your clusters to become better separated.
- otherwise you're also clustering individuals which don't have a clear cluster (since the conditional probabilities for all clusters are equal)
- clustering these people causes the essence of the cluster to be diluted.
The question on the page originate from the summary of the following study material:
- A unique study and practice tool
- Never study anything twice again
- Get the grades you hope for
- 100% sure, 100% understanding