Categorical Latent Variables - r assignment

6 important questions on Categorical Latent Variables - r assignment

How can you fit a normal mixture model for 2:12 potential clusters, and plot the BIC values for each model?

Library("mclust")


clustbic <- mclustBIC(data, G = 2:12) # fit models with 2 to 12 clusters
clustbic
plot(clustbic)  # plot the bic's

the output tells you what the three best models are.

The mclust package uses a maximum-BIC strategy, so the higher BIC in this case is a better fit.

After you did your exploratory fitting to find the amount of clusters, how can you fit the best fitting model to your data?

Clustbic <- mclustBIC(data, G = 2:12)

clusfit <- Mclust(data, x = Clustbic)  # fits the best fitting model

When you've fitted your model, how can you obtain some interesting statistics from the model?

  • clusfit$parameters$pro # these are the class probabilities
  • clusfit$z # these are the posterior probabilities of the individuals
  • clusfit$classification # what is the cluster each individual is classified to.
  • clusfit$parameters$mean # describes the mean score per question per cluster
  • Higher grades + faster learning
  • Never study anything twice
  • 100% sure, 100% understanding
Discover Study Smart

How can you identify whether the clusters are well separated?


# perform dimensionality reduction to plot the clusters
clustred <- MclustDR(clusfit)

plot(clustred, what = "boundaries", ngrid = 200)
plot(clustred, what = "density", dimens = 1)

What is a strategy to identify what the effects of individual characteristics are on the clustering?

  • In a dataframe that contains characteristics for every individual (characteristics as columns, individuals as rows), add the clustering as a column.
  • Then you can filter certain characteristics and calculate what percentage of people in a cluster has this characteristic.
  • this way you can filter for main, 2-way, 3-way, etc. Effects.
    • raters %>%
    •   group_by(cluster) %>%
    •   count(age_group) %>%
    •   mutate(clust_tot = sum(n)) %>%
    •   mutate(clust_prop = n/clust_tot) %>%
    •   arrange(desc(clust_prop))
  • this allows you to see what characteristics cause people to get clustered to certain clusters.

What is the effect of clustering only those for which a cluster has a conditional probability of .8 or more, and NA otherwise?

  • This causes your clusters to become better separated.
  • otherwise you're also clustering individuals which don't have a clear cluster (since the conditional probabilities for all clusters are equal)
  • clustering these people causes the essence of the cluster to be diluted.

The question on the page originate from the summary of the following study material:

  • A unique study and practice tool
  • Never study anything twice again
  • Get the grades you hope for
  • 100% sure, 100% understanding
Remember faster, study better. Scientifically proven.
Trustpilot Logo