Cluster sampling

18 important questions on Cluster sampling

What is a cluster sample?

A probability sample in which each sampling unit is a collection, or cluster, of elements.

In cluster sampling, the population is divided into subpopulations (clusters) of which you take a random sample - a subset of all possible clusters. From a selected cluster, all elements are included. Cluster sampling can with or without replacement and with equal probabilities (when the clusters are of equal size) or with unequal probabilities (when the clusters are of unequal size).

Why are the variances of cluster sampling estimators usually larger?

Including all elements of a cluster in the sample may mean that a number of more or less similar elements are observed. Thus, less information is available than if elements would have been selected completely independent from each other.

What are two designs for cluster sampling?

1. Selecting clusters with equal probabilities.
2. Selecting clusters with probabilities proportional to size.
  • Higher grades + faster learning
  • Never study anything twice
  • 100% sure, 100% understanding
Discover Study Smart

What is an advantage of selecting clusters with probabilities proportional to size?

Selecting clusters with an unequal probabilitie sample reduces the effect of cluster sizes on the variance of the estimators.

What is two-stage cluster sampling?

In two-stage cluster sampling primary sampling units (PSUs) are selected first. Then, within each selected PSU, secondary sampling units (SSUs) are sampled.

What are two advantages of cluster sampling?

1. For some populations, it is not possible to construct an explicit sampling frame and it is easier to construct a sample frame of the clusters.
2. Cluster sampling can be used when individual selection (SRS) is too expensive (e.g. visiting multiple schools to sample multiple children instead of visiting one school to sample an amount of children).

What are two disadvantages of cluster sampling?

1. Cluster sampling generally provides less precision than SRS or stratified.
2. Statistical analysis is more complex than for SRS.

Since clusters sampling provides generally less precision than SRS, it is advised to use cluster sampling only in two situations. What two situations?

Use cluster sampling only when..
1. ... it is economically justified (cost savings overcome (or require) loss in precision).
2. ... constructing a complete list-based sampling frame is difficult, expensive or impossible and the target population is located in natural clusters (e.g. regions, schools, city).

What is a consequence of cluster sampling for independence of observations?

In cluster sampling, observations are not independent. Units in each cluster are more likely to be alike (e.g. students from a class), there is a correlation of scores within the clusters.

What is the sampling fraction for a one-stage cluster sample?

The sampling fraction for a one stage clusters sampling f = m/M.
Where m is the number of sampled clusters and M is the population number of clusters.

When is a cluster sample self-weighting?

When the cluster sizes are equal and clusters are selected with equal probability. Then, the inclusion probability is m/M and the weight is the inverse: M/m.

From the formula for the variance of the mean estimator for cluster sampling it can already be seen that cluster sampling is less efficient than SRS. How?

Cluster sampling is less efficient because it only uses between-cluster information. Within-cluster information is not used.

How is dealt with clusters of unequal size?

When clusters are of unequal size, ratio estimation will be used. The size of the clusters is considered random and determines the weight.

What is the inclusion probability (sampling fraction) of a two-stage cluster sample?

P (select unit i in cluster h) = P (cluster h) * P (unit i, given cluster h) = m/M * nh / Nh = fh * fi

Of what two elements consists the formula for the variance of the mean estimator in two-stage sampling without replacement?

The variance formula consists of a part that estimates the variance between-clusters and a part that estimates the variance within clusters.

Why do clusters that are of unequal size have to be treated different from clusters that are of equal size?

When clusters are of unequal sizes this effects the variability. When the clusters have unequal sizes, it is not really fair to say that they contributed equally to the final estimator (effects variability). Clusters that are large have more impact (e.g. different number of students in a school --> different total score).

What is meant with the 'cluster effect'?

Elements within a cluster are very similar --> observing more elements in a cluster does not provide much more information.

By what is the precision of the estimator primarily determined when primary units are selected with replacement?

The variance of the estimator is determined to a large extent by the differences in the totals of the target variables of the primary units. Particularly if the means of the primary units are more or less the same, differences in sizes of the primary units may lead to larger variances.

The question on the page originate from the summary of the following study material:

  • A unique study and practice tool
  • Never study anything twice again
  • Get the grades you hope for
  • 100% sure, 100% understanding
Remember faster, study better. Scientifically proven.
Trustpilot Logo