Regression in survey analysis

17 important questions on Regression in survey analysis

Why is regression used in survey analysis?

Regression is used to learn about relationships between variables. The regression model can produce more accurate estimates of population means and totals.

What is a descriptive model in survey analysis?

A descriptive model describes the value of the target variable:

 

where F is a function that depends on X and the model parameters.

The mean of the residuals is equal to zero.

What is a direct estimator?

See the picture. A direct estimator is thus a value that directly estimates the population mean.

  • Higher grades + faster learning
  • Never study anything twice
  • 100% sure, 100% understanding
Discover Study Smart

What is ratio estimation?

In ratio estimation, a ratio R is used to assist in estimation. This ratio is based on a correlation of the target variable with an auxiliary variable. The ratio is of direct interest and used to estimate population means or totals / to construct subpopulation estimates.

Why is ratio estimation used?

Ratio estimation is used because it can produce more accurate estimates of population means and totals: R is less variable than Y.

When is a ratio estimator effective?

A ratio estimator is effective when Yk / Xk varies less than the values Yk of the target variable themselves. The ratio estimator has a smaller variance than the direct estimator if the correlation between X and Y is sufficiently large.

What is a regression estimator?

A regression estimator is an estimator of population mean and total (or relations) based on a regression model with (known) auxiliary variable(s).

What advantage has linear regression over ratio estimation?

Linear regression doesn't requires a straight line through the origin, but an intercept is added.

Is the linear regression estimator unbiased?

The linear regression estimator is asymptotically design unbiased (ADU). The bias vanishes for large samples.

The variance of the linear regression estimator is given in the picture. What can be concluded from this formula about when the precision of the estimator is high?

A stronger relationship between x and y in the sample, results in a smaller mean sum of squares of residuals and thus in a smaller variance.

More complex sampling schemes can complicate the computation of regression parameters. What do the estimated regression parameters need to minimize?

The residual sum of squares.

What is a poststratification estimator?

The key idea of a poststratification estimator is that stratification is applied after data are collected. Poststratification is used to improve the efficiency of the estimator. Use quantitative or qualitative variables in a descriptive model to construct the model-based estimator.

What are three differences between model-based and design-based estimation?

Model-based
1. Residual analysis is important (to get correct SE).
2. Valid results when model fits data (model also applies for observations not observed).
3. Observations are usually not weighted (e.g., in linear regression). 

Design-based
1. Residual analysis is not important (SE is design-based).
2. Valid results regardless of model fit.
3. Inclusion probabilities (weights) will influence the estimates.

What is the definition of a variance for a model-based estimator and for a design-based estimator?

Model-based estimator: the variance is the average squared deviation of the estimate and the expected value, averaged over all possible samples under the model.

Design-based estimator: the variance is the average squared deviation of the estimate and the expected value, averaged over all possible samples under the design.

What are three kinds of model variables in linear regression?

1. Exposure of interest: define relationship between response and relevant predictors.
2. Confounding variables: to be included to account for uninteresting associations, to unbias the relationship that is of interest.
3. Precision variables: to be included to reduce standard errors. They explain variability in Y without affecting the interpretation of the regression parameters.

What is an auxiliary variable?

An auxiliary variable is an available variable to support the analysis.

How is the variance of the regression estimator related to the variance of the direct estimator?

The variance of the regression estimator is never larger than the variance of the direct estimator.

The question on the page originate from the summary of the following study material:

  • A unique study and practice tool
  • Never study anything twice again
  • Get the grades you hope for
  • 100% sure, 100% understanding
Remember faster, study better. Scientifically proven.
Trustpilot Logo