Home / Summaries / Class notes - Empirical Research Project / variable-different-table

Violations of the regression model

20 important questions on Violations of the regression model

Can we detect if is there an indication of outliers with the information provided in Table 2?

No, because we miss the maximum and minimum.

1. Have a look at table 6, can you explain what is the difference between the columns of the results table?
2. Looking at the results provided in panel A, Interpret the coefficient of the variable TSTATUS in the specification of the model with dependent variable SCAR(-5,+5)?
3. Can you select the best model specification with the information provided in table 6?

1. The difference in windows event interacting with different dependent variables

2. The coefficient of correlation is significant at a 99% confidence level. Also, If the TSTATUS (dummy variable) increase with +1 the SCAR window of -5 +5 increases with 0.0135

3. No you cant, because there is no R2.

Check for the presence of outliers in this analysis?

[mean -2 or 3 st dev, mean +2 or 3 st dev]. Important note: Dummy variable don’t usually have outliers, as everything is between 0 and 1!

How do the authors check for potential multicollinearity in their model? Is there any other statistic that could have been used?

By looking at the Correlation matrix, an other technique is the Variance Inflation Factor

1. Is there any indication of multicollinearity in this analysis?

2. Which type of correction has been applied to the models to correct for potential heteroscedasticity?

1. No, there is none, if you, for example, look at the fifth row of the 6th variable (headquarters country directors)

2. Robust standard errors

1. Take a look at the variables. There are some dummies. You will see the variable “female” and also the variables “Hispanic” and “Black”. Why are there two variables for “Hispanic” and “Black” and only one for “Female” (and not also a variable for “Male”)?

2. How many females are in the sample?

3. How many individuals have a degree?

4. Take a look at the variable exper, try to determine whether you can find indication of extreme values (outliers) in this variable

1. Once you know who female is, you automatically know that the other variable is a male, this is not the case with black and hispanics.
2. Around 52 of the sample
3. BA |.3065208 + 0440633 = around 35%
4. Yes, there are extreme values.

Describe the distribution of the variable experience in terms of its symmetry with the information provided in table 3.
Describe the distribution of the variable experience in terms how much peaked is in comparison to the normal distribution with the information provided in table 3.
What is the range of the variable experience?

1. It is negatively skewed -.9165169. Also comparing the mean and the median. Looking at outliers also says something about the skewness.
2. Kurtosis 3.37197,as it is Leptokurtic (table 3)
3. 166 - 3 = 163

Keeping in mind that category 1 is female, can you conclude by looking at figure 1 that the distribution of the variable experience is different among genders?

Yes they do, males are more experienced.

Select the best model in table 4. Explain the statistics that you have chosen to select between these models. Justify your answer.

Model three, as the (adjusted) R2 is the highest

***EXTRA TABLE 4***
AIC = Akaike C and BIC= Bayesian, also a goodness of fit test, the lower the better. Root mean square error, the lower it is the better.
***EXTRA TABLE 4***

The survey was held in a specific year; all variables have been measured on the same point in time.
a) Given that the data are collected on the same point in time, what might be a principal problem?
b) 'Wage' is taken as the dependent variable. Is that reasonable or not?

1. You have cross-sectional data instead of longitudinal data. Longitudinal data allows the reader to have a better understanding how factors influence each other and what the relationship is between the variables, which can explain, for example, reverse causality.

2. Yes, it is a logical relationship.

What does multivariate regression tell you?

to what extent a set of variables is able to explain the outcome variable (e.g., R2)
which variable(s) in the set are the best predictors for the outcome (significance and size of β’s)
whether a variable still helps predict the outcome if other variables are also used as predictors (significance of corresponding β

What are marginal effects?

Making one variable constant, to see how the other variable independently from the other X influences the Y

What is the difference between bivariate regression and multivariate regression?

Look at the picture.

What happens if we do OLS with distorted results?

› Variances / standard errors could be inflated

t-ratio (=b/sd(b)) deflated
could imply that the parameter is not significant 
could imply rejection of H0

› Size of individual coefficients (b’s) could be inflated

 t-ratio (=b/sd(b)) inflated
 could imply that the parameter becomes significant
 could imply acceptation of H0
› The signs of the coefficients could change
We cannot trust the results!

What is 'mean centering against multicollinearity and does it work?

Centering: subtracting a constant from every value of a variable
 redefine the 0 point for that predictor to whatever value you subtract
 shifts the scale over, but retains the units

› Mean centering: subtracting the mean from every value
› Common ‘solution’ for multicollinearity, but for a linear or multiplicative model, this is just an algebraic transformation

 different coefficients and standard errors
 but not a better model!!
 note: interpretation of marginal effects changes
› For a polynomial model (e.g. quadratic term)  this may help
 interpretation of marginal effects changes

What is the problem with heteroskedasticity ?

› Uneven distribution of errors in the scatterplot
› A few more large errors of the same sign in the area with large errors would tilt the regression line substantially

Causes:
1. different size of observations may result in different size of error terms
e.g. distance travelled of a rocket from take- off (measurement error)
2. groups of observations are different

 follow different processes
 with different error terms
 e.g. poorer people always buy the same food; wealthier people occasionally buy expensive food

What happens if we do OLS with heteroskedasticity?

› OLS does not produce estimates with minimum variance in the error term
› OLS underestimates the variance / standard errors of the estimated coefficients
 too high t-values
 may lead to erroneous conclusion of significance and accepting H0 (that the variable has a significant effect)

How can we check whether we have this problem?
› Several tests › Graphical:
 the eyeball test: scatterplot of residuals  normal probability plot

What is the scatterplot of residuals?

› You want to see most of the scores concentrated in the center (around 0); no systematic patterns
› For each independent variable!

How can we solve the problem of heteroskedacity?

1. Weighted least squares

more precise observations (with less variability) are given greater weight in determining the regression coefficients

2. Refine the variable

 transform into a form that does not suffer from heteroskedasticity
 e.g. rather than national income, use per capita income

What is reverse causality?

› We usually assume that changes in the dependent variable are caused by changes in the independent variable(s)
› But: we only find a statistical relationship
 says nothing about causality
 says nothing about the direction of causality
› In some analyses, it could be that Y (also) causes X... : reverse causality
 endogeneity, week 1
› We can test whether changes in X precede changes in Y (‘Granger causality’)

The question on the page originate from the summary of the following study material:

Empirical Research Project

View summary

A unique study and practice tool
Never study anything twice again
Get the grades you hope for
100% sure, 100% understanding

Remember faster, study better. Scientifically proven.

Violations of the regression model

20 important questions on Violations of the regression model

Can we detect if is there an indication of outliers with the information provided in Table 2?

Check for the presence of outliers in this analysis?

How do the authors check for potential multicollinearity in their model? Is there any other statistic that could have been used?

1. Is there any indication of multicollinearity in this analysis? 2. Which type of correction has been applied to the models to correct for potential heteroscedasticity?

Keeping in mind that category 1 is female, can you conclude by looking at figure 1 that the distribution of the variable experience is different among genders?

Select the best model in table 4. Explain the statistics that you have chosen to select between these models. Justify your answer.

The survey was held in a specific year; all variables have been measured on the same point in time.a) Given that the data are collected on the same point in time, what might be a principal problem?b) 'Wage' is taken as the dependent variable. Is that reasonable or not?

What does multivariate regression tell you?

What are marginal effects?

What is the difference between bivariate regression and multivariate regression?

What happens if we do OLS with distorted results?

What is 'mean centering against multicollinearity and does it work?

What is the problem with heteroskedasticity ?

What happens if we do OLS with heteroskedasticity?

What is the scatterplot of residuals?

How can we solve the problem of heteroskedacity?

What is reverse causality?

Summaries related to Logistic regression

Class notes - Empirical Research Project

Academic Writing for International Students …

International Financial Management - Custom …

Class notes - International Financial Manage…

Class notes - Innovation Management in Multi…

Class notes - International Strategic Allian…

lecture 1

lecture 2

lecture 3

Lecture 4

lecture 5

lecture 6

1. Is there any indication of multicollinearity in this analysis?

2. Which type of correction has been applied to the models to correct for potential heteroscedasticity?

The survey was held in a specific year; all variables have been measured on the same point in time.
a) Given that the data are collected on the same point in time, what might be a principal problem?
b) 'Wage' is taken as the dependent variable. Is that reasonable or not?