Logistic regression
21 important questions on Logistic regression
What is the difference between OLS and logistic regression?
Which methods let you estimate the model for logistic regression?
› Maximum likelihood estimation (MLE)
- statistical method for estimating the coefficients of a model
- selects coefficients that make the observed values most likely to have occurred
› Likelihood function (L)
- L measures the probability of observing the particular set of dependent variable values that occur in the sample
- thehighertheL,the higher the probability of observing the values in the sample
- MLE involves finding coefficients β that maximizes the (log) likelihood (note that often LL<0
› Log-likelihood statistic (LL<0)
in OLS, you minimize sum of squared residuals; here you maximize the likelihood
indicator of how much explained information there is after the model has been fitted- small values indicate poorly fitting statistical models
- deviancestatistic:-2LL:haschi-squareddistribution
› Note
- we cannot interpret values of L (or LL) directly
higher is better, but critical value of Chi2 distribution depends on number of degrees of freedom - so: look at significance of test
- small values indicate poorly fitting statistical models
What is Warld statistic (z)?
- similartot-statisticinnormalregression
- tests the null hypothesis that b = 0
- biased when b is large
- bettertolookatlikelihood-ratiostatistics(i.e.compare the specifications with and without this variable)
- or:correction
- Higher grades + faster learning
- Never study anything twice
- 100% sure, 100% understanding
Read these two slides
What are the potential problems with logistic regression?
› As in normal regression:
linearity: logistic regression assumes linear relationship between the regressors and the logit of the dependent variable
- independence of errors (no correlation between errors)
- multicollinearity (inflates standard errors)
› Unique problems:
- statistical software: iterative procedure fails to converge
- two reasons:
- incomplete information...
- complete separation...
What is multiple logistic regression?
› Predict membership of more than two categories
› Breaks the dependent variable down to a series of comparisons
› Example: three categories A, B, C analysis consists of two comparisons
select baseline category, e.g. A
comparison: A vs. B,A vs. C
Give your insights from this regression output
- Note: estimate probability of direct network entry (dependent variable =1 if dyad)
- Interpretation of parameter estimate:
- positive (significant): this variable increases the probability of dyad –direct network entry
- negative(significant):...decreases...
- › No evidence for H1 and H4
- › Evidence for other hypotheses
- › Some controls matter, but not all
Looking at the information provided in table 1, explain the main characteristics of the sampled firms. Note that in table 1 (descriptive statistics in the sampled firms) there is a mistake. The title of the third column instead of Minimum should be Maximum.
You can derive the foreign sales and foreign markets as % of the total sales and markets. It appears that there are also outliers (e.g. employees, foreign sales and markets).
The authors collected data 'on site'. What does it mean and what are advantages/disadvantages?
How did authors check for the potential sample bias in the data?
Market bias: How representative is this sample in terms of the dif. host market presence new eu markets (92), Russia (61) & China (5).
Do you think the results will be influenced by the financial crisis of 2008?
Explain why Sandberg decided to use logistic regression rather than OLS. Explain the dependent variable of this analysis
What does the model estimate given the dependent variable? From that point of view: how good are the models?
Model = B0 +B1 firm size +B2export share + B3 host market experience + … + E
What are the requirements of a logistic regression?
What does it mean: hierarchical? (table 4 title)
What types of knowledge are the authors concluded considering to explain the propensity to choose a network entry configuration?
General internationalization knowledge (—> int. experience of the firm) (H1, direct effect) (H4, moderator) (H6, explanatory var)
Market- specific knowledge (H2, direct effect, H4 explanatory effect and H5, moderator)
Customer specific knowledge (H3, direct effect, H5 exp. var, H6, moderator)
What is the assumption behind the 'host market' variable? Can that assumption be verified? That they effect the international strategy of the firms.
E.g. China = 1, if the respondent has more experience in China than in Baltic, Poland or Russia.
= 0, otherwise
What is the baseline and group for the models? Is that a logical choice? The control variables.
- Baltic & Poland
- Russia
- China
We cannot include them all in the model, because we want to avoid multicollinearity. We exclude one of them (base category) China is used at the base category.
It is written in the text why they chose China
Which is the statistic that is used in the logistic regression model to capture the individual significance of the explanatory variables?
Interpret the coefficient on Russia in model 1 table 4.
- Other things being equal’
- When the host market is Russia
- The probability of having a dyad is lower, that if the host market is China.
Based on the statistics provided in table 4 can you choose which is the best specification?
Extra*** Degrees of freedom sample size number of variables —> use the deviance of likelihood to interpret the reliability of the model. If not, use the Nagelkarke R2 and the Correct classification. *** Extra
The question on the page originate from the summary of the following study material:
- A unique study and practice tool
- Never study anything twice again
- Get the grades you hope for
- 100% sure, 100% understanding