ECONOMETRICS
ECONOMETRICS
Sponsored by a Grant TÁMOP-4.1.2-08/2/A/KMR-2009-0041 Course Material Developed by Department of Economics,
Faculty of Social Sciences, Eötvös Loránd University Budapest (ELTE) Department of Economics, Eötvös Loránd University Budapest
Institute of Economics, Hungarian Academy of Sciences Balassi Kiadó, Budapest
ECONOMETRICS
Authors: Péter Elek, Anikó Bíró Supervised by: Péter Elek
June 2010
ELTE Faculty of Social Sciences, Department of Economics
ECONOMETRICS
Week 8.
Heteroscedasticity, multicollinearity
Péter Elek, Anikó Bíró
Concept of heteroscedasticity Tests
Consequences
Solutions
Multicollinearity – definition, consequences
Briefly: endogeneity
Heteroscedasticity
Assumption of basic model
Var(ui) = σ2 for all i – homoscedasticity
Variance of error terms not constant
Var(ui) = σi2 for all i – heteroscedasticity
Heteroscedasticity – example
Consumption model (data: SHARE, 2004, Germany – food expenditures)
Distribution of residuals as a function of income
i i
i
i i
i i
Wealth Th
Inc C
u Wealth
Th Inc
C
_ 007
. 0 02
. 0 6 . ˆ 379
2 _
1 0
-1,000 -500 0 500 1,000 1,500 2,000
0 10,000 20,000 30,000 40,000
INC
RESID01
Example, cont.
Alternative model
Distribution of residuals as a function of income
i i
i
i i
i i
Wealth Th
Inc C
u Wealth
Th Inc
C
_ 05
. 0 log
15 . 0 63 . ˆ 4
log
_ log
log
log 0 1 2
-2.0 -1.6 -1.2 -0.8 -0.4 0.0 0.4 0.8 1.2 1.6
0 10,000 20,000 30,000 40,000
INC
RESID02
Tests I: White-test
Question: Is there a systematic factor in the variance of residuals?
White-test: regression of ûi2 on the explanatory variables, their squares, and cross products
→ F- or chi-squared test of the significance of the coefficients
Tests II: Breusch–Pagan test
Auxiliary regression: regression of ûi2 on z1,..., zk
explanatory variables (which are thought to influence the variance)
S0 sum of squares in auxiliary regression (ESS)
R2 coefficient of determination in auxiliary regression estimated variance of (original!) error terms
LM-test of applicability of the auxiliary regression, which has an alternative formula if u is normally distributed
ˆ
2
normal) u
ˆ if (and 2
~ 2 04
2
nR k S
Consequence I: usual standard error estimate is not valid
Univariate model
yi = α + βxi + ui, Var(ui) = σi2 → Unbiased (E(ui) = 0; xi, ui independent)
In case of homoscedasticity:
Gives biased variance estimate if heteroscedasticity is present!
– Usual tests are not applicable.
ˆ 2 2
x x
u x x x
x
y y x x
i
i i
i i
i
2
22 2
) 2
( ˆ
x x
x x
x x
u x Var x
Var
i
i i
i
i
i
2
2
ˆ)
( x x
Var
i
Consequence II: OLS not efficient
Example: σi2 = σ2zi2
Weighted (homoscedastic) model:
Cauchy–Schwarz:
i i
i i
i v
z x z
y
) 1 / ) (
( ˆ ) (
ˆ) ( : OLS model
Original
) / ) (
( )
/ (
) / : (
(WLS) OLS
model New
2 2 2
2 2
*
2 2 2 2 2
2 2 2 2
2 2
* 2
*
i i i
i
i
i i i i
i i
i i i
i
i i i
z x z
x
x V
V
x z x x
V x
z V x
z x
v z x
(aibi)2
ai2 bi2New model OLS (WLS):
Original model OLS:
Solution 1 – White SE
Heteroscedasticity robust estimator of the variance of estimated coefficients:
Univariate model:
Multivariate model:
t-test:
Asymptotic t-distribution: applicable only for large samples
2
2 2 1
) ˆ ) (
( ˆ
xx
i i
S
u x Var
x regression same
the from squares
of sum residual
:
s y variable explanator
other on
of regression from
residual :
ˆ , ) ˆ
( ˆ
2 2
j
j ij
j ij ij j
RSS
x r
RSS u Var
rSE robust
ˆ 0
t
residual from regression of xj on other explanatory variables residual sum of squares from the same regression
Solution 2 – WLS
Weighted least squares (WLS)
The latter equation can be estimated with OLS, which is equivalent to minimizing the weighted sum
If the variance is well specified then
– more efficient than the simple OLS (morover BLUE), – and the tests have t- and F-distribution also for small samples.
2 2 2
) ( 1 ,
) ( ,
i i
i i i
i i
i i
i i
i
v V z v
x z
z y
z u
V u
x y
n
i
i i
i
x z y
1
2
2 ˆ ˆ
min 1
Examples: WLS
yi = α + βxi + ui
1st common case: Var(ui) = σ2xi2
yi/xi = α/xi + β + ui/xi can be estimated with OLS 2nd common case: Var(ui) = σ2xi
yi/(xi1/2) = α /(xi1/2) + βxi1/2 + ui/(xi1/2) can be estimated with OLS
Transforming the explanatory variable (e.g. taking logarithm) often solves the problem of
heteroscedasticity.
Solution 3: TWLS, FGLS
TWLS: two-step weighted least squares FGLS: feasible generalized least squares Steps
weights and ˆ
ith WLS equation w
original the
Estimate
. 6
ˆ ) ˆ exp(
values, fitted
the ˆ are
5.
variables ...,
, and
constant ˆ on
log Regress
. 4
...) exp(
e.g.
variance, e
Specify th 3.
s error term estimated
Generate ˆ 2.
OLS with
estimated ...
. 1
1 2
1 1 0
2
1 1
h g
h g
x x
) u ( σ x
u
u x
x y
i i
i i
ki i
i i i
i
i ki
k i
i
Properties of FGLS
Since we are estimating weights, the estimating function is not unbiased.
But it is consistent and asymptotically more efficient than the OLS.
If we think that we did not specify the
variance perfectly then we should use the
White standard errors.
Example: analysis of the determinants of smoking
Data (source: Wooldridge)
CIGS: number of cigarettes smoked daily INCOME: annual income
CIGPRIC: price of a pack of cigarettes (cent) EDUC: number of years in education
AGE
RESTAURN: if there are laws in the given state that restrict smoking in restaurants
OLS with usual and robust
standard errors
Tests
FGLS estimation
Eviews program
equation eq_ols equation eq_olsrob
eq_ols.ls cigs c lincome lcigpric educ age age^2 restaurn delete white
delete breuschpagan
freeze(white) eq_ols.hettest(type=white)
freeze(breuschpagan) eq_ols.hettest(BPG) @regs
eq_olsrob.ls(h) cigs c lincome lcigpric educ age age^2 restaurn forecast olsf
genr olsres=cigs-olsf equation eq_logu2
genr logu2=log(olsres^2)
eq_logu2.ls logu2 c lincome lcigpric educ age age^2 restaurn forecast logu2f
genr h=exp(logu2f) genr sqrth=h^(1/2) equation eq_fgls equation eq_fgls2
eq_fgls.ls(w=1/(h)^(1/2)) cigs c lincome lcigpric educ age age^2 restaurn
eq_fgls2.ls cigs/sqrth 1/sqrth lincome/sqrth lcigpric/sqrth educ/sqrth age/sqrth age^2/sqrth restaurn/sqrth
Multicollinearity
Strong correlation among the regressors:
Individual effect is difficult to determine
Does not contradict the basic assumptions of linear regression
Perfect collinearity: functional relationship
E.g. y = β1x1 + β2x2 + u, x2 = ax1
y = (β1 + aβ2) x1 + u
Consequences, solutions
The estimated coefficient is sensitive to the inclusion and exclusion of variables
The variance of the estimated coefficient might increase:
Large if the variance of the error term is large or Sii is small or Ri2 is large (multicoll.: not necessary, neither sufficient)
Potential solutions:
Exclude variables: variance decreases, but biasedness!
Collect additional data (greater variance in x)
“Merge” variables (e.g. ratio)
) 1
) (
( ˆ 2
2 2
i ii
i
i RSS S R
Var
Endogeneity
Endogeneity: the error term is correlated with the regressor:
Y
i= α + βX
i+ u
iE(u
i|x
i) ≠ 0
Consequence: OLS estimator of β is biased
and inconsistent
Possible causes of endogeneity
Omitted variable (u includes something that is correlated with X)
Simultaneity (not only X influences Y, but also Y influences X: Y changes due to u, which also
affects X)
E.g. supply-demand models
Self selection in treatment analysis: “treatment”
(e.g. inclusion in a program) is not independent of the error term
Effect of support to firms on their profitability
Etc.
Summary
Homeworks
Exam task types
Interpret outputs of a regression
Theoretical questions
How can you identify the outliers, what do you do if the sample includes outliers?
Gauss–Markov theorem
What does the standard error of the forecast depend on in the simple regression case?
Questions with brief answers True/false statements
Seminar
Heteroscedasticity, multicollinearity
Maddala: 5/7, 5/8, 7/1, 7/3
Wooldridge: 8.1, 8.2, 8.3, 8.7, 8.9, (3.7, 3.11)
Discussion
Testing and handling heteroscedasticity Multicollinearity – is it really a “problem”?
Data
Model of health expenditures (HRS or SHARE subsample):
Test heteroscedasticity
Multicollinearity: different income or asset indicators jointly included in the regression