Authors: Péter Elek, Anikó Bíró Supervised by: Péter Elek

(1)

ECONOMETRICS

(2)

ECONOMETRICS

Sponsored by a Grant TÁMOP-4.1.2-08/2/A/KMR-2009-0041 Course Material Developed by Department of Economics,

Faculty of Social Sciences, Eötvös Loránd University Budapest (ELTE) Department of Economics, Eötvös Loránd University Budapest

Institute of Economics, Hungarian Academy of Sciences Balassi Kiadó, Budapest

(3)

(4)

ECONOMETRICS

Authors: Péter Elek, Anikó Bíró Supervised by: Péter Elek

June 2010

ELTE Faculty of Social Sciences, Department of Economics

(5)

ECONOMETRICS

Week 8.

Heteroscedasticity, multicollinearity

Péter Elek, Anikó Bíró

(6)

Concept of heteroscedasticity Tests

Consequences

Solutions

Multicollinearity – definition, consequences

Briefly: endogeneity

(7)

Heteroscedasticity

Assumption of basic model

Var(u_i) = σ² for all i – homoscedasticity

Variance of error terms not constant

Var(u_i) = σ_i² for all i – heteroscedasticity

(8)

Heteroscedasticity – example

Consumption model (data: SHARE, 2004, Germany – food expenditures)

Distribution of residuals as a function of income

i i

i

i i

Wealth Th

Inc C

u Wealth

Th Inc

C

_ 007

. 0 02

. 0 6 . ˆ 379

2 _

1 0







   

-1,000 -500 0 500 1,000 1,500 2,000

0 10,000 20,000 30,000 40,000

INC

RESID01

(9)

Example, cont.

Alternative model

Distribution of residuals as a function of income

i i

i

i i

Wealth Th

Inc C

u Wealth

Th Inc

C

_ 05

. 0 log

15 . 0 63 . ˆ 4

log

_ log

log

log ₀ ₁ ₂







  

-2.0 -1.6 -1.2 -0.8 -0.4 0.0 0.4 0.8 1.2 1.6

0 10,000 20,000 30,000 40,000

INC

RESID02

(10)

Tests I: White-test

Question: Is there a systematic factor in the variance of residuals?

White-test: regression of û_i² on the explanatory variables, their squares, and cross products

→ F- or chi-squared test of the significance of the coefficients

(11)

Tests II: Breusch–Pagan test

Auxiliary regression: regression of û_i² on z₁,..., z_k

explanatory variables (which are thought to influence the variance)

S₀ sum of squares in auxiliary regression (ESS)

R² coefficient of determination in auxiliary regression estimated variance of (original!) error terms

LM-test of applicability of the auxiliary regression, which has an alternative formula if u is normally distributed

ˆ

2



normal) u

ˆ if (and 2

~ ² ⁰₄

2

 

  nR _k  ^S

(12)

Consequence I: usual standard error estimate is not valid

Univariate model

y_i = α + βx_i + u_i, Var(u_i) = σ_i²→ Unbiased (E(u_i) = 0; x_i, u_i independent)

In case of homoscedasticity:

Gives biased variance estimate if heteroscedasticity is present!

– Usual tests are not applicable.

  

   

 

ˆ 2 2





 

 



 

x x

u x x x

x

y y x x

i

i i

i 



 

   

 



²



²

2 2

) 2

( ˆ

 



 















 

x x

u x Var x

Var

i

i i

i

i 



 



^

 ₂

2

ˆ)

( x x

Var

i

 

(13)

Consequence II: OLS not efficient

Example: σ_i²= σ²z_i²

Weighted (homoscedastic) model:

Cauchy–Schwarz:

i i

i v

z x z

y   

   

 

) 1 / ) (

( ˆ ) (

ˆ) ( : OLS model

Original

) / ) (

( )

/ (

) / : (

(WLS) OLS

model New

2 2 2

2 2

*

2 2 2 2 2

2 2 2 2

2 2

* 2

*









  



  

i i i

i

i i i i

i i

i i i

i

i i i

z x z

x

x V

V

x z x x

V x

z V x

z x

v z x



 



 



  _ 



⁽^aⁱ^bⁱ⁾² ^



^aⁱ² ^bⁱ²

New model OLS (WLS):

Original model OLS:

(14)

Solution 1 – White SE

Heteroscedasticity robust estimator of the variance of estimated coefficients:

Univariate model:

Multivariate model:

t-test:

Asymptotic t-distribution: applicable only for large samples

2

2 2 1

) ˆ ) (

( ˆ

xx

i i

S

u x Var _ _



x ^

regression same

the from squares

of sum residual

:

s y variable explanator

other on

of regression from

residual :

ˆ , ) ˆ

( ˆ

2 2

j

j ij

j ij ij j

RSS

x r

RSS u Var _ _



r

SE robust

ˆ 0

  t 

residual from regression of x_j on other explanatory variables residual sum of squares from the same regression

(15)

Solution 2 – WLS

Weighted least squares (WLS)

The latter equation can be estimated with OLS, which is equivalent to minimizing the weighted sum

If the variance is well specified then

– more efficient than the simple OLS (morover BLUE), – and the tests have t- and F-distribution also for small samples.

2 2 2

) ( 1 ,

) ( ,























i i

i i i

i i

i

v V z v

x z

z y

z u

V u

x y

 







n 

i

i i

i

x z y

1

2

2 ˆ ˆ

min 1  

(16)

Examples: WLS

y_i = α + βx_i + u_i

1st common case: Var(u_i) = σ²x_i²

y_i/x_i = α/x_i + β + u_i/x_i can be estimated with OLS 2nd common case: Var(u_i) = σ²x_i

y_i/(x_i^1/2) = α /(x_i^1/2) + βx_i^1/2 + u_i/(x_i^1/2) can be estimated with OLS

Transforming the explanatory variable (e.g. taking logarithm) often solves the problem of

heteroscedasticity.

(17)

Solution 3: TWLS, FGLS

TWLS: two-step weighted least squares FGLS: feasible generalized least squares Steps

weights and ˆ

ith WLS equation w

original the

Estimate

. 6

ˆ ) ˆ exp(

values, fitted

the ˆ are

5.

variables ...,

, and

constant ˆ on

log Regress

. 4

...) exp(

e.g.

variance, e

Specify th 3.

s error term estimated

Generate ˆ 2.

OLS with

estimated ...

. 1

1 2

1 1 0

2

1 1

h g

x x

) u ( σ x

u

u x

x y

i i

ki i

i i i

i

i ki

k i

i

















(18)

Properties of FGLS

Since we are estimating weights, the estimating function is not unbiased.

But it is consistent and asymptotically more efficient than the OLS.

If we think that we did not specify the

variance perfectly then we should use the

White standard errors.

(19)

Example: analysis of the determinants of smoking

Data (source: Wooldridge)

CIGS: number of cigarettes smoked daily INCOME: annual income

CIGPRIC: price of a pack of cigarettes (cent) EDUC: number of years in education

AGE

RESTAURN: if there are laws in the given state that restrict smoking in restaurants

(20)

OLS with usual and robust

standard errors

(21)

Tests

(22)

FGLS estimation

(23)

Eviews program

equation eq_ols equation eq_olsrob

eq_ols.ls cigs c lincome lcigpric educ age age^2 restaurn delete white

delete breuschpagan

freeze(white) eq_ols.hettest(type=white)

freeze(breuschpagan) eq_ols.hettest(BPG) @regs

eq_olsrob.ls(h) cigs c lincome lcigpric educ age age^2 restaurn forecast olsf

genr olsres=cigs-olsf equation eq_logu2

genr logu2=log(olsres^2)

eq_logu2.ls logu2 c lincome lcigpric educ age age^2 restaurn forecast logu2f

genr h=exp(logu2f) genr sqrth=h^(1/2) equation eq_fgls equation eq_fgls2

eq_fgls.ls(w=1/(h)^(1/2)) cigs c lincome lcigpric educ age age^2 restaurn

eq_fgls2.ls cigs/sqrth 1/sqrth lincome/sqrth lcigpric/sqrth educ/sqrth age/sqrth age^2/sqrth restaurn/sqrth

(24)

Multicollinearity

Strong correlation among the regressors:

Individual effect is difficult to determine

Does not contradict the basic assumptions of linear regression

Perfect collinearity: functional relationship

E.g. y = β₁x₁+ β₂x₂+ u, x₂= ax₁

y = (β₁+ aβ₂) x₁+ u

(25)

Consequences, solutions

The estimated coefficient is sensitive to the inclusion and exclusion of variables

The variance of the estimated coefficient might increase:

Large if the variance of the error term is large or S_ii is small or R_i² is large (multicoll.: not necessary, neither sufficient)

Potential solutions:

Exclude variables: variance decreases, but biasedness!

Collect additional data (greater variance in x)

“Merge” variables (e.g. ratio)

) 1

) (

( ˆ ₂

2 2

i ii

i

i RSS S R

Var     

(26)

Endogeneity

Endogeneity: the error term is correlated with the regressor:

Y

_i

= α + βX

_i

+ u

_i

E(u

_i

|x

_i

) ≠ 0

Consequence: OLS estimator of β is biased

and inconsistent

(27)

Possible causes of endogeneity

Omitted variable (u includes something that is correlated with X)

Simultaneity (not only X influences Y, but also Y influences X: Y changes due to u, which also

affects X)

E.g. supply-demand models

Self selection in treatment analysis: “treatment”

(e.g. inclusion in a program) is not independent of the error term

Effect of support to firms on their profitability

Etc.

(28)

Summary

Homeworks

Exam task types

Interpret outputs of a regression

Theoretical questions

How can you identify the outliers, what do you do if the sample includes outliers?

Gauss–Markov theorem

What does the standard error of the forecast depend on in the simple regression case?

Questions with brief answers True/false statements

(29)

Seminar

Heteroscedasticity, multicollinearity

(30)

Maddala: 5/7, 5/8, 7/1, 7/3

Wooldridge: 8.1, 8.2, 8.3, 8.7, 8.9, (3.7, 3.11)

Discussion

Testing and handling heteroscedasticity Multicollinearity – is it really a “problem”?

Data

Model of health expenditures (HRS or SHARE subsample):

Test heteroscedasticity

Multicollinearity: different income or asset indicators jointly included in the regression

Authors: Péter Elek, Anikó Bíró Supervised by: Péter Elek

ECONOMETRICS

ECONOMETRICS

ECONOMETRICS

Authors: Péter Elek, Anikó Bíró Supervised by: Péter Elek

June 2010

ECONOMETRICS

Week 8.

Heteroscedasticity, multicollinearity

Péter Elek, Anikó Bíró

Concept of heteroscedasticity Tests

Consequences

Solutions

Multicollinearity – definition, consequences

Briefly: endogeneity

Heteroscedasticity

Assumption of basic model

Variance of error terms not constant

Heteroscedasticity – example

Example, cont.

Tests I: White-test

Tests II: Breusch–Pagan test

ˆ



Consequence I: usual standard error estimate is not valid





 

 

 



Consequence II: OLS not efficient

   

 

   





Solution 1 – White SE





Solution 2 – WLS

 



Examples: WLS

Solution 3: TWLS, FGLS

Properties of FGLS

Since we are estimating weights, the estimating function is not unbiased.

But it is consistent and asymptotically more efficient than the OLS.

If we think that we did not specify the

variance perfectly then we should use the

White standard errors.

Example: analysis of the determinants of smoking

Data (source: Wooldridge)

OLS with usual and robust

standard errors

Tests

FGLS estimation

Eviews program

Multicollinearity

Strong correlation among the regressors:

Perfect collinearity: functional relationship

Consequences, solutions

Endogeneity

Endogeneity: the error term is correlated with the regressor:

Y

= α + βX

+ u

E(u

|x

) ≠ 0

Consequence: OLS estimator of β is biased

and inconsistent

Possible causes of endogeneity

Summary

Seminar

Heteroscedasticity, multicollinearity

Maddala: 5/7, 5/8, 7/1, 7/3

Wooldridge: 8.1, 8.2, 8.3, 8.7, 8.9, (3.7, 3.11)

Discussion

Data

  _ 