• Nem Talált Eredményt

Heteroscedasticity, multicollinearity

N/A
N/A
Protected

Academic year: 2022

Ossza meg "Heteroscedasticity, multicollinearity "

Copied!
13
0
0

Teljes szövegt

(1)

ECONOMETRICS

Sponsored by a Grant TÁMOP-4.1.2-08/2/A/KMR-2009-0041 Course Material Developed by Department of Economics,

Faculty of Social Sciences, Eötvös Loránd University Budapest (ELTE) Department of Economics, Eötvös Loránd University Budapest

Institute of Economics, Hungarian Academy of Sciences Balassi Kiadó, Budapest

(2)

Authors: Péter Elek, Anikó Bíró Supervised by Péter Elek

June 2010

Week 8

Heteroscedasticity, multicollinearity

Concept of heteroscedasticity Tests

Consequences Solutions

Multicollinearity – definition, consequences Briefly: endogeneity

Heteroscedasticity

Assumption of basic model

Var(ui) = σ2 for all i – homoscedasticity Variance of error terms not constant

Var(ui) = σi2 for all i – heteroscedasticity

(3)

Heteroscedasticity – example

Consumption model (data: SHARE, 2004, Germany – food expenditures)

Distribution of residuals as a function of income

Example, cont.

Alternative model

Distribution of residuals as a function of income

i i

i

i i i

i

Wealth Th

Inc C

u Wealth Th

Inc C

_ 007 . 0 02

. 0 6 . ˆ 379

2

_

1 0

-1,000 -500 0 500 1,000 1,500 2,000

0 10,000 20,000 30,000 40,000

INC

RESID01

i i

i

i i i

i

Wealth Th

Inc C

u Wealth Th

Inc C

_ 05 . 0 log

15 . 0 63 . ˆ 4 log

_ log log

log

0 1 2

-2.0 -1.6 -1.2 -0.8 -0.4 0.0 0.4 0.8 1.2 1.6

0 10,000 20,000 30,000 40,000

RESID02

(4)

Tests I: White-test

Question: Is there a systematic factor in the variance of residuals?

White-test: regression of ûi2

on the explanatory variables, their squares, and cross products

→ F- or chi-squared test of the significance of the coefficients

Tests II: Breusch–Pagan test

Auxiliary regression: regression of ûi2

on z1,..., zk explanatory variables (which are thought to influence the variance)

S0sum of squares in auxiliary regression (ESS) R2 coefficient of determination in auxiliary regression

estimated variance of (original!) error terms

LM-test of applicability of the auxiliary regression, which has an alternative formula if u is normally distributed

ˆ

2

normal) u

ˆ if (and 2

~

2 04

2

S

nR

k

(5)

Consequence I: usual standard error estimate is not valid

Univariate model

yi = α + βxi + ui, Var(ui) = σi2 → Unbiased (E(ui) = 0; xi, ui independent)

In case of homoscedasticity:

Gives biased variance estimate if heteroscedasticity is present! – Usual tests are not applicable.

ˆ 2 2

x x

u x x x

x

y y x x

i i i

i i i

2 2 2 2

)

2

( ˆ

x x

x x x

x

u x Var x

Var

i

i i

i

i i

2 2

ˆ )

( x x

Var

i

(6)

Consequence II: OLS not efficient

Example: σi2

2zi2

Weighted (homoscedastic) model:

Cauchy–Schwarz:

Solution 1 – White SE

Heteroscedasticity robust estimator of the variance of estimated coefficients:

Univariate model:

Multivariate model:

t-test:

Asymptotic t-distribution: applicable only for large samples i i i i

i

v

z x z

y

) 1 / ) (

( ˆ ) (

ˆ ) ( : OLS model Original

) / ) (

( )

/ (

) / : (

(WLS) OLS

model New

2 2 2

2 2

*

2 2 2 2 2

2 2 2 2

2 2

* 2

*

i i i

i i

i i i

i i i

i i i

i i i i

z x z

x x V

V

x z x x

V x

z V x

z x

v z x

2 2

)

2

( a

i

b

i

a

i

b

i

2 2 2 1

) ˆ ) (

( ˆ

xx i i

S

u x Var x

regression same

the from squares

of sum residual :

s y variable explanator

other on

of regression from

residual :

ˆ , ) ˆ

( ˆ

2 2

j

j ij

j ij ij j

RSS

x r

RSS u Var r

SE robust

ˆ

0

t

(7)

Solution 2 – WLS

Weighted least squares (WLS)

The latter equation can be estimated with OLS, which is equivalent to minimizing the weighted sum

If the variance is well specified then

– more efficient than the simple OLS (morover BLUE),

– and the tests have t- and F-distribution also for small samples.

Examples: WLS

yi = α + βxi + ui

1st common case: Var(ui) = σ2xi2

yi/xi = α/xi + β + ui/xi can be estimated with OLS 2nd common case: Var(ui) = σ2xi

yi/(xi1/2) = α /(xi1/2) + βxi1/2

+ ui/(xi1/2

) can be estimated with OLS

Transforming the explanatory variable (e.g. taking logarithm) often solves the problem of heteroscedasticity.

2 2 2

) ( 1 ,

) ( ,

i i

i i i

i i

i i

i i i

v V z v

x z

z y

z u

V u

x y

n

i

i i

i

x z y

1

2

2

ˆ ˆ

min 1

(8)

Solution 3: TWLS, FGLS

TWLS: two-step weighted least squares FGLS: feasible generalized least squares Steps

Properties of FGLS

Since we are estimating weights, the estimating function is not unbiased.

But it is consistent and asymptotically more efficient than the OLS.

If we think that we did not specify the variance perfectly then we should use the White standard errors.

weights and ˆ

ith WLS equation w

original the

Estimate

. 6

ˆ ) ˆ exp(

values, fitted

the ˆ are

5.

variables ...,

, and constant ˆ on

log Regress

. 4

...) exp(

e.g.

variance, e

Specify th 3.

s error term estimated

Generate ˆ 2.

OLS with estimated

...

. 1

1 2

1 1 0 2

1 1

h g

h g

x x

) u ( σ x

u

u x x

y

i i

i i

ki i

i i i

i

i ki k i

i

(9)

Example: analysis of the determinants of smoking

Data (source: Wooldridge)

CIGS: number of cigarettes smoked daily INCOME: annual income

CIGPRIC: price of a pack of cigarettes (cent) EDUC: number of years in education

AGE

RESTAURN: if there are laws in the given state that restrict smoking in restaurants

OLS with usual and robust

standard errors

(10)

Tests

FGLS estimation

(11)

Eviews program

equation eq_ols equation eq_olsrob

eq_ols.ls cigs c lincome lcigpric educ age age^2 restaurn delete white

delete breuschpagan

freeze(white) eq_ols.hettest(type=white)

freeze(breuschpagan) eq_ols.hettest(BPG) @regs

eq_olsrob.ls(h) cigs c lincome lcigpric educ age age^2 restaurn forecast olsf

genr olsres=cigs-olsf equation eq_logu2 genr logu2=log(olsres^2)

eq_logu2.ls logu2 c lincome lcigpric educ age age^2 restaurn forecast logu2f

genr h=exp(logu2f) genr sqrth=h^(1/2) equation eq_fgls equation eq_fgls2

eq_fgls.ls(w=1/(h)^(1/2)) cigs c lincome lcigpric educ age age^2 restaurn eq_fgls2.ls cigs/sqrth 1/sqrth lincome/sqrth lcigpric/sqrth educ/sqrth age/sqrth age^2/sqrth restaurn/sqrth

Multicollinearity

Strong correlation among the regressors:

Individual effect is difficult to determine

Does not contradict the basic assumptions of linear regression Perfect collinearity: functional relationship

E.g. y = β1x1 + β2x2 + u, x2 = ax1

y = (β1 + aβ2)x1 + u

(12)

Consequences, solutions

The estimated coefficient is sensitive to the inclusion and exclusion of variables The variance of the estimated coefficient might increase:

Large if the variance of the error term is large or Sii is small or Ri2

is large (multicoll.: not necessary, neither sufficient)

Potential solutions:

Exclude variables: variance decreases, but biasedness!

Collect additional data (greater variance in x)

“Merge” variables (e.g. ratio)

Endogeneity

Endogeneity: the error term is correlated with the regressor:

Yi= α + βXi + ui E(ui|xi) ≠ 0 Consequence: OLS estimator of β is biased and inconsistent

Possible causes of endogeneity

Omitted variable (u includes something that is correlated with X)

Simultaneity (not only X influences Y, but also Y influences X: Y changes due to u, which also affects X)

E.g. supply-demand models

Self selection in treatment analysis: “treatment” (e.g. inclusion in a program) is not independent of the error term

Effect of support to firms on their profitability Etc.

) 1

) (

( ˆ

2

2 2

i ii

i

i

RSS S R

Var

(13)

Summary

Homeworks Exam task types

Interpret outputs of a regression Theoretical questions

How can you identify the outliers, what do you do if the sample includes outliers?

Gauss–Markov theorem

What does the standard error of the forecast depend on in the simple regression case?

Questions with brief answers True/false statements

Seminar

Heteroscedasticity, multicollinearity

Maddala: 5/7, 5/8, 7/1, 7/3

Wooldridge: 8.1, 8.2, 8.3, 8.7, 8.9, (3.7, 3.11) Discussion

Testing and handling heteroscedasticity Multicollinearity – is it really a “problem”?

Data

Model of health expenditures (HRS or SHARE subsample):

Test heteroscedasticity

Multicollinearity: different income or asset indicators jointly included in the regression

Hivatkozások

KAPCSOLÓDÓ DOKUMENTUMOK

The essence of the military reforms initiated by Károly Róbert was that he based the military of the country upon the utilization of priváté troops, in

Fore limb, loin and hind limb cuts of California and New Zealand white rabbits (both sexes) of a marketable age (2 and 3 months) were used to study the

11 Szolgálati Szabályzat a Magyar Királyi Honvédség számára, 1890. számú közren- delete. Honvédelmi Minisztérium, Budapest, 1931.. október 15-én hatályba

Keywords: blood-brain barrier, aging, efflux transporters, P-glycoprotein, BCRP, Alzheimer’s disease, Parkinson’s disease,

To compare the healthy children and patients with different age and body size, the following methods were used:.. (a) First, age (A), height and weight (H/W) and age

aOR (Model 1): adjusted for gender, age, and employment status; aOR (Model 2): adjusted for gender, age, employment status, hazardous alcohol use, cannabis use, and other substance

However, the Late Bronze Age, together with Early Iron Age individuals, have significantly higher δ 13 C ratios than previous times suggesting a dietary change with a higher

This model proposes that the experience of negative affective states of anxiety and loneliness has a highly and statistically signif- icant direct effect on levels of problem