Authors: Péter Elek, Anikó Bíró Supervised by: Péter Elek

(1)

ECONOMETRICS

(2)

ECONOMETRICS

Sponsored by a Grant TÁMOP-4.1.2-08/2/A/KMR-2009-0041 Course Material Developed by Department of Economics,

Faculty of Social Sciences, Eötvös Loránd University Budapest (ELTE) Department of Economics, Eötvös Loránd University Budapest

Institute of Economics, Hungarian Academy of Sciences Balassi Kiadó, Budapest

(3)

(4)

ECONOMETRICS

Authors: Péter Elek, Anikó Bíró Supervised by: Péter Elek

June 2010

ELTE Faculty of Social Sciences, Department of Economics

(5)

ECONOMETRICS

Week 6.

Multivariate regression III

Péter Elek, Anikó Bíró

(6)

Content

F-test (cont.), Stability tests

Adjusted R

²

, model selection

Dummy variables

(7)

F-test more generally

Joint test of r constraints in a regression with k explanatory variables

Nested hypotheses: the parameter set of the model is a subset of that of the original one

Example

U: y = α + β₁x₁+ β₂x₂+ β₃x₃+ u H₀: β₂ = 0, β₃ = 0

R: y = α + β₁x₁+ v

(8)

F-test, test statistic

Sum of squares decompositon

Degrees of freedom

in large samples, approximately the Wald-test)

F = ^(R^U

2 – R_R²) / r

~ F r,(n – k –1)

(1 – R_U²) / (n – k – 1)

TSS = RESS + RRSS = RESS + (RRSS – URSS) + URSS

n – 1 = (k – r) + (n – k + r – 1) = (k – r) + r +(n – k – 1) F = (RRSS – URSS) / r

~ F r,(n – k –1)

URSS /(n – k – 1)

~ c_r²/ r,

RRSS = S_yy (1 – R_R²) URSS = S_yy (1 – R_U²)

(9)

Testing a linear function of the parameters

Example: Cobb-Douglas production function logX= α + β₁logL + β₂logK + u

H₀: β₁ + β₂ = 1

t-test: θ = β₁ + β₂ β₂= θ – β₁

logX = α + β₁(logL – logK)+ θ logK + u H₀: θ = 1

t-test directly on β₁ + β₂, using that the variance of the sum is:

Var(β₁^ + β₂^) = Var(β₁^) + Var(β₂^)+ 2cov(β₁^,β₂^) F-test: β₂= 1 - β₁

R: logX – logK = α + β₁(logL – logK) + u

(10)

Stability test: two independent data sets (sometimes referred to as Chow-test)

1. y_i= α₁ + β₁₁x_1i+ β₂₁x_2i+…+ β_k1x_ki+ u_i, i = 1…n₁ 2. y_i= α₂ + β₁₂x_1i+ β₂₂x_2i+…+ β_k2x_ki+ v_i, i = 1…n₂ H₀: α₁= α₂, β₁₁ = β₁₂, …, β_k1 = β_k2

RRSS: from the merged data set,

RSS_1,RSS₂: from separate regressions F = (RRSS – RSS₁ – RSS₂) / (k + 1)

~ F k + 1, n1 + n2 – 2k – 2

(RSS₁ + RSS₂) / (n₁ + n₂ – 2k – 2)

(11)

Stability test – Chow-test (predictive)

1. y_i= α₁ + β₁₁x_1i+ β₂₁x_2i+…+ β_k1x_ki+ u_i, i = 1…n₁ 2. y_i= α₂ + β₁₂x_1i+ β₂₂x_2i+…+ β_k2x_ki+ v_i, i = 1…n

n – n₁< k + 1 is possible (in contrast to the previous one)

RSS₁: res. sum of squares based on the first n₁ observations RRSS: res. sum of squares based on the model estimated

from all (n = n₁+ n₂) observations

F =(RRSS – RSS₁) / (n – n₁)

~ F (n – n1),(n1 – k – 1)

RSS₁ / (n₁ – k – 1)

(12)

Adjusted R ²

Including new variables: RSS and the degree of freedom are both decreasing (the number of normal equations is increasing)

Adjusted R²:

t < 1: omitting a variable: is increasing F< 1: omitting more variables: is increasing Possible: different conclusions based on t and F (e.g. multicollinearity)

ˆ ²  RSSdf



) 1

1(

1 ² 1 R²

k n

R n 



 



R2

(13)

Model selection

Nested hypotheses: t- and F-test

Non-nested hyp., dependent variable is the same, e.g.:

R&D = α + β log(revenue) + u

R&D = α + β₁ revenue + β₂ revenue² + u

based on adjusted R² or information criteria (e.g. AIC)

AIC (Akaike information criterion):

RSS∙exp(2(k + 1)/n)

(14)

Adjusted R ² , example

Wage survey (2003): does the experience or the age explain more in the wage equation?

(15)

Logarithmic forms

Log-log (loglinear) – elasticity ln(y)= α + βln(x) + u

Partly logarithmic forms

  ^e ^x ^x

y       

 ˆ 1 % ˆ %

%

^^ˆ



x y

u x

y

x y

u x

y





















%





 



100 ˆ ) ˆ

ln(

100 ˆ ˆ

) ln(

(16)

Quadratic form

Increasing or decreasing partial effect

Example: wage survey (2003), quadratic function of experience, estimated equations:

log(Ker) = 9.83 + 0.135 ISKVEG9 + 0.0082 EXP

log(Ker) = 9.83 + 0.135 ISKVEG9 + 0.022 EXP – 0.00029 EXP²

positive (but decreasing) partial effect for 0.022/(2*0.00029) = 39 years

1 2 1

1

2 1 2 1

1

2 ˆ ˆ ˆ

x x y

u x

x y







 







(17)

Interactions

Partial effect depends on other explanatory variables as well:

Example: wage and education premium depend on the

profitability of the firm (net sales revenue – material costs)

Log(wage) = 10.304 + 0.139 Educ9 + 0.092 Log(Profit)

Log(wage) = 10.597 + 0.079 Educ9 + 0.043 Log(Profit) + 0.010 (Educ9*Log(Profit))

2 2 1

1

2 1 2 1

1

ˆ ˆ ˆ

x x y

u x

x x

y







 







(18)

Dummy variables on the right hand side

So far: mainly continuous variables

(quantitative information) – e.g. wage, consumption, wealth, education (?)

Binary / dummy variables

Qualitative information

Examples: gender, employed, country dummy…

(19)

Different intercepts – 2 groups

Example:

otherwise 1

Budapest, from

if 0

, )

( )

age log(

otherwise ,

Budapest from

if ) ,

log(

1 2

1 2 1



















 

i i

i

D D

u Educ

D w

u Educ

u wage Educ













(20)

Different intercepts, example

Based on the 2003 wage survey

(21)

Log dependent variable

Estimated equation:

Countryside: lower wage by approx. 16% (ceteris paribus)

Exact difference („log” is the natural logarithm in Eviews):

i i

i Countryside Educ

Wage ) 10.93 0.16 0.15

log(   



¹



¹⁰⁰ ¹⁴^.⁷⁹

: difference wage

%

log 16

. 0 )

log(

) log(

16 . 0

0 1 0

1



















e

Wage Wage Wage

Wage

(22)

More than two groups

N groups (e.g. regions instead of Budapest / countryside)

N-1 dummies in the regression (if there are N groups), Group N: benchmark group!

otherwise) (0

N Group in

1

..., , otherwise) (0

2 Group in

1

) (

...

) (

N Group in

2 Group in

1 Group in

2

1 2

1































N

i i

N N

i

i i

N

i i

i

D D

u x

D D

y

u x

y



















(23)

Interactions between binary variables

Example: male / female wage gap is not the same in Budapest and in the countryside

Four categories:

Benchmark group: females in Budapest Two equivalent models

Budapest countryside

female

male

i i

i

i i

u Educ

Male e

Countrysid e

Countrysid Male

wage

u Educ

Male e

Countrysid Male

Bp Fem

e Countrysid wage































3 2

1 0

3 2

1 0

) log(

_ _

_ )

log(

(24)

Wage survey estimates (benchmark: females in Bp.):

–0,1026 = –0.1726 + 0.0540 + 0.0165

Interactions, example

(25)

Non-constant slope parameters

In case of two groups

→ Interaction of a dummy variable and an explanatory variable

2 Group for

1 1,

Group for

0 ) (

2 Group for

1 Group for

2 2 1

11 12

1 11

2 2 1

12

2 2 1

11













 





 

i i

i i i

i

i i

i

i i

D D

u x

x D x

y

u x

x

u x

y x













(26)

Example

Effect of education gender-dependent but the effect of age is not

i i

i

i Educ Educ Male Age u

Wage )  ₀  ₁  ₂   ₃ 

log(

   

(27)

Examining the stability of the coefficients

Example: is the wage model the same for males and females?

Cross sectional analysis (time series: stability of the coefficient in time)

F-test (also possible for a subset of the restrictions) Problem if N₂ < k → Chow-test (predictive) can be used (see week 5)

0 ,

0 :

H

) log(

3 2

2 0

4 3

2 1























 _i _i _i _i _i _i _i _i

i Male Educ Educ Male Age Age Male u

Wage

(28)

Estimation results and test statistic

Examining the stability of the coefficients, cont.

(29)

Dummy dependent variable

Examples:

Labour market: employed or not

Consumption: real estate owner or not

Finance: bankruptcy of the borrower or not

Binary dependent variable ↔ linear model?

(30)

Linear probability model I.

y binary variable

Nonlinear model:

If F is the Gaussian distribution function then the probit, if F(z) = e^z/(1 + e^z) then the logit model is obtained.

prob.

estimated 1

ˆ :

: model Estimated

)

| 1 (

)

| (











 y

y

u x

y

x x

y P x

y E

i i

i

 





) (

)

| 1

(y x F x

P   

(31)

Linear probability model II.

Problem 1: estimated probability may lie outside the [0,1]

interval

-0.2 0 0.2 0.4 0.6 0.8 1 1.2

(32)

Linear probability model III.

Problem 2: heteroscedasticity

Solution:

Using robust SE Weighted LS:

ˆ ) 1 ˆ ( )

- (1 )

- 1 ( )

)(

- (1 )

Var(

prob.

1

prob.

) -

(1

0 )

( ,

2 2

i i

i

i i

i

i i

y y

x x

u

x x

x u x

u E u

x y



















 









ˆ ) 1

ˆ_i ( _i

i y y

w  

(33)

Linear probability model, example

0 1,000 2,000 3,000 4,000 5,000 6,000

-0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4

Frequency

INSF

Dependent variable: whether the person has a private health insurance or not (SHARE database)

Explanatory variables: wealth, income, age, education, country dummies Histogram of predicted prob.

(34)

Seminar

Multivariate regression III.

(35)

Exercise: estimation of a wage equation based on a small sample from the wage survey

Variables

Educ (years of education) Exp (experience)

Wage

Typ (type of settlement – qualitative variable) Bp (Budapest dummy)

Male (male dummy)

(36)

Estimation of the wage equation I.

Model 1: modelling log(wage) in the private sector with the educ, exp, exp², bp, male variables and with the interaction of educ, exp with male

Does the equation for males differ significantly from the equation for females?

Joint test of Male, Educ*Male and Exp*Male

Experience-profile for Budapest males with 12 years of experience

Where is the maximum?

Graphical presentation of the experience-profile with confidence interval

(37)

Estimation of the wage equation II.

Model 2: previous model + dummies for “chief town of the county” and for “other town”

Testing the equality of the two new coefficients with three methods

Directly

By a t-test after transformation

By comparing the R²of the restricted and unrestricted model

Testing heteroscedasticity with the White- and Breusch–Pagan-tests

Calculating robust standard errors and comparing them with the non-robust ones

(38)

Authors: Péter Elek, Anikó Bíró Supervised by: Péter Elek

ECONOMETRICS

ECONOMETRICS

ECONOMETRICS

Authors: Péter Elek, Anikó Bíró Supervised by: Péter Elek

June 2010

ECONOMETRICS

Week 6.

Multivariate regression III

Péter Elek, Anikó Bíró

Content

F-test (cont.), Stability tests

Adjusted R

, model selection

Dummy variables

F-test more generally

F-test, test statistic

Testing a linear function of the parameters

Stability test: two independent data sets (sometimes referred to as Chow-test)

Stability test – Chow-test (predictive)

Adjusted R 2

Model selection

Adjusted R 2 , example

Logarithmic forms

  e x x

y       

 ˆ 1 % ˆ %

%



Quadratic form

Interactions











Dummy variables on the right hand side

So far: mainly continuous variables

(quantitative information) – e.g. wage, consumption, wealth, education (?)

Binary / dummy variables

Different intercepts – 2 groups

Different intercepts, example

Based on the 2003 wage survey

Log dependent variable





More than two groups

Interactions between binary variables

Interactions, example

Non-constant slope parameters

2 Group for

1 1,

Group for

0

) (

2 Group for

1 Group for



















 













 

D D

u x

x D x

y

u x

x

Adjusted R ²

Adjusted R ² , example

  ^e ^x ^x