Multivariate regression III.

(1)

ECONOMETRICS

Sponsored by a Grant TÁMOP-4.1.2-08/2/A/KMR-2009-0041 Course Material Developed by Department of Economics,

Faculty of Social Sciences, Eötvös Loránd University Budapest (ELTE) Department of Economics, Eötvös Loránd University Budapest

Institute of Economics, Hungarian Academy of Sciences Balassi Kiadó, Budapest

(2)

Authors: Péter Elek, Anikó Bíró Supervised by Péter Elek

June 2010

Week 6

Multivariate regression III.

Content

F-test (cont.), Stability tests

Adjusted R², model selection Dummy variables

F-test more generally

Joint test of r constraints in a regression with k explanatory variables

Nested hypotheses: the parameter set of the model is a subset of that of the original one

Example

U: y = α + β1x1 + β2x2 + β3x3 + u H0: β2 = 0, β3 = 0 R: y = α + β1x1 + v

(3)

F-test, test statistic

Testing a linear function of the parameters

Example: Cobb-Douglas production function logX= α + β1logL+ β2logK + u H0: β1 + β2 = 1

t-test: θ = β1 + β2 β2 = θ – β1 logX = α + β1(logL – logK)+ θlogK + u H : θ = 1

Sum of squares decompositon

Degrees of freedom

F = (R

_U²

– R

_R²

) / r

~ F

r,(n – k –1)

(1 – R

_U²

) / (n – k – 1)

TSS = RESS + RRSS = RESS + (RRSS – URSS) + URSS

n – 1 = (k – r) + (n – k + r – 1) = (k – r) + r +(n – k – 1) F = (RRSS – URSS) / r

~ F

r,(n – k –1)

URSS /(n – k – 1)

~ 

_r2/

r,

RRSS = S

_yy

(1 – R

_R²

) URSS = S

_yy

(1 – R

_U²

)

Sum of squares decompositon

Degrees of freedom

F = (R

_U²

– R

_R²

) / r

~ F

r,(n – k –1)

(1 – R

_U²

) / (n – k – 1) F = (R

_U²

– R

_R²

) / r

~ F

r,(n – k –1)

(1 – R

_U²

) / (n – k – 1)

TSS = RESS + RRSS = RESS + (RRSS – URSS) + URSS

n – 1 = (k – r) + (n – k + r – 1) = (k – r) + r +(n – k – 1) F = (RRSS – URSS) / r

~ F

r,(n – k –1)

URSS /(n – k – 1) F = (RRSS – URSS) / r

~ F

r,(n – k –1)

URSS /(n – k – 1)

~ 

_r2/

r,

RRSS = S

_yy

(1 – R

_R²

) URSS = S

_yy

(1 – R

_U²

)

(4)

Stability test: two independent data sets (sometimes referred to as Chow-test)

1. yi = α1 + β11x1i + β21x2i +…+ βk1xki + ui, i = 1…n1

2. yi = α2 + β12x1i + β22x2i +…+ βk2xki + vi, i = 1…n2

H0: α1 = α2, β11 = β12, …, βk1 = βk2

RRSS: from the merged data set, RSS1, RSS2: from separate regressions

Stability test – Chow-test (predictive)

1. yi = α1 + β11x1i + β21x2i +…+ βk1xki + ui, i = 1…n1

2. yi = α2 + β12x1i + β22x2i +…+ βk2xki + vi, i = 1…n

n – n1 < k + 1 is possible (in contrast to the previous one)

RSS1: res. sum of squares based on the first n1 observations

RRSS: res. sum of squares based on the model estimated from all (n = n1 + n2) observations

F = (RRSS – RSS

₁

– RSS

₂

) / (k + 1)

~ F

k + 1, n1 + n2 – 2k – 2

(RSS

₁

+ RSS

₂

) / (n

₁

+ n

₂

– 2k – 2) F = (RRSS – RSS

₁

– RSS

₂

) / (k + 1)

~ F

k + 1, n1 + n2 – 2k – 2

(RSS

₁

+ RSS

₂

) / (n

₁

+ n

₂

– 2k – 2)

F = (RRSS – RSS

₁

) / (n – n

₁

)

~ F

(n – n1),(n1 – k – 1)

RSS

₁

/ (n

₁

– k – 1) F = (RRSS – RSS

₁

) / (n – n

₁

)

~ F

(n – n1),(n1 – k – 1)

RSS

₁

/ (n

₁

– k – 1)

(5)

Adjusted R ²

Including new variables: RSS and the degree of freedom are both decreasing (the number of normal equations is increasing)

Adjusted R²:

t < 1: omitting a variable: is increasing F< 1: omitting more variables: is increasing

Possible: different conclusions based on t and F (e.g. multicollinearity)

Model selection

Nested hypotheses: t- and F-test

Non-nested hyp., dependent variable is the same, e.g.:

R&D = α + β log(revenue) + u

R&D = α + β1 revenue + β2 revenue² + u

based on adjusted R² or information criteria (e.g. AIC) AIC (Akaike information criterion):

RSS∙exp(2(k + 1)/n)

ˆ

²

 RSS df



) 1 1 (

1

²

1 R

²

k n

R n 



 



(6)

Adjusted R ² , example

Wage survey (2003): does the experience or the age explain more in the wage equation?

Logarithmic forms

Log-log (loglinear) – elasticity ln(y)= α + βln(x) + u

Partly logarithmic forms

  ^e ^x ^x

y       

 ˆ 1 % ˆ %

%

^^ˆ



x y

u x y

x y

u x y





















%





 



100 ˆ ) ˆ

ln(

100 ˆ ˆ

)

ln(

(7)

Quadratic form

Increasing or decreasing partial effect

Example: wage survey (2003), quadratic function of experience, estimated equations:

log(Ker) = 9.83 + 0.135 ISKVEG9 + 0.0082 EXP

log(Ker) = 9.83 + 0.135 ISKVEG9 + 0.022 EXP – 0.00029 EXP² positive (but decreasing) partial effect for

0.022/(2*0.00029) = 39 years

Interactions

Partial effect depends on other explanatory variables as well:

Example: wage and education premium depend on the profitability of the firm (net sales revenue – material costs)

Log(wage) = 10.304 + 0.139 Educ9 + 0.092 Log(Profit)

Log(wage) = 10.597 + 0.079 Educ9 + 0.043 Log(Profit) + 0.010 (Educ9*Log(Profit))

1 2 1

1

2 1 2 1 1

2 ˆ ˆ ˆ

x x y

u x x

y







 







2 2 1 1

2 1 2 1 1

ˆ ˆ ˆ

x x y

u x x x

y







 







(8)

Dummy variables on the right hand side

So far: mainly continuous variables (quantitative information) – e.g. wage, consumption, wealth, education (?)

Binary / dummy variables Qualitative information

Examples: gender, employed, country dummy…

Different intercepts – 2 groups

Example:

otherwise 1

Budapest, from

if 0

, )

( )

age log(

otherwise ,

Budapest from

if ) ,

log(

1 2 1

2 1













 





 

i i

i i i

i

i i

i i i

D D

u Educ D

w

u Educ

u wage Educ













(9)

Different intercepts, example

Based on the 2003 wage survey

Log dependent variable

Estimated equation:

Countryside: lower wage by approx. 16% (ceteris paribus) Exact difference („log” is the natural logarithm in Eviews):

i i

i

Countrysid e Educ

Wage ) 10 . 93 0 . 16 0 . 15

log(   

 ¹  ¹⁰⁰ ¹⁴ ^. ⁷⁹

: difference wage

%

log 16

. 0 ) log(

) log(

16 . 0

0 1 0

1



















e



Wage Wage Wage

Wage

(10)

More than two groups

N groups (e.g. regions instead of Budapest / countryside)

N-1 dummies in the regression (if there are N groups), Group N: benchmark group!

Interactions between binary variables

Example: male / female wage gap is not the same in Budapest and in the countryside Four categories:

Benchmark group: females in Budapest Two equivalent models

otherwise) (0

N Group in

1 ..., , otherwise) (0

2 Group in

1 ) (

...

) (

N Group in

2 Group in

1 Group in

2

1 2

1 2 1

2 1

















 



 









N

i i N

N i

i i N

i i

i

D D

u x D

D y

u x

u x y



















Budapest countryside female

male

Budapest countryside female

male

i i

i

Countrysid e Fem Bp Male Countrysid e Male

wage )  

₀

 

₁

_  

₂

_  

₃

_ 

log(

(11)

Interactions, example

Wage survey estimates (benchmark: females in Bp.):

–0,1026 = –0.1726 + 0.0540 + 0.0165

Non-constant slope parameters

In case of two groups

) (

2 Group for

1 Group for

2 2 1

12

2 2 1

11











 





 

i i i

i

u x

x D x

y

u x

x

u x

y x













(12)

Example

Effect of education gender-dependent but the effect of age is not

Examining the stability of the coefficients

Example: is the wage model the same for males and females?

Cross sectional analysis (time series: stability of the coefficient in time)

F-test (also possible for a subset of the restrictions)

i i i

i i

i

Educ Educ Male Age u

Wage ) 

₀



₁



₂

 

₃



log(    

0 ,

0 :

H

) log(

3 2

2 0

4 3

2 1























_i _i _i _i _i _i _i _i

i

Male Educ Educ Male Age Age Male u

Wage

(13)

Estimation results and test statistic

Dummy dependent variable

Examples:

Labour market: employed or not

Consumption: real estate owner or not Finance: bankruptcy of the borrower or not

Binary dependent variable ↔ linear model?

(14)

If F is the Gaussian distribution function then the probit, if F(z) = e^z/(1 + e^z) then the logit model is obtained.

Linear probability model II.

Problem 1: estimated probability may lie outside the [0,1] interval

Linear probability model III.

Problem 2: heteroscedasticity

-0.2 0 0.2 0.4 0.6 0.8 1 1.2

prob.

1 prob.

) - (1

0 ) ( ,

i i

i

i i

x x

x u x

u E u x y

 





 









(15)

Linear probability model, example

Dependent variable: whether the person has a private health insurance or not (SHARE database)

Explanatory variables: wealth, income, age, education, country dummies

Histogram of predicted prob

Seminar

Multivariate regression III.

Exercise: estimation of a wage equation based

0 1,000 2,000 3,000 4,000 5,000 6,000

-0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4

Frequency

INSF

(16)

Male (male dummy)

Estimation of the wage equation I.

Model 1: modelling log(wage) in the private sector with the educ, exp, exp², bp, male variables and with the interaction of educ, exp with male

Does the equation for males differ significantly from the equation for females?

Joint test of Male, Educ*Male and Exp*Male

Experience-profile for Budapest males with 12 years of experience Where is the maximum?

Graphical presentation of the experience-profile with confidence interval

Estimation of the wage equation II.

Model 2: previous model + dummies for “chief town of the county” and for “other town”

Testing the equality of the two new coefficients with three methods Directly

By a t-test after transformation

By comparing the R²of the restricted and unrestricted model Testing heteroscedasticity with the White- and Breusch–Pagan-tests

(17)

Exercise: estimation of labour market participation with a linear probability model

Dependent variable: economically active or not

Explanatory variables: education, experience, age, kid below / over 6 years OLS estimation with usual and with robust standard errors, WLS estimation Forecasting probabilities