Authors: Péter Elek, Anikó Bíró Supervised by: Péter Elek

(1)

ECONOMETRICS

(2)

ECONOMETRICS

Sponsored by a Grant TÁMOP-4.1.2-08/2/A/KMR-2009-0041 Course Material Developed by Department of Economics,

Faculty of Social Sciences, Eötvös Loránd University Budapest (ELTE) Department of Economics, Eötvös Loránd University Budapest

Institute of Economics, Hungarian Academy of Sciences Balassi Kiadó, Budapest

(3)

(4)

ECONOMETRICS

Authors: Péter Elek, Anikó Bíró Supervised by: Péter Elek

June 2010

ELTE Faculty of Social Sciences, Department of Economics

(5)

ECONOMETRICS

Week 3.

Simple regression II.

Péter Elek, Anikó Bíró

(6)

Plan

Estimation of standard deviation

Hypothesis testing, confidence interval Forecasting

Outliers, alternative functional forms

(7)

Reminder I

y_i = α + βx_i+ u_i

Assumptions:

1. E(u_i) = 0

2. Var(u_i) = σ² for all i

3. u_i, u_jindependent for all i≠j 4. x_i, u_jindependent for all i, j

5. u_i normally distributed for all i: N(0, σ²)

(8)

Reminder II

y_i = α + βx_i+ u_i

Estimation

Method of moments OLS

Maximum likelihood

Unbiased estimator – normality and homoscedasticity not needed!

xx xy

S

 S

 ^ˆ

(9)

Spurious regression

“Regression to the mean” for normally distributed variables with same standard deviation:

E(Y|X = x) – m_y= ρ(X – m_x), ρ<1 Coefficient of the regression:

Statistical consequence: coefficient less than 1!

Examples: height of parents and children, scores of first and second exams

xx xy

S

 S

 ^ˆ

(10)

Sampling distribution of coefficient estimates

  ^ ^ ^ ^

 

 ^ ^ 







Var ˆ ,

ˆ ~

/ /

/ Var

) /

ˆ Var(

Var

2 2 2 2

N

S S

x x

S y

x x

S S

xx xx

i

xx i

i xx

xy













(11)

Estimation of variance

       

   

 

   

 

 

² ²

2 2 2

2

2 2 2 2

2 2

2 2 2

2

E ˆ 2 ,

2 ~ ˆ

~

ˆ ˆ ˆ

equations normal

from ˆ

ˆ ˆ 0

ˆ ˆ ˆ



 































 

 









































 



n n

RSS

Q RSS

Q

x u

u

x u

x y

u

n n

n

i i

i

i i

i

from normal equations

(12)

Chi-squared, t-distribution

n n

i n

t N

x

x x x

x

~ y/n x/

Z

t independen

~ y ) 1 , 0 (

~

~ Z

on distributi norm.

standard with

s t variable independen

,..., ,

2 n 1

i

2 2

1













Estimation of variance – two normal equations: degree of freedom is n – 2!

x₁, x₂,…,x_n independent variable with standard norm. distribution

independent

(13)

Hypothesis testing, confidence interval

Confidence interval, hypothesis testing

 



²

 ^{ }

²



²



²

2

2 2 2

2

~ /

/ ˆ 1 ˆ

1 , 0

~ /

/ 1

ˆ

~ ˆ /

ˆ

1 , 0

~ /

ˆ

proof) ˆ (no

ˆ, of ndependent i

2

~ ˆ /











n xx

xx

n xx

xx

n

t S

x n N

S x

n

t S

N S

n

















 





^



^

 

^^

^

^

^

^^_^ ^ ^^





   





 _ _ 1 /2 1

ˆ ˆ 2

/

1 ₂

2 n

n t

t SE P

independent of (no proof)

(14)

Analysis of variance

Previous slide: RSS ~



²χ_{n – 2}²

If β = 0, thus y_i independent N(α,



²) variables then

TSS ~ ²χ_{n – 1}² (Fisher-Bartlett theorem) ESS ~ ²χ₁²

RSS and ESS independent

ESS RSS

TSS

y y

y

_i _i _i _i

 









  

 ⁽ ⁾

²

⁽ ^ˆ ⁾

²

⁽ ^ˆ ⁾

²

(15)

Analysis of variance (cont.)

β = 0 hypothesis Source

of var.

Sum of squares

D. of freed

.

Mean squares

F Regr. ^{ESS = r}²^S^yy

= ˆSxy  ˆ²Sxx

1 MS₁= ESS/1

~ χ12

/1

Residual ^RSS

= (1 – r²)S_yy

= ⁽n²⁾^ˆ²

n – 2 MS₂= RSS/(n –2)

~ χ_{n – 2}²/(n – 2)

F = MS₁/MS₂ = (n – 2)r²/(1 – r²)

~ F_{1,n – 2}

= ˆ² /



ˆ² / Sxx



~ t_{n – 2}²

Total ^S^yy ^{n – 1}

(16)

Forecasting

       

     

  ^{ }

 

 

minimal.

is it then If

/ /

1 1

ˆ Var ˆ ,

cov 2

Var ˆ Var ˆ

Var ˆ

(unbiased)

ˆ 0 ˆ ˆ

ˆ ˆ ˆ

0

2 0

2

0 0

2 0 0

0

0 0

0

0 0

x x

S x

x n

u x

x y

y

x E

y y

E

x y

xx



































































(unbiased)

then it is minimal.

If

(17)

Confidence interval of forecasts

-10 0 10 20 30 40 50 60 70 80

5 10 15 20 25 30

(18)

Forecasting expected value

 

   

  ^ ^  

 

 



₀ ₀



2 0

2 0 0

0

0 0

0

0 0

Var ˆ

/ /

1 , ˆ cov ˆ

2 Var ˆ Var ˆ

Var ˆ

ˆ ) (

ˆ ˆ ˆ

y y

S x

x n

x

x y

E y

E

y x

y E

x y

E

xx





















































(19)

Outliers

Outlier: lies far from the other observations

Can change the regression line

Reasons and handling:

Data error (omit the data)

Special case (individual analysis) Same mechanisms, but outlier data (analyze with the other observations)

-40 0 40 80 120 160 200 240 280 320

0 20 40 60 80 100 120

Z

y outlier nélkül outlierrel

(20)

Outliers (cont.): same regression lines, but different relationships

4 5 6 7 8 9 10 11

2 4 6 8 10 12 14 16

3 4 5 6 7 8 9 10 11

2 4 6 8 10 12 14 16

4 5 6 7 8 9 10 11 12 13

2 4 6 8 10 12 14 16

5 6 7 8 9 10 11 12 13

6 8 10 12 14 16 18 20

(21)

Outliers (cont.): analysis of the residuals (will be important in multivariate case)

-2 -1 0 1 2

2 4 6 8 10 12 14 16

X1

U1

-2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5

2 4 6 8 10 12 14 16

X1

U2

-2 -1 0 1 2 3 4

2 4 6 8 10 12 14 16

X1

U3

-2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5

6 8 10 12 14 16 18 20

X4

U4

(22)

Alternative functional forms

y = Ae

^βx

log(y) = log(A) + βx Form of error term matters:

y = Ae

^βx

e

^u

log(y) = log(A) + βx + u E(e

^u

) ≠ e

^E(u)

= 1, thus E(y) ≠ Ae

^βx

Other examples

y = Ax^β log(y) = log(A) + βlog(x)

y = A + B/x (here only the explanatory variable has to be transformed)

(23)

Example: relationship between earnings and education

log(earn_i) = α + β₁educ_i+ u_i, 2003 Wage Tariff (F-test is the square of t-test in univariate case)

(24)

Example (cont.): Forecasting

How much earnings we expect with 15 years of education?

Uncertainty is relatively large.

   

^{ }

   

 

 



^ker ⁵⁸^,⁸⁰⁰^Ft^,⁴⁰⁷^,⁴⁰⁰^Ft



95 . 0

4937 .

0 96 . 1 95 . ˆ 11

95 . 0

s, error term d

distribute normally

Assuming

4937 .

0 ˆ Var

ˆ, cov 15

ˆ 2 Var ˆ 15

Var Var

unbiased) not

is it but 2003,

(in Ft 800 , 154

95 . 11 122

. 0 15 12

. 10 )

log(

0

2 0

0































P

y E P

u y

earn

earn y









y₀ = log(earn) = 10.12 + 15 . 0.122 = 11.95

earn = 154,800 Ft (in 2003, but it is not unbiased)

Assuming normally distributed error terms,

(25)

Normality of error terms: slight deviation

from normal distribution

(26)

Simple regression, summary

Assumptions

Estimation and its properties (unbiased), interpretation of estimated coefficients Hypothesis testing

Problem of outliers

(27)

Seminar

Simple regression II

(28)

Estimating marginal propensity of consumption

FOGYJOV file

CONS = α + β∙INC + u Sample size: 900

Interpretation of coefficients, calculation of marginal and average propensity of consumption

Interpretation of t-statistic, p-value, R², RSS Testing β = 1 hypothesis

95% and 99% confidence intervals for β

Analysis of significance for a subsample of 30 observations Forecasting for 1.5 m Ft annual income

Authors: Péter Elek, Anikó Bíró Supervised by: Péter Elek

ECONOMETRICS

ECONOMETRICS

ECONOMETRICS