Authors: Péter Elek, Anikó Bíró Supervised by: Péter Elek

(1)

ECONOMETRICS

(2)

ECONOMETRICS

Sponsored by a Grant TÁMOP-4.1.2-08/2/A/KMR-2009-0041 Course Material Developed by Department of Economics,

Faculty of Social Sciences, Eötvös Loránd University Budapest (ELTE) Department of Economics, Eötvös Loránd University Budapest

Institute of Economics, Hungarian Academy of Sciences Balassi Kiadó, Budapest

(3)

(4)

ECONOMETRICS

Authors: Péter Elek, Anikó Bíró Supervised by: Péter Elek

June 2010

ELTE Faculty of Social Sciences, Department of Economics

(5)

ECONOMETRICS

Week 2.

Simple regression I.

Péter Elek, Anikó Bíró

(6)

Basics, examples

Assumptions of the regression model Interpretation of the parameters

Estimation methods

Optimal least squares (OLS) Method of moments

(Maximum likelihood method)

Properties of the estimation, sampling distribution

Plan

(7)

Introduction

Simple regression

y = sales

x = expenditure on advertising

Multivariate regression

y = wage of employee x1 = education

x2 = work experience x3 = living area etc.

Aims

Analyse the effects of such decisions on y which change the x variables

Forecast y with the help of x

Decide if any x has significant effect on y

(8)

Simple (linear) regression: basics I.

y_i = α + βx_i+ u_i(stochastic relationship)

y x

Forecasted variable Forecasting variable Explained variable Explanatory variable Dependent variable Independent variable

Causal variable

u error term

Random human reactions cannot be forecasted Effect of omitted variables

Measurement error in y

(9)

Simple regression: basics II.

Regression parameters:

Intercept Slope

Origin of regression: Francis Galton

Relationship between the height of children (y) and of their parents (x)

y = m +



x



<1 found: “regression to the mean”

(10)

Assumptions

1. E(u

_i

) = 0

2. u

_i

, u

_j

independent for all i≠j 3. x

_i

, u

_j

independent for all i, j

Surely satisfied if

x_i variables are not random

4. Var(u

_i

) = σ

²

for all i (homoscedasticity)

5. u

_i

normally distributed for all i: N(0, σ

²

)

(11)

Interpretation

Consequence of (1) and (3):exogeneity, i.e.

E(u_i| x_k) = 0 for all i,k

So E(y_i|x_i) = α + βx_i

Thus β can be interpreted as partial effect:

Interpretation of α: α = E(y_i|x_i= 0)

 

i i

x x y

E



  |



(12)

Estimation I

Optimal least squares (OLS)

Two normal equations:



^ ^



i

i x

y

Q ²

, ˆ

ˆ ( ˆ ˆ )

min  





0 ) (

ˆ ) ˆ

( 2 ˆ 0

0 ) 1 ( ˆ ) ˆ

( 2 ˆ 0







 









 





i i

i

i i

x x

Q y

x Q y



 



 

(13)

Estimation II

Method of moments (MM)

Method of moments: theoretical moments are equalized to observed moments

(e.g. expected value to sample mean, variance to sample variance)

Normal equations (same as before)

E(u) = 0 cov(u,x) = 0

where

System of equations:

ˆ  0



^xⁱ^uⁱ

^u^ˆⁱ ^ ⁰

i i

i y x

u_ˆ  _ˆ ^ˆ

0 ˆ )

( ˆ

0 ˆ )

( ˆ











i i

i

i i

x x y

x y









(14)

Estimation III

Maximum likelihood (ML) method

Reminder: based on the sample observations (y_i) we are searching such



parameter for which the

probability of observing the given sample is the highest

0

max )

( log )

(

max )

( )

(

1 1

 



















l

y f

L l

y f

L

i n

i

(15)

Results: same equations as under OLS (if error terms normally distributed)

 

2 0 2

log 2 )

, , ( log )

, , (

exp 2 2

) 1 , , (

2 1

2









 









 









 



  



















































 





n

i

i i i

n

i

i i

n

i

i i

n

i

i i

x x l y

x l y

x y

n C L

l

x L y

(16)

Estimators

i i

i

i i

xx xy

x y

y y

u

x y

S S





































ˆ ˆ ˆ

ˆ

ˆ ˆ ˆ



 











ˆ 2

ˆ ˆ ˆ

i i

x x

y x

x n

y











^ ^ ^ ^







ˆ 2

ˆ ) (

ˆ ˆ

i i

iy nx y x x

x

x y





 

  

 

² ² ²

2 2 2

y n y

y y

S

y x n y

x y

y x x

S

x n x

x x

S

i i

yy

i i i

i xy

i i

xx



























 



(17)

“Orthogonality” conditions

Normal equations in modified form:

Therefore:

 



i i i

u x u 0 ˆ

0 ˆ

 



ˆ



⁰

ˆ

ˆ 0 ˆ ˆ

ˆ ˆ ˆ 0











 

y y

u

x u

y u

i i

i i i





(18)

Decomposition of the total sum of squares

Total

Explained Residual

In some textbooks the other way round (“regression” and

“error”)

ESS RSS

TSS

y y

y

_i _i _i _i













  

 ⁽ ⁾

²

⁽ ^ˆ ⁾

²

⁽ ^ˆ ⁾

²

 

     

 _i _i _yy _xy

xy xx

i i

yy i

S S

ESS TSS

y y

RSS

S S

x x

y y

ESS

S y

y TSS



















































 ˆ ˆ

ˆ ˆ

ˆ ˆ ˆ ˆ

ˆ

2

2 2 2

2

(19)

Correlation, coefficient of determination

r

_xy

: observed correlation between x

_i

and y

_i

r

_xy²

: coefficient of determination

TSS ESS S

S S

S S r S

yy xx xy

yy xx

xy

xy    ² /  

(20)

Unbiasedness of the estimators

We assumed here that x_i variables are fixed, but results hold also if these are random variables

Not needed: normal distribution of error term, homoscedasticity







 















 



 



 

 

x E

y E x

y E E

x x

y y

E x E x

i

i i

i

i i

ˆ) ( )

( ˆ )

( ˆ)

(

) (

) ) (

( ˆ ₂ ₂

(21)

Optimality properties of the estimators

BLUE (best linear unbiased estimator): if

homoscedasticity is assumed then our estimator has the smallest variance among the unbiased linear estimators (more details: multivariate case)

If the error term is normally distributed then it is the best estimator among ALL unbiased estimators

(22)

Example

2003 Wage Tariff, simple regression log(Wage_i) = α + β₁Edu_i+ u_i

(23)

Interpretation

1 more year of education increases log(wage) by 0.12

Thus wage is increased by 12%

Can be used for forecasting purposes But: causality (exogeneity)?

Not sure, e.g.

Work experience (observed) Abilities (difficult to observe)

(24)

Distribution of coefficient estimates (fix x

_i

variables)

  ^

^ ^

^

 

...

estimated.

be to has also but

Var ˆ , ˆ ~

: normal are

s error term If

ˆ / ˆ, cov

/ /

ˆ 1 Var

/ /

/ Var

) /

ˆ Var(

Var

: ticity homoscedas

of case In

2

2 2

2 2 2

2





















N N

S x

S x n

S S

x x

S y x x S

S

xx

xx xx

i

xx i

i xx

xy

















 

In case of homoscedasticity

If error terms are normal

but σ ^also has to be estimated…

(25)

Seminar

EViews, simple regression

(26)

EViews

Software for statistics – econometrics

User friendly, especially suitable for time series analysis Help files (User’s guide)

Stata

Has more built-in procedures, easier to program Better for cross sectional and panel analysis

Gretl:

Free to download, appropriate at BA level

Deficiencies: panel and multivariate time series models

Statistical softwares: SPSS, R

(27)

Loading data I.

File/new/workfile - undated

Objects/new object/series - edit Copy – paste

Name

(28)

Loading data II.

File/new/workfile – undated

Procs/Import/Read text-lotus-excel

Source file should be closed!

Excel sheet name …

Names for series: pl. hours tax – reads both series

(29)

Manipulation of variables

Open, descriptive statistics, graphs

View/Descriptive statistics View/Graph

Multiple series can be selected (open as group)

Generate variables (genr) Sample: smpl

smpl 1 20 smpl @all

Or: quick/sample

(30)

Regression

Quick/estimate equation …

Include constant! (c)

Method: OLS is the default

Or:

equation name.ls …

(31)

Example 1 – public expenditures, GDP

Eurostat data

Why can be related? Direction of causality?

Graph (scatterplot)

Regression estimation

Interpretation of the coefficients Significance – t-test, F-test (Wald)

Residuals: View/Resid.tests/histogram

Problems?

(32)

Example 2 – hours worked, marginal tax rate

OECD data

Why can be related? Direction of causality?

Expected sign of the slope coefficient?

Graph (scatterplot)

Estimation of the regression

Interpretation of the coefficients Significance – t-test

Interpretation of R-squared

Estimated value: forecast