ECONOMETRICS
Sponsored by a Grant TÁMOP-4.1.2-08/2/A/KMR-2009-0041 Course Material Developed by Department of Economics,
Faculty of Social Sciences, Eötvös Loránd University Budapest (ELTE) Department of Economics, Eötvös Loránd University Budapest
Institute of Economics, Hungarian Academy of Sciences Balassi Kiadó, Budapest
2
Authors: Péter Elek, Anikó Bíró Supervised by Péter Elek
June 2010
Week 7
Summary of estimation methods and large sample theory
Regression model
yi = α + β1x1i + β2x2i +…+ βkxki + ui, i = 1…n Assumptions
1. E(ui) = 0
2. ui, uj independent for all i≠j
3. xi, uj independent for all i, j (exogeneity) 4. No perfect collinearity
5. Var(ui) = σ2 for all i 6. ui has normal distribution 1–5.: Gauss–Markov conditions
1–6.: Conditions of classical linear model
3
Assumptions differently (for large sample theory – stochastic explanatory variables)
1. Population model: y = α + β1x1 + β2x2 +…+ βkxk + u.
2. {(x1i,x2i,…,xki,yi), i = 1,…,n} random independent sample of the model.
3. None of the regressors is constant, no perfect collinearity among the regressors.
4. Exogeneity: E(u|x1,…,xk) = 0
5. Homoscedasticity: Var(u|x1,…,xk) = σ2
6. u independent of the regressors, normally distributed.
1–5.: Gauss–Markov conditions
1–6.: Conditions of classical linear model
Multivariate regression model
Estimation: method of moments or OLS (also ML estimation if error term is normal)
Matrix
2 2
2 1
ˆ 1
ˆ,
( ˆ ˆ ˆ ... ˆ )
min Q
iy
ix
ix
i kx
kik
) ' ( ) '
ˆ ( X X
-1X y β
u
Xβ
y
4
Simple regression
Interpretation of multivariate model
Interpretation of coefficients
Partial effect (“ceteris paribus”): effect of a given regressor on the dependent variable, holding the other regressors fixed
Coefficient of determination: R2
RSS = Syy(1 – R2)
i i
i i i
i i
xx xy
x y
y y u
x y
x y
S S
ˆ ˆ ˆ
ˆ
ˆ ˆ ˆ
ˆ ˆ ˆ
2 2 22 2 2
y n y y
y S
y x n y x y
y x x S
x n x x
x S
i i
yy
i i i
i xy
i i
xx
i i
i i i
i i
xx xy
x y
y y u
x y
x y
S S
ˆ ˆ ˆ
ˆ
ˆ ˆ ˆ
ˆ ˆ ˆ
2 2 22 2 2
y n y y
y S
y x n y x y
y x x S
x n x x
x S
i i
yy
i i i
i xy
i i
xx
5
Small sample properties of estimation
If assumptions 1–4 hold: OLS unbiased
If assumptions 1–5 (Gauss-Markov) hold: the estimation is BLUE, and the common formula of variance is correct:
If assumptions 1–6 (classical linear model) hold:
the t- and F-statistic have t- and F-distribution, respectively (any sample size).
Multivariate regression, t-test
Two sided test: pl. H0: βi = 0, H1: βi ≠ 0 One sided test: pl. H0: βi = 0, H1: βi > 0
In case of normal error term:
Simple regression
i
i
RSS
Var
2
ˆ )
(
~
1ˆ ) ( ˆ
k n i
i
i
t
SE
i
i
RSS
Var
2
ˆ )
(
ˆ
21
k n
RSS
2
22
2 2
~ /
/ ˆ 1
ˆ
~ ˆ /
ˆ
n xx
n xx
t S
x n
t S
6
Multivariate regression, F-test
Testing nested hypotheses Testing multiple restrictions
Analysis of variance
) 1 ( 2 ,
2 2
2 1 0
2 2
) 1 ( 2 ,
2 2
) ~ 1 /(
) 1
( /
0
0 ...
: H
: used be cannot Regression
) 1
(
) 1
(
) ~ 1 /(
) 1
(
/ ) (
) 1 /(
/ ) (
k n k U
U R
k
U yy
R yy
k n r U
R U
k F n R
k F R
R
R S
URSS R
S RRSS
k F n R
r R R k
n URSS
r URSS F RRSS
Source of variance
Sum of squares
Degree of freedom
Mean sum of squares
F Regr. R
2S
yyk R
2S
yy/k = MS
1F =
= MS
1/MS
2Residual (1 – R
2)S
yyn – k – 1 (1 – R
2)S
yy/(n – k – 1) =
= MS
2Total S
yyn – 1
7
Large sample properties I:
consistency
If assumptions 1–4 hold: OLS is consistent. Proof for simple regression
Large sample properties II: asymptotic normality
If assumptions 1–5 (Gauss–Markov) hold: OLS estimator is asymptotically normal:
Thus the standard deviation goes to zero in order n1/2.
The common estimator of σ2 is consistent, therefore the common t-test is asymptotically valid (even if assumption 6 (normality) does not hold)!
) (
) , (
) (
) ,
ˆ ( plim
ˆ
x Var
u x Cov
x Var
u x x
Cov Var(x)
Cov(x,y) S
S
xx xy
) 1
( )
1 ) (
( ˆ
, 0 ˆ ~
2 2
2 2
2
i x
i i
i asympt i
i
R R
n TSS Var
n c
c N n
i
8
Large sample properties III:
F-test and others
If assumptions 1-5 hold (assumption 6 (normality) not needed): F-test is asymptotically valid.
Other large sample tests (only asymptotically valid):
Wald-test: n(RRSS-URSS)/URSS ~ χr2
regression cannot be used: nR2/(1-R2)~ χk2
Lagrange-multiplicator (LM) test: n(RRSS-URSS)/RRSS ~ χr2
regression cannot be used: nR2 ~ χk2
Model selection
Adjusted R2
Nested hypotheses: t- and F-test
Non-nested hypotheses, same dependent variable: adjusted R2, information criteria (AIC, BIC – based on log-likelihood)
Omitting relevant variables
If omitted variable is correlated with included regressors: biased estimation (endogeneity)
Simple regression
True model: y = β1x1 + β2x2+ u Estimated model: y = γ1x1 + u
Bias: Corr(x1,x2)>0 Corr(x1,x2)<0
β2 >0 + –
β2 <0 – +
) 1
1 (
1
21 R
2k n
R n
9
Including irrelevant variables
True model: y = β1x1 + β2x2 + u
Estimated model: y= β1x1 + β2x2 + β3x3+ u, β3 = 0 Does not affect unbiasedness (no endogeneity) Variance increases
Other topics
Forecasting Outliers
Alternative functional forms Tests of stability
Dummy regressors
Quadratic terms, interactions Heteroscedasticity, etc.
Seminar First exam
i
i
RSS
2