ECONOMETRICS
Sponsored by a Grant TÁMOP-4.1.2-08/2/A/KMR-2009-0041 Course Material Developed by Department of Economics,
Faculty of Social Sciences, Eötvös Loránd University Budapest (ELTE) Department of Economics, Eötvös Loránd University Budapest
Institute of Economics, Hungarian Academy of Sciences Balassi Kiadó, Budapest
Authors: Péter Elek, Anikó Bíró Supervised by Péter Elek
June 2010
Week 3
Simple regression II.
Plan
Estimation of standard deviation
Hypothesis testing, confidence interval Forecasting
Outliers, alternative functional forms
Reminder I
yi = α + βxi + ui
Assumptions:
1. E(ui) = 0
2. Var(ui) = σ2 for all i
3. ui, uj independent for all i≠j 4. xi, uj independent for all i, j
5. ui normally distributed for all i: N(0, σ2)
Reminder II
yi = α + βxi + ui
Estimation
Method of moments OLS
Maximum likelihood
Unbiased estimator – normality and homoscedasticity not needed!
Spurious regression
“Regression to the mean” for normally distributed variables with same standard deviation:
E(Y|X = x) – my = ρ(X – mx), ρ<1 Coefficient of the regression:
Statistical consequence: coefficient less than 1!
Examples: height of parents and children, scores of first and second exams
Sampling distribution of coefficient estimates
xx xy
S ˆ S
xx xy
S
ˆ
SVar ˆ , ˆ ~
/ /
/ Var
) / ˆ Var(
Var
2 2 2 2
N
S S
x x
S y x x S
S
xx xx
i
xx i i
xx xy
Estimation of variance
Chi-squared, t-distribution
2 2
2 2 2 2
2 2 2 2
2 2 2
2
2 2 2
2
E ˆ 2 , 2 ~
ˆ
~
ˆ ˆ ˆ
equations normal
from ˆ
ˆ 0 ˆ
ˆ ˆ ˆ
n n
RSS
Q RSS
Q
x u
u
x u
x u
x y
u
n n
n
i i
i
i i
i i
i i
i
n n
n n
i n
t N
x
x x x
x
~ y/n x/
Z
t independen
~ y ) 1 , 0 (
~
~ Z
on distributi norm.
standard with
s t variable independen
,..., ,
2 n 1
i
2 2
1
Hypothesis testing, confidence interval
Confidence interval, hypothesis testing
Analysis of variance
Previous slide: RSS ~ 2χn – 22
If β = 0, thus yi independent N(α, 2) variables then TSS ~ 2χn – 12
(Fisher-Bartlett theorem) ESS ~ 2χ12
RSS and ESS independent
2 2 2
2 2
2 2 2
2 2 2
2
~ /
/ ˆ 1 ˆ 1
, 0
~ /
/ 1
ˆ
~ ˆ /
ˆ
1 , 0
~ /
ˆ
proof) ˆ (no
ˆ , of ndependent i
2
~ ˆ /
n xx
xx
n xx xx
n
t S
x n N
S x n
t S
N S
n
1 2
/ ˆ 1
ˆ 2
/
1
22 n
n
t
t SE P
ESS RSS
TSS
y y y
y y
y
i i i i2 2
2
( ˆ ) ( ˆ )
)
(
Analysis of variance (cont.)
β = 0 hypothesis
Forecasting
Source of var.
Sum of squares
D. of freed
.
Mean squares
F Regr. ESS = r
2S
yy= ˆ S
xyˆ
2S
xx1 MS
1= ESS/1
~ χ
12/1
Residual RSS
= (1 – r
2)S
yy= (n 2 ) ˆ
2n – 2 MS
2= RSS/(n –2)
~ χ
n – 22
/(n – 2)
F = MS
1/MS
2= (n – 2)r
2/(1 – r
2)
~ F
1,n – 2= ˆ
2/ ˆ
2/
Sxx~ t
n – 22Total S
yyn – 1
minimal.
is it then If
/ /
1 1
ˆ Var ˆ ,
cov 2
Var ˆ Var ˆ
Var ˆ
(unbiased)
ˆ 0 ˆ ˆ
ˆ ˆ ˆ
0
2 0
2
0 0
2 0 0
0
0 0
0
0 0
0 0
x x
S x x n
u x
x y
y
x E
y y E
x y
x y
xx
Confidence interval of forecasts
Forecasting expected value
-10 0 10 20 30 40 50 60 70 80
5 10 15 20 25 30
0 0
2 0
2 0
2 0 0
0
0 0
0
0 0
Var ˆ
/ /
1
, ˆ cov ˆ
2
Var ˆ Var ˆ
Var ˆ
ˆ ) ( ˆ ˆ ˆ
y y
S x x n x
x y
E y
E
y x
y E
x y
E
xx
Outliers
Outlier: lies far from the other observations
Can change the regression line Reasons and handling:
Data error (omit the data)
Special case (individual analysis) Same mechanisms, but outlier data (analyze with the other observations)
Outliers (cont.): same regression lines, but different relationships
-4 0 0 4 0 8 0 1 2 0 1 6 0 2 0 0 2 4 0 2 8 0 3 2 0
0 2 0 4 0 6 0 8 0 1 0 0 1 2 0
Z
y ou tlier n é lkü l ou tlierrel
4 5 6 7 8 9 10 11
2 4 6 8 10 12 14 16
3 4 5 6 7 8 9 10 11
2 4 6 8 10 12 14 16
4 5 6 7 8 9 10 11 12 13
2 4 6 8 10 12 14 16
5 6 7 8 9 10 11 12 13
6 8 10 12 14 16 18 20
Outliers (cont.): analysis of the residuals (will be important in multivariate case)
Alternative functional forms
y = Aeβx log(y) = log(A) + βx Form of error term matters:
y = Aeβxeu log(y) = log(A) + βx + u E(eu) ≠ eE(u) = 1, thus E(y) ≠ Aeβx
Other examples
y = Axβ log(y) = log(A) + βlog(x)
-2 -1 0 1 2
2 4 6 8 10 12 14 16
X1
U1
-2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5
2 4 6 8 10 12 14 16
X1
U2
-2 -1 0 1 2 3 4
2 4 6 8 10 12 14 16
X1
U3
-2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5
6 8 10 12 14 16 18 20
X4
U4
Example: relationship between earnings and education
log(earni) = α + β1educi + ui, 2003 Wage Tariff (F-test is the square of t-test in univariate case)
Example (cont.): Forecasting
How much earnings we expect with 15 years of education?
Uncertainty is relatively large.
Ft 400 , 407 , Ft 800 , 58 ker 95
. 0
4937 . 0 96 . 1 95 . ˆ 11
95 . 0
s, error term d
distribute normally
Assuming
4937 . 0 ˆ Var
ˆ , cov 15 ˆ 2
Var ˆ 15
Var Var
unbiased) not
is it but 2003, (in Ft 800 , 154
95 . 11 122 . 0 15 12 . 10 ) log(
0
2 0
0
P y E P
u y
earn
earn y
Normality of error terms: slight deviation from normal distribution
Simple regression, summary
Assumptions
Estimation and its properties (unbiased), interpretation of estimated coefficients Hypothesis testing
Problem of outliers
Seminar
Simple regression II
Estimating marginal propensity of consumption
FOGYJOV file
CONS = α + β∙INC + u Sample size: 900
Interpretation of coefficients, calculation of marginal and average propensity of consumption
Interpretation of t-statistic, p-value, R2, RSS Testing β = 1 hypothesis
95% and 99% confidence intervals for β
Analysis of significance for a subsample of 30 observations Forecasting for 1.5 m Ft annual income