ECONOMETRICS
ECONOMETRICS
Sponsored by a Grant TÁMOP-4.1.2-08/2/A/KMR-2009-0041 Course Material Developed by Department of Economics,
Faculty of Social Sciences, Eötvös Loránd University Budapest (ELTE) Department of Economics, Eötvös Loránd University Budapest
Institute of Economics, Hungarian Academy of Sciences Balassi Kiadó, Budapest
ECONOMETRICS
Authors: Péter Elek, Anikó Bíró Supervised by: Péter Elek
June 2010
ELTE Faculty of Social Sciences, Department of Economics
ECONOMETRICS
Week 3.
Simple regression II.
Péter Elek, Anikó Bíró
Plan
Estimation of standard deviation
Hypothesis testing, confidence interval Forecasting
Outliers, alternative functional forms
Reminder I
yi = α + βxi + ui
Assumptions:
1. E(ui) = 0
2. Var(ui) = σ2 for all i
3. ui, uj independent for all i≠j 4. xi, uj independent for all i, j
5. ui normally distributed for all i: N(0, σ2)
Reminder II
yi = α + βxi + ui
Estimation
Method of moments OLS
Maximum likelihood
Unbiased estimator – normality and homoscedasticity not needed!
xx xy
S
S
ˆ
Spurious regression
“Regression to the mean” for normally distributed variables with same standard deviation:
E(Y|X = x) – my = ρ(X – mx), ρ<1 Coefficient of the regression:
Statistical consequence: coefficient less than 1!
Examples: height of parents and children, scores of first and second exams
xx xy
S
S
ˆ
Sampling distribution of coefficient estimates
Var ˆ ,
ˆ ~
/ /
/ Var
) /
ˆ Var(
Var
2 2 2 2
N
S S
x x
S y
x x
S S
xx xx
i
xx i
i xx
xy
Estimation of variance
2 22 2 2
2
2 2 2 2
2 2
2 2
2 2 2
2
E ˆ 2 ,
2 ~ ˆ
~
ˆ ˆ ˆ
equations normal
from ˆ
ˆ ˆ 0
ˆ ˆ ˆ
n n
RSS
Q RSS
Q
x u
u
x u
x u
x y
u
n n
n
i i
i
i i
i i
i i
i
from normal equations
Chi-squared, t-distribution
n n
n n
i n
t N
x
x x x
x
~ y/n x/
Z
t independen
~ y ) 1 , 0 (
~
~ Z
on distributi norm.
standard with
s t variable independen
,..., ,
2 n 1
i
2 2
1
Estimation of variance – two normal equations: degree of freedom is n – 2!
x1, x2,…,xn independent variable with standard norm. distribution
independent
Hypothesis testing, confidence interval
Confidence interval, hypothesis testing
2
2
2
22
2 2 2
2 2 2
2
~ /
/ ˆ 1 ˆ
1 , 0
~ /
/ 1
ˆ
~ ˆ /
ˆ
1 , 0
~ /
ˆ
proof) ˆ (no
ˆ, of ndependent i
2
~ ˆ /
n xx
xx
n xx
xx
n
t S
x n N
S x
n
t S
N S
n
1 /2 1
ˆ ˆ 2
/
1 2
2 n
n t
t SE P
independent of (no proof)
Analysis of variance
Previous slide: RSS ~
2χn – 22If β = 0, thus yi independent N(α,
2) variables thenTSS ~ 2χn – 12 (Fisher-Bartlett theorem) ESS ~ 2χ12
RSS and ESS independent
ESS RSS
TSS
y y
y y
y
y
i i i i
( )
2( ˆ )
2( ˆ )
2Analysis of variance (cont.)
β = 0 hypothesis Source
of var.
Sum of squares
D. of freed
.
Mean squares
F Regr. ESS = r2Syy
= ˆSxy ˆ2Sxx
1 MS1 = ESS/1
~ χ12
/1
Residual RSS
= (1 – r2)Syy
= (n2)ˆ2
n – 2 MS2 = RSS/(n –2)
~ χn – 22/(n – 2)
F = MS1/MS2 = (n – 2)r2/(1 – r2)
~ F1,n – 2
= ˆ2 /
ˆ2 / Sxx
~ tn – 22
Total Syy n – 1
Forecasting
minimal.
is it then If
/ /
1 1
ˆ Var ˆ ,
cov 2
Var ˆ Var ˆ
Var ˆ
(unbiased)
ˆ 0 ˆ ˆ
ˆ ˆ ˆ
0
2 0
2
0 0
2 0 0
0
0 0
0
0 0
0 0
x x
S x
x n
u x
x y
y
x E
y y
E
x y
x y
xx
(unbiased)
then it is minimal.
If
Confidence interval of forecasts
-10 0 10 20 30 40 50 60 70 80
5 10 15 20 25 30
Forecasting expected value
0 0
2 0
2 0
2 0 0
0
0 0
0
0 0
Var ˆ
/ /
1
, ˆ cov ˆ
2
Var ˆ Var ˆ
Var ˆ
ˆ ) (
ˆ ˆ ˆ
y y
S x
x n
x
x y
E y
E
y x
y E
x y
E
xx
Outliers
Outlier: lies far from the other observations
Can change the regression line
Reasons and handling:
Data error (omit the data)
Special case (individual analysis) Same mechanisms, but outlier data (analyze with the other observations)
-40 0 40 80 120 160 200 240 280 320
0 20 40 60 80 100 120
Z
y outlier nélkül outlierrel
Outliers (cont.): same regression lines, but different relationships
4 5 6 7 8 9 10 11
2 4 6 8 10 12 14 16
3 4 5 6 7 8 9 10 11
2 4 6 8 10 12 14 16
4 5 6 7 8 9 10 11 12 13
2 4 6 8 10 12 14 16
5 6 7 8 9 10 11 12 13
6 8 10 12 14 16 18 20
Outliers (cont.): analysis of the residuals (will be important in multivariate case)
-2 -1 0 1 2
2 4 6 8 10 12 14 16
X1
U1
-2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5
2 4 6 8 10 12 14 16
X1
U2
-2 -1 0 1 2 3 4
2 4 6 8 10 12 14 16
X1
U3
-2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5
6 8 10 12 14 16 18 20
X4
U4
Alternative functional forms
y = Ae
βxlog(y) = log(A) + βx Form of error term matters:
y = Ae
βxe
ulog(y) = log(A) + βx + u E(e
u) ≠ e
E(u)= 1, thus E(y) ≠ Ae
βxOther examples
y = Axβ log(y) = log(A) + βlog(x)
y = A + B/x (here only the explanatory variable has to be transformed)
Example: relationship between earnings and education
log(earni) = α + β1educi + ui, 2003 Wage Tariff (F-test is the square of t-test in univariate case)
Example (cont.): Forecasting
How much earnings we expect with 15 years of education?
Uncertainty is relatively large.
ker 58,800Ft, 407,400Ft
95 . 0
4937 .
0 96 . 1 95 . ˆ 11
95 . 0
s, error term d
distribute normally
Assuming
4937 .
0 ˆ Var
ˆ, cov 15
ˆ 2 Var ˆ 15
Var Var
unbiased) not
is it but 2003,
(in Ft 800 , 154
95 . 11 122
. 0 15 12
. 10 )
log(
0
2 0
0
P
y E P
u y
earn
earn y
y0 = log(earn) = 10.12 + 15 . 0.122 = 11.95
earn = 154,800 Ft (in 2003, but it is not unbiased)
Assuming normally distributed error terms,
Normality of error terms: slight deviation
from normal distribution
Simple regression, summary
Assumptions
Estimation and its properties (unbiased), interpretation of estimated coefficients Hypothesis testing
Problem of outliers
Seminar
Simple regression II
Estimating marginal propensity of consumption
FOGYJOV file
CONS = α + β∙INC + u Sample size: 900
Interpretation of coefficients, calculation of marginal and average propensity of consumption
Interpretation of t-statistic, p-value, R2, RSS Testing β = 1 hypothesis
95% and 99% confidence intervals for β
Analysis of significance for a subsample of 30 observations Forecasting for 1.5 m Ft annual income