ECONOMETRICS
Sponsored by a Grant TÁMOP-4.1.2-08/2/A/KMR-2009-0041 Course Material Developed by Department of Economics,
Faculty of Social Sciences, Eötvös Loránd University Budapest (ELTE) Department of Economics, Eötvös Loránd University Budapest
Institute of Economics, Hungarian Academy of Sciences Balassi Kiadó, Budapest
Authors: Péter Elek, Anikó Bíró Supervised by Péter Elek
June 2010
Week 6
Multivariate regression III.
Content
F-test (cont.), Stability tests
Adjusted R2, model selection Dummy variables
F-test more generally
Joint test of r constraints in a regression with k explanatory variables
Nested hypotheses: the parameter set of the model is a subset of that of the original one
Example
U: y = α + β1x1 + β2x2 + β3x3 + u H0: β2 = 0, β3 = 0 R: y = α + β1x1 + v
F-test, test statistic
Testing a linear function of the parameters
Example: Cobb-Douglas production function logX= α + β1logL+ β2logK + u H0: β1 + β2 = 1
t-test: θ = β1 + β2 β2 = θ – β1 logX = α + β1(logL – logK)+ θlogK + u H : θ = 1
Sum of squares decompositon
Degrees of freedom
F = (R
U2– R
R2) / r
~ F
r,(n – k –1)(1 – R
U2) / (n – k – 1)
TSS = RESS + RRSS = RESS + (RRSS – URSS) + URSS
n – 1 = (k – r) + (n – k + r – 1) = (k – r) + r +(n – k – 1) F = (RRSS – URSS) / r
~ F
r,(n – k –1)URSS /(n – k – 1)
~
r2/r,
RRSS = S
yy(1 – R
R2) URSS = S
yy(1 – R
U2)
Sum of squares decompositonDegrees of freedom
F = (R
U2– R
R2) / r
~ F
r,(n – k –1)(1 – R
U2) / (n – k – 1) F = (R
U2– R
R2) / r
~ F
r,(n – k –1)(1 – R
U2) / (n – k – 1)
TSS = RESS + RRSS = RESS + (RRSS – URSS) + URSS
n – 1 = (k – r) + (n – k + r – 1) = (k – r) + r +(n – k – 1) F = (RRSS – URSS) / r
~ F
r,(n – k –1)URSS /(n – k – 1) F = (RRSS – URSS) / r
~ F
r,(n – k –1)URSS /(n – k – 1)
~
r2/r,
RRSS = S
yy(1 – R
R2) URSS = S
yy(1 – R
U2)
Stability test: two independent data sets (sometimes referred to as Chow-test)
1. yi = α1 + β11x1i + β21x2i +…+ βk1xki + ui, i = 1…n1
2. yi = α2 + β12x1i + β22x2i +…+ βk2xki + vi, i = 1…n2
H0: α1 = α2, β11 = β12, …, βk1 = βk2
RRSS: from the merged data set, RSS1, RSS2: from separate regressions
Stability test – Chow-test (predictive)
1. yi = α1 + β11x1i + β21x2i +…+ βk1xki + ui, i = 1…n1
2. yi = α2 + β12x1i + β22x2i +…+ βk2xki + vi, i = 1…n
n – n1 < k + 1 is possible (in contrast to the previous one)
RSS1: res. sum of squares based on the first n1 observations
RRSS: res. sum of squares based on the model estimated from all (n = n1 + n2) observations
F = (RRSS – RSS
1– RSS
2) / (k + 1)
~ F
k + 1, n1 + n2 – 2k – 2(RSS
1+ RSS
2) / (n
1+ n
2– 2k – 2) F = (RRSS – RSS
1– RSS
2) / (k + 1)
~ F
k + 1, n1 + n2 – 2k – 2(RSS
1+ RSS
2) / (n
1+ n
2– 2k – 2)
F = (RRSS – RSS
1) / (n – n
1)
~ F
(n – n1),(n1 – k – 1)RSS
1/ (n
1– k – 1) F = (RRSS – RSS
1) / (n – n
1)
~ F
(n – n1),(n1 – k – 1)RSS
1/ (n
1– k – 1)
Adjusted R 2
Including new variables: RSS and the degree of freedom are both decreasing (the number of normal equations is increasing)
Adjusted R2:
t < 1: omitting a variable: is increasing F< 1: omitting more variables: is increasing
Possible: different conclusions based on t and F (e.g. multicollinearity)
Model selection
Nested hypotheses: t- and F-test
Non-nested hyp., dependent variable is the same, e.g.:
R&D = α + β log(revenue) + u
R&D = α + β1 revenue + β2 revenue2 + u
based on adjusted R2 or information criteria (e.g. AIC) AIC (Akaike information criterion):
RSS∙exp(2(k + 1)/n)
ˆ
2 RSS df
) 1 1 (
1
21 R
2k n
R n
Adjusted R 2 , example
Wage survey (2003): does the experience or the age explain more in the wage equation?
Logarithmic forms
Log-log (loglinear) – elasticity ln(y)= α + βln(x) + u
Partly logarithmic forms
e x x
y
ˆ 1 % ˆ %
%
ˆ
x y
u x y
x y
u x y
%
%
100 ˆ ) ˆ
ln(
100 ˆ ˆ
)
ln(
Quadratic form
Increasing or decreasing partial effect
Example: wage survey (2003), quadratic function of experience, estimated equations:
log(Ker) = 9.83 + 0.135 ISKVEG9 + 0.0082 EXP
log(Ker) = 9.83 + 0.135 ISKVEG9 + 0.022 EXP – 0.00029 EXP2 positive (but decreasing) partial effect for
0.022/(2*0.00029) = 39 years
Interactions
Partial effect depends on other explanatory variables as well:
Example: wage and education premium depend on the profitability of the firm (net sales revenue – material costs)
Log(wage) = 10.304 + 0.139 Educ9 + 0.092 Log(Profit)
Log(wage) = 10.597 + 0.079 Educ9 + 0.043 Log(Profit) + 0.010 (Educ9*Log(Profit))
1 2 1
1
2 1 2 1 1
2 ˆ ˆ ˆ
x x y
u x x
y
2 2 1 1
2 1 2 1 1
ˆ ˆ ˆ
x x y
u x x x
y
Dummy variables on the right hand side
So far: mainly continuous variables (quantitative information) – e.g. wage, consumption, wealth, education (?)
Binary / dummy variables Qualitative information
Examples: gender, employed, country dummy…
Different intercepts – 2 groups
Example:
otherwise 1
Budapest, from
if 0
, )
( )
age log(
otherwise ,
Budapest from
if ) ,
log(
1 2 1
2 1
i i
i i i
i
i i
i i i
D D
u Educ D
w
u Educ
u wage Educ
Different intercepts, example
Based on the 2003 wage survey
Log dependent variable
Estimated equation:
Countryside: lower wage by approx. 16% (ceteris paribus) Exact difference („log” is the natural logarithm in Eviews):
i i
i
Countrysid e Educ
Wage ) 10 . 93 0 . 16 0 . 15
log(
1 100 14 . 79
: difference wage
%
log 16
. 0 ) log(
) log(
16 . 0
0 1 0
1
e
Wage Wage Wage
Wage
More than two groups
N groups (e.g. regions instead of Budapest / countryside)
N-1 dummies in the regression (if there are N groups), Group N: benchmark group!
Interactions between binary variables
Example: male / female wage gap is not the same in Budapest and in the countryside Four categories:
Benchmark group: females in Budapest Two equivalent models
otherwise) (0
N Group in
1
..., , otherwise) (0
2 Group in
1
) (
...
) (
N Group in
2 Group in
1 Group in
2
1 2
1 2 1
2 1
N
i i N
N i
i i N
i i
i i
i
D D
u x D
D y
u x
u x
u x y
Budapest countryside female
male
Budapest countryside female
male
i i
i
i
Countrysid e Fem Bp Male Countrysid e Male
wage )
0
1_
2_
3_
log(
Interactions, example
Wage survey estimates (benchmark: females in Bp.):
–0,1026 = –0.1726 + 0.0540 + 0.0165
Non-constant slope parameters
In case of two groups
) (
2 Group for
1 Group for
2 2 1
12
2 2 1
11
i i i
i i i
i
u x
x D x
y
u x
x
u x
y x
Example
Effect of education gender-dependent but the effect of age is not
Examining the stability of the coefficients
Example: is the wage model the same for males and females?
Cross sectional analysis (time series: stability of the coefficient in time)
F-test (also possible for a subset of the restrictions)
i i i
i i
i
Educ Educ Male Age u
Wage )
0
1
2
3
log(
0 ,
0 ,
0 :
H
) log(
3 2
2 0
4 3
2 1
2 1
i i i i i i i ii
Male Educ Educ Male Age Age Male u
Wage
Estimation results and test statistic
Dummy dependent variable
Examples:
Labour market: employed or not
Consumption: real estate owner or not Finance: bankruptcy of the borrower or not
Binary dependent variable ↔ linear model?
If F is the Gaussian distribution function then the probit, if F(z) = ez/(1 + ez) then the logit model is obtained.
Linear probability model II.
Problem 1: estimated probability may lie outside the [0,1] interval
Linear probability model III.
Problem 2: heteroscedasticity
-0.2 0 0.2 0.4 0.6 0.8 1 1.2
prob.
1
prob.
) - (1
0 ) ( ,
i i
i
i i
i i
x x
x u x
u E u x y
Linear probability model, example
Dependent variable: whether the person has a private health insurance or not (SHARE database)
Explanatory variables: wealth, income, age, education, country dummies
Histogram of predicted prob
Seminar
Multivariate regression III.
Exercise: estimation of a wage equation based
0 1,000 2,000 3,000 4,000 5,000 6,000
-0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4
Frequency
INSF
Male (male dummy)
Estimation of the wage equation I.
Model 1: modelling log(wage) in the private sector with the educ, exp, exp2, bp, male variables and with the interaction of educ, exp with male
Does the equation for males differ significantly from the equation for females?
Joint test of Male, Educ*Male and Exp*Male
Experience-profile for Budapest males with 12 years of experience Where is the maximum?
Graphical presentation of the experience-profile with confidence interval
Estimation of the wage equation II.
Model 2: previous model + dummies for “chief town of the county” and for “other town”
Testing the equality of the two new coefficients with three methods Directly
By a t-test after transformation
By comparing the R2 of the restricted and unrestricted model Testing heteroscedasticity with the White- and Breusch–Pagan-tests
Exercise: estimation of labour market participation with a linear probability model
Dependent variable: economically active or not
Explanatory variables: education, experience, age, kid below / over 6 years OLS estimation with usual and with robust standard errors, WLS estimation Forecasting probabilities