• Nem Talált Eredményt

Econometrics Applications

In document ECONOMETRICS with MACHINE LEARNING (Pldal 44-50)

1 Linear Econometric Models with Machine Learning 27 Table 1.4:Power of a Simple𝑡-test for Unpenalized Parameter:𝛽1=0.1

𝜌 𝑁 OLS LASSO adaLASSO adaLASSO2 Ridge Elastic Net SCAD

0

30 12.54 11.6 11.68 11.6 11.56 12.22 12.16

50 16.22 15.56 15.86 15.88 15.44 15.9 16.18

100 28.48 27.88 28.1 27.92 27.24 28.06 28.08

500 87.64 87.4 87.4 87.38 86.3 87.42 87.18

1000 99.42 99.44 99.46 99.46 99.16 99.46 99.46

0.45

30 11.18 12.54 12.26 12.4 13.54 12.48 12.0

50 14.9 17.58 17.78 17.42 19.62 17.4 16.7

100 23.9 28.6 28.54 28.42 33.94 27.86 28.58

500 77.3 82.26 81.94 81.96 93.02 81.2 87.6

1000 97.12 98.2 98.14 98.08 99.88 98.0 99.38

0.9

30 7.8 7.74 7.94 7.78 5.58 7.32 7.68

50 6.78 7.42 7.44 7.24 6.46 6.96 6.8

100 7.84 9.1 9.24 9.28 10.14 8.7 8.54

500 17.56 22.62 22.36 22.38 42.82 20.8 25.5

1000 31.12 38.32 38.1 38.12 74.44 35.64 48.08

the signal to noise ratio in the form of uniform signal strength condition clearly plays an extremely important role in variable selection using shrinkage estimators. Thus, caution and further research in this area seems warranted.

Table 1.5:Percentage of Replications Excludinglog𝑥2

𝜌 𝑁 OLS LASSO adaLASSO adaLASSO2 Ridge Elastic Net SCAD

0

30 0.0 52.24 52.22 52.26 0.0 46.82 79.58

50 0.0 56.44 56.44 56.44 0.0 50.56 86.48

100 0.0 64.3 64.3 64.3 0.0 62.98 94.16

500 0.0 88.08 88.08 88.08 0.0 89.16 99.98

1000 0.0 95.02 95.02 95.02 0.0 96.26 100.0

0.45

30 0.0 50.08 50.18 50.22 0.0 47.9 79.74

50 0.0 56.0 55.96 56.0 0.0 53.7 85.8

100 0.0 64.08 64.08 64.08 0.0 63.72 94.34

500 0.0 89.88 89.88 89.88 0.0 91.28 99.96

1000 0.0 96.64 96.64 96.64 0.0 96.7 100.0

0.9

30 0.0 47.46 47.68 47.66 0.0 47.06 73.68

50 0.0 53.06 53.34 53.12 0.0 54.16 79.42

100 0.0 65.44 65.5 65.42 0.0 64.94 89.76

500 0.0 93.98 93.98 93.98 0.0 92.38 99.62

1000 0.0 98.82 98.82 98.82 0.0 97.62 99.94

1.6.1 Distributed Lag Models

The origin of the distributed lag model can be traced back to Tinbergen (1939).

While there have been studies focusing on lag selection in an Autoregressive-Moving Average (ARMA) setting (for examples, see Wang et al., 2007; Hsu et al., 2008 and Huang et al., 2008), the application of the partially penalized estimator as discussed in Section 1.4.3 has not been discussed in this context.

Consider the following DGP

𝑦𝑖=x𝑖𝛽𝛽𝛽+

𝐿

∑︁

𝑗=1

x𝑖−𝑗𝛼𝛼𝛼𝑗+𝑢𝑖, (1.33) where 𝐿 < 𝑁. Choosing the appropriate lag order for a particular variable is a challenging task. Perhaps more importantly, as the number of observations increases, the number of potential (lag) variables increases and this creates additional difficulties in identifying a satisfactorily model. If one is only interested in statistical inference on the estimates of𝛽𝛽𝛽, then the results from Section 1.4.3 may seem to be useful. In this case, one can apply a Partially Penalized Estimator and obtain the parameter

1 Linear Econometric Models with Machine Learning 29 Table 1.6:Percentage of Replications Excluding𝑥2

2

𝜌 𝑁 OLS LASSO adaLASSO adaLASSO2 Ridge Elastic Net SCAD

0

30 0.0 3.32 3.28 3.3 0.0 2.08 7.72

50 0.0 2.08 2.06 2.06 0.0 1.18 5.16

100 0.0 1.16 1.16 1.16 0.0 0.68 3.58

500 0.0 0.64 0.62 0.62 0.0 0.18 1.88

1000 0.0 0.5 0.5 0.5 0.0 0.1 1.88

0.45

30 0.0 4.22 4.22 4.24 0.0 3.4 7.96

50 0.0 2.4 2.42 2.4 0.0 1.52 5.08

100 0.0 1.22 1.2 1.2 0.0 0.5 4.02

500 0.0 0.7 0.7 0.7 0.0 0.12 1.94

1000 0.0 0.34 0.34 0.34 0.0 0.1 1.28

0.9

30 0.0 14.28 14.3 14.32 0.0 13.6 19.36

50 0.0 9.6 9.52 9.5 0.0 8.46 14.32

100 0.0 5.62 5.6 5.62 0.0 5.3 9.52

500 0.0 2.54 2.54 2.58 0.0 2.0 5.24

1000 0.0 1.9 1.84 1.88 0.0 1.3 4.24

estimates as follows 𝛽𝛽𝛽,ˆ 𝛼𝛼𝛼ˆ

=arg min

𝛽𝛽 𝛽 , 𝛼𝛼𝛼

𝑁

∑︁

𝑖=1

©

­

«

𝑦𝑖x𝑖𝛽𝛽𝛽−

𝐿

∑︁

𝑗=1

x𝑖− 𝑗𝛼𝛼𝛼𝑗ª

®

¬

2

+𝜆 𝑝(𝛼𝛼𝛼), (1.34)

where𝛼𝛼𝛼= 𝛼𝛼𝛼

1, . . . , 𝛼𝛼𝛼

𝐿

and the𝑝(𝛼𝛼𝛼)is a regularizer applied only to𝛼𝛼𝛼. Since ˆ𝛽𝛽𝛽is not part of the shrinkage, under the Bridge regularizer and the assumptions in Proposition 1.1, ˆ𝛽𝛽𝛽has an asymptotically normal distribution which facilitates valid inferences on 𝛽𝛽𝛽. Obviously,𝛽𝛽𝛽does not have to be the coefficients associated with the covariates at the same time period as the response variable. The argument above should apply to any coefficients of interests. The main idea is that if a researcher is interested in conducting statistical inference on a particular set of coefficients with a potentially high number of possible control variables, as long as the coefficients of interests are not part of the shrinkage, valid inference on these coefficients may still be possible.

1.6.2 Panel Data Models

Following the idea above, another potentially useful application is the panel data model with fixed effect. Consider

𝑦𝑖 𝑡=x𝑖 𝑡𝛽𝛽𝛽+𝛼𝑖+𝑢𝑖 𝑡, 𝑖=1, . . . , 𝑁 , 𝑡=1, . . . , 𝑇 . (1.35) The parameter vector𝛽𝛽𝛽is typically estimated by thefixed effectestimator

ˆ 𝛽 𝛽

𝛽𝐹 𝐸=arg min

𝛽𝛽𝛽 𝑁

∑︁

𝑖=1 𝑇

∑︁

𝑡=1

¤

𝑦𝑖 𝑡− ¤x𝑖 𝑡𝛽𝛽𝛽2

, (1.36)

where𝑦¤𝑖 𝑡 =𝑦𝑖 𝑡−𝑦¯𝑖 with ¯𝑦𝑖=𝑇−1Í𝑇

𝑡=1𝑦𝑖 𝑡 and𝑥¤𝑖 𝑡=x𝑖 𝑡x¯𝑖 with ¯x𝑖=𝑇−1Í𝑇

𝑡=1x𝑖 𝑡. In practice, the estimator is typically computed using the dummy variable approach.

However, when 𝑁 is large, the number of 𝛼𝑖 is also large. Since it is common for 𝑁 >> 𝑇 in panel data, the number of dummy variables required for the fixed effect estimator can also be unacceptably large. Since the main focus is𝛽𝛽𝛽, under the assumption that𝛼𝑖are constants for𝑖=1, . . . , 𝑁, it seems also possible to apply the methodology from the previous example and consider the following estimator

𝛽𝛽𝛽,ˆ 𝛼𝛼𝛼ˆ

=arg min

𝛽 𝛽 𝛽 , 𝛼𝛼𝛼

𝑁

∑︁

𝑖=1 𝑇

∑︁

𝑡=1

𝑦𝑖 𝑡x𝑖 𝑡𝛽𝛽𝛽−𝛼𝑖2

+𝜆 𝑝(𝛼𝛼𝛼). (1.37) where𝛼𝛼𝛼=(𝛼1, . . . , 𝛼𝑁)and𝑝(𝛼𝛼𝛼)denotes a regularizer applied only to the coefficients of the fixed effect dummies. Proposition 1.1 should apply in this case without any modification.

This can be extended to a higher dimensional panel with more than two indexes.

In such cases, the number of dummies required grows exponentially. While it is possible to obtain a fixed effect estimators in a higher dimensional panel through various transformations proposed by Balazsi, Mátyás and Wansbeek (2018), these transformations are not always straightforward to derive and the dummy variable approach could be more practically convenient. The dummy variable approach, however, suffers from the curse of dimensionality and the proposed method here seems to be a feasible way to resolve this issue. Another potential application is to incorporate interacting fixed effects of the form𝛼𝑖 𝑡 into model (1.35). This is, of course, not possible in a usual two-dimensional panel data setting, but feasible with this approach.

Another possible application, which has been proposed in the literature, is to incorporate a regularizer in Equation (1.36) and therefore define a shrinkage estimator in a standard fixed effects framework. Specifically,fixed effects with shrinkagecan be defined as

𝛽ˆ

𝛽𝛽𝐹 𝐸=arg min

𝛽𝛽𝛽 𝑁

∑︁

𝑖=1 𝑇

∑︁

𝑡=1

¤

𝑦𝑖 𝑡− ¤x𝑖 𝑡𝛽𝛽𝛽2

+𝜆 𝑝(𝛽𝛽𝛽;𝛼𝛼𝛼), (1.38)

1 Linear Econometric Models with Machine Learning 31 where𝑝(𝛽𝛽𝛽)denotes the regularizer, which in principle, can be any regularizers such as those introduced in Section 1.2. Given the similarity between Equations (1.38) and all the other shrinkage estimators considered so far, it seems reasonable to assume that the results in Knight and Fu (2000) and Proposition 1 would apply possibly with only some minor modifications. This also means that fixed effect models with a shrinkage estimator are not immune to the shortfall of shrinkage estimators in general.

The observations and issues highlighted in this chapter would apply equally in this case.

1.6.3 Structural Breaks

Another econometric example where shrinkage type estimators could be helpful is the testing for structural breaks with unknown breakpoints. Consider the following DGP

𝑦𝑖=x𝑖𝛽𝛽𝛽+x𝑖𝛿𝛿𝛿0𝐼(𝑖 > 𝑡1) +𝑢𝑖, 𝑢𝑖∼𝐷(0, 𝜎𝑢2), (1.39) where the break point,𝑡1is unknown. Equation (1.39) implies that the parameter vector when𝑖≤𝑡1is𝛽𝛽𝛽0 and when𝑖 > 𝑡1, it is𝛽𝛽𝛽0+𝛿𝛿𝛿0. In other words, a structural break occurs at𝑖=𝑡1and𝛿𝛿𝛿denotes the shift in the parameter vector before and after the break point.

Such models have a long history in econometrics, for example, see Andrews (1993) and Andrews (2003) as well as the references within. However, the existing tests are bounded by the𝑝 < 𝑁restriction. That is, the number of variables must be less than the number of observations. Given that these tests are mostly residuals based tests, this means that it is possible to obtain post-shrinkage (or post selection) residuals and use these residuals in the existing tests. To illustrate the idea, consider the simple case when𝑡1is known. In this case, a typical approach is to consider the following 𝐹-test statistics as proposed by Chow (1960)

𝐹=𝑅 𝑆 𝑆𝑅−𝑅 𝑆 𝑆𝑈 𝑅1−𝑅 𝑆 𝑆𝑈 𝑅2 𝑅 𝑆 𝑆𝑈 𝑅1+𝑅 𝑆 𝑆𝑈 𝑅2

𝑁−2𝑝 𝑝

, (1.40)

where𝑅 𝑆 𝑆𝑅denotes the residual sum-of-squares from the restricted model (𝛿𝛿𝛿=0), while𝑅 𝑆 𝑆𝑈 𝑅1and𝑅 𝑆 𝑆𝑈 𝑅2denote the unrestricted sum-of-squares before and after the break, respectively. Specifically,𝑅 𝑆 𝑆𝑈 𝑅1denotes the residual sum-of-squares from the residuals ˆ𝑢𝑡=𝑦𝑡−x𝑡𝛽𝛽𝛽ˆfor𝑡≤𝑡1and𝑅 𝑆 𝑆𝑈 𝑅2denotes the residuals sum-of-squares from the residuals ˆ𝑢𝑖=𝑦𝑡x𝑡

𝛽𝛽𝛽ˆ+𝛿𝛿𝛿ˆ

for𝑡 > 𝑡1.

It is well known that under the null hypothesis𝐻0:𝛿𝛿𝛿=0, the𝐹-test statistics in Equation (1.40) follows an𝐹distribution under the usual regularity conditions. When 𝑡1is not known, Andrews (1993) derived the asymptotic distribution for

𝐹=sup

𝑠

𝐹(𝑠), (1.41)

where𝐹(𝑠)denotes the𝐹-statistics as defined in Equation (1.40) assuming𝑠as the breakpoint for𝑠=𝑝+1, . . . , 𝑁−𝑝−1. The idea is to select a breakpoint𝑠, such that the test has the highest chance to reject the null of𝐻0:𝛿𝛿𝛿=0. The distribution based on this approach is non-standard as shown by Andrews (1993) and must therefore be tabulated or simulated.

Note that the statistics in Equation (1.40) is based on the residuals rather than the individual coefficient estimates, so it is possible to use the arguments by Belloni et al.

(2012) and construct the statistics as follows:

Step 1. Estimate the parameter vector𝑦𝑖=x𝑖𝛽𝛽𝛽+𝑢𝑖using a LASSO type estimator, called it ˆ𝛽𝛽𝛽𝐿 𝐴𝑆 𝑆𝑂.

Step 2. Obtain a Post-Selection OLS. That is, estimate the linear regression model using OLS with the covariates selected in the previous step.

Step 3. Construct the residuals using the estimates from the previous step ˆ𝑢𝑅 ,𝑖=𝑦𝑖−𝑦ˆ𝑖 where ˆ𝑦𝑖=x𝑖𝛽𝛽𝛽ˆ

𝑂 𝐿 𝑆. Step 4. Compute𝑅 𝑆 𝑆𝑈 𝑅𝑁

𝑖=1𝑢ˆ2

𝑅 ,𝑖.

Step 5. Estimate the following model using a LASSO-type estimator

𝑦𝑖=x𝑖𝑖 𝛽𝛽𝛽+

𝑁−1

∑︁

𝑗=2

𝛿𝑗x𝑖𝛽𝛽𝛽 𝐼(𝑖≤𝑗) +𝑢𝑖 (1.42) and denotes the estimates for𝛽𝛽𝛽,𝛿𝛿𝛿 and 𝑗 as ˆ𝛽𝛽𝛽𝐿 𝐴𝑆 𝑆𝑂𝑈 𝑅, ˆ𝛿𝐿 𝐴𝑆 𝑆𝑂 and ˆ𝑗, respectively. Under the assumption there is only one break𝛿𝑗=0 for all 𝑗 except when 𝑗=𝑡1.

Step 6. Obtain the Post-Selection OLS for the pre-break unrestricted model, ˆ

𝛽𝛽𝛽𝑈 𝑅1−𝑂 𝐿 𝑆. That is, estimate the linear regression model using OLS with the covariates selected in Step 5 for𝑖≤𝑗ˆ.

Step 7. Obtain the Post-Selection OLS for the post-break unrestricted model, 𝛽ˆ

𝛽𝛽𝑈 𝑅2−𝑂 𝐿 𝑆. That is, estimate the linear regression model using OLS with the covariates selected in Step 5 for𝑖 > 𝑗ˆ.

Step 8. Construct the pre-break residuals using ˆ𝛽𝛽𝛽

𝑈 𝑅1−𝑂 𝐿 𝑆. That is, ˆ𝑢𝑈 𝑅1,𝑖=𝑦𝑖−𝑦ˆ𝑖, where ˆ𝑦𝑖=x𝑖𝛽𝛽𝛽ˆ

𝑈 𝑅1−𝑂 𝐿 𝑆for𝑖≤ 𝑗ˆ. Step 9. Construct the post-break residuals using ˆ𝛽𝛽𝛽

𝑈 𝑅2−𝑂 𝐿 𝑆. That is, ˆ𝑢𝑈 𝑅2,𝑖=𝑦𝑖−𝑦ˆ𝑖, where ˆ𝑦𝑖=x𝑖𝛽𝛽𝛽ˆ

𝑈 𝑅2−𝑂 𝐿 𝑆for𝑖 >𝑗ˆ. Step 10. Compute𝑅 𝑆 𝑆𝑈 𝑅1ˆ𝑗

𝑖=1𝑢ˆ2

𝑈 𝑅1,𝑖and𝑅 𝑆 𝑆𝑈 𝑅2𝑁

𝑖=𝑗+1ˆ 𝑢ˆ2

𝑈 𝑅2,𝑖. Step 11. Compute the test statistics as defined in Equation (1.40).

Essentially, the proposal above uses LASSO as a variable selector as well as a break point identifier. It then generates the residuals sum-of-squares using OLS based on the selection given by LASSO. This approach can potentially be justified by the results by Belloni et al. (2012) and Belloni and Chernozhukov (2013). Unlike the conventional approach when the breakpoint is unknown, such as those studied by Andrews (1993) whose test statistics have non-standard distributions, the test statistics proposed here is likely to follow the𝐹distribution similar to the original test statistics as proposed by Chow (1960) and can accommodate the case when𝑝 > 𝑁. To the best

1 Linear Econometric Models with Machine Learning 33 of the authors’ knowledge, this approach is novel with both the theoretical properties and finite sample performance to be further evaluated. However, given the results of Belloni et al. (2012), this seems like a plausible approach to tackle of the problem of detecting structural breaks with unknown breakpoints.

In document ECONOMETRICS with MACHINE LEARNING (Pldal 44-50)