Concluding Remarks - The Use of Machine Learning in Treatment Effect Estimation

The Use of Machine Learning in Treatment Effect Estimation

4.6 Concluding Remarks

In this chapter we review the most recent advances in using Machine Learning models/methods to forecast time-series data in a high-dimensional setup, where the number of variables used as potential predictors is much larger than the available sample to estimate the forecasting models.

We start the chapter by discussing how to construct and compare forecasts from different models. More specifically, we discuss the Diebold-Mariano test of equal predictive ability and the Li-Liao-Quaedvlieg test of conditional superior predictive ability. Finally, we illustrate how to construct model confidence sets.

References 147 In terms of linear ML models, we complement the techniques described in Chapter 1 by focusing of factor-based regression, the combination of factors and penalized regressions and ensemble methods.

After presenting the linear models, we review neural network methods. We discuss both shallow and deep networks, as well as long shot term memory and convolution neural networks.

We end the chapter by discussing some hybrid methods and new proposals in the forecasting literature.

References

Ahn, S. & Horenstein, A. (2013). Eigenvalue ratio test for the number of factors.

Econometrica,81, 1203–1227.

Bai, J. & Ng, S. (2002). Determining the number of factors in approximate factor models.Econometrica,70, 191–221.

Barron, A. (1993). Universal approximation bounds for superpositions of a sigmoidal function.IEEE Transactions on Information Theory,39, 930–945.

Breiman, L. (1996). Bagging predictors. Machine Learning,24, 123–140.

Chen, X. (2007). Large sample sieve estimation of semi-nonparametric models. In J. Heckman & E. Leamer (Eds.),Handbook of econometrics.Elsevier.

Clark, T. & McCracken, M. (2013). Advances in forecast evaluation. In G. Elliott &

A. Timmermann (Eds.),Handbook of economic forecasting(Vol. 2, p. 1107-1201). Elsevier.

Cybenko, G. (1989). Approximation by superposition of sigmoidal functions.

Mathematics of Control, Signals, and Systems,2, 303–314.

Diebold, F. (2015). Comparing predictive accuracy, twenty years later: A personal perspective on the use and abuse of Diebold-Mariano tests.Journal of Business and Economic Statistics,33, 1–9.

Diebold, F. & Mariano, R. (1995). Comparing predictive accuracy. Journal of Business and Economic Statistics,13, 253–263.

Elliott, G., Gargano, A. & Timmermann, A. (2013). Complete subset regressions.

Journal of Econometrics,177(2), 357–373.

Elliott, G., Gargano, A. & Timmermann, A. (2015). Complete subset regressions with large-dimensional sets of predictors. Journal of Economic Dynamics and Control,54, 86–110.

Fan, J., Masini, R. & Medeiros, M. (2021).Bridging factor and sparse models(Tech.

Rep. No. 2102.11341). arxiv.

Fava, B. & Lopes, H. (2020). The illusion of the illusion of sparsity. Brazilian Journal of Probability and Statistics. (forthcoming)

Foresee, F. D. & Hagan, M. . T. (1997). Gauss-newton approximation to Bayesian regularization. InIEEE international conference on neural networks (vol. 3) (pp. 1930–1935). New York: IEEE.

Funahashi, K. (1989). On the approximate realization of continuous mappings by neural networks.Neural Networks,2, 183–192.

Garcia, M., Medeiros, M. & Vasconcelos, G. (2017). Real-time inflation forecasting with high-dimensional models: The case of brazil. International Journal of Forecasting,33(3), 679–693.

Genre, V., Kenny, G., Meyler, A. & Timmermann, A. (2013). Combining expert forecasts: Can anything beat the simple average? International Journal of Forecasting,29, 108–121.

Giacomini, R. & White, H. (2006). Tests of conditional predictive ability. Economet-rica,74, 1545–1578.

Giannone, D., Lenza, M. & Primiceri, G. (2021). Economic predictions with big data: The illusion of sparsity.Econometrica,89, 2409–2437.

Grenander, U. (1981).Abstract inference. New York, USA: Wiley.

Hansen, P., Lunde, A. & Nason, J. (2011). The model confidence set. Econometrica, 79, 453–497.

Harvey, D., Leybourne, S. & Newbold, P. (1997). Testing the equality of prediction mean squared errors.International Journal of Forecasting,13, 281–291.

Hochreiter, S. & Schmidhuber, J. (1997). Long short-term memory. Neural Computation,9, 1735–1780.

Hornik, K., Stinchombe, M. & White, H. (1989). Multi-layer Feedforward networks are universal approximators.Neural Networks,2, 359–366.

Inoue, A. & Kilian, L. (2008). How useful is bagging in forecasting economic time series? a case study of U.S. consumer price inflation.Journal of the American Statistical Association,103, 511-522.

Kock, A. & Teräsvirta, T. (2014). Forecasting performance of three automated modelling techniques during the economic crisis 2007-2009. International Journal of Forecasting,30, 616–631.

Kock, A. & Teräsvirta, T. (2015). Forecasting macroeconomic variables using neural network models and three automated model selection techniques.Econometric Reviews,35, 1753–1779.

Li, J., Liao, Z. & Quaedvlieg, R. (2021). Conditional superior predictive ability.

Review of Economic Studies. (forthcoming)

MacKay, D. J. C. (1992). Bayesian interpolation.Neural Computation,4, 415–447.

MacKay, D. J. C. (1992). A practical bayesian framework for backpropagation networks.Neural Computation,4, 448–472.

McAleer, M. & Medeiros, M. (2008). A multiple regime smooth transition hetero-geneous autoregressive model for long memory and asymmetries.Journal of Econometrics,147, 104–119.

McCracken, M. (2020). Diverging tests of equal predictive ability.Econometrica, 88, 1753–1754.

Medeiros, M. & Mendes, E. (2013). Penalized estimation of semi-parametric additive time-series models. In N. Haldrup, M. Meitz & P. Saikkonen (Eds.),Essays in nonlinear time series econometrics.Oxford University Press.

Medeiros, M., Teräsvirta, T. & Rech, G. (2006). Building neural network models for time series: A statistical approach.Journal of Forecasting,25, 49–75.

References 149 Medeiros, M., Vasconcelos, G., Veiga, A. & Zilberman, E. (2021). Forecasting inflation in a data-rich environment: The benefits of machine learning methods.

Journal of Business and Economic Statistics,39, 98–119.

Mhaska, H., Liao, Q. & Poggio, T. (2017). When and why are deep networks better than shallow ones? InProceedings of the thirty-first aaai conference on artificial intelligence (aaai-17)(pp. 2343–2349).

Onatski, A. (2010). Determining the number of factors from empirical distribution of eigenvalues.Review of Economics and Statistics,92, 1004–1016.

Park, J. & Sandberg, I. (1991). Universal approximation using radial-basis-function networks.Neural Computation,3, 246–257.

Patton, A. (2015). Comment.Journal of Business & Economic Statistics,33, 22-24.

Samuel, A. (1959). Some studies in machine learning using the game of checkers.

IBM Journal of Research and Development,3.3, 210–229.

Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I. & Salakhutdinov, R. (2014).

Simple way to prevent neural networks from overfitting.Journal of Machine Learning Research,15, 1929–1958.

Stinchcombe, M. & White, S. (1989). Universal approximation using feedforward neural networks with non-sigmoid hidden layer activation functions. In Proceedings of the international joint conference on neural networks (pp.

613–617). Washington: IEEE Press, New York, NY.

Stock, J. & Watson, M. (2002a). Forecasting using principal components from a large number of predictors.Journal of the American Statistical Association, 97, 1167–1179.

Stock, J. & Watson, M. (2002b). Macroeconomic forecasting using diffusion indexes.

Journal of Business & Economic Statistics,20, 147–162.

Suarez-Fariñas, Pedreira, C. & Medeiros, M. (2004). Local-global neural networks:

A new approach for nonlinear time series modelling.Journal of the American Statistical Association,99, 1092–1107.

Teräsvirta, T. (1994). Specification, estimation, and evaluation of smooth transition autoregressive models.Journal of the American Statistical Association,89, 208–218.

Teräsvirta, T. (2006). Forecasting economic variables with nonlinear models. In G. Elliott, C. Granger & A. Timmermann (Eds.), (Vol. 1, p. 413-457). Elsevier.

Teräsvirta, T., Tjøstheim, D. & Granger, C. (2010).Modelling nonlinear economic time series. Oxford, UK: Oxford University Press.

Trapletti, A., Leisch, F. & Hornik, K. (2000). Stationary and integrated autoregressive neural network processes.Neural Computation,12, 2427–2450.

West, K. (2006). Forecast evaluation. In G. Elliott, C. Granger & A. Timmermann (Eds.), (Vol. 1, pp. 99–134). Elsevier.

Yarotsky, D. (2017). Error bounds for approximations with deep ReLU networks.

Neural Networks,94, 103–114.

Chapter 5 Causal Estimation of Treatment Effects From

In document ECONOMETRICS with MACHINE LEARNING (Pldal 163-168)