• Nem Talált Eredményt

Conclusion

In document ECONOMETRICS with MACHINE LEARNING (Pldal 122-129)

The Use of Machine Learning in Treatment Effect Estimation

3.6 Conclusion

3 The Use of Machine Learning in Treatment Effect Estimation 105

hyperpr = 1

mage<15

6.4[101.1], (2%) mage19

−221.7[9.94](64%) −120.1[19.2], (27%) mage<22

85.2[60.1], (3%) −163.0[54.5], (3%) yes no

yes no

yes no

yes no

Fig. 3.1:A causal tree for the effect of smoking on birth weight (first time black mothers).

Notes: standard errors are in brackets and the percentages in parenthesis denote share of group in the sample. The total number of observations is𝑁=157,989 and the covariates used are𝑋1, except for the polynomial and interaction terms. To obtain a simpler model, we choose the pruning parameter using the 1SE-rule rather than the minimum cross-validated MSE.

negative with age (compare the two largest leaves on the𝑚 𝑎𝑔 𝑒≥28 and𝑚 𝑎𝑔 𝑒 <28 branches). Hypertension appears again as a potentially relevant variable, but the share of young women affected by this problem is small. We also note that the results are obtained over a subsample of𝑁=150,000 and robustness checks show sensitivity to the choice of the subsample. Nonetheless, the age pattern is qualitatively stable.

The preceding results illustrate both the strengths and weaknesses of using a causal tree for heterogeneity analysis. On the one hand, letting the data speak for itself is philosophically attractive and can certainly be useful. On the other hand, the estimation results may appear to be too complex or arbitrary and can be challenging to interpret. This problem is exacerbated by the fact that trees grown on different subsamples can show different patterns — suggesting that in practice it is prudent to construct a causal forest, i.e., use the average of multiple trees.

this literature is that machine learning can be fruitfully applied for this purpose if the nuisance functions enter the second stage estimating equations (more precisely, moment conditions) in a way that satisfies an orthogonality condition. This condition ensures that the parameter of interest is consistently estimable despite the selection and approximation errors introduced by the first stage machine learning procedure.

Inference in the second stage can then proceed as usual. In practice a cross-fitting procedure is recommended, which involves splitting the sample between the first and second stages.

In applications of thecausal tree or forestmethodology, the parameter of interest is the full dimensional conditional average treatment effect (CATE) function, i.e., the focus is on treatment effect heterogeneity. The method permits near automatic and data-driven discovery of this function, and if the selected approximation is re-estimated on an independent subsample (the ‘honest approach’) then inference about group-specific effects can proceed as usual. Another strand of the heterogeneity literature estimates the projection of the CATE function on a given, pre-specified coordinate to facilitate presentation and interpretation. This can be accomplished by an extension of the DML framework where in the first stage the nuisance functions are estimated by an ML method and in the second stage a traditional nonparametric estimator is used (e.g., kernel-based or series regression).

In our empirical application (the effects of maternal smoking during pregnancy on the baby’s birthweight) we illustrate the use of the DML estimator as well as causal trees. While the results confirm previous findings in the literature, they also highlight some limitations of these methods. In particular, with the number of observations orders of magnitude larger than the number of covariates, and the covariates not being very strong predictors of the treatment, DML virtually coincides with OLS and even the naive (direct) Lasso estimator. The causal tree approach successfully uncovers important patterns in treatment effect heterogeneity but also some that seem somewhat incidental and/or less straightforward to interpret. This suggests that in practice a causal forest should be used unless the computational cost is prohibitive.

In sum, machine learning methods, while geared toward prediction tasks in themselves, can be used to enhance treatment effect estimation in various ways. This is an active research area in econometrics at the moment, with a promise to supply exciting theoretical developments and a large number of empirical applications for years to come.

Acknowledgements We thank Qingliang Fan for his help in collecting literature. We are also grateful to Alice Kuegler, Henrika Langen and the editors for their constructive comments, which led to noticeable improvements in the exposition. The usual disclaimer applies.

References

Abrevaya, J. (2006). Estimating the effect of smoking on birth outcomes using a matched panel data approach. Journal of Applied Econometrics, 21(4),

References 107 489–519.

Abrevaya, J., Hsu, Y.-C. & Lieli, R. P. (2015). Estimating conditional average treatment effects.Journal of Business & Economic Statistics,33(4), 485-505.

Alaa, A. M., Weisz, M. & van der Schaar, M. (2017). Deep counterfactual networks with propensity-dropout.arXiv preprint arXiv:1706.05966, https:/ / arxiv.org/

abs/ 1706.05966.

Athey, S. & Imbens, G. (2016). Recursive partitioning for heterogeneous causal effects.Proceedings of the National Academy of Sciences,113(27), 7353–7360.

Athey, S. & Imbens, G. W. (2019). Machine learning methods that economists should know about.Annual Review of Economics,11(1), 685-725.

Athey, S., Tibshirani, J. & Wager, S. (2019). Generalized random forests.Annals of Statistics,47(2), 1148–1178.

Athey, S. & Wager, S. (2021). Policy learning with observational data.Econometrica, 89(1), 133-161.

Belloni, A., Chen, D., Chernozhukov, V. & Hansen, C. (2012). Sparse models and methods for optimal instruments with an application to eminent domain.

Econometrica,80(6), 2369–2429.

Belloni, A., Chernozhukov, V., Fernández-Val, I. & Hansen, C. (2017). Program evaluation and causal inference with high-dimensional data.Econometrica, 85(1), 233–298.

Belloni, A., Chernozhukov, V. & Hansen, C. (2013). Inference for high-dimensional sparse econometric models. In M. A. D. Acemoglu & E. Dekel (Eds.),Advances in economics and econometrics. 10th world congress, vol. 3.(pp. 245–95).

Cambridge University Press.

Belloni, A., Chernozhukov, V. & Hansen, C. (2014a). High-dimensional methods and inference on structural and treatment effects.Journal of Economic Perspectives, 28(2), 29-50.

Belloni, A., Chernozhukov, V. & Hansen, C. (2014b). Inference on treatment effects after selection among high-dimensional controls. The Review of Economic Studies,81(2), 608–650.

Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C. & Newey, W.

(2017, May). Double/debiased/neyman machine learning of treatment effects.

American Economic Review,107(5), 261-65.

Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W.

& Robins, J. (2018). Double/debiased machine learning for treatment and structural parameters.The Econometrics Journal,21(1), C1-C68.

Colangelo, K. & Lee, Y.-Y. (2022). Double debiased machine learning nonparametric inference with continuous treatments.arXiv preprint arXiv:2004.03036, https:/ / arxiv.org/ abs/ 2004.03036.

Da Veiga, P. V. & Wilder, R. P. (2008). Maternal smoking during pregnancy and birthweight: a propensity score matching approach.Maternal and Child Health Journal,12(2), 194–203.

Donald, S. G., Hsu, Y.-C. & Lieli, R. P. (2014). Testing the unconfoundedness assumption via inverse probability weighted estimators of (l)att. Journal of Business & Economic Statistics,32(3), 395-415.

Electronic Online Supplement. (2022).Online Supplement of the book Econometrics with Machine Learning.https://sn.pub/0ObVSo.

Fan, Q., Hsu, Y.-C., Lieli, R. P. & Zhang, Y. (2020). Estimation of conditional average treatment effects with high-dimensional data.Journal of Business and Economic Statistics. (forthcoming)

Farbmacher, H., Huber, M., Laffers, L., Langen, H. & Spindler, M. (2022). Causal mediation analysis with double machine learning.The Econometrics Journal.

(forthcoming)

Hirano, K., Imbens, G. W. & Ridder, G. (2003). Efficient estimation of average treatment effects using the estimated propensity score.Econometrica,71(4), 1161–1189.

Hsu, Y.-C., Huber, M., Lee, Y.-Y. & Liu, C.-A. (2022). Testing monotonicity of mean potential outcomes in a continuous treatment with high-dimensional data.

arXiv preprint arXiv:2106.04237 ,https:/ / arxiv.org/ abs/ 2106.04237.

Huber, M. (2021). Causal analysis (Working Paper). Fribourg, Switzerland:

University of Fribourg.

Imbens, G. W. & Wooldridge, J. M. (2009). Recent developments in the econometrics of program evaluation.Journal of economic literature,47(1), 5–86.

Kitagawa, T. & Tetenov, A. (2018). Who should be treated? empirical welfare maximization methods for treatment choice.Econometrica,86(2), 591-616.

Knaus, M. (2021). Double machine learning based program evaluation under unconfoundedness.arXiv preprint arXiv:2003.03191, https:/ / arxiv.org/ abs/

2003.03191.

Knaus, M., Lechner, M. & Strittmatter, A. (2021). Machine learning estimation of het-erogeneous causal effects: Empirical monte carlo evidence.The Econometrics Journal,24(1), 134-161.

Kreif, N. & DiazOrdaz, K. (2019). Machine learning in policy evaluation: New tools for causal inference.arXiv preprint arXiv:1903.00402, https:/ / arxiv.org/ abs/

1903.00402.

Kueck, J., Luo, Y., Spindler, M. & Wang, Z. (2022). Estimation and inference of treatment effects with l2-boosting in high-dimensional settings. Journal of Econometrics. (forthcoming)

Mullainathan, S. & Spiess, J. (2017). Machine learning: An applied econometric approach.Journal of Economic Perspectives,31(2), 87-106.

Neyman, J. (1959). Optimal asymptotic tests of composite statistical hypotheses. In U. Grenander (Ed.),Probability and statistics(p. 416-44).

Pagan, A. & Ullah, A. (1999).Nonparametric econometrics. Cambridge University Press.

Reguly, A. (2021). Heterogeneous treatment effects in regression discontinuity designs.arXiv preprint arXiv:2106.11640, https:/ / arxiv.org/ abs/ 2106.11640.

Semenova, V. & Chernozhukov, V. (2020). Debiased machine learning of conditional average treatment effects and other causal functions.The Econometrics Journal, 24(2), 264-289.

Semenova, V. & Chernozhukov, V. (2021). Debiased machine learning of conditional average treatment effects and other causal functions.The Econometrics Journal,

References 109 24(2), 264–289.

Wager, S. & Athey, S. (2018). Estimation and inference of heterogeneous treatment effects using random forests.Journal of the American Statistical Association, 113(523), 1228–1242.

Walker, M. B., Tekin, E. & Wallace, S. (2009). Teen smoking and birth outcomes.

Southern Economic Journal,75(3), 892–907.

Yao, L., Li, S., Li, Y., Huai, M., Gao, J. & Zhang, A. (2018). Representation learning for treatment effect estimation from observational data. Proceedings of the 32nd International Conference on Neural Information Processing Systems, 2638-2648.

Zimmert, M. & Lechner, M. (2019). Nonparametric estimation of causal heterogeneity under high-dimensional confounding.arXiv preprint arXiv:1908.08779, https:/ / arxiv.org/ abs/ 1908.08779.

Chapter 4

Forecasting with Machine Learning Methods

Marcelo C. Medeiros

AbstractThis chapter surveys the use of supervised Machine Learning (ML) models to forecast time-series data. Our focus is on covariance stationary dependent data when a large set of predictors is available and the target variable is a scalar. We start by defining the forecasting scheme setup as well as different approaches to compare forecasts generated by different models/methods. More specifically, we review three important techniques to compare forecasts: the Diebold-Mariano (DM) and the Li-Liao-Quaedvlieg tests, and the Model Confidence Set (MCS) approach.

Second, we discuss several linear and nonlinear commonly used ML models. Among linear models, we focus on factor (principal component)-based regressions, ensemble methods (bagging and complete subset regression), and the combination of factor models and penalized regression. With respect to nonlinear models, we pay special attention to neural networks and autoenconders. Third, we discuss some hybrid models where linear and nonlinear alternatives are combined.

Marcelo C. MedeirosB

Pontifical Catholic University of Rio de Janeiro, Rio de Janeiro, Brazil, e-mail: mcm@econ.puc-rio.br

111

In document ECONOMETRICS with MACHINE LEARNING (Pldal 122-129)