• Nem Talált Eredményt

A Final Word on the Importance of Study Design in Mitigating Bias

In document ECONOMETRICS with MACHINE LEARNING (Pldal 185-194)

Causal Estimation of Treatment Effects From Observational Health Care Data Using Machine

5.12 A Final Word on the Importance of Study Design in Mitigating Bias

Most studies comparing average treatment effects (ATEs) from observational studies with randomized controlled trials (RCTs) for the same disease states have found a high degree of agreement (Anglemyer, Horvath & Bero, 2014; Concato, Shah &

Horwitz, 2000; Benson & Hartz, 2000). However, other studies have documented considerable disagreement in such results introduced by the heterogeneity of datasets and other factors (Madigan et al., 2013). In some cases, apparent disagreements have been shown to be due to avoidable errors in observational study design which, upon correction, found similar results from the observational studies and RCTs (Dickerman, García-Albéniz, Logan, Denaxas & Hernán, 2019; Hernán et al., 2008).

For any question involving causal inference, it is theoretically possible to design a randomized trial to answer that question. This is known as designing the target trial (Hernán, 2021). When a study is designed to emulate a target trial using observational data some features of the target trial design may be impossible to emulate. Emulating treatment assignment requires data on all features associated with the implementation of the treatment intervention. This is the basis for the extensive use of propensity score matching and inverse probability weighting in the health outcomes literature.

It has been estimated that a relatively small percentage of clinical trials can be emulated using observational data (Bartlett, Dhruva, Shah, Ryan & Ross, 2019).

However, observational studies can still be designed with a theoretical target trial

References 169 in mind–specifying a randomized trial to answer the question of interest and then examining where the available data may limit the ability to emulate this trial (Berger

& Crown, 2021). Aside from lack of comparability in defining treatment groups, there are a number of other problems frequently encountered in the design of observational health outcomes studies including immortal time bias, adjustment for intermediate variables, and reverse causation. The target trial approach is one method for avoiding such issues.

Numerous observational studies have designed target trials designed to emulate existing RCTs in order to compare the results from RCTs to those of the emulations using observational data (Seeger et al., 2015; Hernán et al., 2008; Franklin et al., 2020; Dickerman et al., 2019). In general, such studies demonstrate higher levels of agreement than comparisons of ATE estimates from observational and RCTS within a disease area that do not attempt to emulate study design characteristics such as inclusion/exclusion criteria, follow-up periods, etc. For example, a paper comparing RCT emulation results for 10 cardiovascular trials found that the hazard ratio estimate from the observational emulations was within the 95% CI from the corresponding RCT in 8 of 10 studies. In 9 of 10, the results had the same sign and statistical significance.

To date, all of the trial emulations have used traditional propensity score or IPTW approaches. None have used doubly-robust methods such as TMLE implemented with Super Learner machine learning methods. In addition to simulation studies, it would be useful to examine the ability of methods like TMLE to estimate similar treatment effects as randomized trials—particularly in cases where traditional methods have failed to do so.

References

Abadie, A. & Cattaneo, D., Matias. (2018). Econometric methods for program evaluation.Annual Review of Economics,10, 465–503.

Anglemyer, A., Horvath, H. & Bero, L. (2014). Healthcare outcomes assessed with observational study designs compared with those assessed in randomized trials.

The Cochrane Database of Systematic Reviews,4.

Athey, S. & Imbens, G. (2016). Recursive partitioning for heterogeneous causal effects.Proceedings of the National Academy of Sciences,113, 7353–7360.

Athey, S. & Imbens, G. (2019). Machine learning methods that economists should know about.Annual Review of Economics,11, 685–725.

Athey, S., Imbens, G. & Wager, S. (2018). Approximate residual balancing: Debiased inference of average treatment effects in high dimensions.Journal of the Royal Statistical Society, Series B (Methodological),80, 597–623.

Athey, S., Tibshirani, J. & Wager, S. (2019). Generalized random forests.Annals of Statistics,47, 399–424.

Bang, H. & Robins, J. (2005). Doubly robust estimation in missing data and causal inference models.Biometrics,61, 692–972.

Bartlett, V., Dhruva, S., Shah, N., Ryan, P. & Ross, J. (2019). Feasibility of using real-world data to replicate clinical trial evidence. JAMA Network Open,2, e1912869.

Baser, O. (2006). Too much ado about propensity score models? comparing methods of propensity score matching.Value in Health: The Journal of the International Society for Pharmacoeconomics and Outcomes Research,9, 377–385.

Basu, A. (2011). Economics of individualization in comparative effectiveness research and a basis for a patient-centered health care. Journal of Health Economics,30, 549-59.

Basu, A., Navarro, S. & Urzua, S. (2007). Use of instrumental variables in the presence of heterogeneity and self-selection: An application to treatments of breast cancer patients.Health Economics,16, 1133–1157.

Basu, A., Polsky, D. & Manning, W. (2011). Estimating treatment effects on healthcare costs under exogeneity: Is there a ’magic bullet’?Health Services &

Outcomes Research Methodology,11, 1-26.

Belloni, A., Chen, D., Chernozhukov, V. & Hansen, C. (2012). Sparse models and methods for optimal instruments with an application to eminent domain.SSRN Electronic Journal,80, 2369–2429.

Belloni, A., Chernozhukov, V., Fernndez-Val, I. & Hansen, C. (2017). Program evaluation and causal inference with high-dimensional data.Econometrica,85, 233–298.

Belloni, A., Chernozhukov, V. & Hansen, C. (2013). Inference for high-dimensional sparse econometric models.Advances in Economics and Econometrics: Tenth World Congress Volume 3, Econometrics, 245–295.

Belloni, A., Chernozhukov, V. & Hansen, C. (2014a). High-dimensional methods and inference on structural and treatment effects. The Journal of Economic Perspectives,28, 29–50.

Belloni, A., Chernozhukov, V. & Hansen, C. (2014b). Inference on treatment effects after selection among high-dimensional controls. The Review of Economic Studies,81, 29–50.

Benson, K. & Hartz, A. (2000). A comparison of observational studies and randomized, controlled trials. The New England Journal of Medicine,342, 1878–1886.

Berger, M. & Crown, W. (2021). How can we make more rapid progress in the leveraging of real-world evidence by regulatory decision makers? Value in Health,25, 167–170.

Bound, J., Jaeger, D. & Baker, R. (1995). Problems with instrumental variables estimation when the correlation between the instruments and the endogenous explanatory variable is weak.Journal of the American Statistical Association, 90, 443–450.

Brookhart, M., Rassen, J. & Schneeweiss, S. (2010). Instrumental variable methods for comparative effectiveness research. Pharmacoepidemiology and Drug safety,19, 537-554.

Brookhart, M., Schneeweiss, S., Rothman, K., Glynn, R., Avorn, J. & Sturmer, T.

(2006). Variable selection for propensity score models.American Journal of

References 171 Epidemiology,163, 1149-1156.

Cameron, A. & Trivedi, P. (2013). Regression analysis of count data(2nd ed.).

Cambridge University Press.

Carpenter, J., Kenward, M. & Vansteelandt, S. (2006). A comparison of multiple imputation and doubly robust estimation for analyses with missing data.Journal of the Royal Statistical Society Series A,169, 571–584.

Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C. & Newey, W. (2017). Double/debiased/neyman machine learning of treatment effects.

American Economic Review,107, 261–265.

Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W.

& Robins, J. (2018). Double/debiased machine learning for treatment and structural parameters.The Econometrics Journal,21, 1–C68.

Cole, S. & Frangakis, C. (2009). The consistency statement in causal inference: a definition or an assumption?Epidemiology,20, 3–5.

Cole, S. & Hernán, M. (2008, 10). Constructing inverse probability weights for marginal structural models.American Journal of Epidemiology,168, 656–64.

Concato, J., Shah, N. & Horwitz, R. (2000). Randomized, controlled trials, observa-tional studies, and the hierarchy of research designs.New England Journal of Medicine,342, 1887-1892.

Crown, W. (2015). Potential application of machine learning in health outcomes research and some statistical cautions.Value in Health,18, 137–140.

Crown, W., Henk, H. & Vanness, D. (2011). Some cautions on the use of instrumental variables estimators in outcomes research: How bias in instrumental variables estimators is affected by instrument strength, instrument contamination, and sample size.Value in Health,14, 1078–1084.

Crump, R., Hotz, V., Imbens, G. & Mitnik, O. (2009). Dealing with limited overlap in estimation of average treatment effects.Biometrika,96, 187–199.

Dahabreh, I., Robertson, S., Tchetgen, E. & Stuart, E. (2019). Generalizing causal inferences from randomized trials: Counterfactual and graphical identification.

Biometrics,75, 685–694.

D’Amour, A., Ding, P., Feller, A., Lei, L. & Sekhon, J. (2021). Overlap in observational studies with high-dimensional covariates.Journal of Econometrics,221, 644–

654.

Dickerman, B., García-Albéniz, X., Logan, R., Denaxas, S. & Hernán, M. (2019, 10).

Avoidable flaws in observational analyses: an application to statins and cancer.

Nature Medicine,25, 1601–1606.

Evans, H. & Basu, A. (2011). Exploring comparative effect heterogeneity with instrumental variables: prehospital intubation and mortality(Health, Econo-metrics and Data Group (HEDG) Working Papers). HEDG, c/o Department of Economics, University of York.

Franklin, J., Patorno, E., Desai, R., Glynn, R., Martin, D., Quinto, K., . . . Schneeweiss, S. (2020). Emulating randomized clinical trials with nonrandomized real-world evidence studies: First results from the RCT DUPLICATE initiative.

Circulation,143, 1002–1013.

Funk, J., Westreich, D., Wiesen, C., Stürmer, T., Brookhart, M. & Davidian, M.

(2011, 03). Doubly robust estimation of causal effects.American Journal of Epidemiology,173, 761–767.

Futoma, J., Morris, M. & Lucas, J. (2015). A comparison of models for predicting early hospital readmissions.Journal of Biomedical Informatics,56, 229–238.

Greenland, S. & Robins, J. (1986). Identifiability, exchangeability, and epidemiolo-gical confounding.International Journal of Epidemiology,15, 413–419.

Hahn, J. & Hausman, J. (2002). A new specification test for the validity of instrumental variables.Econometrica,70, 163–189.

Hastie, T., Tibshirani, R. & Friedman, J. (2009).The elements of statistical learning:

Data mining, inference and prediction(2nd ed.). Springer Verlag, New York.

Hausman, J. (1978). Specification tests in econometrics.Econometrica,46, 1251–

1271.

Hausman, J. (1983). Specification and estimation of simultaneous equation models.

InHandbook of econometrics(pp. 391–448). Elsevier.

Heckman, J. & Navarro, S. (2003). Using matching, instrumental variables and control functions to estimate economic choice models.Review of Economics and Statistics,86.

Hernán, M. (2011). Beyond exchangeability: The other conditions for causal inference in medical research.Statistical Methods in Medical Research,21, 3–5.

Hernán, M. (2021). Methods of public health research–strengthening causal inference from observational data.The New England Journal of Medicine,385, 1345–

1348.

Hernán, M., Alonso, A., Logan, R., Grodstein, F., Michels, K., Willett, W., . . . Robins, J. (2008). Observational studies analyzed like randomized experiments an application to postmenopausal hormone therapy and coronary heart disease.

Epidemiology,19, 766–779.

Hirano, K., Imbens, G. & Ridder, G. (2003). Efficient estimation of average treatment effects using the estimated propensity score.Econometrica,71, 1161–1189.

Hong, W., Haimovich, A. & Taylor, R. (2018). Predicting hospital admission at emergency department triage using machine learning. PLOS ONE,13, e0201016.

Imbens, G. (2020). Potential outcome and directed acyclic graph approaches to causality: Relevance for empirical practice in economics.Journal of Economic Literature,58, 1129–1179.

Imbens, G. & Wooldridge, J. (2009). Recent developments in the econometrics of program evaluation.Journal of Economic Literature,47, 5–86.

Joffe, M., Have, T., Feldman, H. & Kimmel, S. (2004). Model selection, confounder control, and marginal structural models: Review and new applications. The American Statistician,58, 272–279.

Johnson, M., Bush, R., Collins, T., Lin, P., Canter, D., Henderson, W., . . . Petersen, L. (2006). Propensity score analysis in observational studies: outcomes after abdominal aortic aneurysm repair. American Journal of Surgery,192, 336–343.

Johnson, M., Crown, W., Martin, B., Dormuth, C. & Siebert, U. (2009). Good research practices for comparative effectiveness research: analytic methods

References 173 to improve causal inference from nonrandomized studies of treatment effects using secondary data sources: the ispor good research practices for retrospective database analysis task force report–part iii.Value Health,12, 1062–1073.

Jones, A. & Rice, N. (2009). Econometric evaluation of health policies. InThe Oxford Handbook of Health Economics.

Kang, J. & Schafer, J. (2007). Demystifying double robustness: A comparison of alternative strategies for estimating a population mean from incomplete data.

Statistical Science,22, 523–539.

Kleibergen, F. & Zivot, E. (2003). Bayesian and classical approaches to instrumental variable regression. Journal of Econometrics, 29-72.

Knaus, M., Lechner, M. & Strittmatter, A. (2021). Machine learning estimation of het-erogeneous causal effects: Empirical monte carlo evidence.The Econometrics Journal,24.

Kreif, N., Tran, L., Grieve, R., Stavola, B., Tasker, R. & Petersen, M. (2017). Estimating the comparative effectiveness of feeding interventions in the pediatric intensive care unit: A demonstration of longitudinal targeted maximum likelihood estimation.American Journal of Epidemiology,186, 1370–1379.

Maddala, G. S. (1983).Limited-dependent and qualitative variables in econometrics.

Cambridge University Press.

Madigan, D., Ryan, P., Schuemie, M., Stang, P., Overhage, J. M., Hartzema, A., . . . Berlin, J. (2013). Evaluating the impact of database heterogeneity on observational study results.American Journal of Epidemiology,178, 645–651.

Mitra, N. & Indurkhya, A. (2005). A propensity score approach to estimating the cost-effectiveness of medical therapies from observational data. Health Economics,14, 805—815.

Mullainathan, S. & Spiess, J. (2017). Machine learning: An applied econometric approach.Journal of Economic Perspectives,31, 87–106.

Murray, M. (2007). Avoiding invalid instruments and coping with weak instruments.

Journal of Economic Perspectives,20, 111–132.

Naimi, A., Cole, S. & Kennedy, E. (2017). An introduction to g methods.International Journal of Epidemiology,46, 756–762.

Obermeyer, Z. & Emanuel, E. (2016). Predicting the future — big data, machine learning, and clinical medicine.The New England Journal of Medicine,375, 1216–1219.

Pang, M., Schuster, T., Filion, K., Schnitzer, M., Eberg, M. & Platt, R. (2016).

Effect estimation in point-exposure studies with binary outcomes and high-dimensional covariate data - a comparison of targeted maximum likelihood estimation and inverse probability of treatment weighting.The international Journal of Biostatistics,12.

Pearl, J. (2009). Causality(2nd ed.). Cambridge University Press.

Petersen, M., Porter, K., Gruber, S., Wang, Y. & Laan, M. (2012). Diagnosing and responding to violations in the positivity assumption.Statistical Methods in Medical Research,21, 31–54.

Rajkomar, A., Oren, E., Chen, K., Dai, A., Hajaj, N., Liu, P., . . . Dean, J. (2018).

Scalable and accurate deep learning for electronic health records.npj Digital

Medicine,18.

Ramsahai, R., Grieve, R. & Sekhon, J. (2011, 12). Extending iterative matching methods: An approach to improving covariate balance that allows prioritisation.

Health Services and Outcomes Research Methodology,11, 95–114.

Richardson, T. (2013, April).Single world intervention graphs (swigs): A unification of the counterfactual and graphical approaches to causality(Tech. Rep. No.

Working Paper Number 128). Center for Statistics and the Social Sciences.

University of Washington.

Robins, J. (1986). A new approach to causal inference in mortality studies with sustained exposure periods - application to control of the healthy worker survivor effect.Computers & Mathematics With Applications,14, 923–945.

Robins, J. & Hernan, M. (2009). Estimation of the causal effects of time varying exposures. InIn: Fitzmaurice g, davidian m, verbeke g, and molenberghs g (eds.) advances in longitudinal data analysis(pp. 553–599). Boca Raton, FL:

Chapman & Hall.

Robins, J., Rotnitzky, A. G. & Zhao, L. (1994). Estimation of regression coefficients when some regressors are not always observed. Journal of The American Statistical Association,89, 846–866.

Rosenbaum, P. & Rubin, D. (1983). The central role of the propensity score in observational studies for causal effects.Biometrika,70, 41–55.

Rubin, D. B. (1974). Estimating causal effects if treatment in randomized and nonrandomized studies.Journal of Educational Psychology,66, 688–701.

Rubin, D. B. (1980). Randomization analysis of experimental data: The fisher randomization test.Journal of the American Statistical Association,75(371), 575–582.

Rubin, D. B. (1986). Statistics and causal inference: Comment: Which ifs have causal answers.Journal of the American Statistical Association,81, 961–962.

Rubin, D. B. (2006).Matched sampling for causal effects. Cambridge University Press, Cambridge UK.

Scharfstein, D., Rotnitzky, A. G. & Robins, J. (1999). Adjusting for nonignorable drop-out using semiparametric nonresponse models. JASA. Journal of the American Statistical Association,94, 1096–1120. (Rejoinder, 1135–1146).

Schuler, M. & Rose, S. (2017). Targeted maximum likelihood estimation for causal inference in observational studies.American Journal of Epidemiology,185, 65–73.

Seeger, J., Bykov, K., Bartels, D., Huybrechts, K., Zint, K. & Schneeweiss, S. (2015, 10). Safety and effectiveness of dabigatran and warfarin in routine care of patients with atrial fibrillation.Thrombosis and Haemostasis,114, 1277–1289.

Sekhon, J. & Grieve, R. (2012). A matching method for improving covariate balance in cost-effectiveness analyses.Health Economics,21, 695–714.

Setoguchi, S., Schneeweiss, S., Brookhart, M., Glynn, R. & Cook, E. (2008).

Evaluating uses of data mining techniques in propensity score estimation: A simulation study.Pharmacoepidemiology and Drug Safety,17, 546–555.

Shi, C., Blei, D. & Veitch, V. (2019). Adapting neural networks for the estimation of treatment effects..

References 175 Shickel, B., Tighe, P., Bihorac, A. & Rashidi, P. (2018). Deep ehr: A survey of recent advances on deep learning techniques for electronic health record (ehr) analysis.Journal of Biomedical and Health Informatics.,22, 1589–1604.

Staiger, D. & Stock, J. H. (1997). Instrumental variables regression with weak instruments.Econometrica,65, 557–586.

Terza, J., Basu, A. & Rathouz, P. (2008). Two-stage residual inclusion estimation:

Addressing endogeneity in health econometric modeling. Journal of Health Economics,27, 531–543.

Tibshirani, R. (1996). Regression shrinkage and selection via the lasso.Journal of the Royal Statistical Society: Series B (Methodological),58, 267–288.

Ting, D., Cheung, C., Lim, G., Tan, G., Nguyen, D. Q., Gan, A., . . . Wong, T.-Y.

(2017). Development and validation of a deep learning system for diabetic retinopathy and related eye diseases using retinal images from multiethnic populations with diabetes.JAMA,318, 2211-2223.

Toth, B. & van der Laan, M. J. (2016, June). TMLE for marginal structural models based on an instrument(Tech. Rep. No. Working Paper 350). U.C. Berkeley Division of Biostatistics Working Paper Series.

van der Laan, M. & Rose, S. (2011). Targeted learning: Causal inference for observational and experimental data. Springer.

van der Laan, M. & Rose, S. (2018). Targeted learning in data science: Causal inference for complex longitudinal studies.

van der Laan, M. & Rubin, D. (2006). Targeted maximum likelihood learning.

International Journal of Biostatistics,2, 1043–1043.

Vytlacil, E. (2002). Independence, monotonicity, and latent index models: An equivalence result.Econometrica,70, 331–341.

Wager, S. & Athey, S. (2018). Estimation and inference of heterogeneous treatment effects using random forests.Journal of the American Statistical Association, 113(523), 1228-1242.

Westreich, D. & Cole, S. (2010, 02). Invited commentary: Positivity in practice.

American Journal of Epidemiology,171, 674–677; discussion 678–681.

Westreich, D., Lessler, J. & Jonsson Funk, M. (2010). Propensity score estim-ation: Neural networks, support vector machines, decision trees (cart), and meta-classifiers as alternatives to logistic regression. Journal of Clinical Epidemiology,63, 826–833.

Wooldridge, J. (2002). Econometric analysis of cross-section and panel data. MIT Press.

Chapter 6

Econometrics of Networks with Machine

In document ECONOMETRICS with MACHINE LEARNING (Pldal 185-194)