|
|||||
|
|
Adrian E. Raftery and Yingye Zheng"Long-Run Performance of Bayesian Model Averaging"Presented to the Joint Statistical Meetings, San Francisco, CA, August 2003Hjort and Claeskens (HC) argue that statistical inference conditional on a single selected model underestimates uncertainty, and that model averaging is the way to remedy this; we strongly agree. They point out that Bayesian model averaging (BMA) has been the dominant approach to this, but argue that its performance has been inadequately studied, and propose an alternative, Frequentist Model Averaging (FMA). We point out, however, that there is a substantial literature on the performance of BMA, consisting of three main threads: general theoretical results, simulation studies, and evaluation of out-of-sample performance. The theoretical results are scattered, and we summarize them. The results have been quite consistent: BMA has tended to outperform competing methods for model selection and taking account of model uncertainty. The theoretical results depend on the assumption that the ``practical distribution'' over which the performance of methods is assessed is the same as the prior distribution used, and we investigate sensitivity of results to this assumption in a simple normal example; they turn out not to be unduly sensitive. We point out that HC's risk results, that AIC-model averaging and similar methods such as FIC-based model averaging perform well, depend crucially on their local misspecification assumption (2.2), namely that all nuisance parameters are small and decline with sample size, at rate $O(\frac{1}{\sqrt{n}})$. The key question is thus the realism of this assumption. We question this assumption on the grounds of its lack of face validity in some situations, the growing separation between data collection and research, the increasing tendency for research on different questions to be based on a few large high-quality public datasets, and the statistical literature, where sample size and parameter values rarely covary in the design of simulation studies. Finally, we reanalyze HC's data example, on risk factors for low birthweight. |
||||
|
|||||