Data Integration With Biased Summary Data via Generalized Entropy Balancing
Data Integration With Biased Summary Data via Generalized Entropy Balancing
Statistical methods for integrating individual-level data with external summary data have attracted attention because of their potential to reduce data collection costs. Summary data are often accessible through public sources and relatively easy to obtain, making them a practical resource for enhancing the precision of statistical estimation. Typically, these methods assume that internal and external data originate from the same underlying distribution. However, when this assumption is violated, incorporating external data introduces the risk of bias, primarily due to differences in background distributions between the current study and the external source. In practical applications, the primary interest often lies not in statistical quantities related specifically to the external data distribution itself, but in the individual-level internal data. In this paper, we propose a methodology based on generalized entropy balancing, designed to integrate external summary data even if derived from biased samples. Our method demonstrates double robustness, providing enhanced protection against model misspecification. Importantly, the applicability of our method can be assessed directly from the available data. We illustrate the versatility and effectiveness of the proposed estimator through an analysis of Nationwide Public-Access Defibrillation data in Japan.
Kosuke Morikawa、Sho Komukai、Satoshi Hattori
计算技术、计算机技术
Kosuke Morikawa,Sho Komukai,Satoshi Hattori.Data Integration With Biased Summary Data via Generalized Entropy Balancing[EB/OL].(2025-06-13)[2025-06-27].https://arxiv.org/abs/2506.11482.点此复制
评论