|国家预印本平台
首页|Restoring the Forecasting Power of Google Trends with Statistical Preprocessing

Restoring the Forecasting Power of Google Trends with Statistical Preprocessing

Restoring the Forecasting Power of Google Trends with Statistical Preprocessing

来源:Arxiv_logoArxiv
英文摘要

Google Trends reports how frequently specific queries are searched on Google over time. It is widely used in research and industry to gain early insights into public interest. However, its data generation mechanism introduces missing values, sampling variability, noise, and trends. These issues arise from privacy thresholds mapping low search volumes to zeros, daily sampling variations causing discrepancies across historical downloads, and algorithm updates altering volume magnitudes over time. Data quality has recently deteriorated, with more zeros and noise, even for previously stable queries. We propose a comprehensive statistical methodology to preprocess Google Trends search information using hierarchical clustering, smoothing splines, and detrending. We validate our approach by forecasting U.S. influenza hospitalizations with a univariate ARIMAX model. Compared to omitting exogenous variables, our results show that raw Google Trends data degrades modeling performance, while preprocessed signals enhance forecast accuracy by 58% nationally and 24% at the state level.

Candice Djorno、Mauricio Santillana、Shihao Yang

医学研究方法

Candice Djorno,Mauricio Santillana,Shihao Yang.Restoring the Forecasting Power of Google Trends with Statistical Preprocessing[EB/OL].(2025-04-09)[2025-07-02].https://arxiv.org/abs/2504.07032.点此复制

评论