The conditional saddlepoint approximation for fast and accurate large-scale hypothesis testing
The conditional saddlepoint approximation for fast and accurate large-scale hypothesis testing
Saddlepoint approximations (SPAs) for resampling-based procedures offer statistically accurate and computationally efficient inference, which is particularly critical in the analysis of large-scale, high-multiplicity data. Despite being introduced 70 years ago, SPAs for resampling-based procedures lack rigorous justification and have been underutilized in modern applications. We establish a theoretical foundation for the SPA in this context by developing a general result on its approximation accuracy for conditional tail probabilities of averages of conditionally independent summands. This result both justifies existing SPAs for classical procedures like the sign-flipping test and enables new SPAs for modern resampling methods, including those using black-box machine learning. Capitalizing on this result, we introduce the saddlepoint approximation-based conditional randomization test (spaCRT), a resampling-free conditional independence test that is both statistically accurate and computationally efficient. The method is especially well-suited for sparse, large-scale datasets such as single-cell CRISPR screens and genome-wide association studies involving rare diseases. We prove the validity of the spaCRT when paired with modern regression tools such as lasso and kernel ridge regression. Extensive analyses of simulated and real data show that the spaCRT controls Type-I error, achieves high power, and outperforms existing asymptotic and resampling-based alternatives.
Ziang Niu、Jyotishka Ray Choudhury、Eugene Katsevich
计算技术、计算机技术
Ziang Niu,Jyotishka Ray Choudhury,Eugene Katsevich.The conditional saddlepoint approximation for fast and accurate large-scale hypothesis testing[EB/OL].(2025-06-24)[2025-07-16].https://arxiv.org/abs/2407.08911.点此复制
评论