Poisson Hierarchical Indian Buffet Processes-With Indications for Microbiome Species Sampling Models
Poisson Hierarchical Indian Buffet Processes-With Indications for Microbiome Species Sampling Models
We introduce the Poisson Hierarchical Indian Buffet Process (PHIBP), a new class of species sampling models designed to address the challenges of complex, sparse count data by facilitating information sharing across and within groups. Our theoretical developments enable a tractable Bayesian nonparametric framework with machine learning elements, accommodating a potentially infinite number of species (taxa) whose parameters are learned from data. Focusing on microbiome analysis, we address key gaps by providing a flexible multivariate count model that accounts for overdispersion and robustly handles diverse data types (OTUs, ASVs). We introduce novel parameters reflecting species abundance and diversity. The model borrows strength across groups while explicitly distinguishing between technical and biological zeros to interpret sparse co-occurrence patterns. This results in a framework with tractable posterior inference, exact generative sampling, and a principled solution to the unseen species problem. We describe extensions where domain experts can incorporate knowledge through covariates and structured priors, with potential for strain-level analysis. While motivated by ecology, our work provides a broadly applicable methodology for hierarchical count modeling in genetics, commerce, and text analysis, and has significant implications for the broader theory of species sampling models arising in probability and statistics.
Abhinav Pandey、Juho Lee、Lancelot F. James
微生物学生物科学理论、生物科学方法
Abhinav Pandey,Juho Lee,Lancelot F. James.Poisson Hierarchical Indian Buffet Processes-With Indications for Microbiome Species Sampling Models[EB/OL].(2025-08-25)[2025-09-05].https://arxiv.org/abs/2502.01919.点此复制
评论