Abundance-Aware Set Transformer for Microbiome Sample Embedding
Abundance-Aware Set Transformer for Microbiome Sample Embedding
Microbiome sample representation to input into LLMs is essential for downstream tasks such as phenotype prediction and environmental classification. While prior studies have explored embedding-based representations of each microbiome sample, most rely on simple averaging over sequence embeddings, often overlooking the biological importance of taxa abundance. In this work, we propose an abundance-aware variant of the Set Transformer to construct fixed-size sample-level embeddings by weighting sequence embeddings according to their relative abundance. Without modifying the model architecture, we replicate embedding vectors proportional to their abundance and apply self-attention-based aggregation. Our method outperforms average pooling and unweighted Set Transformers on real-world microbiome classification tasks, achieving perfect performance in some cases. These results demonstrate the utility of abundance-aware aggregation for robust and biologically informed microbiome representation. To the best of our knowledge, this is one of the first approaches to integrate sequence-level abundance into Transformer-based sample embeddings.
Hyunwoo Yoo、Gail Rosen
微生物学生物科学研究方法、生物科学研究技术
Hyunwoo Yoo,Gail Rosen.Abundance-Aware Set Transformer for Microbiome Sample Embedding[EB/OL].(2025-08-14)[2025-08-28].https://arxiv.org/abs/2508.11075.点此复制
评论