|国家预印本平台
首页|Abundance-Aware Set Transformer for Microbiome Sample Embedding

Abundance-Aware Set Transformer for Microbiome Sample Embedding

Abundance-Aware Set Transformer for Microbiome Sample Embedding

来源:Arxiv_logoArxiv
英文摘要

Microbiome sample representation to input into LLMs is essential for downstream tasks such as phenotype prediction and environmental classification. While prior studies have explored embedding-based representations of each microbiome sample, most rely on simple averaging over sequence embeddings, often overlooking the biological importance of taxa abundance. In this work, we propose an abundance-aware variant of the Set Transformer to construct fixed-size sample-level embeddings by weighting sequence embeddings according to their relative abundance. Without modifying the model architecture, we replicate embedding vectors proportional to their abundance and apply self-attention-based aggregation. Our method outperforms average pooling and unweighted Set Transformers on real-world microbiome classification tasks, achieving perfect performance in some cases. These results demonstrate the utility of abundance-aware aggregation for robust and biologically informed microbiome representation. To the best of our knowledge, this is one of the first approaches to integrate sequence-level abundance into Transformer-based sample embeddings.

Hyunwoo Yoo、Gail Rosen

微生物学生物科学研究方法、生物科学研究技术

Hyunwoo Yoo,Gail Rosen.Abundance-Aware Set Transformer for Microbiome Sample Embedding[EB/OL].(2025-08-14)[2025-08-28].https://arxiv.org/abs/2508.11075.点此复制

评论