MetaGen Blended RAG: Unlocking Zero-Shot Precision for Specialized Domain Question-Answering
MetaGen Blended RAG: Unlocking Zero-Shot Precision for Specialized Domain Question-Answering
Retrieval-Augmented Generation (RAG) struggles with domain-specific enterprise datasets, often isolated behind firewalls and rich in complex, specialized terminology unseen by LLMs during pre-training. Semantic variability across domains like medicine, networking, or law hampers RAG's context precision, while fine-tuning solutions are costly, slow, and lack generalization as new data emerges. Achieving zero-shot precision with retrievers without fine-tuning still remains a key challenge. We introduce 'MetaGen Blended RAG', a novel enterprise search approach that enhances semantic retrievers through a metadata generation pipeline and hybrid query indexes using dense and sparse vectors. By leveraging key concepts, topics, and acronyms, our method creates metadata-enriched semantic indexes and boosted hybrid queries, delivering robust, scalable performance without fine-tuning. On the biomedical PubMedQA dataset, MetaGen Blended RAG achieves 82% retrieval accuracy and 77% RAG accuracy, surpassing all prior zero-shot RAG benchmarks and even rivaling fine-tuned models on that dataset, while also excelling on datasets like SQuAD and NQ. This approach redefines enterprise search using a new approach to building semantic retrievers with unmatched generalization across specialized domains.
Kunal Sawarkar、Shivam R. Solanki、Abhilasha Mangal
生物科学理论、生物科学方法计算技术、计算机技术
Kunal Sawarkar,Shivam R. Solanki,Abhilasha Mangal.MetaGen Blended RAG: Unlocking Zero-Shot Precision for Specialized Domain Question-Answering[EB/OL].(2025-05-23)[2025-06-16].https://arxiv.org/abs/2505.18247.点此复制
评论