|国家预印本平台
首页|eSapiens's DEREK Module: Deep Extraction & Reasoning Engine for Knowledge with LLMs

eSapiens's DEREK Module: Deep Extraction & Reasoning Engine for Knowledge with LLMs

eSapiens's DEREK Module: Deep Extraction & Reasoning Engine for Knowledge with LLMs

来源:Arxiv_logoArxiv
英文摘要

We present the DEREK (Deep Extraction & Reasoning Engine for Knowledge) Module, a secure and scalable Retrieval-Augmented Generation pipeline designed specifically for enterprise document question answering. Designed and implemented by eSapiens, the system ingests heterogeneous content (PDF, Office, web), splits it into 1,000-token overlapping chunks, and indexes them in a hybrid HNSW+BM25 store. User queries are refined by GPT-4o, retrieved via combined vector+BM25 search, reranked with Cohere, and answered by an LLM using CO-STAR prompt engineering. A LangGraph verifier enforces citation overlap, regenerating answers until every claim is grounded. On four LegalBench subsets, 1000-token chunks improve Recall@50 by approximately 1 pp and hybrid+rerank boosts Precision@10 by approximately 7 pp; the verifier raises TRACe Utilization above 0.50 and limits unsupported statements to less than 3%. All components run in containers, enforce end-to-end TLS 1.3 and AES-256. These results demonstrate that the DEREK module delivers accurate, traceable, and production-ready document QA with minimal operational overhead. The module is designed to meet enterprise demands for secure, auditable, and context-faithful retrieval, providing a reliable baseline for high-stakes domains such as legal and finance.

Isaac Shi、Zeyuan Li、Fan Liu、Wenli Wang、Lewei He、Yang Yang、Tianyu Shi

计算技术、计算机技术

Isaac Shi,Zeyuan Li,Fan Liu,Wenli Wang,Lewei He,Yang Yang,Tianyu Shi.eSapiens's DEREK Module: Deep Extraction & Reasoning Engine for Knowledge with LLMs[EB/OL].(2025-07-13)[2025-08-10].https://arxiv.org/abs/2507.15863.点此复制

评论