|国家预印本平台
首页|基于视觉语言模型的平面几何图形自动形式化

基于视觉语言模型的平面几何图形自动形式化

中文摘要英文摘要

视觉语言模型等大模型已展现出强大的世界知识理解能力,为数学问题自动求解研究提供了新的启发。在几何问题自动求解领域,几何图形中所蕴含的复杂多样的抽象几何关系给利用大模型进行求解带来了巨大挑战。为了提升几何问题求解的准确性,本文分析了现有的求解范式,并提出利用视觉语言模型来提高几何图形自动形式化的准确性。首先,通过利用代数交换律进行数据增强,本文基于数据集Geometry3K构建了一个多模态指令微调数据集GeometryDiagramFormalization86K(GDF86K)。该数据集包含超过86,000个(几何图形,形式化语言文字列表)数据对,以促进图形形式化模型的训练。基于GDF86K进行有监督微调,本文训练得到专注于几何图形形式化的视觉语言模型Geo-TinyLLaVA。在输入的几何图形带有完整的几何点命名标注的前提下,Geo-TinyLLaVA在几何图形形式化任务上的表现优于传统的Inter-GPS图形解析器,并可作为插件集成到Inter-GPS几何问题求解系统中,以提高其求解准确率。

Large models such as vision language models (VLMs) have demonstrated robust world knowledge comprehension, inspiring advancements in automated mathematical problem-solving. In the domain of geometry problem-solving, the intricate and diverse abstract relationships inherent in geometry diagrams present significant challenges for leveraging large models. To enhance the accuracy of geometry problem-solving, we analyze existing problem-solving paradigms and propose leveraging VLMs for enhanced diagram formalization accuracy. First, we construct a multimodal instruction-tuning dataset named GeometryDiagramFormalization86K (GDF86K) through data augmentation based on algebraic commutativity in the Geometry3K dataset. This dataset contains over 86,000 (diagram, literal list) pairs to facilitate training of diagram formalization models. Utilizing GDF86K, we conduct supervised fine-tuning to implement Geo-TinyLLaVA, a vision-language model specialized in geometry diagram formalization. When input diagrams with complete point annotations, Geo-TinyLLaVA outperforms the conventional Inter-GPS diagram parser in formalization performance and can serve as a plugin to enhance the problem-solving accuracy of the geometry problem-solving system Inter-GPS.

崔晓腾、刘一

北京交通大学计算机科学与技术学院,北京市 100044北京交通大学计算机科学与技术学院,北京市 100044

数学计算技术、计算机技术

视觉语言模型形式化平面几何数据增强有监督微调

Vision-Language ModelFormalizationPlane GeometryData AugmentationSupervised Fine-tuning

崔晓腾,刘一.基于视觉语言模型的平面几何图形自动形式化[EB/OL].(2025-04-18)[2025-05-06].http://www.paper.edu.cn/releasepaper/content/202504-167.点此复制

评论