|国家预印本平台
首页|Zobrist Hash-based Duplicate Detection in Symbolic Regression

Zobrist Hash-based Duplicate Detection in Symbolic Regression

Zobrist Hash-based Duplicate Detection in Symbolic Regression

来源:Arxiv_logoArxiv
英文摘要

Symbolic regression encompasses a family of search algorithms that aim to discover the best fitting function for a set of data without requiring an a priori specification of the model structure. The most successful and commonly used technique for symbolic regression is Genetic Programming (GP), an evolutionary search method that evolves a population of mathematical expressions through the mechanism of natural selection. In this work we analyze the efficiency of the evolutionary search in GP and show that many points in the search space are re-visited and re-evaluated multiple times by the algorithm, leading to wasted computational effort. We address this issue by introducing a caching mechanism based on the Zobrist hash, a type of hashing frequently used in abstract board games for the efficient construction and subsequent update of transposition tables. We implement our caching approach using the open-source framework Operon and demonstrate its performance on a selection of real-world regression problems, where we observe up to 34\% speedups without any detrimental effects on search quality. The hashing approach represents a straightforward way to improve runtime performance while also offering some interesting possibilities for adjusting search strategy based on cached information.

Bogdan Burlacu

计算技术、计算机技术

Bogdan Burlacu.Zobrist Hash-based Duplicate Detection in Symbolic Regression[EB/OL].(2025-08-19)[2025-09-03].https://arxiv.org/abs/2508.13859.点此复制

评论