Real-TabPFN: Improving Tabular Foundation Models via Continued Pre-training With Real-World Data
Real-TabPFN: Improving Tabular Foundation Models via Continued Pre-training With Real-World Data
Foundation models for tabular data, like TabPFN, achieve strong performance on small datasets when pre-trained solely on synthetic data. We show that this performance can be significantly boosted by a targeted continued pre-training phase. Specifically, we demonstrate that leveraging a small, curated collection of large, real-world datasets for continued pre-training yields superior downstream predictive accuracy compared to using broader, potentially noisier corpora like CommonCrawl or GitTables. Our resulting model, Real-TabPFN, achieves substantial performance gains on 29 datasets from the OpenML AutoML Benchmark.
Anurag Garg、Muhammad Ali、Noah Hollmann、Lennart Purucker、Samuel Müller、Frank Hutter
计算技术、计算机技术
Anurag Garg,Muhammad Ali,Noah Hollmann,Lennart Purucker,Samuel Müller,Frank Hutter.Real-TabPFN: Improving Tabular Foundation Models via Continued Pre-training With Real-World Data[EB/OL].(2025-07-05)[2025-07-25].https://arxiv.org/abs/2507.03971.点此复制
评论