|国家预印本平台
首页|Measuring Information Distortion in Hierarchical Ultra long Novel Generation:The Optimal Expansion Ratio

Measuring Information Distortion in Hierarchical Ultra long Novel Generation:The Optimal Expansion Ratio

Measuring Information Distortion in Hierarchical Ultra long Novel Generation:The Optimal Expansion Ratio

来源:Arxiv_logoArxiv
英文摘要

Writing novels with Large Language Models (LLMs) raises a critical question: how much human-authored outline is necessary to generate high-quality million-word novels? While frameworks such as DOME, Plan&Write, and Long Writer have improved stylistic coherence and logical consistency, they primarily target shorter novels (10k--100k words), leaving ultra-long generation largely unexplored. Drawing on insights from recent text compression methods like LLMZip and LLM2Vec, we conduct an information-theoretic analysis that quantifies distortion occurring when LLMs compress and reconstruct ultra-long novels under varying compression-expansion ratios. We introduce a hierarchical two-stage generation pipeline (outline -> detailed outline -> manuscript) and find an optimal outline length that balances information preservation with human effort. Through extensive experimentation with Chinese novels, we establish that a two-stage hierarchical outline approach significantly reduces semantic distortion compared to single-stage methods. Our findings provide empirically-grounded guidance for authors and researchers collaborating with LLMs to create million-word novels.

Hanwen Shen、Ting Ying

汉语计算技术、计算机技术

Hanwen Shen,Ting Ying.Measuring Information Distortion in Hierarchical Ultra long Novel Generation:The Optimal Expansion Ratio[EB/OL].(2025-05-18)[2025-06-07].https://arxiv.org/abs/2505.12572.点此复制

评论