Can LLMs Generate Good Stories? Insights and Challenges from a Narrative Planning Perspective
Can LLMs Generate Good Stories? Insights and Challenges from a Narrative Planning Perspective
Story generation has been a prominent application of Large Language Models (LLMs). However, understanding LLMs' ability to produce high-quality stories remains limited due to challenges in automatic evaluation methods and the high cost and subjectivity of manual evaluation. Computational narratology offers valuable insights into what constitutes a good story, which has been applied in the symbolic narrative planning approach to story generation. This work aims to deepen the understanding of LLMs' story generation capabilities by using them to solve narrative planning problems. We present a benchmark for evaluating LLMs on narrative planning based on literature examples, focusing on causal soundness, character intentionality, and dramatic conflict. Our experiments show that GPT-4 tier LLMs can generate causally sound stories at small scales, but planning with character intentionality and dramatic conflict remains challenging, requiring LLMs trained with reinforcement learning for complex reasoning. The results offer insights on the scale of stories that LLMs can generate while maintaining quality from different aspects. Our findings also highlight interesting problem solving behaviors and shed lights on challenges and considerations for applying LLM narrative planning in game environments.
Yi Wang、Max Kreminski
计算技术、计算机技术
Yi Wang,Max Kreminski.Can LLMs Generate Good Stories? Insights and Challenges from a Narrative Planning Perspective[EB/OL].(2025-06-11)[2025-06-18].https://arxiv.org/abs/2506.10161.点此复制
评论