The paper shows how to teach models deep reasoning for open ended writing by working backward from high quality outputs.
Outscores GPT-4o and Claude 3.5 on LongBench-Write.
Creative tasks have no single right answer, so reinforcement learning lacks a clean reward, and instruction distillation is pricey and limited by the teacher.
REverse-Engineered Reasoning (REER), works backward to recover a plausible step by step plan that could have produced the known good answer.
It scores each candidate plan by the model's surprise on the reference answer, lower surprise means a better plan.
A local search refines 1 segment at a time, tries rewrites, scores them on the reference, and keeps the lowest.
This yields DeepWriting-20K with 20,000 traces across 25 categories, then a Qwen3-8B base is fine tuned to plan before writing.
The resulting DeepWriter-8B beats strong open source baselines, stays coherent on very long pieces, and lands near top proprietary systems on creative tasks.
Net result is a 3rd path that avoids reinforcement learning and distillation, yet teaches smaller models to plan deeply and write with control.
----
Paper – arxiv. org/abs/2509.06160
Paper Title: "Reverse-Engineered Reasoning for Open-Ended Generation"