Recent advances in large reasoning language models (LRLMs) rely on test-time
scaling, which extends long chain-of-thought (CoT) generation to solve complex
tasks. However, overthinking in long CoT not only slows down the efficiency of
problem solving, but also risks accuracy loss due to the extremely detailed or
redundant reasoning steps. We propose a simple yet effective method that allows
LLMs to self-truncate CoT sequences by early exit during generation. Instead of
relying on fixed heuristics, the proposed method monitors model behavior at
potential reasoning transition points (e.g.,”Wait” tokens) and dynamically
terminates the next reasoning chain’s generation when the model exhibits high
confidence in a trial answer. Our method requires no additional training and
can be seamlessly integrated into existing o1-like reasoning LLMs. Experiments
on multiple reasoning benchmarks MATH-500, AMC 2023, GPQA Diamond and AIME 2024
show that the proposed method is consistently effective on deepseek-series
reasoning LLMs, reducing the length of CoT sequences by an average of 31% to
43% while improving accuracy by 1.7% to 5.7%.
Este artículo explora los viajes en el tiempo y sus implicaciones.
Descargar PDF:
2504.15895v1