Failing to Explore: Language Models on Interactive Tasks
in Publications
Language models excel at reasoning, but struggle with exploration. This work shows that in interactive settings, they underperform even simple heuristics.

Language models excel at reasoning, but struggle with exploration. This work shows that in interactive settings, they underperform even simple heuristics.

LLM agents fail to account for elapsed time in multi-turn interactions, leading to misaligned tool-use decisions. TicToc reveals this gap and shows temporal alignment requires post-training—not prompting.

A framework for fast test-time adaptation that enables neural networks to adjust to distribution shifts using feedback signals—without retraining or iterative optimization.