A Free Energy Perspective for SFT and RL

From a formal perspective, both RL and SFT objectives can be viewed through free-energy minimization, whose optimal solution takes the form of a reweighted Boltzmann distribution. Therefore, to understand the capability boundary of post-training — whether a method creates genuinely new capabilities or merely elicits capabilities already present in the base model — we should not focus only on the algorithmic form of RL versus SFT. Instead, the central question is whether the training signal introduces additional information.

We categorize the behavior landscape of a model into four regimes: basin, tail, barrier, and singularity. The basin and tail regimes are directly reachable by the model; they differ mainly in probability mass rather than in reachability. These regimes correspond to amplifying, steering, or stabilizing capabilities that already exist within the base model. By contrast, crossing a barrier requires additional information — such as search, interaction, verification, tool use — and corresponds to the emergence of genuinely new capabilities. Finally, in singular regime, the target distribution contains behaviors that are absent from the base model’s support. In this case, the energy diverges, and the free-energy framework no longer provides a valid explanation.

Authors: Yuhao Li, Shengchao Liu
Preprint: arXiv:2605.08368

A Free Energy Perspective for SFT and RL

View Next

A Minimal Model of Representation Collapse