Projects

A Free Energy Perspective for SFT and RL

From a free-energy perspective, RL and SFT can be unified under the same formal structure; their essential difference lies in the training signals they use. Model behaviors can be divided into four regimes, each corresponding to a different type of landscape. Here, we characterize the evolution of these four landscape regimes -- from basin to tail, and further to barrier and singularity.

A Minimal Model of Representation Collapse

We build a minimal dynamical model directly in representation space, abstracting away the details of network architecture and parameters. We use the concept of frustration from statistical physics to describe the core mechanism behind representation collapse, and analyze how Stop-Gradient can break the symmetry and open up a non-collapsing subspace that preserves geometric separation between classes.

Spin Glass Model of In-Context Learning

We mapped in-context learning in a linear attention model to a spin glass with real-valued spins, and solved the ground state, energy landscape and phase behavior to show how task diversity drives a unique solution that enables in-context prediction in pre-trained transformers.