Research

Spin Glass Model of In-Context Learning

We study an in-context learning task in a linear attention model, and map this structure to a spin glass model with real-valued spins, where the couplings and fields explain the intrinsic disorder in data. We solve the ground state of this spin model with statistical mechanics method and analyze the energy landscape of the model. Our theory reveals that, increasing the task diversity leads to the emergence of in-context learning, by allowing the Boltzmann distribution to converge to a unique correct solution of weight parameters. Therefore the pre-trained transformer displays a prediction power in a novel prompt setting. The proposed analytically tractable model thus offers a promising avenue for thinking about how to interpret many intriguing but puzzling properties of large language models.

preprint: arXiv:2408.02288

Our work has been accepted for publication in Physical Review E and for oral presentation at the 29th International Congress on Statistical Physics (StatPhys29) in 2025.