Department of Mathematics

Seminar / Workshop

Image
esterno dell'edificio Povo 0, vista del prato circostante
Emergence of meta-stable clustering in mean-field transformer models
Seminario periodico del Dipartimento di Matematica
, TIME 14:00
PovoZero ,Via Sommarive 14, Povo (Trento)
Aula Seminari 1– Povo0 e online Zoom (contattare dept.math@unitn.it per le credenziali)
Free, Online
Target audience: UniTrento students, University community
Referent: Prof. Gian Paolo Leonardi e prof.ssa Sonia Mazzucchi
Contatti:
Staff of the Department of Mathematics
Image
esterno dell'edificio Povo 0, vista del prato circostante

This content is only available in Italian.

Speaker: Andrea Agazzi (Università di Pisa)

Abstract

Transformers are central to state-of-the-art performance in large language models such as ChatGPT. In this talk, we model the evolution of tokens within deep Transformer architectures as a continuous-time, mean-field interacting particle system on the unit sphere, based on the framework from [GLPR23]. We analyze the associated mean-field PDE, interpreted as a Wasserstein gradient flow, to investigate the emergence of meta-stable phases and clustering phenomena, relevant for next-token prediction. Using perturbative analysis around uniform initialization, we show that for a large number of tokens, the system remains near a structured meta-stable manifold. This structure, characterized by rescaled Gegenbauer polynomials, is explicitly linked to the inverse temperature parameter, providing insights into mean-field dynamics and emergent behaviors in Transformer models.
This is joint work with Giuseppe Bruno and Federico Pasqualotto. [GLPR23] Geshkovski, Letrouit, Polyanskiy, and Rigollet. A mathematical perspective on transformers. arXiv:2312.10794, 2023

Email: 

dept.math@unitn.it