[CL] The Geometry of Categorical and Hierarchical Concepts in Large Language Models
K Park, Y J Choe, Y Jiang, V Veitch [University of Chicago] (2024)
arxiv.org/abs/2406.01506
- The paper studies how categorical concepts like {mammal, bird, reptile, fish} and hierarchical concepts like animal vs mammal are represented in large language models.
- It extends the linear representation hypothesis, which says high-level concepts are linearly encoded, from binary concepts to categorical and hierarchical concepts.
- It shows categorical concepts are represented as simplices, with vertices corresponding to binary features like mammal, bird, etc.
- It shows hierarchical concepts are encoded through orthogonality. For example, the representations of animal and mammal are orthogonal.
- It validates these theoretical results empirically on the Gemma large language model, estimating representations for 957 hierarchically related concepts from WordNet.