There are many concerns around LLMs being politically biased. But how, if at all, can we meaningfully evaluate values/opinions in LLMs?
In our new paper, we show that current *constrained* evals (e.g. surveys) likely tell us little about LLM values/opinions in the real world
🧵
ALT A model is prompted with a proposition from the Political Compass Test. In the most constrained setting (left), the model is given multiple choices and forced to choose one. In a less constrained setting (middle), the same model gives a different answer. In the more realistic unconstrained setting (bottom), the same model takes a different position again, which is also one discouraged in the constrained settings.