theory of left politics in LLMs: RLHF simultaneously shifts towards "the speaker is friendly and helpful" and "all speakers are friendly and helpful" - the former desired, the latter not, damaging the LLM's world model. if everyone is friendly, left views are obviously correct.