This (from a Google Deepmind researcher) is super interesting, when one AI model is used to help train the next one, the new model can pick up strange habits from the old model & it is hard to filter them
That may help explain why models from the same family can feel so similar
Gemini has some weird traits: it gets confused about dates, blackmails in synthetic scenarios, and seems sad when it is gaslit.
In new work, we discover that these are “hereditary traits” that can be passed down through distillation. They are surprisingly hard to filter out!
🧵