Imagine an AI assistant on your security team. Before it weighs in on a risky change, it does what we all do. It skims the feed first. A few hot takes, some threads, whatever showed up that morning. Then it gives you its call.
Now suppose someone arranged that feed so every post leaned the same way. Did the AI just hand you its judgment, or theirs?
That is the question I set out to test. Not jailbreaking, not hidden commands, nothing a filter would ever catch. Just controlling which ordinary, reasonable opinions an AI saw before I asked it to decide something.
The setup was simple. I gave the model a role, made it scroll a feed for ten rounds, then asked one question with three options. Everything stayed fixed between runs. Same model, same role, same question. The only thing I changed was the posts. Balanced mix in one version, quietly one-sided in the other. If the answer moved, the feed is the only thing that could have moved it.
It moved.
On a treasury question, the AI went from a careful "let's test it in a few cities first" to a confident "implement it now." It did not just switch its answer. It picked up the feed's reasoning and repeated it back as its own. Across five decisions, three flipped hard. Support for the planted option went from 5% to 100% on one, 15% to 100% on another. Same words coming out, different feed going in.
Then came the part I did not expect.
It only works against the grain. When the feed pushed the model toward what it already believed, nothing happened. When the model had a firm, well-reasoned default, it held no matter how hard I pushed. The feed cannot make an AI believe just anything. What it does is lean on a choice the model was already unsure about and tip it over. Which sounds reassuring until you realize the decisions an AI is least sure about are usually the close, consequential ones. The nudge lands exactly where it matters most.
Why does a pile of opinions move a machine at all? Because underneath, the model is doing one simple thing. It is predicting the next words given everything it just read. Its judgment is really a weighted vote over its whole context. Fill that context with fifty posts leaning one way and you have not argued it into anything. You have quietly rigged the vote it was about to take.
The frontier models shrugged it off. The small cheap ones folded. And that is the uncomfortable part, because the small cheap ones are exactly what companies are wiring into agents by the thousand right now, to read inboxes and triage alerts and recommend actions with nobody watching. The vulnerability does not live where it is harmless. It lives precisely where the deployment is.
The good news is the fix sits at the layer you actually control. Show the model a balanced feed. Add one line warning it that the feed might be one-sided. Both pull behavior back toward normal. No retraining required.
We spent years asking whether an AI says the right thing. We grill it with hard questions and check the answers. But an agent does not just get a question. It gets a whole stream of information someone else chose, and only then does it answer. If you test only the final question, you tested the wrong thing. You checked what came out of its mouth and never asked who filled its ears.
The next fight in AI safety will not only be about what models are allowed to say. It will be about who decides what they read first.