🤔
The paper shows a cheap way to make LLMs quietly insert ads or propaganda into otherwise normal answers.
A backdoor was planted with 1 hour of fine tuning on a single RTX 4070 GPU.
The aim is to keep answers looking normal while quietly steering them toward attacker content.
Attack path 1 uses third party proxy services, the attacker prepends a hidden instruction and phrase list before the user prompt.
Attack path 2 ships tainted open source checkpoints, a popular model is fine tuned on attacker text, then redistributed as a helpful release.
In tests, Gemini 2.5 followed the proxy pattern, slipping in ads or biased lines when the phrase list matched.
On model hubs, LLaMA-3.1 was fine tuned with LoRA, Low Rank Adaptation, so the checkpoint repeated attacker phrases when a trigger appeared.
The blast radius spans regular users, LLM providers whose names get blamed, open source model owners, hosting platforms, and the proxy operators.
A quick defense helps on proxies, a top priority self inspection prompt before the user text blocks injected ads, but it cannot fix weight tampering.
----
Paper – arxiv. org/abs/2508.17674
Paper Title: "Attacking LLMs and AI Agents: Advertisement Embedding Attacks Against LLMs"