What Is LLM Poisoning? Interesting Break Through
Researchers from Anthropic, the UK AI Security Institute and the Alan Turing Institute showed that slipping just 250 malicious documents into a model’s training data can stealth-install a backdoor—no matter how big the model is or how much data it normally sees. Even a lean 600M-parameter model and a hefty 13B-parameter sibling are equally vulnerable when poisoned with this tiny payload.
In other words, LLM poisoning is a surprisingly low-effort attack: you don’t need to drown a model in bad data, just sneak in a few dozen or a few hundred carefully crafted docs and you’ve got a hidden “kill switch” ready to exploit.
Watch on YouTube
Top comments (0)