What Is LLM Poisoning? Interesting Breakthrough
Researchers at Anthropic teamed up with the UK AI Security Institute and the Alan Turing Institute to show that a mere 250 poisoned documents can plant a “backdoor” in any large language model—no matter its size or how much data it’s trained on. They proved that both a 600 M-parameter model and a 13 B-parameter model were equally vulnerable when hit with the same small batch of malicious examples.
This finding rings alarm bells for AI safety: it takes surprisingly few bad apples to compromise even the biggest LLMs, highlighting an urgent need for better data vetting and robust defense strategies.
Watch on YouTube
Top comments (0)