LLM poisoning just got scarier: researchers at Anthropic, the UK AI Security Institute and the Alan Turing Institute discovered that injecting a mere 250 malicious documents can backdoor any large language model—whether it’s a lean 600 million-parameter system or a beefy 13 billion-parameter one with 20× more data.
This tiny trojan stash is enough to hijack model behavior, proving that neither size nor data volume can shield LLMs from poisoning attacks.
Watch on YouTube
Top comments (0)