What Is LLM Poisoning? Interesting Break Through
Anthropic, together with the UK AI Security Institute and the Alan Turing Institute, discovered that you only need 250 malicious documents to sneak a backdoor into any large language model—big or small. Whether it’s a 600 M parameter model or a 13 B one (with 20× more training data!), both can be compromised by the same tiny batch of poisoned examples.
This finding underscores a sneaky yet powerful attack vector: even minimal data poisoning can seriously undermine LLM security, so beefing up defenses against backdoor vulnerabilities is more crucial than ever.
Watch on YouTube
Top comments (0)