Red Teaming (AI)
A structured adversarial testing exercise where testers deliberately attempt to find failures, vulnerabilities, biases, or harmful outputs in an AI system. Unlike standard testing that checks if the system works, red teaming checks how the system breaks.
Why It Matters
Standard testing proves the happy path works. Red teaming reveals the failure modes that only emerge when someone actively tries to exploit the system — which is exactly what will happen in production.
Example
A red team testing a customer service chatbot tries prompt injection attacks, attempts to extract training data, pushes the bot to generate offensive content, and tests whether it can be manipulated into providing unauthorized discounts or refunds.
Think of it like...
Red teaming is like hiring a professional burglar to test your home security — they think like the adversary so you can fix the weaknesses before a real attacker finds them.
Related Terms
AI Audit
An independent evaluation of an AI system's compliance, performance, fairness, and governance practices. Audits can be internal (conducted by the organization's own team) or external (by independent third parties), and may be required by regulation for high-risk systems.
Adversarial Attack
An input deliberately crafted to fool an AI model into making incorrect predictions. Adversarial examples often look normal to humans but cause models to fail spectacularly.
TEVV (Test, Evaluation, Verification, Validation)
A comprehensive framework for assessing AI systems that goes beyond accuracy metrics to include bias testing, fairness evaluation, robustness assessment, safety verification, and security validation. TEVV is promoted by the NIST AI RMF as essential for responsible AI deployment.