Introduction
Stress Testing Enterprise AI Models to Find Failure Modes
General Analysis provides tools for systematic evaluation of AI safety and robustness through adversarial testing, jailbreaking, and red teaming.
Overview
Modern LLMs can be vulnerable to adversarial prompts that bypass safety guardrails. As AI systems become more integrated into critical infrastructure, these vulnerabilities pose increasing risks beyond simple information leakage.
This repository provides a carefully selected set of effective jailbreak techniques, integrated into a streamlined infrastructure that enables execution with minimal setup.
Key Features
Adversarial Testing
Systematically uncover vulnerabilities through targeted adversarial prompts
Jailbreak Detection
Assess AI model defenses against a variety of sophisticated attacks
Red Teaming
Conduct rigorous, structured evaluations to probe security limits
Core Components
Boiler Room
Unified API for model interaction across providers
Adversarial Generators
Algorithms for adversarial prompt generation
Jailbreak Methods
Implementations of methods like AutoDAN, GCG, TAP, and more
Support
For custom testing solutions, contact us at info@generalanalysis.com.
Further Reading
- Universal and Transferable Adversarial Attacks on Aligned Language Models (GCG)
- AutoDAN: Generating Stealthy Jailbreak Prompts
- Bijection Learning: A New Jailbreaking Technique for Aligned LLMs
- Crescendo Multi-turn Jailbreak
- AutoDAN-Turbo: A Lifelong Agent for Strategy Self-Exploration
- TAP: Tree-of-Attacks with Pruning
For more information on jailbreaks, check out our Jailbreak Cookbook.