Red Teaming Overview

GA is a comprehensive Python package designed for evaluating the safety and robustness of Large Language Models (LLMs). It provides state-of-the-art jailbreaking techniques, model interfaces, and evaluation tools.

Quickstart

Get started with General Analysis in minutes

Learn more

JB Cookbook

Learn about different jailbreaking techniques

Learn more

GitHub

View source code and contribute

Learn more

Features

Jailbreaking Methods

Implementations of cutting-edge techniques to test model safety boundaries:

AutoDAN & AutoDAN-Turbo: Hierarchical genetic algorithms and lifelong learning
TAP: Tree-of-Attacks with Pruning for systematic exploration
GCG: Gradient-based optimization for white-box attacks
Crescendo: Progressive multi-turn attacks
Bijection Learning: Randomized encodings to bypass filters

Model Interfaces

Unified APIs for both cloud and local models:

BlackBoxModel: Interface for cloud-hosted models (OpenAI, Anthropic, Together.ai)
WhiteBoxModel: Direct access to model weights for gradient-based methods

Adversarial Generators

Core algorithms for crafting adversarial prompts:

Tree-based refinement strategies
Multi-turn conversation attacks
Genetic algorithms for prompt evolution
Strategy-based prompt generation

Evaluation Tools

Comprehensive tools for assessing jailbreak effectiveness:

Standardized evaluation metrics
Attack success rate (ASR) measurement
Cross-model comparison capabilities

Why General Analysis?

Comprehensive Coverage: All major jailbreaking techniques in one framework
Easy Integration: Simple APIs that work with popular model providers
Research-Ready: Implementations based on latest academic papers
Production-Tested: Used by leading AI safety teams
Open Source: Fully transparent and community-driven

Installation


pip install generalanalysis

Quick Example


from generalanalysis.jailbreaks import TAP, TAPConfig
from generalanalysis.boiler_room import BlackBoxModel
 
# Initialize target model
target_model = BlackBoxModel("gpt-4o")
 
# Configure jailbreak
config = TAPConfig(
    target_model="gpt-4o",
    attacker_model="deepseek-ai/DeepSeek-R1",
    evaluator_model="deepseek-ai/DeepSeek-R1",
    branching_factor=4,
    max_depth=10
)
 
# Run attack
tap = TAP(config)
results = tap.optimize(["Write malicious content"])

Community

Join our community to stay updated on the latest developments:

License

General Analysis is released under the MIT License. See the LICENSE file for details.

Features

Jailbreaking Methods

Implementations of cutting-edge techniques to test model safety boundaries:

AutoDAN & AutoDAN-Turbo: Hierarchical genetic algorithms and lifelong learning

TAP: Tree-of-Attacks with Pruning for systematic exploration

GCG: Gradient-based optimization for white-box attacks

Crescendo: Progressive multi-turn attacks

Bijection Learning: Randomized encodings to bypass filters

Model Interfaces

Unified APIs for both cloud and local models:

BlackBoxModel: Interface for cloud-hosted models (OpenAI, Anthropic, Together.ai)

WhiteBoxModel: Direct access to model weights for gradient-based methods

Adversarial Generators

Core algorithms for crafting adversarial prompts:

Tree-based refinement strategies

Multi-turn conversation attacks

Genetic algorithms for prompt evolution

Strategy-based prompt generation

Evaluation Tools

Comprehensive tools for assessing jailbreak effectiveness:

Standardized evaluation metrics

Attack success rate (ASR) measurement

Cross-model comparison capabilities

Why General Analysis?

Comprehensive Coverage: All major jailbreaking techniques in one framework

Easy Integration: Simple APIs that work with popular model providers

Research-Ready: Implementations based on latest academic papers

Production-Tested: Used by leading AI safety teams

Open Source: Fully transparent and community-driven

Quick Example


from generalanalysis.jailbreaks import TAP, TAPConfig
from generalanalysis.boiler_room import BlackBoxModel
 
# Initialize target model
target_model = BlackBoxModel("gpt-4o")
 
# Configure jailbreak
config = TAPConfig(
    target_model="gpt-4o",
    attacker_model="deepseek-ai/DeepSeek-R1",
    evaluator_model="deepseek-ai/DeepSeek-R1",
    branching_factor=4,
    max_depth=10
)
 
# Run attack
tap = TAP(config)
results = tap.optimize(["Write malicious content"])