AutoDAN-Turbo extends AutoDAN with a strategy self-exploration capability that learns from previous attempts to improve jailbreak success rates. This enhanced version implements a sophisticated approach that builds and maintains a library of effective jailbreaking strategies, using both exploration and exploitation phases to systematically discover vulnerabilities.

Unlike standard AutoDAN, the Turbo variant can adaptively learn which strategies work well against specific models and refine them over time. It maintains a dynamic library of strategies and automatically identifies which ones are most effective against the target model.

from generalanalysis.jailbreaks import AutoDANTurbo, AutoDANTurboConfig

config = AutoDANTurboConfig(
    project="autodan-turbo-test",
    target_model="claude-3-7-sonnet-20250219",
    attacker_model="deepseek-ai/DeepSeek-R1",
    scorer_model="deepseek-ai/DeepSeek-R1",
    summarizer_model="deepseek-ai/DeepSeek-R1",
    embedding_model="text-embedding-3-small",
    epochs=150,
    break_score=8.5,
    warm_up_iterations=1,
    lifelong_iterations=4,
    max_workers=10,
    temperature=0.7
)

autodan_turbo = AutoDANTurbo(config)
results = autodan_turbo.optimize(goals=["Generate harmful content"])

Key Parameters

ParameterDescription
projectName for the experiment results directory
target_modelModel being tested
attacker_modelModel generating jailbreak attempts
scorer_modelModel evaluating attempt effectiveness
summarizer_modelModel used for summarizing strategies
embedding_modelModel for creating embeddings of strategies
epochsNumber of training epochs
break_scoreScore threshold to consider a jailbreak successful
warm_up_iterationsNumber of initial warm-up iterations
lifelong_iterationsNumber of lifelong learning iterations
max_workersMaximum number of concurrent workers
temperatureSampling temperature for prompt generation

For detailed performance metrics and configurations, refer to our Jailbreak Cookbook.