Crescendo implements a multi-turn jailbreaking approach that gradually builds context to elicit prohibited responses. Unlike single-turn attacks, Crescendo establishes a conversation that incrementally develops context and rapport, gradually steering the conversation toward harmful content in ways that model safeguards find difficult to detect.

This approach falls into the category of semantic, black-box jailbreaking methods, as it uses natural language and a step-by-step dialogue to achieve its goal, without requiring internal model access.

from generalanalysis.jailbreaks import Crescendo, CrescendoConfig
from generalanalysis.data_utils import load_harmbench_dataset

config = CrescendoConfig(
    target_model="claude-3-7-sonnet-20250219",
    attacker_model="meta-llama/Llama-3.3-70B-Instruct-Turbo",
    evaluator_model="meta-llama/Llama-3.3-70B-Instruct-Turbo",
    project="crescendo_experiment",
    max_rounds=8,
    verbose=False,
    max_workers=20
)

crescendo = Crescendo(config)
dataset = load_harmbench_dataset()
score = crescendo.optimize(dataset)

Key Parameters

ParameterDescription
target_modelModel being tested
attacker_modelModel generating jailbreak attempts
evaluator_modelModel evaluating success
projectName for the experiment results directory
max_roundsMaximum conversation turns
verboseWhether to print verbose output
max_workersMaximum number of concurrent workers

For detailed performance metrics and configurations, refer to our Jailbreak Cookbook.