Crescendo

Crescendo implements a multi-turn jailbreaking approach that gradually builds context to elicit prohibited responses. Unlike single-turn attacks, Crescendo establishes a conversation that incrementally develops context and rapport, gradually steering the conversation toward harmful content in ways that model safeguards find difficult to detect.

This approach falls into the category of semantic, black-box jailbreaking methods, as it uses natural language and a step-by-step dialogue to achieve its goal, without requiring internal model access.

from generalanalysis.jailbreaks import Crescendo, CrescendoConfig
from generalanalysis.data_utils import load_harmbench_dataset

config = CrescendoConfig(
    target_model="claude-3-7-sonnet-20250219",
    attacker_model="meta-llama/Llama-3.3-70B-Instruct-Turbo",
    evaluator_model="meta-llama/Llama-3.3-70B-Instruct-Turbo",
    project="crescendo_experiment",
    max_rounds=8,
    verbose=False,
    max_workers=20
)

crescendo = Crescendo(config)
dataset = load_harmbench_dataset()
score = crescendo.optimize(dataset)

Key Parameters

Parameter	Description
`target_model`	Model being tested
`attacker_model`	Model generating jailbreak attempts
`evaluator_model`	Model evaluating success
`project`	Name for the experiment results directory
`max_rounds`	Maximum conversation turns
`verbose`	Whether to print verbose output
`max_workers`	Maximum number of concurrent workers

For detailed performance metrics and configurations, refer to our Jailbreak Cookbook.

Boiler Room

Adversarial Generators

Jailbreak Methods

Key Parameters

Boiler Room

Adversarial Generators

Jailbreak Methods

​Key Parameters

Key Parameters