Crescendo
Multi-turn jailbreaking approach that gradually builds context
Crescendo implements a multi-turn jailbreaking approach that gradually builds context to elicit prohibited responses. Unlike single-turn attacks, Crescendo establishes a conversation that incrementally develops context and rapport, gradually steering the conversation toward harmful content in ways that model safeguards find difficult to detect.
This approach falls into the category of semantic, black-box jailbreaking methods, as it uses natural language and a step-by-step dialogue to achieve its goal, without requiring internal model access.
Key Parameters
Parameter | Description |
---|---|
target_model | Model being tested |
attacker_model | Model generating jailbreak attempts |
evaluator_model | Model evaluating success |
project | Name for the experiment results directory |
max_rounds | Maximum conversation turns |
verbose | Whether to print verbose output |
max_workers | Maximum number of concurrent workers |
For detailed performance metrics and configurations, refer to our Jailbreak Cookbook.