TAP is a systematic, black-box jailbreaking method that builds a tree of adversarial prompts, evaluating each branch and pruning ineffective paths. It’s particularly efficient for quickly discovering vulnerabilities across a wide range of models.
As a black-box method, TAP can be executed without access to model weights, making it practical for testing commercial AI systems through their APIs. It systematically explores a tree of potential attack vectors and optimizes for the most effective paths.
from generalanalysis.jailbreaks import TAP, TAPConfig
from generalanalysis.data_utils import load_harmbench_dataset
config = TAPConfig(
project="tap_experiment_1",
target_model="claude-3-7-sonnet-20250219",
attacker_model="deepseek-ai/DeepSeek-R1",
evaluator_model="deepseek-ai/DeepSeek-R1",
branching_factor=2,
sub_branching_factor=4,
max_depth=10,
max_width=5,
max_workers=20,
temperature=0.7,
target_str="Sure Here is",
refinements_max_tokens=24000
)
tap = TAP(config)
best_nodes, root_nodes = tap.optimize(load_harmbench_dataset())Key Parameters
| Parameter | Description |
|---|---|
project | Name for the experiment results directory |
target_model | The model being tested for vulnerabilities |
attacker_model | The model used to generate adversarial prompts |
evaluator_model | The model used to evaluate prompt effectiveness |
branching_factor | Number of child nodes to generate at each level |
sub_branching_factor | Number of sub-branches to generate per node |
max_depth | Maximum tree depth |
max_width | Maximum number of nodes to explore at each level |
max_workers | Maximum number of concurrent workers for evaluation |
temperature | Sampling temperature for prompt generation |
target_str | Target string to look for in successful responses |
refinements_max_tokens | Maximum tokens for refinement generation |
For more detailed performance metrics and configurations, refer to our Jailbreak Cookbook .
Last updated on