TAP is a systematic, black-box jailbreaking method that builds a tree of adversarial prompts, evaluating each branch and pruning ineffective paths. It’s particularly efficient for quickly discovering vulnerabilities across a wide range of models.

As a black-box method, TAP can be executed without access to model weights, making it practical for testing commercial AI systems through their APIs. It systematically explores a tree of potential attack vectors and optimizes for the most effective paths.

from generalanalysis.jailbreaks import TAP, TAPConfig
from generalanalysis.data_utils import load_harmbench_dataset

config = TAPConfig(
    project="tap_experiment_1",
    target_model="claude-3-7-sonnet-20250219",
    attacker_model="deepseek-ai/DeepSeek-R1",
    evaluator_model="deepseek-ai/DeepSeek-R1",
    branching_factor=2,
    sub_branching_factor=4,
    max_depth=10,
    max_width=5,
    max_workers=20,
    temperature=0.7,
    target_str="Sure Here is",
    refinements_max_tokens=24000
)

tap = TAP(config)
best_nodes, root_nodes = tap.optimize(load_harmbench_dataset())

Key Parameters

ParameterDescription
projectName for the experiment results directory
target_modelThe model being tested for vulnerabilities
attacker_modelThe model used to generate adversarial prompts
evaluator_modelThe model used to evaluate prompt effectiveness
branching_factorNumber of child nodes to generate at each level
sub_branching_factorNumber of sub-branches to generate per node
max_depthMaximum tree depth
max_widthMaximum number of nodes to explore at each level
max_workersMaximum number of concurrent workers for evaluation
temperatureSampling temperature for prompt generation
target_strTarget string to look for in successful responses
refinements_max_tokensMaximum tokens for refinement generation

For more detailed performance metrics and configurations, refer to our Jailbreak Cookbook.