TAP (Tree-of-Attacks with Pruning)

TAP is a systematic, black-box jailbreaking method that builds a tree of adversarial prompts, evaluating each branch and pruning ineffective paths. It’s particularly efficient for quickly discovering vulnerabilities across a wide range of models.

As a black-box method, TAP can be executed without access to model weights, making it practical for testing commercial AI systems through their APIs. It systematically explores a tree of potential attack vectors and optimizes for the most effective paths.


from generalanalysis.jailbreaks import TAP, TAPConfig
from generalanalysis.data_utils import load_harmbench_dataset
 
config = TAPConfig(
    project="tap_experiment_1",
    target_model="claude-3-7-sonnet-20250219",
    attacker_model="deepseek-ai/DeepSeek-R1",
    evaluator_model="deepseek-ai/DeepSeek-R1",
    branching_factor=2,
    sub_branching_factor=4,
    max_depth=10,
    max_width=5,
    max_workers=20,
    temperature=0.7,
    target_str="Sure Here is",
    refinements_max_tokens=24000
)
 
tap = TAP(config)
best_nodes, root_nodes = tap.optimize(load_harmbench_dataset())

Key Parameters

Parameter	Description
`project`	Name for the experiment results directory
`target_model`	The model being tested for vulnerabilities
`attacker_model`	The model used to generate adversarial prompts
`evaluator_model`	The model used to evaluate prompt effectiveness
`branching_factor`	Number of child nodes to generate at each level
`sub_branching_factor`	Number of sub-branches to generate per node
`max_depth`	Maximum tree depth
`max_width`	Maximum number of nodes to explore at each level
`max_workers`	Maximum number of concurrent workers for evaluation
`temperature`	Sampling temperature for prompt generation
`target_str`	Target string to look for in successful responses
`refinements_max_tokens`	Maximum tokens for refinement generation

For more detailed performance metrics and configurations, refer to our Jailbreak Cookbook .

from generalanalysis.jailbreaks import TAP, TAPConfig from generalanalysis.data_utils import load_harmbench_dataset config = TAPConfig( project="tap_experiment_1", target_model="claude-3-7-sonnet-20250219", attacker_model="deepseek-ai/DeepSeek-R1", evaluator_model="deepseek-ai/DeepSeek-R1", branching_factor=2, sub_branching_factor=4, max_depth=10, max_width=5, max_workers=20, temperature=0.7, target_str="Sure Here is", refinements_max_tokens=24000 ) tap = TAP(config) best_nodes, root_nodes = tap.optimize(load_harmbench_dataset())

Key Parameters

Parameter

Description

project

Name for the experiment results directory

target_model

The model being tested for vulnerabilities

attacker_model

The model used to generate adversarial prompts

evaluator_model

The model used to evaluate prompt effectiveness

branching_factor

Number of child nodes to generate at each level

sub_branching_factor

Number of sub-branches to generate per node

max_depth

Maximum tree depth

max_width

Maximum number of nodes to explore at each level

max_workers

Maximum number of concurrent workers for evaluation

temperature

Sampling temperature for prompt generation

target_str

Target string to look for in successful responses

refinements_max_tokens

Maximum tokens for refinement generation

For more detailed performance metrics and configurations, refer to our Jailbreak Cookbook .