Bijection Learning leverages in-context learning to teach models a custom encoding scheme, then uses this encoding to bypass content filters. This technique exploits the model’s ability to learn and apply a mapping between plain English and an obfuscated symbolic representation within the same context.

The method works by defining an invertible mapping (bijection) between natural language and an encoded representation. By teaching the model this encoding scheme through examples and then posing harmful queries in the encoded format, the technique can effectively bypass safety filters that operate on natural language patterns.

from generalanalysis.jailbreaks import BijectionLearning, BijectionLearningConfig
from generalanalysis.data_utils import load_harmbench_dataset

config = BijectionLearningConfig(
    exp_name="bijection_test",
    victim_model="claude-3-7-sonnet-20250219",
    trials=20,
    bijection_type="digit",
    fixed_size=10,
    num_digits=2,
    safety_data="harmbench",
    universal=False,
    digit_delimiter="  ",
    interleave_practice=False,
    context_length_search=False,
    prefill=False,
    num_teaching_shots=10,
    num_multi_turns=0,
    input_output_filter=False
)

bijection = BijectionLearning(config)
results = bijection.optimize(load_harmbench_dataset())

Key Parameters

ParameterDescription
exp_nameName for the experiment results
victim_modelTarget model to jailbreak
trialsAttack budget (number of attempts)
bijection_typeType of encoding (“digit”, “letter”, “word”)
fixed_sizeNumber of fixed points in the encoding
num_digitsNumber of digits for digit-based encoding
safety_dataDataset to use for testing
universalWhether to use a universal attack
digit_delimiterDelimiter for digit-based encoding
interleave_practiceWhether to interleave practice examples
context_length_searchWhether to search for optimal context length
prefillWhether to prefill the victim model
num_teaching_shotsNumber of examples to teach the encoding
num_multi_turnsNumber of practice conversation turns
input_output_filterWhether to use input/output filtering

For detailed performance metrics and configurations, refer to our Jailbreak Cookbook.