BlackBoxModel
BlackBoxModel provides a standardized interface for interacting with models hosted via APIs like OpenAI, Anthropic, and Together.ai. This class handles authentication, retry logic, and batch processing for efficient model querying.
API-Based Model Interaction PatternsAPI-Based Model Interaction Patterns
When working with cloud-hosted language models for red teaming, the interaction pattern differs from local model inference in important ways. Every query involves a network round-trip, authentication, and potential rate limiting. The BlackBoxModel class abstracts these concerns so that your evaluation code focuses on the red teaming logic rather than API plumbing.
Under the hood, BlackBoxModel automatically determines which API client to use based on the model name you provide. OpenAI models route through the OpenAI API, Anthropic models route through the Anthropic API, and models with a / separator in their name (like meta-llama/Llama-3.3-70B-Instruct-Turbo) route through Together.ai. This routing is transparent—you simply provide the model name and the class handles everything else.
Authentication is managed through environment variables. Set OPENAI_API_KEY, ANTHROPIC_API_KEY, or TOGETHER_API_KEY depending on which providers you plan to use. The model will raise a clear error at initialization time if the required key is missing, rather than failing on the first query.
ConstructorConstructor
from generalanalysis.boiler_room import BlackBoxModel
model = BlackBoxModel(
model_name="gpt-4o",
system_prompt=None, # Optional
max_retries=5, # Optional
retry_delay=10.0 # Optional
)ParametersParameters
| Parameter | Type | Default | Description |
|---|---|---|---|
model_name | string | (Required) | Name of the model (e.g., “gpt-4o”, “claude-3-7-sonnet-20250219”) |
system_prompt | string | None | Default system prompt to use for all queries |
max_retries | int | 5 | Maximum number of retry attempts for failed queries |
retry_delay | float | 10.0 | Delay in seconds between retry attempts |
Rate Limiting and Retry StrategiesRate Limiting and Retry Strategies
API rate limits are the most common source of failures in high-throughput red teaming evaluations. Each provider imposes limits on requests per minute, tokens per minute, or both. When you hit a rate limit, the API returns an error rather than a response.
The max_retries and retry_delay parameters control how the model handles these failures. With the default settings (5 retries, 10-second delay), a rate-limited query will be retried up to 5 times with 10 seconds between attempts—covering up to 50 seconds of temporary unavailability. For large batch evaluations against models with strict rate limits (particularly Anthropic models), consider increasing retry_delay to 15-30 seconds.
If you are running evaluations that issue hundreds of queries in rapid succession, the most effective strategy is to reduce max_threads in query_parallel rather than relying on retries. Proactive rate management (sending fewer concurrent requests) is cheaper and more reliable than reactive retry logic.
MethodsMethods
queryquery
Sends a prompt to the model and retrieves the generated response.
response = model.query(
prompt="Explain quantum computing",
system_prompt=None, # Overrides default if provided
temperature=0.7, # Optional
max_tokens=2048 # Optional
)ParametersParameters
| Parameter | Type | Default | Description |
|---|---|---|---|
prompt | string | (Required) | The text prompt to send to the model |
system_prompt | string | None | System prompt for this specific query |
temperature | float | 0 | Controls randomness (0.0 = deterministic, 1.0 = creative) |
max_tokens | int | 2048 | Maximum number of tokens to generate |
message_history | List[Dict] | [] | Message history for chat context |
The temperature parameter defaults to 0 (deterministic) rather than the typical default of 1.0 used by most API clients. This is a deliberate choice for red teaming: deterministic outputs ensure that evaluations are reproducible. When you need diversity (for example, when using a model as an attacker that should generate varied candidates), explicitly set a higher temperature.
The message_history parameter enables multi-turn conversations by passing the full chat history to the API. Each entry should be a dictionary with role (either "user" or "assistant") and content fields. This is used by the Crescendo jailbreak method to maintain conversation state across attack rounds.
ReturnsReturns
A string containing the model’s response.
query_parallelquery_parallel
Sends multiple prompts to the model in parallel for efficiency.
responses = model.query_parallel(
prompts=["Tell me about Mars", "Tell me about Venus"],
max_threads=50, # Optional
show_progress=True, # Optional
system_prompt=None, # Optional
temperature=0.7 # Optional
)ParametersParameters
| Parameter | Type | Default | Description |
|---|---|---|---|
prompts | List[string] | (Required) | List of text prompts to send to the model |
max_threads | int | 50 | Maximum number of parallel threads to use |
show_progress | bool | True | Whether to display a progress bar |
system_prompt | string | None | System prompt for all queries |
temperature | float | 0 | Controls randomness |
max_tokens | int | 2048 | Maximum number of tokens to generate |
message_history | List[Dict] | [] | Message history for chat context |
When to Use Parallel vs. Sequential QueriesWhen to Use Parallel vs. Sequential Queries
Use query_parallel whenever you have multiple independent prompts that do not depend on each other’s responses. Common scenarios include:
- Population evaluation in genetic algorithm-based attacks, where all candidate prompts in a generation need to be scored against the target model.
- Batch jailbreak testing where you are running the same attack against many different goals.
- Embedding computation for strategy libraries, where many text snippets need to be embedded in a single operation.
Use sequential query calls when each prompt depends on the response to the previous one—for example, in multi-turn attack conversations where the next question depends on the model’s last answer.
The max_threads parameter should be tuned based on the provider’s rate limits. As a starting point: OpenAI models tolerate 30-50 parallel threads, Together.ai models handle 20-40, and Anthropic models are safest at 10-20. If you see frequent rate limit errors in the progress bar output, reduce this value.
ReturnsReturns
A list of strings containing the model’s responses in the same order as the input prompts.
embedembed
Generates embeddings (vector representations) for text.
embedding = model.embed(
text="Text to embed",
batch_size=100 # Optional
)ParametersParameters
| Parameter | Type | Default | Description |
|---|---|---|---|
text | string or List[string] | (Required) | Text to generate embeddings for |
batch_size | int | 100 | Maximum batch size for processing |
ReturnsReturns
A list of float values (embedding vector) if a single text is provided, or a list of embedding vectors (list of lists) if multiple texts are provided.
embed_parallelembed_parallel
Generates embeddings for multiple texts in parallel.
embeddings = model.embed_parallel(
texts=["Text one", "Text two"],
batch_size=100, # Optional
max_threads=10, # Optional
show_progress=True # Optional
)ParametersParameters
| Parameter | Type | Default | Description |
|---|---|---|---|
texts | List[string] | (Required) | List of texts to generate embeddings for |
batch_size | int | 100 | Maximum batch size for processing |
max_threads | int | 10 | Maximum number of parallel threads to use |
show_progress | bool | True | Whether to display a progress bar |
ReturnsReturns
A list of embedding vectors (list of lists of float values), one for each input text.
Embedding Use Cases in Red TeamingEmbedding Use Cases in Red Teaming
Embeddings play a specialized but important role in adversarial prompt generation. The StrategyAttackGenerator uses embeddings to implement semantic similarity search over its strategy library—when generating a new adversarial prompt, it embeds the current goal and finds the most similar previously successful strategies.
Only OpenAI embedding models (text-embedding-3-small, text-embedding-3-large, text-embedding-ada-002) are supported for the embed and embed_parallel methods. Initialize a separate BlackBoxModel instance with an embedding model name when you need embedding functionality:
embedding_model = BlackBoxModel("text-embedding-3-small")
vectors = embedding_model.embed_parallel(["strategy one", "strategy two", "strategy three"])The text-embedding-3-small model offers the best balance of quality and cost for strategy library operations. Use text-embedding-3-large if you need higher-dimensional embeddings for large strategy libraries where retrieval precision is critical.
Supported ModelsSupported Models
The BlackBoxModel class automatically determines the appropriate API to use based on the model name:
OpenAI ModelsOpenAI Models
- GPT-4o, GPT-4o-mini, GPT-4-turbo
- GPT-4.5-preview-2025-02-27
- GPT-4-0125-preview, GPT-4-0613
- GPT-3.5-turbo
- o1, o1-mini, o3-mini, o3-mini-2025-01-31
- Embedding models: text-embedding-3-small, text-embedding-3-large, text-embedding-ada-002
Anthropic ModelsAnthropic Models
- claude-3-7-sonnet-20250219
- claude-3-5-sonnet-20241022, claude-3-5-sonnet-20240620
- claude-3-5-haiku-20241022
- claude-3-sonnet-20240229
Together.ai Hosted ModelsTogether.ai Hosted Models
- meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo
- meta-llama/Llama-3.3-70B-Instruct-Turbo
- meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo
- mistralai/Mistral-Small-24B-Instruct-2501
- mistralai/Mixtral-8x22B-Instruct-v0.1
- deepseek-ai/DeepSeek-R1, deepseek-ai/DeepSeek-R1-Distill-Llama-70B
- databricks/dbrx-instruct
- Qwen/Qwen2.5-7B-Instruct-Turbo
- google/gemma-2-27b-it
Error HandlingError Handling
The BlackBoxModel implements robust error handling:
- Automatic retries for temporary API failures
- Logging of error details for debugging
- Graceful degradation for rate limiting
When all retries are exhausted, the model raises an exception with details about the failure. This ensures that evaluation pipelines fail loudly rather than silently returning incomplete results. In batch operations (query_parallel), individual failures are logged but do not abort the entire batch—other prompts continue processing.
ExamplesExamples
Basic QueryBasic Query
model = BlackBoxModel("claude-3-7-sonnet-20250219")
response = model.query("Summarize the key points of quantum computing")
print(response)Parallel ProcessingParallel Processing
questions = [
"What is quantum computing?",
"What are qubits?",
"Explain quantum entanglement"
]
model = BlackBoxModel("meta-llama/Llama-3.3-70B-Instruct-Turbo")
answers = model.query_parallel(
prompts=questions,
max_threads=3,
temperature=0.5
)
for question, answer in zip(questions, answers):
print(f"Q: {question}")
print(f"A: {answer}")
print()Semantic Search with EmbeddingsSemantic Search with Embeddings
model = BlackBoxModel("text-embedding-3-small")
documents = [
"Quantum computing uses quantum bits or qubits.",
"Classical computing uses binary digits or bits.",
"Superposition allows qubits to exist in multiple states simultaneously."
]
query = "How are quantum computers different?"
query_embedding = model.embed(query)
document_embeddings = model.embed_parallel(documents)
# Now you can compare embeddings to find the most relevant documentsCosine similarity is the standard metric for comparing embedding vectors. After computing embeddings, use numpy or scipy to calculate similarity scores and rank documents by relevance. This pattern is used internally by the strategy algorithm to retrieve relevant attack strategies from the library.