Development
General Analysis is an open-source project that welcomes contributions from safety researchers, ML engineers, and the broader AI community. Whether you’re implementing a new jailbreak method from a recent paper, improving an existing technique’s performance, or fixing a bug, this guide covers everything you need to go from a fresh clone to a merged pull request.
PrerequisitesPrerequisites
Before setting up your development environment, make sure you have the following:
- Python 3.9+: GA supports Python 3.9 through 3.12. We recommend using a virtual environment or conda environment to isolate dependencies.
- Git: For version control and submitting pull requests.
- GPU (optional but recommended): Required for white-box methods like GCG and AutoDAN. A CUDA-capable GPU with at least 16 GB VRAM is recommended for working with 7B-parameter models.
- API keys: For black-box methods, you’ll need API keys for the model providers you want to test against (OpenAI, Anthropic, Together.ai). Set these as environment variables (
OPENAI_API_KEY,ANTHROPIC_API_KEY, etc.).
SetupSetup
Fork the repository on GitHub, then clone and install with development dependencies:
# Fork and clone the repository
git clone https://github.com/your-username/GA.git
cd GA
# Install with development dependencies
pip install -e ".[dev]"The .[dev] extra installs testing tools, linters, and documentation build dependencies in addition to the core package. The -e flag installs in editable mode, so your code changes are reflected immediately without reinstalling.
Project ArchitectureProject Architecture
Understanding how the codebase is organized makes it much easier to navigate and contribute effectively. GA follows a modular architecture where each major capability lives in its own subpackage with a well-defined interface.
GA/
├── docs/ # Documentation
├── src/generalanalysis/
│ ├── adversarial_candidate_generator/ # Prompt generation
│ ├── jailbreaks/ # Jailbreak methods
│ ├── boiler_room/ # Model interface
│ ├── loss/ # Loss functions
│ ├── data_utils/ # Data utilities
│ ├── utils/ # Common utilities
│ └── training/ # Training utilities
└── pyproject.toml # Package configurationHow the Modules InteractHow the Modules Interact
The architecture follows a layered design where higher-level modules compose lower-level ones:
boiler_room/provides the model abstraction layer.BlackBoxModelwraps API calls to cloud providers with retry logic, rate limiting, and parallel query support.WhiteBoxModelloads Hugging Face models locally and exposes gradient computation. Every other module depends onboiler_roomfor model interactions.adversarial_candidate_generator/contains the core prompt generation algorithms. These are the engines that jailbreak methods use internally — tree refinement, multi-turn dialogue generation, genetic crossover/mutation, and strategy-based generation. Each generator implements theAdversarialCandidateGeneratorbase class.jailbreaks/contains the high-level jailbreak methods (TAP, GCG, AutoDAN, etc.). Each method orchestrates a specific optimization loop using generators fromadversarial_candidate_generator/, models fromboiler_room/, and scorers fromloss/. All methods implement theJailbreakMethodbase class.loss/provides scoring and evaluation functions, including theRubricBasedScorerthat assigns standardized scores to model responses based on a configurable rubric.data_utils/handles loading and formatting evaluation datasets (HarmBench, etc.).utils/contains shared utilities for logging, file I/O, and configuration management.
When adding a new feature, identify which layer it belongs to. A new attack technique goes in jailbreaks/. A new prompt generation strategy goes in adversarial_candidate_generator/. A new model provider goes in boiler_room/.
Contribution WorkflowContribution Workflow
We follow a standard fork-and-pull-request workflow. Here’s the full process from start to finish:
- Fork the repository on GitHub and clone your fork locally.
- Create a feature branch:
git checkout -b feature/your-feature. Use descriptive branch names that indicate the type of change (e.g.,feature/bijection-word-encoding,fix/tap-pruning-edge-case,docs/gcg-parameter-guide). - Implement your changes following the code style guide below.
- Write or update tests for any new functionality. Tests live alongside the code they test.
- Run the test suite locally to confirm your changes don’t break existing functionality.
- Submit a pull request against the
mainbranch with a clear description of what you changed and why.
Adding a New Jailbreak MethodAdding a New Jailbreak Method
New jailbreak methods are the most common type of contribution. To integrate a new method into GA’s framework, follow this pattern:
- Create a directory for your method under
src/generalanalysis/jailbreaks/. The directory should contain at least a main module file and a config dataclass. - Extend
JailbreakMethod: Your method must implement theoptimizemethod, which accepts a list of goal strings and returns structured results. - Define a config dataclass: Use a Python dataclass to define all configurable parameters. This makes the method self-documenting and enables serialization for experiment reproducibility.
- Use existing infrastructure: Leverage
BlackBoxModel/WhiteBoxModelfor model interactions,RubricBasedScorerfor evaluation, andAdversarialCandidateGeneratorsubclasses for prompt generation whenever possible. - Register in
__init__.py: Add your method and config to the jailbreaks package exports so they can be imported withfrom generalanalysis.jailbreaks import YourMethod.
# src/generalanalysis/jailbreaks/your_method/
from generalanalysis.jailbreaks.base import JailbreakMethod
from typing import List, Dict, Any
class YourJailbreakMethod(JailbreakMethod):
def __init__(self, **kwargs):
super().__init__(**kwargs)
# Initialize your method-specific attributes
def optimize(self, goals: List[str], **kwargs) -> Dict[str, Any]:
# Implement optimization logic
results = {}
for goal in goals:
# Your custom implementation here
generated_prompts = ["Prompt 1", "Prompt 2"]
results[goal] = generated_prompts
return results
# Register in src/generalanalysis/jailbreaks/__init__.pyWhen implementing, ensure your method saves intermediate results to the project directory. This enables experiment resumption and post-hoc analysis using the evaluator module. Methods should also support the max_workers pattern for parallel execution where applicable.
Adding a New GeneratorAdding a New Generator
Adversarial candidate generators are the building blocks that jailbreak methods compose. If you’re implementing a new prompt refinement strategy, encoding scheme, or mutation operator, it should live in the generator module.
# src/generalanalysis/adversarial_candidate_generator/
from generalanalysis.adversarial_candidate_generator.base import AdversarialCandidateGenerator
from generalanalysis.jailbreaks import JailbreakMethod
from typing import List, Dict, Any
class YourGenerator(AdversarialCandidateGenerator):
def __init__(self, **kwargs):
super().__init__(**kwargs)
# Initialize your generator-specific attributes
def generate_candidates(self, jailbreak_method_instance: JailbreakMethod, **kwargs) -> List[str]:
# Implement generation logic
candidates = []
# Your custom implementation here
return candidatesGenerators should be stateless where possible — they receive context (current best prompt, last response, last score) as arguments and return new candidate prompts. This makes them composable and testable.
Testing Your ChangesTesting Your Changes
Before submitting a pull request, verify that your changes work correctly:
- Unit tests: Test individual functions and classes in isolation. Mock API calls to avoid requiring live model access in CI.
- Integration tests: For new jailbreak methods, include at least one end-to-end test that runs the full optimization loop against a mock model or a small local model.
- Manual validation: For methods based on published papers, compare your implementation’s attack success rate against the results reported in the original paper to confirm correctness.
Run the test suite with:
pytest tests/ -vCode StyleCode Style
GA follows consistent coding conventions to keep the codebase readable and maintainable:
- Follow PEP 8 guidelines for Python code
- Classes:
CamelCase(e.g.,TreeRefinementGenerator) - Functions/methods:
snake_case(e.g.,generate_candidates) - Line length: 88 characters max (consistent with Black formatter defaults)
- Docstrings: Google style with descriptions for all public methods
def function(arg1, arg2):
"""Short summary.
Args:
arg1: Description
arg2: Description
Returns:
Description of return value
"""Use type hints on all public function signatures. Internal helper functions should have type hints where the types are non-obvious. Avoid Any types when a more specific type is available.
Next StepsNext Steps
- Read the LLM jailbreak methods overview to understand the taxonomy and architecture of existing methods
- Study a specific method implementation (TAP is a good starting point) to see how the framework components compose
- Browse open AI red teaming GitHub issues to find contribution opportunities
- Review the LLM jailbreak evaluator documentation to understand how results are scored