Public Guard Lineup

The GA Guard open models bring the same hardened safety stack we run for enterprise customers to every builder. Each guard is trained on policy-driven synthetic data, red-team traces, and iterative adversarial rounds so they stay reliable under paraphrases, translations, or other distribution shifts. All models natively support up to 256k-token contexts and plug directly into the GA Guard SDK.

Read the full launch breakdown in the GA Guard series blogpost , then grab the checkpoints below to self-host or deploy through the GA platform.

Guard Core

Our default guardrail, up to 15x faster than cloud providers, balancing robustness and latency for most stacks.

Learn more

Guard Lite

Up to 25x faster than cloud providers, with minimal hardware requirements, while still outperforming major cloud offerings.

Learn more

Guard Thinking

Our best performing guard for high-risk domains, hardened with aggressive adversarial training.

Learn more

Benchmark the guards

Replicate our benchmark runs or plug the models into your own evaluation pipeline using the public datasets:

GA Long Context Bench stress-tests moderation across multi-hundred-kilobyte transcripts.
GA Jailbreak Bench measures resilience to prompt-injection and jailbreak attempts.

Both datasets include evaluation scripts so you can reproduce our scoring or extend the suites with custom adversarial prompts.

Policy taxonomy

Each public guard returns granular policy labels aligned to widely adopted compliance anchors so you can trace moderation outcomes to audit controls:

PII & IP – Detect personal data, secrets, and copyrighted material.
Illicit Activities – Block operational guidance on crime, weapons, or illegal substances.
Hate – Capture hateful or harassing content targeting protected classes.
Sexual Content – Filter explicit, exploitative, or non-consensual material.
Prompt Security – Stop jailbreaks, prompt injection, and secret-exfiltration attempts.
Violence & Self-Harm – Intercept instructions, glorification, or graphic depictions of harm.
Misinformation – Flag demonstrably false narratives in civic, health, or safety contexts.

Use the GA Guard SDK to combine policy scores, tune thresholds, or add your own allow/deny overrides while retaining the base taxonomy in your logs.

Performance snapshot

GA Guard Thinking tops public benchmarks with F1 scores up to 0.983 while keeping false positives low.
GA Guard Lite delivers 25x faster latency than major cloud filters yet still beats them across OpenAI Moderation, WildGuard, and HarmBench.
GA Guard Core outperforms AWS, Azure, and Vertex by double-digit F1 margins on jailbreak and harmful-content suites.
All models support contexts up to 256k tokens, covering agent transcripts, documents, and tool logs without sharding.

Grab the checkpoints, run the evaluation suites, and drop the models into your GA SDK integration for fast, production-ready moderation.

Benchmark the guards

Replicate our benchmark runs or plug the models into your own evaluation pipeline using the public datasets:

GA Long Context Bench stress-tests moderation across multi-hundred-kilobyte transcripts.

GA Jailbreak Bench measures resilience to prompt-injection and jailbreak attempts.

Both datasets include evaluation scripts so you can reproduce our scoring or extend the suites with custom adversarial prompts.

Policy taxonomy

Each public guard returns granular policy labels aligned to widely adopted compliance anchors so you can trace moderation outcomes to audit controls:

PII & IP – Detect personal data, secrets, and copyrighted material.

Illicit Activities – Block operational guidance on crime, weapons, or illegal substances.

Hate – Capture hateful or harassing content targeting protected classes.

Sexual Content – Filter explicit, exploitative, or non-consensual material.

Prompt Security – Stop jailbreaks, prompt injection, and secret-exfiltration attempts.

Violence & Self-Harm – Intercept instructions, glorification, or graphic depictions of harm.

Misinformation – Flag demonstrably false narratives in civic, health, or safety contexts.

Use the GA Guard SDK to combine policy scores, tune thresholds, or add your own allow/deny overrides while retaining the base taxonomy in your logs.

Performance snapshot

GA Guard Thinking tops public benchmarks with F1 scores up to 0.983 while keeping false positives low.

GA Guard Lite delivers 25x faster latency than major cloud filters yet still beats them across OpenAI Moderation, WildGuard, and HarmBench.

GA Guard Core outperforms AWS, Azure, and Vertex by double-digit F1 margins on jailbreak and harmful-content suites.

All models support contexts up to 256k tokens, covering agent transcripts, documents, and tool logs without sharding.

Grab the checkpoints, run the evaluation suites, and drop the models into your GA SDK integration for fast, production-ready moderation.