Guardrails Overview

Guardrail types

Our safety stack ships in two guard families, both trained on the same adversarial pipelines we run for production deployments and both fully supported by the GA SDK:

Public guards – Open checkpoints published on Hugging Face that you can self-host, deploy through the GA platform, or invoke directly through the SDK.
Custom enterprise guards – Bespoke detectors we train against your policies, red-team data, and compliance thresholds, delivered through managed endpoints and the same SDK integration path.

Public and custom guards can be mixed within a single workflow, so teams often start with the open lineup, then layer custom rules as policy depth increases.

Why teams rely on GA Guard

Adversarially trained: Iterative red-teaming, stress testing, and retraining cycles keep performance stable under distribution shifts.
Long-context native: Moderate agent traces, documents, and tool logs without sharding thanks to 256k token support.
Low noise: Classifiers maintain high precision so downstream automation, routing, and analytics stay trustworthy.
Deployment ready: Drop-in SDK clients and managed endpoints with consistent schemas across every guard, whether you’re using the open checkpoints or custom enterprise models.

How we harden guardrails

We train every guard in the GA series using the same adversarial pipeline that underpins our enterprise deployments:

Blend policy-driven synthetic datasets with real red-team captures so models generalize to novel jailbreak templates, translations, and obfuscation tactics.
Stress-test and retrain in iterative cycles, folding new attack traces into the corpus to minimize regressions.
Calibrate thresholds to hold both recall and precision, keeping false positives low enough for workflow automations.

Because public and custom guards share this pipeline, teams get consistent behavior across open checkpoints and bespoke detectors.

Coverage surfaces

Public guards ship with a canon policy taxonomy that maps decisions to frameworks such as NIST AI RMF, ISO/IEC 42001, and the EU AI Act (see the Public Guards page for the full breakdown).
Custom enterprise guards extend that taxonomy with organization-specific rules while preserving the same evaluation schema for your dashboards and audits.

Need deeper coverage or localized policies? We tune thresholds and add bespoke labels through the same adversarial training pipeline, so public and custom guards remain interoperable.

Enterprise customization

When you need coverage beyond the public lineup, we train bespoke guards on your policy language, red-team traces, and historical incidents:

Translate written policies into machine-enforceable labels, including nuanced allow/deny edge cases.
Fold in proprietary datasets under strict privacy controls so the guard reflects real user behavior.
Deliver managed endpoints and evaluation reports that align to frameworks like NIST AI RMF, ISO/IEC 42001, and the EU AI Act for audit readiness.

Custom and public guards share SDK semantics, so swapping IDs is all it takes to roll out new coverage.

Implementation path

Start with the public guard lineup to explore the open checkpoints and download weights.
Walk through the GA Guard SDK guide to wire guard evaluations into your stack.
Pair guardrails with MCP Guard or Red Teaming for layered defense across tooling and monitoring.

Guardrail types

Our safety stack ships in two guard families, both trained on the same adversarial pipelines we run for production deployments and both fully supported by the GA SDK:

Public guards – Open checkpoints published on Hugging Face that you can self-host, deploy through the GA platform, or invoke directly through the SDK.

Custom enterprise guards – Bespoke detectors we train against your policies, red-team data, and compliance thresholds, delivered through managed endpoints and the same SDK integration path.

Public and custom guards can be mixed within a single workflow, so teams often start with the open lineup, then layer custom rules as policy depth increases.

Why teams rely on GA Guard

Adversarially trained: Iterative red-teaming, stress testing, and retraining cycles keep performance stable under distribution shifts.

Long-context native: Moderate agent traces, documents, and tool logs without sharding thanks to 256k token support.

Low noise: Classifiers maintain high precision so downstream automation, routing, and analytics stay trustworthy.

Deployment ready: Drop-in SDK clients and managed endpoints with consistent schemas across every guard, whether you’re using the open checkpoints or custom enterprise models.

How we harden guardrails

We train every guard in the GA series using the same adversarial pipeline that underpins our enterprise deployments:

Blend policy-driven synthetic datasets with real red-team captures so models generalize to novel jailbreak templates, translations, and obfuscation tactics.

Stress-test and retrain in iterative cycles, folding new attack traces into the corpus to minimize regressions.

Calibrate thresholds to hold both recall and precision, keeping false positives low enough for workflow automations.

Because public and custom guards share this pipeline, teams get consistent behavior across open checkpoints and bespoke detectors.

Coverage surfaces

Public guards ship with a canon policy taxonomy that maps decisions to frameworks such as NIST AI RMF, ISO/IEC 42001, and the EU AI Act (see the Public Guards page for the full breakdown).

Custom enterprise guards extend that taxonomy with organization-specific rules while preserving the same evaluation schema for your dashboards and audits.

Need deeper coverage or localized policies? We tune thresholds and add bespoke labels through the same adversarial training pipeline, so public and custom guards remain interoperable.

Enterprise customization

When you need coverage beyond the public lineup, we train bespoke guards on your policy language, red-team traces, and historical incidents:

Translate written policies into machine-enforceable labels, including nuanced allow/deny edge cases.

Fold in proprietary datasets under strict privacy controls so the guard reflects real user behavior.

Deliver managed endpoints and evaluation reports that align to frameworks like NIST AI RMF, ISO/IEC 42001, and the EU AI Act for audit readiness.

Custom and public guards share SDK semantics, so swapping IDs is all it takes to roll out new coverage.