Safety

Guardrails

Programmatic constraints that prevent an AI application from producing or acting on harmful outputs.

Full Definition

Guardrails are safety controls layered around an LLM application to constrain its behaviour within acceptable boundaries. They operate at multiple levels: input validation (filtering harmful or out-of-scope requests before they reach the model), output validation (checking responses against safety classifiers or rule engines before delivery), and action constraints (limiting what tools or APIs an agent can invoke). Frameworks like NeMo Guardrails, Guardrails.ai, and Llama Guard provide structured ways to define and enforce these constraints. Guardrails are complementary to, not a replacement for, model-level safety training — they catch the cases that slip through.

Examples

Using Guardrails.ai to define a topic rail that detects if a customer service bot's response veers into medical advice and redirects to a disclaimer.

An agent framework that checks every proposed tool call against an allowlist before execution, preventing the agent from calling destructive database operations.

Apply this in your prompts

PromptITIN automatically uses techniques like Guardrails to build better prompts for you.

✦ Try it free

Related Terms

Content Moderation

Automated or human review of AI inputs and outputs to prevent harmful, illegal, …

View →

AI Safety

The interdisciplinary field studying how to develop AI systems that are safe, re…

View →

Responsible AI

The practice of developing and deploying AI systems ethically, transparently, an…

View →

← Browse all 100 terms