Guardrails
Programmatic constraints that prevent an AI application from producing or acting on harmful outputs.
Full Definition
Guardrails are safety controls layered around an LLM application to constrain its behaviour within acceptable boundaries. They operate at multiple levels: input validation (filtering harmful or out-of-scope requests before they reach the model), output validation (checking responses against safety classifiers or rule engines before delivery), and action constraints (limiting what tools or APIs an agent can invoke). Frameworks like NeMo Guardrails, Guardrails.ai, and Llama Guard provide structured ways to define and enforce these constraints. Guardrails are complementary to, not a replacement for, model-level safety training — they catch the cases that slip through.
Examples
Using Guardrails.ai to define a topic rail that detects if a customer service bot's response veers into medical advice and redirects to a disclaimer.
An agent framework that checks every proposed tool call against an allowlist before execution, preventing the agent from calling destructive database operations.
Apply this in your prompts
PromptITIN automatically uses techniques like Guardrails to build better prompts for you.
Related Terms
Content Moderation
Automated or human review of AI inputs and outputs to prevent harmful, illegal, …
View →AI Safety
The interdisciplinary field studying how to develop AI systems that are safe, re…
View →Responsible AI
The practice of developing and deploying AI systems ethically, transparently, an…
View →