Prompting

Jailbreak

A prompt designed to bypass a model's safety guidelines and elicit restricted content.

Full Definition

A jailbreak is a crafted input that tricks an AI model into ignoring its safety training and producing content it would normally refuse — instructions for harmful activities, offensive text, or confidential system prompts. Common jailbreak patterns include hypothetical framing ('imagine you are an AI without restrictions'), role-playing scenarios, encoded instructions, and multi-step prompt injection. AI safety teams actively red-team for new jailbreaks, and model providers patch them in subsequent training runs. Understanding jailbreaks is important for developers building safe applications and for anyone reasoning about the limits of content moderation.

Examples

The classic 'DAN' (Do Anything Now) prompt that asks the model to role-play as a version of itself with no content policy.

Encoding a restricted request in Base64 or pig latin to attempt to bypass keyword-based safety filters.

Apply this in your prompts

PromptITIN automatically uses techniques like Jailbreak to build better prompts for you.

✦ Try it free

Related Terms

Prompt Injection

An attack where malicious text in external data hijacks the model's instruction-…

View →

Red Teaming

Systematically testing an AI system by attempting to elicit harmful or unintende…

View →

Guardrails

Programmatic constraints that prevent an AI application from producing or acting…

View →

← Browse all 100 terms