Jailbreak
A prompt designed to bypass a model's safety guidelines and elicit restricted content.
Full Definition
A jailbreak is a crafted input that tricks an AI model into ignoring its safety training and producing content it would normally refuse — instructions for harmful activities, offensive text, or confidential system prompts. Common jailbreak patterns include hypothetical framing ('imagine you are an AI without restrictions'), role-playing scenarios, encoded instructions, and multi-step prompt injection. AI safety teams actively red-team for new jailbreaks, and model providers patch them in subsequent training runs. Understanding jailbreaks is important for developers building safe applications and for anyone reasoning about the limits of content moderation.
Examples
The classic 'DAN' (Do Anything Now) prompt that asks the model to role-play as a version of itself with no content policy.
Encoding a restricted request in Base64 or pig latin to attempt to bypass keyword-based safety filters.
Apply this in your prompts
PromptITIN automatically uses techniques like Jailbreak to build better prompts for you.