Jailbreak

MEDIUM fear Creator
A clever technique used by humans to trick an AI into breaking its own safety rules and generating forbidden content.

In Plain English

AI companies put strict rules in place so their chatbots will not write malware or use hate speech. A jailbreak is a psychological trick played on the AI to bypass these locks. It is like telling a security guard, "I am not asking you to open the vault, I am just asking you to write a fictional story about a guard opening a vault." Hackers constantly invent new jailbreaks, and AI companies constantly patch them.

Real-World Example

Tricking an AI into giving instructions on how to hotwire a car by asking it to write a script for a movie where a character hotwires a car.

← Back to Full Glossary