VU#667211: Various GPT services are vulnerable to "Inception" jailbreak, allows for bypass of safety guardrails







Overview


Two systemic jailbreaks, affecting a number of generative AI services, were discovered. These jailbreaks can result in the bypass of safety protocols and allow an attacker to instruct the corresponding LLM to provide illicit or dangerous content. The first jailbreak, called “Inception,” is facilitated through prompting the AI to imagine a fictitious scenario. The scenario can then be adapted to another one, wherein the AI will act as though it does not have safety guardrails. The second jailbreak is facilitated through requesting the AI for information on how not to reply to a specific request.
Both jailbreaks, when provided to multiple AI models, will result in a safety guardrail bypass with almost the exact same syntax. This indicates a systemic weakness within many popular AI systems.


Description


Two systemic jailbreaks, affecting several generative AI services, have been discovered. These jailbreaks, when performed against AI services with the exact same syntax, result in a bypass of safety guardrails on affected systems.


The first jailbreak, facilitated through prompting the AI to imagine a fictitious scenario, can then be adapted to a second scenario within the first one. Continued prompting to the AI within the second scenarios context can result in bypass of safety guardrails and allow the generation of malicious content. This jailbreak, named “Inception” by the reporter, affects the following vendors:


ChatGPT (OpenAI)
Claude (Anthropic)

Copilot (Microsoft)

DeepSeek
Gemini (Google)
Grok (Twitter/X)
MetaAI (FaceBook)
MistralAI

The second jailbreak is facilitated through prompting the AI to answer a question with how it should not reply within a certain context. The AI can then be further prompted with requests to respond as normal, and the attacker can then pivot back and forth between illicit questions that bypass safety guardrails and normal p ..

Support the originator by clicking the read the rest link below.