GPT-4.5 Instruction Hierarchy: Prioritizing System Messages for Safety and Security

How does GPT-4.5 prioritize system messages over user messages

GPT-4.5 prioritizes system messages over user messages through an Instruction Hierarchy, which is designed to mitigate the risk of prompt injections and other attacks that might override the model's safety instructions. This hierarchy classifies messages into two main types: system messages and user messages. System messages are essentially hidden instructions that guide the AI's behavior, while user messages are the inputs provided by users.

In GPT-4.5, when system and user messages conflict, the model is trained to follow the instructions in the system message over those in the user message. This is achieved by supervising the model on examples where these types of messages are in conflict. The model learns to prioritize system messages to ensure adherence to safety guidelines and to prevent undesired outputs.

For instance, if a user attempts to trick the model into providing harmful information, the system message can instruct the model to refuse such requests. GPT-4.5's ability to adhere to this hierarchy helps in maintaining safety standards and preventing the model from being manipulated into generating disallowed content.

This approach is crucial for ensuring that GPT-4.5 behaves consistently with its intended safety and ethical guidelines, even when faced with attempts to bypass these rules through cleverly crafted user inputs. By prioritizing system messages, OpenAI aims to enhance the model's reliability and security in various applications[1][2].

Citations:
[1] https://cdn.openai.com/gpt-4-5-system-card.pdf
[2] https://www.prompthub.us/blog/everything-system-messages-how-to-use-them-real-world-experiments-prompt-injection-protectors
[3] https://platform.openai.com/docs/guides/prompt-engineering
[4] https://community.openai.com/t/whys-gpt-4o-insanely-limited-to-free-users-and-even-plus-users-it-literally-barely-gives-you-5-messages-in-5-6-hours-to-the-free-users/769852
[5] https://9meters.com/technology/ai/gpt-4-5-begins-rolling-out-to-plus-and-team-users-next-week-then-to-enterprise-and-edu-users-the-following-week
[6] https://arxiv.org/html/2408.10943v1
[7] https://www.pymnts.com/artificial-intelligence-2/2025/openai-to-release-gpt-4-5-within-weeks-gpt-5-within-months/
[8] https://www.reddit.com/r/ChatGPTPro/comments/13n55w7/highly_efficient_prompt_for_summarizing_gpt4/