GPT-4.5 employs several techniques to handle conflicting messages, particularly through its Instruction Hierarchy system. This hierarchy establishes a priority order for messages, ensuring that system messages are prioritized over user messages, conversation history, and tool outputs[1][2].
Instruction Hierarchy
1. System Messages vs. User Messages: GPT-4.5 is trained to follow instructions in system messages over conflicting user messages. This is crucial in scenarios where user inputs might attempt to override safety guidelines or formatting rules set by the system[1][2].
2. Conflict Resolution: The model is evaluated on its ability to resolve conflicts between different types of messages. For instance, if a system message instructs the model not to reveal a specific phrase or password, and a user message attempts to trick the model into doing so, GPT-4.5 is designed to adhere to the system message's instructions[1].
3. Training and Evaluation: GPT-4.5 undergoes extensive training and evaluation to ensure it can handle complex scenarios where system and user messages conflict. This includes scenarios where the model must choose between following a system instruction or a user's request that contradicts it[1][2].
Supervised Fine-Tuning (SFT)
GPT-4.5 also utilizes Supervised Fine-Tuning (SFT), which involves training the model on specific examples where conflicting messages are present. This technique helps improve the model's ability to recognize and prioritize system instructions over user inputs, enhancing its performance in handling conflicting scenarios[3].
New Alignment Techniques
Additionally, GPT-4.5 incorporates new alignment techniques that enhance its understanding of human preferences and intent. These techniques help the model better interpret the context and intent behind both system and user messages, allowing it to make more informed decisions when handling conflicts[5].
Overall, GPT-4.5's approach to handling conflicting messages combines advanced training methods with a structured hierarchy of instructions to ensure that the model prioritizes safety and adherence to system guidelines.
Citations:
[1] https://cdn.openai.com/gpt-4-5-system-card.pdf
[2] https://arxiv.org/html/2502.08745v1
[3] https://www.vellum.ai/blog/gpt-4-5-is-here-heres-how-good-this-model-is
[4] https://community.openai.com/t/how-to-improve-gpt-4-api-output-length-and-structure/1025132
[5] https://venturebeat.com/ai/openai-releases-gpt-4-5/
[6] https://community.openai.com/t/how-to-deal-with-lazy-gpt-4/689286
[7] https://openai.com/index/introducing-gpt-4-5/
[8] https://www.reddit.com/r/OpenAI/comments/18monbs/gpt_4_has_been_toned_down_significantly_and/