How does GPT-4.5's performance compare to earlier versions in handling conflicting instructions

GPT-4.5 demonstrates improved performance in handling conflicting instructions compared to earlier versions, particularly through its enhanced Instruction Hierarchy. This feature allows the model to prioritize system messages over user inputs, mitigating risks associated with prompt injections and other attacks that might override safety instructions.

In evaluations involving conflicting message types, GPT-4.5 generally outperforms GPT-4o. The model is trained to follow the instructions in the highest priority message, which helps in scenarios where system and user messages conflict. For instance, in a scenario where the model is instructed not to give away the answer to a math question, GPT-4.5 shows better adherence to these system instructions compared to GPT-4o, although it does not surpass GPT-4o1 in all evaluations[1].

Additionally, GPT-4.5 has been evaluated in scenarios where it must protect specific phrases or passwords from being revealed through user prompts. In these evaluations, GPT-4.5 performs well, indicating its ability to maintain security and follow system instructions even when faced with conflicting user inputs[1].

However, while GPT-4.5 improves upon earlier models in handling conflicting instructions, it still faces challenges in certain complex scenarios. The model's performance in red teaming evaluations, which simulate adversarial prompting, shows that it can produce unsafe outputs in some cases, though it generally performs better than GPT-4o in these challenging tests[1].

Overall, GPT-4.5's enhancements in handling conflicting instructions make it a more reliable choice for applications requiring strict adherence to safety guidelines and system instructions.

Citations:
[1] https://cdn.openai.com/gpt-4-5-system-card.pdf
[2] https://gettalkative.com/info/gpt-models-compared
[3] https://www.vellum.ai/blog/gpt-4-5-is-here-heres-how-good-this-model-is
[4] https://www.techtarget.com/searchenterpriseai/tip/GPT-35-vs-GPT-4-Biggest-differences-to-consider
[5] https://www.reddit.com/r/singularity/comments/1izn175/openai_gpt45_system_card/
[6] https://www.reddit.com/r/OpenAI/comments/18monbs/gpt_4_has_been_toned_down_significantly_and/
[7] https://www.theverge.com/news/620021/openai-gpt-4-5-orion-ai-model-release
[8] https://www.axios.com/2025/02/27/chatgpt-45-model-openai-reasoning