Home Arrow Icon Knowledge base Arrow Icon Global Arrow Icon How does GPT-4.5 handle conflicting instructions in multi-step problems


How does GPT-4.5 handle conflicting instructions in multi-step problems


GPT-4.5 is designed to handle conflicting instructions in multi-step problems by adhering to an Instruction Hierarchy. This hierarchy helps the model prioritize system messages over user messages, mitigating the risk of prompt injections and other attacks that might override safety instructions[1][5].

Handling Conflicting Instructions

1. Instruction Hierarchy Evaluation: GPT-4.5 is trained to recognize and follow the instructions in the highest priority message when faced with conflicting messages. This includes scenarios where system messages and user messages conflict, and the model must choose which set of instructions to follow[1].

2. System vs. User Messages: The model is taught to prioritize system messages, which are designed to enforce safety and operational guidelines, over user messages. This ensures that GPT-4.5 adheres to its safety protocols even when users attempt to bypass them[1][5].

3. Realistic Scenarios: Evaluations include realistic scenarios where the model acts as a tutor and must resist attempts by users to trick it into revealing answers or solutions. GPT-4.5 is instructed not to give away answers, and it must follow these system instructions despite user attempts to circumvent them[1].

4. Phrase and Password Protection: In another type of evaluation, GPT-4.5 is instructed not to output specific phrases or reveal passwords. The model must resist user messages designed to trick it into doing so, demonstrating its ability to protect sensitive information[1].

Multi-Step Problem Handling

While GPT-4.5 is designed to handle conflicting instructions effectively, its ability to manage multi-step problems can be improved by structuring prompts in a clear, sequential manner. This involves breaking tasks into smaller steps and ensuring each step is completed before moving to the next[2]. However, GPT-4.5's performance in automatically following multi-step instructions without manual intervention (e.g., prompting "next") remains a challenge[4].

Improving Performance

To enhance GPT-4.5's performance in multi-step tasks, users can employ strategies such as:

- Clear Prompt Structure: Use headers or section markers to clearly define each step, helping the model recognize and follow the sequence more effectively[2].
- Sequential Execution: Implement a step-by-step approach where each step is fully processed before moving to the next, which can be facilitated by using function-like structures in prompts[2].
- Manual Intervention: If necessary, manually prompt the model to proceed to the next step, as it may not automatically move through a sequence without such cues[4].

Citations:
[1] https://cdn.openai.com/gpt-4-5-system-card.pdf
[2] https://www.aifire.co/p/detailed-steps-to-build-multi-step-custom-gpts-solve-tasks
[3] https://community.openai.com/t/how-to-deal-with-lazy-gpt-4/689286
[4] https://community.openai.com/t/gpt-getting-stuck-with-multi-step-instructions/613209
[5] https://www.reddit.com/r/singularity/comments/1izn175/openai_gpt45_system_card/
[6] https://community.openai.com/t/how-to-improve-gpt-4-api-output-length-and-structure/1025132
[7] https://www.reddit.com/r/singularity/comments/1iyrwvd/the_information_confirms_gpt45_this_week/
[8] https://www.reddit.com/r/OpenAI/comments/197exmm/gpt4_has_only_been_getting_worse/