GPT-4.5 has been assessed for its effectiveness in mitigating risks related to Chemical, Biological, Radiological, and Nuclear (CBRN) threats and persuasion. Here's a detailed overview of its capabilities and mitigations:
CBRN Risks
GPT-4.5 is classified as a medium risk for CBRN threats. This classification is based on evaluations that assess the model's ability to assist in the operational planning of reproducing known biological threats. However, this risk is considered limited because it primarily aids experts who already possess significant domain expertise[1].
To mitigate CBRN risks, GPT-4.5 employs several strategies:
- Pre-training Mitigations: The model filters out data related to CBRN proliferation that has limited or no legitimate use. This helps reduce the model's exposure to potentially dangerous information[1].
- Model Robustness: GPT-4.5 is designed to withstand malicious and adversarial users by improving its ability to resist manipulation related to CBRN threats[1].
- Monitoring and Detection: Dedicated efforts are made to monitor and detect activities related to CBRN tasks, ensuring that any misuse is quickly identified and addressed[1].
Persuasion Risks
GPT-4.5 also carries a medium risk designation for persuasion. This is due to its state-of-the-art performance in generating persuasive content, which could be used to manipulate beliefs or actions[2].
To address persuasion risks, GPT-4.5 incorporates the following mitigations:
- Safety Training: The model undergoes specific training to handle political persuasion tasks responsibly, aiming to prevent misuse for influencing or manipulating public opinion[1].
- Monitoring Influence Operations: There is ongoing monitoring and investigation of suspected abuses related to influence operations, extremism, and improper political activities. This helps identify and mitigate potential persuasion risks[1].
- Reconsidering Persuasion Assessments: OpenAI is reevaluating its approach to assessing real-world persuasion risks, focusing on factors like content personalization, distribution, and presentation over time[2].
Overall, while GPT-4.5 presents medium risks for both CBRN and persuasion, it employs a robust set of mitigations to minimize these risks. These include advanced filtering, safety training, and continuous monitoring to ensure the model is used responsibly and safely.
Citations:
[1] https://cdn.openai.com/gpt-4-5-system-card.pdf
[2] https://assets.ctfassets.net/kftzwdyauwt9/7EaDv6OaWHhXLAehUYu7Db/64e9f7916d3581ba4b5d0f0a6c5098d1/GPT-4-5_System_Card_2272025.pdf
[3] https://centerforhealthsecurity.org/sites/default/files/2024-06/2024-06-02-jhchs-nist-ai-6001-rfc.pdf
[4] https://model-spec.openai.com
[5] https://www.reddit.com/r/ChatGPT/comments/1iznoek/gpt45_system_card_mmlu_896/
[6] https://pmc.ncbi.nlm.nih.gov/articles/PMC10795998/
[7] https://openai.com/index/gpt-4-5-system-card/
[8] https://patriciagestoso.com/2024/05/21/openai-chatgpt-4o-the-good-the-bad-and-the-irresponsible/