人类的克劳德3.5十四行诗：生成AI的新基准测试

人类的混合AI模型与其他AI模型相比如何

Anthropic的Claude 3.5十四行诗是一种新的强大的生成AI模型，在几个领域的竞争对手（例如Chatgpt-4O）都优于竞争对手[1]。在内部代理编码评估中，Claude 3.5十四行诗解决了64％的问题，而Claude 3 Opus求解了38％[1]。在研究生水平的推理上，它得分为59％，而Chatgpt-4O的53％[1]。在文本推理中，Claude 3.5十四行诗得分87％，表现优于Chatgpt-4O（83％），Google的Gemini（74％）和Meta的Llama（83％）[1]。但是，Chatgpt-4O在解决数学问题的问题中比Claude 3.5高5％[1]。

在MMLU，GPQA，GSM8K，数学，MGSM，HumaneVal，Drop，Big-Bench-Hard，Arc-Challenge和Hellaswag基准测试中，Anthropic的数据表明，它表现优于GPT-4 [2]。这些测试包括从事实和数学到推理和代码生成的广泛知识[2]。

Anthropic的Claude 3型号，尤其是Opus，通常超过OpenAI的GPT-4和Google的Gemini模型在各种任务上[3]。 Claude 3在编码任务中表现出色，在HumaneVal，超过GPT-4（67％）和Gemini 1.0 Pro（67.7％）等基准上得分为84.9％[3]。 Claude 3十四行诗在复杂的定量分析任务上也表现出色，其中GPT-4和Gemini有时会挣扎[3]。

拟人化已将文本超出文本扩展到视觉输入中，以使用Claude 3家族[7]。 Claude 3模型还允许用户通过其新的多模式支持功能[4]通过其新的多模式支持功能来分析数据，图表，图表和文档。

选择AI模型时，企业应考虑准确性，速度，隐私，易于部署或维护以及成本[4]。

引用：
[1] https://www.euronews.com/next/2024/06/20/anththropic-launches-its-latest-most-most-powerful-generative-generative-ai-model
[2] https://synthedia.substack.com/p/anthropic-says-it-just-dethroned
[3] https://www.voiceflow.com/articles/anthropic-ai
[4] https://www.pymnts.com/news/news/artcover-intelligence/2024/how-anthropics-new-claude-3-3-ai--3-ai-model-new-model-compets-up-against-up-against-competition/
[5] https://cloud.google.com/solutions/anthropic
[6] https://www.promptitude.io/post/navigating-the--ai-landscape-openai-vs-vs-anthropic-vs-voogle-google-ai-2024
[7] https://www.nextplatform.com/2024/03/05/anththropic-fires-fires-count-performance-and-price-salvos-salvos-in-ai-war/
[8] https://big-agi.com/blog/ai-api-comparison-2024-anththropic-vs-google-vs-openai