Anthropic 's Claude 3.5 Sonnet : 생성 AI의 새로운 벤치 마크

Anthropic의 하이브리드 AI 모델은 성능 측면에서 다른 AI 모델과 어떻게 비교됩니까?

Anthropic의 Claude 3.5 Sonnet은 여러 영역에서 Chatgpt-4o와 같은 경쟁자를 능가하는 새롭고 강력한 생성 AI 모델입니다 [1]. 내부 에이전트 코딩 평가에서 Claude 3.5 Sonnet은 64%의 문제를 해결했으며 Claude 3 Opus는 38%를 해결했습니다 [1]. 대학원 수준의 추론에서, 그것은 ChatGpt-4O의 53%에 비해 59%를 기록했다 [1]. 텍스트에 대한 추론에서 Claude 3.5 Sonnet은 87%, Chatgpt-4O (83%), Google의 Gemini (74%) 및 Meta의 LLAMA (83%)를 능가했습니다 [1]. 그러나 ChatGpt-4o는 수학 문제 해결에서 Claude 3.5보다 5% 더 정확했습니다 [1].

MMLU, GPQA, GSM8K, 수학, MGSM, Humaneval, Drop, Big-Bench-Hard, Arc-Challenge 및 Hellaswag 벤치 마크에서 Anthropic의 데이터는 GPT-4보다 성능이 우수합니다 [2]. 이러한 테스트는 사실과 수학에서 추론 및 코드 생성에 이르기까지 광범위한 지식을 포함합니다 [2].

Anthropic의 Claude 3 모델, 특히 Opus는 일반적으로 다양한 작업에 대한 OpenAi의 GPT-4 및 Google의 Gemini 모델보다 우수합니다 [3]. 클로드 3은 코딩 작업에서 우수한 성능을 보여 주었고, Humaneval, 성과 성능 GPT-4 (67%) 및 Gemini 1.0 Pro (67.7%)와 같은 벤치 마크에서 84.9%를 기록했습니다 [3]. Claude 3 Sonnet은 또한 GPT-4와 Gemini가 때때로 어려움을 겪고있는 복잡한 정량 분석 작업에서도 뛰어났다 [3].

안트로 픽은 텍스트를 넘어 Claude 3 제품군과의 훈련 데이터를위한 시각적 입력으로 확장했다 [7]. Claude 3 모델을 통해 사용자는 새로운 멀티 모달 지원 기능을 통해 그림, 차트 및 문서를 포함한 데이터를 분석 할 수 있습니다 [4].

AI 모델을 선택할 때 비즈니스는 정확성, 속도, 개인 정보 보호, 배포 또는 유지 보수 용이성 및 비용을 고려해야합니다 [4].

인용 :
[1] https://www.euronews.com/next/2024/06/20/anthropic-launches-its-latest-powerful-generative-ai-model
[2] https://synthedia.substack.com/p/anthropic-seys-it-just-dethroned
[3] https://www.voiceflow.com/articles/anthropic-ai
[4] https://www.pymnts.com/news/artificial-intelligence/2024/how-anthropics-new-claude-3-ai-model-stacks-ugainst-the-competition/
[5] https://cloud.google.com/solutions/anthropic
[6] https://www.promptitude.io/post/navigating-the-ai-landscape-openai-vs-anthropic-vs-google-ai-in-2024
[7] https://www.nextplatform.com/2024/03/05/anthropic-fires-for-performance-and-price-salvos-in-ai-war/
[8] https://big-agi.com/blog/ai-api-comparison-2024-anthropic-vs-google-vs-openai