BridgeBench Shows Top AI Models at 10% Accuracy Despite Strong Reasoning

تكنولوجيا

Morocco World News

2026/04/17 - 12:59 509 مشاهدة

تحليل ذكي | AI Editorial Analysis

جاري تحليل المقال...

Casablanca – BridgeBench, a new benchmarking project focused on AI reasoning, has released a ranking that exposes a gap between how confidently models explain answers and how often those answers are correct.

The benchmark tests models on reasoning-heavy tasks and scores them across three metrics. Accuracy measures whether the final answer is correct. Evidence evaluates how well the model supports its reasoning with verifiable steps or sources. The overall score combines both, aiming to reward systems that not only answer, but also justify.

In the latest results, xAI’s Grok 4.20 Reasoning model ranks first with a score of 41.8. It records 10.0% accuracy and 89.7% on evidence. OpenAI’s GPT-5.4 follows closely with a score of 40.6, matching the same 10.0% accuracy and slightly stronger evidence at 90.6%.

Anthropic’s Claude Opus 4.7 comes third at 40.3, but with lower accuracy at 6.7%, offset by the highest evidence score among the top models at 91.3%.

Read also: Google Launches AI-powered Desktop Search App for Windows

In fourth place is Grok 4.20, the non-reasoning version, scoring 40.0 with 6.7% accuracy and 89.9% evidence. Claude Opus 4.6 rounds out the top five with a score of 39.6, posting 10.0% accuracy and 86.1% evidence.

Further down, Google’s Gemini 3.1 Pro ranks 15th with a score of 34.3. Its accuracy drops sharply to 3.3%, despite an evidence score of 89.1%.

What makes the ranking striking is not who leads, but how low the accuracy remains across all models. Even the top systems only answer correctly about one in ten times.

At the same time, their evidence scores are consistently high, raising questions about what exactly is being measured. If models can produce convincing chains of reasoning while still being wrong most of the time, the benchmark may be capturing fluency more than reliability.

Morocco World News is also on X — check out our latest posts now! Get MWN on iOS and Android for instant access to breaking news.

The post BridgeBench Shows Top AI Models at 10% Accuracy Despite Strong Reasoning appeared first on Morocco World News.

المصدر: Morocco World News | Source: Morocco World News

ملاحظة تحريرية | Editorial Note: نُشر هذا المقال في الأصل بواسطة Morocco World News. خبر (Khabr) هي منصة إعلامية أردنية مرخّصة تعمل بالذكاء الاصطناعي. نضيف قيمة تحريرية من خلال: تحليل ذكي للأخبار، ملخصات تلقائية، رواية صوتية بالذكاء الاصطناعي، ترجمة متعددة اللغات، وتدقيق الحقائق. هدفنا جعل الأخبار أكثر وضوحاً وسهولةً للقارئ العربي.

This article was originally published by Morocco World News. Khabr is a licensed Jordanian AI-powered news platform (Registration #82086). We add editorial value through: AI-powered news analysis, automated summaries, AI audio narration, multi-language translation (Arabic, English, French, Turkish), and AI fact-checking. Our mission is to make news more accessible and understandable for Arabic-speaking audiences worldwide.

قراءة المقال الأصلي

المزيد عن تكنولوجيا | More on Technology

هذا الخبر ضمن تغطية خبر لقسم تكنولوجيا. نقدّم لك تحليلات ذكية وملخصات يومية لأهم الأخبار من مصادر موثوقة متعددة. المصدر: Morocco World News. يوجد 6 مقالات مرتبطة بهذا الموضوع.

This article is part of Khabr's coverage of Technology. We provide AI-powered analysis, summaries, and multi-source aggregation to keep you informed. Source: Morocco World News. Tags: AI, accuracy, reasoning.

BridgeBench Shows Top AI Models at 10% Accuracy Despite Strong Reasoning

المزيد عن تكنولوجيا | More on Technology

مقالات ذات صلة

Fewer U.S. Navy Sailors To Be Involved In Next Supercarrier Refueling

ميزة جديدة في أندرويد تثير جدلًا حول الخصوصية وفحص محتوى الصور على الهواتف

ثغرات اليوم الصفر.. لماذا يعد تجاهل التحديثات البرمجية خطرا يهدد حياتك الرقمية؟

مفاجأة لعشاق آبل.. انخفاض غير مسبوق في سعر AirPods Pro 3

أفضل 5 سيارات فاخرة تحت 100,000 دولار تبدو وكأنها مليون دولار!

10 Luxury Cars Under $100,000 That Look Like a Million Bucks