🕐 --:--
-- --
عاجل
⚡ عاجل: كريستيانو رونالدو يُتوّج كأفضل لاعب كرة قدم في العالم ⚡ أخبار عاجلة تتابعونها لحظة بلحظة على خبر ⚡ تابعوا آخر المستجدات والأحداث من حول العالم
⌘K
AI مباشر | -- مشاهد مباشر
852,260 مقال 404 مصدر نشط 224 قناة مباشرة 4,976 خبر اليوم
آخر تحديث: منذ ثانية

BridgeBench Shows Top AI Models at 10% Accuracy Despite Strong Reasoning

تكنولوجيا
Morocco World News
2026/04/17 - 12:59 509 مشاهدة
تحليل ذكي | AI Editorial Analysis
جاري تحليل المقال...

Casablanca – BridgeBench, a new benchmarking project focused on AI reasoning, has released a ranking that exposes a gap between how confidently models explain answers and how often those answers are correct.

The benchmark tests models on reasoning-heavy tasks and  scores them across three metrics. Accuracy measures whether the final answer is correct. Evidence evaluates how well the model supports its reasoning with verifiable steps or sources. The overall score combines both, aiming to reward systems that not only answer, but also justify.

In the latest results, xAI’s Grok 4.20 Reasoning model ranks first with a score of 41.8. It records 10.0% accuracy and 89.7% on evidence. OpenAI’s GPT-5.4 follows closely with a score of 40.6, matching the same 10.0% accuracy and slightly stronger evidence at 90.6%.

Anthropic’s Claude Opus 4.7 comes third at 40.3, but with lower accuracy at 6.7%, offset by the highest evidence score among the top models at 91.3%.

Read also: Google Launches AI-powered Desktop Search App for Windows

In fourth place is Grok 4.20, the non-reasoning version, scoring 40.0 with 6.7% accuracy and 89.9% evidence. Claude Opus 4.6 rounds out the top five with a score of 39.6, posting 10.0% accuracy and 86.1% evidence.

Further down, Google’s Gemini 3.1 Pro ranks 15th with a score of 34.3. Its accuracy drops sharply to 3.3%, despite an evidence score of 89.1%.

What makes the ranking striking is not who leads, but how low the accuracy remains across all models. Even the top systems only answer correctly about one in ten times.

At the same time, their evidence scores are consistently high, raising questions about what exactly is being measured. If models can produce convincing chains of reasoning while still being wrong most of the time, the benchmark may be capturing fluency more than reliability.

Morocco World News is also on X — check out our latest posts now! Get MWN on iOS and Android for instant access to breaking news.

The post BridgeBench Shows Top AI Models at 10% Accuracy Despite Strong Reasoning appeared first on Morocco World News.

المصدر: Morocco World News | Source: Morocco World News

ملاحظة تحريرية | Editorial Note: نُشر هذا المقال في الأصل بواسطة Morocco World News. خبر (Khabr) هي منصة إعلامية أردنية مرخّصة تعمل بالذكاء الاصطناعي. نضيف قيمة تحريرية من خلال: تحليل ذكي للأخبار، ملخصات تلقائية، رواية صوتية بالذكاء الاصطناعي، ترجمة متعددة اللغات، وتدقيق الحقائق. هدفنا جعل الأخبار أكثر وضوحاً وسهولةً للقارئ العربي.

This article was originally published by Morocco World News. Khabr is a licensed Jordanian AI-powered news platform (Registration #82086). We add editorial value through: AI-powered news analysis, automated summaries, AI audio narration, multi-language translation (Arabic, English, French, Turkish), and AI fact-checking. Our mission is to make news more accessible and understandable for Arabic-speaking audiences worldwide.

مشاركة:

المزيد عن تكنولوجيا | More on Technology

هذا الخبر ضمن تغطية خبر لقسم تكنولوجيا. نقدّم لك تحليلات ذكية وملخصات يومية لأهم الأخبار من مصادر موثوقة متعددة. المصدر: Morocco World News. يوجد 6 مقالات مرتبطة بهذا الموضوع.

This article is part of Khabr's coverage of Technology. We provide AI-powered analysis, summaries, and multi-source aggregation to keep you informed. Source: Morocco World News. Tags: AI, accuracy, reasoning.

مقالات ذات صلة

AI
يا هلا! اسألني أي شي 🎤
FREE Free 1GB Internet + Free International Calls

$1 trial — eSIM in 190+ countries — No roaming charges

Download Free
🔍