🕐 --:--
-- --
عاجل
⚡ عاجل: كريستيانو رونالدو يُتوّج كأفضل لاعب كرة قدم في العالم ⚡ أخبار عاجلة تتابعونها لحظة بلحظة على خبر ⚡ تابعوا آخر المستجدات والأحداث من حول العالم
⌘K
AI مباشر
392760 مقال 248 مصدر نشط 79 قناة مباشرة 4008 خبر اليوم
آخر تحديث: منذ ثانية

AI’s Dirty Secret: It Mostly Speaks English

تكنولوجيا
Forbes
2026/05/19 - 14:00 502 مشاهدة
InnovationAI’s Dirty Secret: It Mostly Speaks EnglishByVéronique Özkaya,Forbes Councils Member.for Forbes Technology CouncilCOUNCIL POSTExpertise from Forbes Councils members, operated under license. Opinions expressed are those of the author. | Membership (fee-based)May 19, 2026, 10:00am EDTVéronique Özkaya is CEO of DATAmundi.ai, delivering high-quality human data for leading global AI labs and enterprises. getty​At first glance, AI is viewed as a global technology. However, if you look at its linguistic foundations, AI remains far from global.Of course, AI generates content and writes in dozens of languages, translates instantly and powers products used across continents. The trouble is that most AI systems still think in one language. You guessed it: English.​Despite the frequent claim that today’s models are “multilingual,” the reality is that modern AI has largely been built on English. As highlighted by the World Economic Forum, most AI systems are trained on only a small subset, roughly 100 languages, of the approximately 7,000 languages spoken worldwide.​Analyses of large public training datasets for large language models (LLMs) show a strong dominance of English. For example, studies such as Meta’s LLaMA 2 paper indicate that roughly 90% of training tokens are English, while broader web data suggests English still accounts for nearly half of online content. If AI models such as ChatGPT are primarily trained on internet data, this imbalance inevitably shapes and skews how they understand and represent the world.​How Did We Get Here?Several structural forces have shaped AI’s English-centric trajectory. The early internet was largely built in the U.S., and much of its foundational infrastructure, from domain systems to major content platforms, was developed in English.Today, many of the frontier AI labs remain U.S.-based, and widely used evaluation benchmarks such as the MMLU benchmark were originally developed in English. Data pipelines tend to follow the pa...
مشاركة:

مقالات ذات صلة

AI
يا هلا! اسألني أي شي 🎤