AI’s Dirty Secret: It Mostly Speaks English
•InnovationAI’s Dirty Secret: It Mostly Speaks EnglishByVéronique Özkaya,Forbes Councils Member.for Forbes Technology CouncilCOUNCIL POSTExpertise from Forbes Councils members, operated under license.
•Opinions expressed are those of the author.
•| Membership (fee-based)May 19, 2026, 10:00am EDTVéronique Özkaya is CEO of DATAmundi.ai, delivering high-quality human data for leading global AI labs and enterprises.
هذا الخبر من Forbes. خبر يقدم أدوات ذكاء اصطناعي للتلخيص والترجمة والاستماع.
InnovationAI’s Dirty Secret: It Mostly Speaks EnglishByVéronique Özkaya,Forbes Councils Member.for Forbes Technology CouncilCOUNCIL POSTExpertise from Forbes Councils members, operated under license. Opinions expressed are those of the author. | Membership (fee-based)May 19, 2026, 10:00am EDTVéronique Özkaya is CEO of DATAmundi.ai, delivering high-quality human data for leading global AI labs and enterprises. gettyAt first glance, AI is viewed as a global technology. However, if you look at its linguistic foundations, AI remains far from global.Of course, AI generates content and writes in dozens of languages, translates instantly and powers products used across continents. The trouble is that most AI systems still think in one language. You guessed it: English.Despite the frequent claim that today’s models are “multilingual,” the reality is that modern AI has largely been built on English. As highlighted by the World Economic Forum, most AI systems are trained on only a small subset, roughly 100 languages, of the approximately 7,000 languages spoken worldwide.Analyses of large public training datasets for large language models (LLMs) show a strong dominance of English. For example, studies such as Meta’s LLaMA 2 paper indicate that roughly 90% of training tokens are English, while broader web data suggests English still accounts for nearly half of online content. If AI models such as ChatGPT are primarily trained on internet data, this imbalance inevitably shapes and skews how they understand and represent the world.How Did We Get Here?Several structural forces have shaped AI’s English-centric trajectory. The early internet was largely built in the U.S., and much of its foundational infrastructure, from domain systems to major content platforms, was developed in English.Today, many of the frontier AI labs remain U.S.-based, and widely used evaluation benchmarks such as the MMLU benchmark were originally developed in English. Data pipelines tend to follow the pa...المصدر: Forbes | Source: Forbes
ملاحظة تحريرية | Editorial Note: نُشر هذا المقال في الأصل بواسطة Forbes. خبر (Khabr) هي منصة إعلامية أردنية مرخّصة تعمل بالذكاء الاصطناعي. نضيف قيمة تحريرية من خلال: تحليل ذكي للأخبار، ملخصات تلقائية، رواية صوتية بالذكاء الاصطناعي، ترجمة متعددة اللغات، وتدقيق الحقائق. هدفنا جعل الأخبار أكثر وضوحاً وسهولةً للقارئ العربي.
This article was originally published by Forbes. Khabr is a licensed Jordanian AI-powered news platform (Registration #82086). We add editorial value through: AI-powered news analysis, automated summaries, AI audio narration, multi-language translation (Arabic, English, French, Turkish), and AI fact-checking. Our mission is to make news more accessible and understandable for Arabic-speaking audiences worldwide.




