🕐 --:--
-- --
عاجل
⚡ عاجل: كريستيانو رونالدو يُتوّج كأفضل لاعب كرة قدم في العالم ⚡ أخبار عاجلة تتابعونها لحظة بلحظة على خبر ⚡ تابعوا آخر المستجدات والأحداث من حول العالم
⌘K
AI مباشر
205540 مقال 125 مصدر نشط 79 قناة مباشرة 2058 خبر اليوم
آخر تحديث: منذ 0 ثانية

Tether Brings AI Memory Compression To Consumer Devices

تكنولوجيا
Forbes
2026/06/02 - 15:58 501 مشاهدة
InnovationEnterprise TechTether Brings AI Memory Compression To Consumer DevicesByThomas Coughlin,Contributor.Forbes contributors publish independent expert analyses and insights. Covering Digital Storage Technology & Market. IEEE President in 2024Follow AuthorJun 02, 2026, 11:58am EDTdata compressiongettyI have written in March about Google’s TurboQuant for compressing data in memory for AI applications, focusing on data center applications. In that article, I said that TurboQuant is a compression algorithm to address the challenge of memory overhead in key-value storage for AI models with zero accuracy loss. I also said that by enabling AI with lower memory and storage requirements, we make that memory and storage even more useful and this will likely increase AI workflows, particularly on-premise. This could increase the memory and storage demand for implementing local AI inference. With today’s costs for digital memory and storage, this technology could enable useful AI implementations at much lower costs.Recently a company called Tether introduced a version of TurboQuant that can be used on consumer devices like laptops and phones to process documents and extending AI conversations locally by using local memory and storage rather than public cloud-based resources. Tether Turboquant is an open-source AI memory compression algorithm that reduces the key-value (KV) cache of large language models (LLMs) by 3-6 times, depending upon the workload. The figure below, from Tether shows an 5 times reduction in required memory using TurboQuant. Data resource requirements with and without TurboQuantTetherTurboQuant compresses the KV cache using during inference sessions but doesn’t change the trained LLM model weights. This is important as a model is accessed by a user. The KV cache keeps past keys and values in memory and this increases over time as a user interacts with the model. The KV cache contents grow with every token and every active session. This can become a...
مشاركة:

مقالات ذات صلة

AI
يا هلا! اسألني أي شي 🎤
FREE Free 1GB Internet + Free International Calls

$1 trial — eSIM in 190+ countries — No roaming charges

Download Free