News Hardware and Technologies New Google algorithm cuts memory usage sixfold. Is expensive hardware doomed?

New Google algorithm cuts memory usage sixfold. Is expensive hardware doomed?

March 26, 2026, 05:10 PM

Google Research has published a paper on TurboQuant, an algorithm that slashes the memory required for AI workloads by at least six times, all without compromising response accuracy and without the need for additional model training.

During text generation, models rely on the so-called KV cache—a memory buffer that stores previously computed attention mechanism data, allowing them to avoid recalculating it at every step. But the longer the context window, the more this cache balloons. At a certain point, it starts eating up tens of gigabytes of memory, and even powerful graphics cards with tons of VRAM are left powerless. Traditional quantization methods have long been used to compress the cache, but they come with a hidden drawback: along with the compressed data, you also have to store the so-called quantization constants—essentially a lookup table, similar to what ZIP or RAR archivers use.

The researchers tested TurboQuant on open-source models like Gemma and Mistral, using long-context benchmark suites such as LongBench, Needle In A Haystack, ZeroSCROLLS, RULER, and L-Eval. On simple tasks, the algorithm delivered flawless results, cutting the KV cache size by at least six times. In more complex scenarios—like question answering, code generation, and summarization—the margin wasn't as dramatic, but it still outperformed the existing KIVI compression algorithm. On NVIDIA H100 accelerators, the 4-bit version of TurboQuant demonstrated an eightfold increase in performance.

The market has already reacted to the announcement, with shares of major memory manufacturers taking a hit—reflecting a shift in investor expectations. If widespread adoption of TurboQuant lowers VRAM requirements, companies could either cut hardware costs or expand model context windows without needing to ramp up compute power.

The study's authors emphasize that their work isn't just an engineering fix—it's a way to curb memory consumption at a time when memory is becoming increasingly scarce.

Can an algorithm like this actually help put an end to the "memory crisis" in the market, or will the shortage remain a problem for everyday users no matter what software tricks are thrown at it? Share your thoughts in the comments.

News Hardware and Technologies Google artificial intelligence

About the author

Arkadiy Andrienko

Author of articles and news

As a technical journalist for VGTimes, I equally enjoy discussing the latest graphics cards and the insides of consoles and other gadgets. Since 2018, I have been writing about games and hardware; my experience in sound engineering has allowed me to understand the nuances of audio technologies well, and my love for electronics has driven me to study the insides of PCs, so I am always on the lookout for something new and interesting in the field of gaming equipment.

...Expand

Comments0