MinIO Announces MemKV, Purpose-Built Context Memory Store for AI Inference

Designed for NVIDIA STX architecture, MemKV delivers microsecond context retrieval at petascale, dramatically reducing time to first token

REDWOOD CITY, Calif., May 12, 2026 — MinIO, the data foundation for enterprise AI and analytics, today announced MemKV, a context memory store that delivers microsecond context retrieval at petabyte scale for agentic AI inference workloads. MemKV joins AIStor as the second pillar of MinIO's product portfolio, extending the company's data foundation, which began with AIStor, into the memory tier where inference runs. The new MinIO emKV product delivers persistent, shared context across GPU clusters at a scale that existing memory and storage tiers cannot.

As AI moves from answering simple questions to performing complex, multi-step tasks, the underlying systems must remember what they have already done. That memory is called context, and today it is routinely lost because the infrastructure closest to the GPU cannot hold enough of it. When context is lost, the GPU repeats work it has already completed. The result is a recompute tax: more time, more compute, more energy, and a higher cost for work the system has already completed.

Eliminate Context Loss. Maximize Token Throughput.

MemKV dramatically reduces the recompute tax for AI inference workloads. On representative benchmarks, MemKV delivered a substantial improvement in time-to-first-token at production concurrency. Furthermore, for a typical enterprise deployment with 128 GPUs and a 128K-token context length, MemKV increased GPU utilization from ~50% to over 90%, resulting in $2 million in annual compute savings.

"The industry has been papering over context loss for years because at small scale you may be able to absorb the recompute tax and move on. At the GPU density hyperscalers and neoclouds are building toward, that is no longer true. A GPU recomputing context it has already generated is burning power without return, and at a thousand GPUs that is not inefficiency, it is structural drag,” said AB Periasamy, co-founder and CEO, MinIO. “Yield economics at this scale demand something purpose-built for the inference data path. MemKV was designed for exactly this.”

Breaking the Speed-Scale Tradeoff Holding Agentic AI Back

Until now, AI infrastructure has forced a choice: high-speed memory tiers like GPU HBM and DRAM that deliver microsecond access but quickly hit capacity limits, or general-purpose storage systems that scale but introduce millisecond-level latency. Neither supports the long-context reasoning that agentic AI demands.

MemKV breaks that tradeoff. Designed to run on NVIDIA BlueField-4 STX architecture and with native support for NVIDIA Dynamo and NVIDIA NIXL. MemKV gives enterprises, cloud providers, and AI platforms a shared memory tier that combines microsecond responsiveness with petabyte-scale capacity. For the first time, an entire GPU cluster can access a common pool of context at speeds that keep pace with inference, rather than waiting on storage.

Purpose-Built for Inference at Scale

Designed exclusively for AI inference and built from the ground up for the G3.5 layer of the GPU memory hierarchy, MemKV delivers petabytes of shared context memory at SSD economics, replacing the cost and capacity constraints of GPU HBM and DRAM with a tier that scales independently of the compute cluster.

Unlike approaches that retrofit file-storage architectures into the inference data path. Data moves directly from NVMe to the AI data path via end-to-end RDMA transport, with no HTTP overhead, no file system translation, and no storage servers between the GPU and its context.

“The AI conversation has moved from raw model performance to token economics and the cost of operating AI at scale,” said Don Gentile, Analyst at HyperFRAME Research. “That is driving new focus on how systems retain and share context during inference. MinIO’s MemKV addresses a costly inefficiency: rerunning prior calculations when context cannot be shared across GPUs. Eliminating that friction improves utilization and lowers the cost of enterprise AI.”

The architecture incorporates how GPUs actually consume data at inference time:

Native support for NVIDIA BlueField-4 STX: Runs directly within NVIDIA STX infrastructure as a single ARM64-native binary, embedded in the storage tier rather than deployed on separate x86 storage servers connected over the network.
End-to-end RDMA transport: KV cache moves from GPU memory to NVMe over RDMA, bypassing file-system or object-storage protocols entirely.
GPU-native block sizes: Operates in 2-16 MB blocks optimized for throughput-oriented GPU access patterns, not the 4 KB blocks designed for legacy storage workloads.
Wire-speed fabric performance: Built for NVIDIA Spectrum-X Ethernet networking and PCIe Gen6, driving throughput to near wire speed across the physical fabric.

Availability:

MinIO MemKV is available today. Click here for access.

Resources:

About MinIO:
‍MinIO is the data foundation for enterprise AI and analytics. Built for exascale performance and limitless scale, AIStor and MemKV cover every layer of the AI data stack from Objects to Tables to inference context, spanning the edge, core, and cloud. With widespread adoption across the Fortune 100 and 500, MinIO is redefining how organizations and government agencies store, manage, and mobilize their data in the AI era. MinIO is backed by Jerry Yang's AME Cloud Ventures, Dell Technologies, General Catalyst, Index Ventures, Intel Capital, Softbank Vision Fund 2, and others.

Security & Compliance

Protocols

Data Store

Data Engine

Operations & Management

MinIO Announces MemKV, Purpose-Built Context Memory Store for AI Inference

Press

More News

Get started using