Solidigm, Intel, and MinIO put high-density S3 object storage under real load: eight nodes, roughly 24 petabytes of raw capacity, 122TB QLC NVMe, and 249.5 GiB/s of GET throughput with zero errors.
Every week brings another compute announcement. Bigger GPUs, bigger clusters, bigger model counts. What gets far less attention is the tier that decides whether any of it pays off: storage. As models grow into the trillions of parameters and agentic systems multiply inference traffic, the demand on the storage layer stops being just about capacity. It becomes about throughput, and about delivering that throughput without burning the power and floor space needed for your GPUs.
For years, the reflexive answer to "I need more storage performance" has been "add more nodes." That works right up until it collides with the physical limits of a data center: power capacity, cooling, and rack space. And underneath all this sits an old assumption that the denser you pack a storage node, the more you give up in performance.
We wanted to know whether that assumption still holds in the face of modern QLC NVMe and modern software built to exploit it. So Solidigm and MinIO built a deliberately extreme cluster and ran it flat out.
The problem nobody wants to size
When you’re forced to deploy hundreds of nodes to hold a dataset, you haven’t solved a capacity problem. You’ve traded a capacity problem for a sprawl problem. Every additional node is more power drawn, more heat generated, and more floor space consumed. All of that means less power, less heat capacity, and less space available for the GPUs that do the work. So the real question isn’t whether object storage can be fast. It’s whether it can be fast and dense at the same time, so the storage tier stays out of the way of the GPUs.
What we built
The testing ran in the Solidigm AI Central Lab in Rancho Cordova and was run on what Solidigm describes as the densest storage test cluster it has built: eight storage nodes, each with24 Solidigm D5-P5336 122TB QLC NVMe drives for, roughly 3 PB of raw capacity in a single chassis. That’s about 24 PB raw across 192 drives, with MinIO AIStor® running on top.
We ran the MinIO Warp benchmark and scaled client concurrency from a single client up to eight against the fixed eight-node cluster, using 256 MiB objects and measuring GET and PUT throughput separately.
One detail matters for what comes later: this node is not a one-off test rig. It is the building block of MinIO's ExaPOD reference architecture.
The results
Peak GET of 249.5 GiB/s. Peak PUT of 128.5 GiB/s. Zero errors across every test.
Reads scaled linearly. GET throughput climbed from 36 GiB/s on a single client straight through seven clients to 249.5 GiB/s, with per-client throughput holding steady at roughly 35.5 GiB/s the entire way. In plain terms, every client we added was saturating its own 400G NIC, and the cluster itself never became the read bottleneck. At peak, that is more than 10 GiB/s of read throughput for every petabyte of raw capacity sitting in the rack. Density did not cost us reads.

The honest read on writes
Here is the part most benchmark write-ups leave out: GET scales cleanly, but PUT does not. From one to three clients we saw about 15.5 GiB/s each, then a jump to 107.5 GiB/s at four clients, and a write ceiling around 128 GiB/s that held flat from six clients through eight.
That ceiling is a function of the single 400G NIC carrying client write traffic in this configuration. It is not a limit of the cluster or the drives. It is exactly the kind of detail you want in hand before you size a deployment, which is why it is in the brief rather than buried under the headline number. The joint optimization work behind these results gets its own write-up to follow.
From benchmark to blueprint
This is why a single lab result is worth your attention. The exact node we tested is the repeatable building block in MinIO's ExaPOD reference architecture. ExaPOD takes the same hardware and the same AIStor software and scales it to one exabyte of usable capacity across 640 servers in 32 racks, with up to 19.2 TB/s of aggregate throughput.
The point of a reference architecture is that you should not have to choose between starting small and scaling big. This benchmark is the single-pod proof that the density and the performance hold at the entry point. The path from here to exascale is the same node, just more racks.
The storage math
For anyone planning AI infrastructure, the takeaway reduces to a few lines:
- Fewer nodes for the same capacity means less power, cooling, and floor space, which is the constraint that actually bites when data center capacity is the thing you cannot add more of.
- S3-native means your training frameworks, inference pipelines, and lakehouse analytics all speak the same protocol against the same store.
- Linear scaling means you buy for today and expand predictably tomorrow, on the same operational model.
Compute without data is a GPU waiting on a read, and the storage tier decides how often that wait happens. This benchmark says you can build that tier dense, at 122TB per drive and roughly 3 PB per node, without giving up the throughput your GPUs depend on.
The full technical brief has the complete methodology, the per-client scaling data, the system specifications, and the detailed observations. [Read the full technical brief here.]




