The New Economics of AI Factory Efficiency

Solidigm, MinIO and Intel Put Object Storage Scaling to the Test

The New Economics of AI Factory Efficiency

Solidigm, MinIO and Intel Put Object Storage Scaling to the Test

That’s why Solidigm, MinIO, and Intel partnered to design and benchmark S3-compatible object storage, with a focus on performance and concurrency scaling. This effort leverages Solidigm 122TB QLC NVMe drives, MinIO AIStor software, and Intel® Xeon® 6 processors, showcasing how a tightly integrated stack can deliver the throughput and efficiency required for modern AI workloads at scale.

AI infrastructure is at an inflection point as models grow from billions to trillions of parameters, as inference workload multiply through agentic systems, and as enterprises race to build their own AI factories. What is the one bottleneck that links all of these trends? Storage. Not just storage capacity, but the ability to serve massive datasets at the throughput modern GPU clusters demand without blowing up the data center power budget or floor space in the process.

The AI storage Problem Nobody Wants to Talk About

Inference pipelines, especially those serving retrieval-augmented generation (RAG) and agentic workflows, require low-latency random reads across billions of objects. The traditional answer has been to throw more servers at the problem. But that approach collides with hard physical constraints: power capacity, rack space, cooling, and operational complexity. When your storage tier requires hundreds of nodes to hold your AI model, you’ve introduced a sprawl problem that undermines the economics of your entire AI factory.

What the industry needs is density without compromise:

  • Massive capacity per node
  • High throughput per drive
  • Software that can extract every bit of performance from the hardware without adding operational overhead
Inside the Build

The cluster we tested is extremely dense. Each storage node delivers approximately 3PB of raw capacity in a single 2U chassis. Our test was set up to measure S3 object storage GET & PUT throughput as client concurrency increases against the full 8-node storage cluster (~24PB). The baseline was a single client targeting all 8 storage nodes (~24PB), and we scaled up to 8 clients to saturate the cluster.

  • The client servers featured 16× Solidigm™ D7-PS1010, 7.68TB TLC NVMe drives
  • The storage nodes featured 24× Solidigm™ D5-P5336, 122TB QLC NVMe drives
  • Client servers each used 1x400Gb NIC
  • Storage servers each used 2x400Gb NICs

The client nodes were further over-provisioned with 1TB of DRAM, to ensure the load generator was never the bottleneck.

The storage servers are the fundamental building block of MinIO’s ExaPOD reference architecture, which scales linearly from petabytes to exabytes (EB) using this same dense node design. Where ExaPOD defines the blueprint for exascale, this benchmark validates the performance reality at the single-pod level.

Reaching these numbers required deliberate co-tuning of the full stack, storage software, host OS and network stack, NIC offloads, CPU and IRQ topology, and switch fabric. Each solution brings its own tuning challenges for optimal performance, as evidenced by these performance-tuning results, which achieve 3x performance over hardware and software defaults. The types of adjustments made are listed at the end of this paper.

Performance Results

We ran the MinIO Warp benchmark to measure how S3 throughput scales with increasing client concurrency against a fixed 8-node storage cluster (~24PB). Each run lasted 15 minutes per phase (PUT and GET separately). Object size was set to 256 MiB with 32 concurrent connections per client.

Key Results

Data Table
Clients
Total Concurrency
PUT (GiB/s)
GET (GiB/s)
1
32
40
39
2
64
64
77
3
96
84
114
4
128
96
152
5
160
107
190
6
192
116
226
7
224
120
259
8
256
120
268

Key Observations

  • GET throughput scales through 8 clients with headroom remaining. GET scales smoothly from 38.6 GiB/s
    (1 client) to a peak median of 267.8 GiB/s at 8 clients, with the fastest 1-second bucket reaching 272.4 GiB/s. Per-client efficiency holds at ~37 to 38 GiB/s per client through 7 clients, then tapers only slightly at full saturation. Notably, the curve continues to rise through the 7 to 8 transition rather than flattening; the cluster
    is not yet read-bound at this configuration.
  • PUT scales smoothly and saturates near 120 GiB/s. PUT throughput climbs cleanly with concurrency 40, 64, 82, 96, 107, 116, 120, 120 GiB/s, with no inflection point or stall. The write ceiling is reached at 7 clients and holds flat at 8. Low concurrency write performance is particularly strong. A single client sustains 40 GiB/s of PUT throughput, demonstrating that the cluster does not require high parallelism to deliver useful write bandwidth.
  • 8 storage nodes deliver ~24PB in a single cluster. Each node contributes ~3PB of raw capacity using 24× Solidigm D5-P5336 122 TB QLC NVMe drives. At 267.8 GiB/s GET, the cluster delivers over 11 GiB/s per petabyte
    of raw capacity, demonstrating that density does not come at the expense of throughput.
  • Zero errors across all 16 tests. All 16 phases (8 PUT + 8 GET) completed cleanly across 240 minutes of continuous I/O with no object verification failures, critical for production confidence when deploying at this density. First-byte latency stayed at 3 to 5 ms median across the full sweep, with the worst 1-second bucket at 8-client GET still within ~11% of the median, indicating well-behaved tail latency under saturation.
From Benchmark to Blueprint

These results aren’t academic. They map directly to the MinIO ExaPOD reference architecture, which scales to 1EB usable across 640 servers in 32 racks, delivering up to 19.2 TB/s aggregate throughput. The current ~24PB, 8-node deployment provides a representative small-scale instantiation of the ExaPOD design, demonstrating the same architectural principles, density, and core performance characteristics in a more compact footprint.

For organizations planning their AI infrastructure, this benchmark answers a critical question. Can you achieve exascale economics at petascale entry points? The answer is yes. The same hardware, the same software, the same operational model

What This Means for AI Builders

The compute announcements across the industry are impressive. But compute without storage is a GPU waiting on data. As you plan your AI factory, consider the storage math:

  • Fewer nodes mean less power, cooling, and rack space; critical when data center capacity is constrained
  • S3-native means your training frameworks, inference pipelines, and lakehouse analytics all speak the same protocol
  • Linear scaling means you buy what you need today and expand predictably tomorrowQLC economics
  • QLC economics means the cost-per-terabyte curve finally bends in your favor at the capacity points that matter

Solidigm and MinIO are proving that the densest storage doesn’t have to be the slowest. It can be the foundation of your entire AI data platform.

More Information

The World’s Highest Capacity PCIe SSD

At the heart of this density story is the Solidigm™ D5-P5336,a hyper-scalable, cost-effective solution for AI and data-intensive workloads. With industry-leading capacity up to 122.88TB, the D5-P5336 is architected to efficiently accelerate and scale with the increasingly massive datasets found in widely-deployed, modern read-intensive workloads.

This testing was conducted in the Solidigm AI Central Lab, a purpose-built facility that brings together storage and AI capabilities to perform cutting-edge research and improve bottom-line results. The lab features high-performance GPUs, 800 Gbps Ethernet networking, and extensive Solidigm SSD infrastructure, all designed on reference architectures that mirror what hyperscalers and enterprises are deploying in data centers worldwide, making the findings broadly applicable to customer environments.

The AI Central Lab hosts what Solidigm believes to be the densest storage test cluster ever built: 192 Solidigm D5-P5336 SSDs packing 23.4PB into just 16U of rack space. The lab also has the capability to collect telemetry, creating a detailed picture of how system resources are used and where bottlenecks exist, allowing Solidigm and its collaborators to recommend optimizations to improve performance and power efficiency.

Additional Resources