AIStor Table Sharing: Integrating Databricks with MinIO AIStor

About this Resource

Most enterprise data required for cloud analytics workloads remains on-premises, not out of inertia but by deliberate architectural and regulatory choice. Replicating petabyte-scale datasets to the cloud introduces storage costs, egress fees, synchronization pipelines, and version drift -- and by the time the data arrives, it may already be stale. AIStor Table Sharing solves this by embedding the Delta Sharing protocol directly into the AIStor binary, eliminating third-party sidecar services and enabling Databricks to query on-premises data securely without copying it. The implementation supports both Delta and Iceberg tables from the same endpoint, using AIStor's integrated Iceberg REST catalog to extend the Delta Sharing specification beyond its original Delta-only reference implementation. Token-based authentication (bearer token or OAuth2 via OIDC) controls access per share, and shares can be created, updated, and deleted without downtime or service interruption. The guide provides complete setup procedures for two deployment methods: a self-contained Docker Compose environment with both AIStor and PySpark, and a standalone PySpark container connecting to an existing AIStor cluster. Connection to Databricks on AWS, Azure, and GCP is covered, including networking requirements, TLS certificate requirements, and Unity Catalog installation steps. Production connections require HTTPS with CA-signed certificates; Azure Databricks additionally requires a Premium account to accept third-party shares. Once connected, Databricks users can run SQL joins across Delta and Iceberg tables residing in AIStor using standard serverless SQL against the Unity Catalog.

Key Takeaways:

The Delta Sharing protocol is built directly into the AIStor binary with full feature parity across bare-metal and container deployments, removing the need for any third-party sharing framework.

A single AIStor Table Share endpoint serves both Delta and Iceberg tables simultaneously, with table format being the only variable -- no programmatic changes required on the consumer side.

Databricks workloads on AWS, Azure (Premium tier), and GCP can query live on-premises AIStor data via secure Private Link or Direct Connect connections, with administrators able to update shares dynamically without interrupting active Databricks sessions.

Who this is for

Data engineers, platform architects, and Databricks administrators at organizations that need cloud analytics platforms to access on-premises data governed by sovereignty, compliance, or cost constraints.

Security & Compliance

Protocols

Data Store

Data Engine

Operations & Management

AIStor Table Sharing: Integrating Databricks with MinIO AIStor

Related Resources