This webinar shows how to build a production-grade streaming pipeline using Kafka for guaranteed delivery and AIStor Tables as the Iceberg v3 destination. The demo ingests live global flight data from the Open Sky Network REST API through a Python Kafka producer and consumer, writing structured records into an AIStor Tables warehouse in batches of 1,000 at approximately 11,000 records per 90-second polling cycle. Query access is demonstrated via Trino SQL, and a Jupyter notebook visualizes the accumulated dataset across flight tracks, altitude distributions, and day/night activity cycles. The session also covers Iceberg catalog and warehouse provisioning in AIStor, PyIceberg schema safety for ingestion validation, and geospatial data type support in Iceberg v3.
AIStor Tables stores 9.2 million flight state records ingested over approximately 24-28 hours in just 30 GB using Parquet with Snappy compression, and the Iceberg catalog, metadata, and warehouse are all created and managed automatically with a single MC table warehouse create command.
Kafka decouples the Open Sky producer from the AIStor Tables consumer, meaning either side can go offline independently without losing data, a critical architectural requirement for any pipeline where source availability and destination maintenance windows do not align.
Any query engine that speaks Iceberg, including Trino, PySpark, or others, connects directly to AIStor Tables via the standard REST catalog, with no MinIO-specific libraries or configuration required beyond standard S3 credentials.
Data engineers building real-time ingestion pipelines into Iceberg lakehouses, and platform teams evaluating AIStor Tables as a unified destination for both structured streaming data and unstructured object storage.