This seminar introduces participants to the essential infrastructure that powers modern data and AI systems. Attendees will explore how real-world organizations manage, store, and process data at scale — starting with Docker containerization for reproducible environments, advancing to Data Lake architecture using the Bronze-Silver-Gold medallion pattern with MinIO, DuckDB, and Parquet, and finishing with production-grade PostgreSQL database engineering.
Through hands-on exercises using Google Colab, GitHub Codespaces, and free cloud services, participants will build a complete data pipeline without needing to install anything on their laptops. By the end of the session, attendees will understand how data flows from raw ingestion to analytics-ready formats — the same foundation that feeds machine learning and AI applications.