· 27:53
Jorrit Sandbrink, a data engineer specializing on open table formats, discusses the advantages of decoupling storage and compute, the importance of choosing the right table format, and strategies for optimizing your data pipelines. This episode is full of practical advice for anyone looking to build a high-performance data analytics platform.
Key Takeaways:
Sound Bites
"The Lake house is sort of a modular setup where you decouple the storage and the compute." "A lake house is an architecture, an architecture for data analytics platforms." "The most popular table formats for a lake house are Delta, Iceberg, and Apache Hoodie."
Jorrit Sandbrink:
Nicolay Gerold:
Chapters
00:00 Introduction to the Lake House Architecture
03:59 Choosing Storage and Table Formats
06:19 Comparing Compute Engines
21:37 Simplifying Data Ingress
25:01 Building a Preferred Data Stack
lake house, data analytics, architecture, storage, table format, query execution engine, document store, DuckDB, Polars, orchestration, Airflow, Dexter, DLT, data ingress, data processing, data storage
Listen to How AI Is Built using one of many popular podcasting apps or directories.