How AI Is Built | All Episodes

#023 The Power of Rerankers in Modern Search

S2E6 · September 26, 2024 · 42:29

Today, we're talking to Aamir Shakir, the founder and baker at mixedbread.ai, where he's building some of the best embedding and re-ranking models out there. We go into the world of ...

#022 The Limits of Embeddings, Out-of-Domain Data, Long Context, Finetuning (and How We're Fixing It)

S2E5 · September 19, 2024 · 46:06

Text embeddings have limitations when it comes to handling long documents and out-of-domain data.Today, we are talking to Nils Reimers. He is one of the researchers who kickstarted t...

#021 The Problems You Will Encounter With RAG At Scale And How To Prevent (or fix) Them

S2E4 · September 12, 2024 · 50:09

Hey! Welcome back.Today we look at how we can get our RAG system ready for scale.We discuss common problems and their solutions, when you introduce more users and more requests to yo...

#020 The Evolution of Search, Finding Search Signals, GenAI Augmented Retrieval

S2E3 · September 5, 2024 · 52:16

In this episode of How AI is Built, Nicolay Gerold interviews Doug Turnbull, a search engineer at Reddit and author on “Relevant Search”. They discuss how methods and technologies, i...

#019 Data-driven Search Optimization, Analysing Relevance

S2E2 · August 30, 2024 · 51:14

In this episode, we talk data-driven search optimizations with Charlie Hull.Charlie is a search expert from Open Source Connections. He has built Flax, one of the leading open source...

#018 Query Understanding: Doing The Work Before The Query Hits The Database

S2E1 · August 15, 2024 · 53:02

Welcome back to How AI Is Built. We have got a very special episode to kick off season two. Daniel Tunkelang is a search consultant currently working with Algolia. He is a leader in ...

Season 2 Trailer: Mastering Search

S2Trailer · August 8, 2024 · 04:16

Today we are launching the season 2 of How AI Is Built.The last few weeks, we spoke to a lot of regular listeners and past guests and collected feedback. Analyzed our episode data. A...

#017 Unlocking Value from Unstructured Data, Real-World Applications of Generative AI

S1E17 · July 16, 2024 · 36:28

In this episode of "How AI is Built," host Nicolay Gerold interviews Jonathan Yarkoni, founder of Reach Latent. Jonathan shares his expertise in extracting value from unstructured da...

#016 Data Processing for AI, Integrating AI into Data Pipelines, Spark

S1E16 · July 12, 2024 · 46:26

This episode of "How AI Is Built" is all about data processing for AI. Abhishek Choudhary and Nicolay discuss Spark and alternatives to process data so it is AI-ready.Spark is a dist...

#015 Building AI Agents for the Enterprise, Agent Cost Controls, Seamless UX

S1E15 · July 4, 2024 · 35:12

In this episode, Nicolay talks with Rahul Parundekar, founder of AI Hero, about the current state and future of AI agents. Drawing from over a decade of experience working on agent t...

#014 Building Predictable Agents through Prompting, Compression, and Memory Strategies

S1E14 · June 27, 2024 · 32:14

In this conversation, Nicolay and Richmond Alake discuss various topics related to building AI agents and using MongoDB in the AI space. They cover the use of agents and multi-agents...

Data Integration and Ingestion for AI & LLMs, Architecting Data Flows | changelog 3

S1E14 · June 25, 2024 · 14:53

In this episode, Kirk Marple, CEO and founder of Graphlit, shares his expertise on building efficient data integrations. Kirk breaks down his approach using relatable concepts: The...

#013 ETL for LLMs, Integrating and Normalizing Unstructured Data

S1E13 · June 19, 2024 · 36:48

In our latest episode, we sit down with Derek Tu, Founder and CEO of Carbon, a cutting-edge ETL tool designed specifically for large language models (LLMs). Carbon is streamlining A...

#012 Serverless Data Orchestration, AI in the Data Stack, AI Pipelines

S1E12 · June 14, 2024 · 28:06

In this episode, Nicolay sits down with Hugo Lu, founder and CEO of Orchestra, a modern data orchestration platform. As data pipelines and analytics workflows become increasingly com...

#011 Mastering Vector Databases, Product & Binary Quantization, Multi-Vector Search

S1E11 · June 7, 2024 · 40:06

Ever wondered how AI systems handle images and videos, or how they make lightning-fast recommendations? Tune in as Nicolay chats with Zain Hassan, an expert in vector databases from ...

#010 Building Robust AI and Data Systems, Data Architecture, Data Quality, Data Storage

S1E10 · May 31, 2024 · 45:33

In this episode of "How AI is Built", data architect Anjan Banerjee provides an in-depth look at the world of data architecture and building complex AI and data systems. Anjan breaks...

#009 Modern Data Infrastructure for Analytics and AI, Lakehouses, Open Source Data Stack

S1E9 · May 24, 2024 · 27:53

Jorrit Sandbrink, a data engineer specializing on open table formats, discusses the advantages of decoupling storage and compute, the importance of choosing the right table format, a...

#008 Knowledge Graphs for Better RAG, Virtual Entities, Hybrid Data Models

S1E8 · May 20, 2024 · 36:40

Kirk Marple, CEO and founder of Graphlit, discusses the evolution of his company from a data cataloging tool to an platform designed for ETL (Extract, Transform, Load) and knowledge ...

#007 Navigating the Modern Data Stack, Choosing the Right OSS Tools, From Problem to Requirements to Architecture

S1E7 · May 17, 2024 · 38:12

From Problem to Requirements to Architecture. In this episode, Nicolay Gerold and Jon Erich Kemi Warghed discuss the landscape of data engineering, sharing insights on selecting the...

#006 Data Orchestration Tools, Choosing the right one for your needs

S1E6 · May 10, 2024 · 32:37

In this episode, Nicolay Gerold interviews John Wessel, the founder of Agreeable Data, about data orchestration. They discuss the evolution of data orchestration tools, the popularit...

#005 Building Reliable LLM Applications, Production-Ready RAG, Data-Driven Evals

S1E4 · May 3, 2024 · 29:40

In this episode of "How AI is Built", we learn how to build and evaluate real-world language model applications with Shahul and Jithin, creators of Ragas. Ragas is a powerful open-so...

Lance v2: Rethinking Columnar Storage for Faster Lookups, Nulls, and Flexible Encodings | changelog 2

S1E5 · April 29, 2024 · 21:33

In this episode of Changelog, Weston Pace dives into the latest updates to LanceDB, an open-source vector database and file format. Lance's new V2 file format redefines the tradition...

#004 AI with Supabase, Postgres Configuration, Real-Time Processing, and more

S1E4 · April 26, 2024 · 31:57

Had a fantastic conversation with Christopher Williams, Solutions Architect at Supabase, about setting up Postgres the right way for AI. We dug deep into Supabase, exploring: Cor...

#003 AI Inside Your Database, Real-Time AI, Declarative ML/AI

S1E3 · April 19, 2024 · 36:04

If you've ever wanted a simpler way to integrate AI directly into your database, SuperDuperDB might be the answer. SuperDuperDB lets you easily apply AI processes to your data while ...

Supabase acquires OrioleDB, A New Database Engine for PostgreSQL | changelog 1

S1E3 · April 17, 2024 · 13:37

Supabase just acquired OrioleDB, a storage engine for PostgreSQL. Oriole gets creative with MVCC! It uses an UNDO log rather than keeping multiple versions of an entire data row (tu...

#002 AI Powered Data Transformation, Combining gen & trad AI, Semantic Validation

S1E2 · April 12, 2024 · 37:09

Today’s guest is Antonio Bustamante, a serial entrepreneur who previously built Kite and Silo and is now working to fix bad data. He is building bem, the data tool to transform any d...

#001 Multimodal AI, Storing 1 Billion Vectors, Building Data Infrastructure at LanceDB

S1E1 · April 5, 2024 · 34:04

Imagine a world where data bottlenecks, slow data loaders, or memory issues on the VM don't hold back machine learning. Machine learning and AI success depends on the speed you can ...