How AI Is Built | All Episodes

#050 Bringing LLMs to Production: Delete Frameworks, Avoid Finetuning, Ship Faster

S3E3 · May 27, 2025 · 01:06:58

Nicolay here,Most AI developers are drowning in frameworks and hype. This conversation is about cutting through the noise and actually getting something into production.Today I have ...

#050 TAKEAWAYS Bringing LLMs to Production: Delete Frameworks, Avoid Finetuning, Ship Faster

S3E3 · May 27, 2025 · 11:01

Nicolay here,Most AI developers are drowning in frameworks and hype. This conversation is about cutting through the noise and actually getting something into production.Today I have ...

#049 BAML: The Programming Language That Turns LLMs into Predictable Functions

S3E2 · May 20, 2025 · 01:02:39

Nicolay here,I think by now we are done with marveling at the latest benchmark scores of the models. It doesn’t tell us much anymore that the latest generation outscores the previous...

#049 TAKEAWAYS BAML: The Programming Language That Turns LLMs into Predictable Functions

S3E2 · May 20, 2025 · 01:12:35

Nicolay here,I think by now we are done with marveling at the latest benchmark scores of the models. It doesn’t tell us much anymore that the latest generation outscores the previous...

#048 TAKEAWAYS Why Your AI Agents Need Permission to Act, Not Just Read

S3E1 · May 13, 2025 · 07:07

Nicolay here,most AI conversations obsess over capabilities. This one focuses on constraints - the right ones that make AI actually useful rather than just impressive demos.Today I h...

#048 Why Your AI Agents Need Permission to Act, Not Just Read

S3E1 · May 11, 2025 · 57:03

Nicolay here,most AI conversations obsess over capabilities. This one focuses on constraints - the right ones that make AI actually useful rather than just impressive demos.Today I h...

#047 Architecting Information for Search, Humans, and Artificial Intelligence

S2E30 · March 27, 2025 · 57:22

Today on How AI Is Built, Nicolay Gerold sits down with Jorge Arango, an expert in information architecture. Jorge emphasizes that aligning systems with users' mental models is more ...

#046 Building a Search Database From First Principles

S2E29 · March 13, 2025 · 53:29

Modern search is broken. There are too many pieces that are glued together.Vector databases for semantic searchText engines for keywordsRerankers to fix the resultsLLMs to understand...

#045 RAG As Two Things - Prompt Engineering and Search

S2E28 · March 6, 2025 · 01:02:44

John Berryman moved from aerospace engineering to search, then to ML and LLMs. His path: Eventbrite search → GitHub code search → data science → GitHub Copilot. He was drawn to more ...

#044 Graphs Aren't Just For Specialists Anymore

S2E27 · February 28, 2025 · 01:03:35

Kuzu is an embedded graph database that implements Cypher as a library.It can be easily integrated into various environments—from scripts and Android apps to serverless platforms.Its...

#043 Knowledge Graphs Won't Fix Bad Data

S2E26 · February 20, 2025 · 01:10:59

Metadata is the foundation of any enterprise knowledge graph.By organizing both technical and business metadata, organizations create a “brain” that supports advanced applications li...

#042 Temporal RAG, Embracing Time for Smarter, Reliable Knowledge Graphs

S2E25 · February 13, 2025 · 01:33:44

Daniel Davis is an expert on knowledge graphs. He has a background in risk assessment and complex systems—from aerospace to cybersecurity. Now he is working on “Temporal RAG” in Trus...

#041 Context Engineering, How Knowledge Graphs Help LLMs Reason

S2E24 · February 6, 2025 · 01:33:35

Robert Caulk runs Emergent Methods, a research lab building news knowledge graphs. With a Ph.D. in computational mechanics, he spent 12 years creating open-source tools for machine l...

#040 Vector Database Quantization, Product, Binary, and Scalar

S2E23 · January 31, 2025 · 52:12

When you store vectors, each number takes up 32 bits.With 1000 numbers per vector and millions of vectors, costs explode.A simple chatbot can cost thousands per month just to store a...

#039 Local-First Search, How to Push Search To End-Devices

S2E22 · January 23, 2025 · 53:09

Alex Garcia is a developer focused on making vector search accessible and practical. As he puts it: "I'm a SQLite guy. I use SQLite for a lot of projects... I want an easier vector s...

#038 AI-Powered Search, Context Is King, But Your RAG System Ignores Two-Thirds of It

S2E21 · January 9, 2025 · 01:14:24

Today, I (Nicolay Gerold) sit down with Trey Grainger, author of the book AI-Powered Search. We discuss the different techniques for search and recommendations and how to combine the...

#037 Chunking for RAG: Stop Breaking Your Documents Into Meaningless Pieces

S2E20 · January 3, 2025 · 49:13

Today we are back continuing our series on search. We are talking to Brandon Smith, about his work for Chroma. He led one of the largest studies in the field on different chunking te...

#036 How AI Can Start Teaching Itself - Synthetic Data Deep Dive

S2E19 · December 19, 2024 · 48:11

Most LLMs you use today already use synthetic data.It’s not a thing of the future.The large labs use a large model (e.g. gpt-4o) to generate training data for a smaller one (gpt-4o-m...

#035 A Search System That Learns As You Use It (Agentic RAG)

S2E18 · December 13, 2024 · 45:30

Modern RAG systems build on flexibility.At their core, they match each query with the best tool for the job.They know which tool fits each task. When you ask about sales numbers, the...

#034 Rethinking Search Inside Postgres, From Lexemes to BM25

S2E17 · December 5, 2024 · 47:16

Many companies use Elastic or OpenSearch and use 10% of the capacity.They have to build ETL pipelines.Get data Normalized.Worry about race conditions.All in all. At the moment, when ...

#033 RAG's Biggest Problems & How to Fix It (ft. Synthetic Data)

S2E16 · November 28, 2024 · 51:26

RAG isn't a magic fix for search problems. While it works well at first, most teams find it's not good enough for production out of the box. The key is to make it better step by step...

#032 Improving Documentation Quality for RAG Systems

S2E15 · November 21, 2024 · 46:37

Documentation quality is the silent killer of RAG systems. A single ambiguous sentence might corrupt an entire set of responses. But the hardest part isn't fixing errors - it's findi...

#031 BM25 As The Workhorse Of Search; Vectors Are Its Visionary Cousin

S2E14 · November 15, 2024 · 54:05

Ever wondered why vector search isn't always the best path for information retrieval?Join us as we dive deep into BM25 and its unmatched efficiency in our latest podcast episode with...

#030 Vector Search at Scale, Why One Size Doesn't Fit All

S2E13 · November 7, 2024 · 36:26

Ever wondered why your vector search becomes painfully slow after scaling past a million vectors? You're not alone - even tech giants struggle with this.Charles Xie, founder of Zilli...

#029 Search Systems at Scale, Avoiding Local Maxima and Other Engineering Lessons

S2E12 · October 31, 2024 · 54:47

Modern search systems face a complex balancing act between performance, relevancy, and cost, requiring careful architectural decisions at each layer.While vector search generates buz...

#028 Training Multi-Modal AI, Inside the Jina CLIP Embedding Model

S2E11 · October 25, 2024 · 49:22

Today we are talking to Michael Günther, a senior machine learning scientist at Jina about his work on JINA Clip.Some key points:Uni-modal embeddings convert a single type of input (...

#027 Building the database for AI, Multi-modal AI, Multi-modal Storage

S2E10 · October 23, 2024 · 44:54

Imagine a world where data bottlenecks, slow data loaders, or memory issues on the VM don't hold back machine learning.Machine learning and AI success depends on the speed you can it...

#026 Embedding Numbers, Categories, Locations, Images, Text, and The World

S2E9 · October 10, 2024 · 46:44

Today’s guest is Mór Kapronczay. Mór is the Head of ML at superlinked. Superlinked is a compute framework for your information retrieval and feature engineering systems, where they t...

#025 Data Models to Remove Ambiguity from AI and Search

S2E8 · October 4, 2024 · 58:40

Today we have Jessica Talisman with us, who is working as an Information Architect at Adobe. She is (in my opinion) the expert on taxonomies and ontologies.That’s what you will learn...

#024 How ColPali is Changing Information Retrieval

S2E7 · September 27, 2024 · 54:57

ColPali makes us rethink how we approach document processing.ColPali revolutionizes visual document search by combining late interaction scoring with visual language models. This app...