RAG isn't a magic fix for search problems. While it works well at first, most teams find it's not good enough for production out of the box. The key is to make it better step by step, using good testing and smart data creation.
Today, we are talking to Saahil Ognawala from Jina AI to start to understand RAG.
To build a good RAG system, you need three things: ways to test it, methods to create training data, and plans to make it better over time. Testing starts with a set of example searches that users might make. These should include common searches that happen often, medium-rare searches, and rare searches that only happen now and then. This mix helps you measure if changes make your system better or worse.
Creating synthetic data helps make the system stronger, especially in spotting wrong answers that look right. Think of someone searching for a "gluten-free chocolate cake." A "sugar-free chocolate cake" might look like a good answer because it shares many words, but it's wrong.
These tricky examples help the system learn the difference between similar but different things.
When creating synthetic data, you need rules. The best way is to show the AI a few real examples and give it a list of topics to work with. Most teams find that using half real data and half synthetic data works best. This gives you enough variety while keeping things real.
Getting user feedback is hard with RAG. In normal search, you can see if users click on results. But with RAG, the system creates an answer from many pieces. A good answer might come from both good and bad pieces, making it hard to know which parts helped. This means you need smart ways to track which pieces of information actually helped make good answers.
One key rule: don't make things harder than they need to be. If simple keyword search (called BM25) works well enough, adding fancy AI search might not be worth the extra work.
Success with RAG comes from good testing, careful data creation, and steady improvements based on real use. It's not about using the newest AI models. It's about building good systems and processes that work reliably.
"It isn’t a magic wand you can place on your catalog and expect results you didn’t get before."
“Most of our users are enterprise users who have seen the most success in their RAG systems are the ones that very early implemented a continuous feedback mechanism.“
“If you can't tell in real time usage whether an answer is a bad answer or a right answer because the LLM just makes it look like the right answer then you only have your retrieval dataset to blame”
Saahil Ognawala:
Nicolay Gerold:
00:00 Introduction to Retrieval Augmented Generation (RAG)
00:29 Interview with Saahil Ognawala
00:52 Synthetic Data in Language Generation
01:14 Understanding the E5 Mistral Instructor Embeddings Paper
03:15 Challenges and Evolution in Synthetic Data
05:03 User Intent and Retrieval Systems
11:26 Evaluating RAG Systems
14:46 Setting Up Evaluation Frameworks
20:37 Fine-Tuning and Embedding Models
22:25 Negative and Positive Examples in Retrieval
26:10 Synthetic Data for Hard Negatives
29:20 Case Study: Marine Biology Project
29:54 Addressing Errors in Marine Biology Queries
31:28 Ensuring Query Relevance with Human Intervention
31:47 Few Shot Prompting vs Zero Shot Prompting
35:09 Balancing Synthetic and Real World Data
37:17 Improving RAG Systems with User Feedback
39:15 Future Directions for Jina and Synthetic Data
40:44 Building and Evaluating Embedding Models
41:24 Getting Started with Jina and Open Source Tools
51:25 The Importance of Hard Negatives in Embedding Models