· 01:14:24
Nicolay Gerold: Developers
treat search as a black box.
They throw everything in a vector database
and hope something good comes out of it.
They also throw all ranking
signals into one big ML model
and hope it makes good results.
something decent and you don't want
to create this witch's cauldron
where you throw all your ingredients
in and hope for a magic potion.
It might work, but when it
does, you won't know why it's
difficult to debug and adjust.
You won't know which, for
example, ranking signal actually
delivered the important signal.
And if something goes wrong, you
need to retrain the entire model And
you can pinpoint why it went wrong
and you want the exact opposite.
You want layers of tools or techniques
aligned in a graph that you can
tune, debug, and update in isolation.
For ranking, this might
look something like this.
You have like a top layer
of personalization with
user specific adjustments.
So for example, this only
adjusts the result positions of
the different search results.
In the middle layer, you might have
something like signal boosting.
So, Popular items are in
within the specific query,
specific terms are boosted.
And this might also happen based
on some user behavior data.
And the baseline layer might be
a generalized ranking algorithm,
which is like your core relevance
scoring based on similarity, but
also like your TF IDF scores.
And through this, we basically can
control each layer and can tune it
independently of the others and also
enable and disable specific components
which becomes very interesting when you
actually want to adjust them based on the
user query and pipe them through different
paths through the graph you have.
And it allows you also to easier debug
why something went wrong because you can
pinpoint, okay, where was this result
actually like boosted up or ranked higher?
Um, so it actually went to the user and
it also gives you the ability to update
each layer without affecting the others.
So it makes it easier to A,
B test specific components.
And this reflects like a broader
principle in software engineering
where you prefer clear and separated
components over monolithic black boxes.
It's just the same as you wouldn't
throw all your application code
into like one child function.
So you shouldn't throw all your
ranking signals into one child model.
And you shouldn't throw all your
hopes into a vector database or
an embedding model by OpenAI.
And today we are continuing
our series on search.
We are talking to Trey Greinger, the
author of AI Powered Search, and we
look at the different techniques with
modern search engines and recommendation
systems and how you can bind them.
And Trey brings a wealth of experience
from His different search positions with
solar elastic search and also now from
his own shop search kernel Let's do it
Trey Grainger: I think RAG is a bad
acronym I think GAR is also a bad
acronym only because it goes both ways.
And so I think that it's more
of a synthesis of using using
retrieval to make generative models
better and then using generative
models to make retrieval better.
So I think it's actually more
that retrieval is in the middle.
You use generative models to help
interpret queries and understand them.
You then use sorry, you to,
yeah, to better understand them.
And then once you've gotten the
results, you also, take that to improve
the output of the generative model.
So I think it, it's, um, GARRAG or
RAGGAR if we wanted to start to combine
them together, but I don't know.
I think it's more important that
people understand the concept and
how to put these things together than
get hung up on particular acronyms.
Nicolay Gerold: Yeah.
I always think of like your search
system in the end, rather as a feature
pipeline for my generative model,
as opposed to like just retrieval.
How would you actually, how do you see it?
Do you think like, how I
think about it is wrong?
Trey Grainger: I think of whether it's
search or generative AI, I think of all
of these systems as nonlinear pipelines.
And so for me if I think
of, I'm just going to talk
about retrieval for a second.
If I look at a typical information
retrieval architecture, usually
there's a query intent aspect of it.
You take the query intent you run a
search to find results, probably over
requests, and then usually there's
some sort of a re ranking phase.
But that's a very overly simplified
version of what we're doing.
Because often you will Run, you will
try to interpret query intent, maybe
take that, go out and look up some more
information to better understand query
intent, and that might include going
to your index and finding what's there.
And so you refine the query
intent, and you request information
potentially multiple times.
You get results back, you do re
ranking, and then potentially, if
it's more of an agentic framework you
might actually return results to a
generative model that then determines,
I actually want to explore this further.
And, you continue on, and maybe you
repeat that process multiple times.
Where we are, and where we're
going, I think not thinking about
it as, retrieval is three steps.
Interpret intent get results and re
rank, but think of it as that is a
sequence that happens in a larger
pipeline, and that pipeline might
be linear or it might be non linear.
And then I think, to your actual
question, which was, how do you think of
retrieval in the context of generative AI?
It's the same kind of thing.
The reason we use retrieval when
we're working with generative AI is
because A generative AI model these
LLMs will take your query, your
request, whatever you're asking for.
They will then try to interpret
them and without access to up to
date information, without access
to correct information, they will
generate a response from their highly
compressed understanding of the world.
And so we use retrieval to
augment them with information.
But so that first phase though,
which is the interpretation of
the query is boosted by the second
page, which is the retrieval.
But then, all of that gets pumped
back into the generative model
to summarize, which is used
in the generative model again.
So in a lot of ways, I think of
retrieval as sandwiched in between
the generative AI, where there's
generative AI at the beginning and at
the end, and retrieval in the middle.
But again, You can always go back to
retrieval multiple times, you can, try
to assess and summarize multiple times.
It's better to just think of these as
almost two different techniques that
are almost, it's almost like you're
playing a game of pong ping pong back
and forth where the generative model
says, hey, I need some information,
the information comes back and it
just keeps going back and forth until
a sufficient answer is determined.
Nicolay Gerold: And this in the end,
it's like really use case dependent,
but in the end, it's like a graph,
which isn't the traditional graphs we've
seen, like data processing, like a DAG
which is a cyclic, but we can branch.
We can do really elaborate loops as well
and go back and basically your use case
should inform what actions do you take?
Like in e commerce, when I have to
be really fast, I probably can't do
many loops, but in something like
report generation at the moment,
it's a really hot topic I can do.
Probably as many loops as I like
and focus more on like really
getting the information down.
What?
Trey Grainger: Yeah, I think in search
in particular one of the key aspects
is people expect it to be fast.
So a lot of the time even though you
could find a way better answer with,
ten minutes of processing a lot of times
you have to optimize things to, get down
to, milliseconds or seconds of latency.
So there's definitely a
lot of tradeoffs there.
Nicolay Gerold: Yeah, but I think
like search has a real renaissance
at the moment because it's now
being used in way more context than
the traditional search was used.
Like we are going way past like website
search, web search, but we're using
it to feed the right information into
the LLM so we can actually perform
actions or do whatever we want.
Trey Grainger: Yeah, for sure.
And I think I don't know if
we've crossed the threshold.
We may have already but, go back a
couple of years and if you think of
the big web search engines, Google,
Bing, mostly Google Baidu, et
cetera they I like to think of the
web search engines as Basically a
giant cache of the internet, right?
You could take these generative
models and you could have them,
figure out what you're asking for
and then go crawl the entire web to
find the information and process it.
With web, do web crawling, all of
that, and then return a result.
But, it might take you 10 years
to go through the entire internet
to find the right results.
Whereas what the search engine does is
it serves as a cache of knowledge, a
cache of information that the generative
models are basically relying on to
get all of that data in milliseconds.
So the cache shouldn't make it
very efficient to find the most
relevant information to use.
And so that's why search engines
are so well paired with generative
AI for RAG because they provide
that quick access to the data.
So I yeah, I think the reason there's
this renaissance is because we need the
data for these generative models and
we're probably close to, if we haven't
already exceeded the point, where the
number of requests going to these search
engines, which were originally designed
for humans they're probably, in you.
Very soon, if we're not already there,
most of the requests are going to
be going to the generative models.
Bots already make a lot of requests,
but I see a future where humans
are the vast minority of the
requests going to search engines.
Nicolay Gerold: Yeah, and at the
moment I see two different camps
like talking on the RAG space one
is like really the AI side, which is
coming from LLMs and have discovered
like RAG and barely even retrieval.
And then you have the search side,
which is coming with all their like
methods for actually figuring out
retrieval and more old school techniques,
which are very well thought through.
What do you think are the misconceptions
of both camps when it comes to AI
powered search in this new generation?
Trey Grainger: Yeah,
that's a great question.
So I come from more of the search camp,
so I'm going to start with that answer.
What I see so I'm going to
just do an analogy to start.
So I got my start in the information
retrieval space migrating to Apache
Solr when Solr was very early.
And after Apache Solr Elasticsearch
came along and had some things that
were, better designed for that era and
Elasticsearch started taking market share.
They did something very smart and
very interesting, which is it's not
like they were coming and taking
away all of the people using Solr.
What they did instead is they used logs.
They used Logstash and Kibana to
make a really useful tool that, to
solve a problem everybody had, which
is, hey, I've got a ton of log data
and I need to be able to analyze it
and pull it together and analyze it.
By doing that, they actually
opened the door for an entire
new genre of search people.
They brought in all sorts of DevOps
people, all sorts of people that really
needed to do log search got in the door
got tons of installations, and then those
people, once they, had used the tool for
a while, started to realize Wait a minute.
I can do way more than
just log search with this.
I can search on all kinds of data.
I can build all sorts of
products on top of it.
And by doing that, they the
number of people in search and
information retrieval just exploded
once they developed those skills.
I see a very similar thing happening
right, happening now with generative AI.
So people start to build their
companies, their start ups.
They're trying to build
generative AI based products.
Once they start, they realize,
Hey, these things hallucinate.
They don't always give
me the right results.
I need to supplement this using RAG.
So they hear the term RAG, they don't
exactly know what it is, but what they
hear is oh I need to plug in a vector
database, I need to take my data, I need
to encode it with a transformer, I put my
embeddings in the vector database and now
my system can magically somehow take the
queries, get data from the vector database
and use that to supplement the results.
So they do this, they maybe use some
off the shelf tool and LangChain,
LamaIndex, something like that.
Once they've done it, they start to
realize, wow, this is like magic.
Somehow, miraculously, my
generative AI now has access to
real, live, up to date information.
And it looks like magic some of the time.
The rest of the time, they get
really weird results because the
data that's coming back from the
vector database for the retrieval
part of RAG isn't the right data.
Out of all the documents,
maybe they got the wrong ones.
And so then, they very quickly start to
come and look at why that's happening.
And what they end up doing is they
meander, they start doing some research,
and then they open this door, and
when they open the door to information
retrieval, they start to look inside,
and they see all of The folks who come
from an information retrieval background
waving, saying, hi, welcome to the club.
Here is all of our knowledge about
all these techniques and tools.
And some of them say, wow, that's amazing.
And they dive in.
And some of them say, oh, and
they shut the door and continue on
their way to solve other problems.
So I think that's my sort of little
analogy here, which is I think there
is a Renaissance in information
retrieval and search driven by
the generative AI capabilities.
But I think that very quickly people
are realizing that the R part of RAG is
really the hard part and the part that
takes, years and years to get right if
that part is critical to the system.
And I think the misconception is that,
oh, hey, for RAG I can just, plug
in a vector database and a couple
of libraries and, a day or two later
everything's magically working and
I'm off to solve the next problem.
Because search and information
retrieval is one of those problems
that you never really solve.
You get it, good enough and quit,
or you find so much value in
it, you just continue investing
to constantly make it better.
On the flip side, I think misconceptions
on the search side of things.
I think there's a lot of people who
have been in search for a while.
who know all of the old school techniques,
lexical search, stemming, minimization
field boosting using NLP libraries to do
parts of speech analysis and, boosting.
There's all these techniques that
sometimes they're still useful but I will
definitely say over the last two to three
years, I've probably thrown out about
twenty five to 30 percent of all the tools
and techniques that I used to use just
because they're not relevant anymore.
The other 60 to 70 percent is
still highly relevant and those
things still need to be done.
But there's a lot of things that,
we've just got better tools now.
And I think a lot of people maybe
are a little afraid to embrace those
new tools because they've got the
way they've always done things.
But everybody's different.
Nicolay Gerold: Yeah, what are the parts
you've actually are, you have thrown
away and that you would recommend people
throw away from traditional search?
Trey Grainger: It's a good question.
I should preface this by saying
every problem is different.
If I was tasked with, hey, I've got a
thousand documents and their recipes.
And I'm trying to, put them in a
search engine so I can, quickly
find the ingredients for my recipe.
I would say, okay, let's load those
in Solr Elasticsearch OpenSearch.
Let's do lexical search
and let's call it a day.
Because that's not a hard problem
to solve when you're looking for
specific ingredients and the old school
techniques are more than sufficient.
You don't need a vector database,
you don't need embeddings,
you don't need a transformer.
It's overkill.
I should caveat this with saying
Depending on what you're building, the old
school techniques might be fast, cheap,
efficient, and even do a better job.
That being said, for your larger use
cases, your e commerce use cases, your,
more sophisticated enterprise search use
cases, across the board I think a lot of
the NLP techniques, specifically things
that you would use NLP libraries for
OpenNLP things like that, where you're
doing parts of speech detection and
trying to say I'm going to match on the
nouns, or I'm going to match on the verbs.
Those are basically not as useful anymore.
I think, if you've got the ability
to leverage an LLM, it's gonna do
a way better job at understanding
the entire context and picking out
which words and phrases matter.
So those are just a waste of
time in my view in most cases.
I also think that We're entering a phase
now where document interpretation and
understanding and enrichment can be, in
many cases, done way better by language
models than by some of the old school
pipelines that we would have built.
Things like we've got things like
OCR, optical character recognition
for reading PDFs and documents,
very rapidly being replaced.
by things like vision language models.
So if use something like ColPali then
you can even take the entire PDF or
document, pass it in to the language
model, have it interpret it and turn
it into a vector, and all of those in
between steps of parsing and tokenizing
and OCR and all of that stuff just goes
out the window because when you compare
its efficacy to what you get with a
vision language model, the vision language
model just blows it out of the water.
So you're wasting a lot of time doing
things that get you worse results than
if you just didn't spend the time at all.
So those are some examples I think.
Nicolay Gerold: Yeah, and you
mentioned you have to pick like
the right tools and you have to
take a lot from different fields.
One we haven't mentioned yet, which
is like a major topic of your book
as well is recommendation systems.
Yeah.
I want to know what you actually
took from recommendation systems
and brought into your search system.
But first, like how do recommendation
systems differ from the search
systems and where do they overlap?
Trey Grainger: Yeah,
it's a great question.
So I take a slightly different view
than some people, but my perspective
is that there's fundamentally, at least
conceptually, no difference between a
search engine and a recommendation engine.
And what I mean by that is when you
have a search engine, just think of a
traditional search engine like keyword
search, even vector search at this point.
Somebody types in some keywords, a
question, some text maybe some image,
but let's just go with text for now.
They type in some text, and that text
is the context by which they want
to match documents and rank those
documents coming back from the match.
Assuming for now we're just
getting documents back, not,
nothing more sophisticated.
The context is keywords and
you get matched results back.
When I have a pure recommendation
engine, let's say it's collaborative
filtering based, meaning it's based
upon the interactions of users
with products and their behavior.
What do I have?
The context that I have is the behaviors
of the user maybe a profile of the
user, it's things I know about the user.
I take that context, I pass it
to a matching engine to match
and rank and return results.
So if you conceptually abstract
search engines and recommendations,
recommendation engines up one
level, they're both engines that
take context, they match results,
and they rank the results.
And you can say, yeah, you're
oversimplifying it why, but they
have different purposes, right?
Yes and no.
If I think of it not as two different
types of technologies, but I think of
it as one type of technology, a matching
and ranking engine that takes in context.
Then what I can do is I can say
what about all the space in between?
So if I move from a pure
search engine that the context
is only explicitly entered.
by the user at query time, and I say yeah,
but I've got some user context as well.
So why don't I use that user
context to change the way that
I match and rank the results?
That's where you get
to personalize search.
And so you could think of, for example
let's say I've got a restaurant
search engine, and I'm in New
York, and I search for a hamburger.
I would be pretty upset if my search
engine returned me hamburger stores
in Atlanta and Chicago and London.
That's because I'm in New York, the
search engine should hopefully know
about me and where I am, if nothing
else, based upon my IP address.
And every restaurant search
engine will do that, but that's
an example of personalized search.
It's taking information you know about
the user, in this case it's just a
location from an IP address, and using
that to augment how you adjust the query.
And that's a pretty simple example.
On the flip side, if I go from
pure recommendations, and I say
let me augment these with user
context that they're typing in now.
So the user could go in and say
hey, I don't want to include This
document that I looked at before.
Take that out of my
recommendation algorithm.
Oh, also, I would like to
filter down to these results.
Oh, also, let me add some keywords in, and
make sure I fine tune my recommendations
so they target those results.
That's an example of, I would call
it, augmented recommendations.
It's not pure recommendations
just from user context, you're
also taking explicit input.
And so this notion of getting some
input explicitly from the user, usually
keywords, And getting some information
based upon user behavior and what you
know about the user allows you to do
very interesting things in the middle.
And I would say, and we can talk about
this more if you want to dive in, but
being able to take user behaviors, user
searches, their queries, their clicks
maybe products they've purchased,
taking those signals and using those
to generate embeddings that represent
what they're interested in, and then
pairing those with what the user's
searching in terms of keywords and
what that means, you actually get to
this almost sweet spot in the middle.
Cool.
Where you've got as much coming
from user understanding as you have
coming from their explicit intent.
And then you can marry
those in different ways.
And so I, what I would say is, to me
they're, search and recommendations
are fundamentally the same problem.
They're just using different contexts.
And what I really like to do is
think of the entire spectrum.
And depending upon what you're
building target the appropriate
level of balance between the explicit
context and the user context.
Nicolay Gerold: Yeah, and we will double
click soon, but I think you have the
perfect precursor or contextualization for
the three buckets of relevance in search
system, which you go into in your books.
Can you maybe explain like the
three buckets of relevance and the
different search techniques that
fall into the different buckets
and also the intersections between
Trey Grainger: Sure.
Yeah, happy to.
Yeah, highly related to the last topic.
The fundamental, one of the fundamental
questions when someone's approaching
search, or when you're doing retrieval
augmentation, you're thinking about
how to think about the retrieval piece
of it is this concept of relevance.
And going back decades and decades, within
search and information retrieval, we
define relevance as matching the user's,
or meeting the user's information need.
Basically, if you can understand
what the user is looking for and
return that, then you've succeeded
at providing relevant results.
And so the question is,
what is the user intent?
And to me, there's three categories.
of, or I would say three contexts that are
really important for understanding intent.
One of those is the content context,
one of those is the user context, and
one of those is the domain context.
And so if you can mentally
think of this a Venn diagram
with three concentric circles.
In the content context, if I'm looking
purely at the content and nothing else,
I don't care about the user behavior, I
don't care about the domain, that's where
we've got traditional keyword search.
Type in some keywords, I match
on those keywords, and that's it.
No other context taken into consideration.
Like we just talked about, if I
look purely at the user that's
where I'm in recommendations, right?
It's collaborative recommendations, and
I'm just literally looking at what's this
user's behavior, what items have they
clicked on, what items have other users
clicked on, and it's all behavior based.
And so that's pure recommendations.
And then, if you look at the intersection
of those that's where you get personalized
search, which we just talked about
and all of the things you can do with
combining those two contexts together.
But there's the third category,
which is the domain context.
And so if you overlay the domain
context and look at the intersections
there content is You know, what's
in my engine the behavior we talked
about, and then the domain context
is what you understand about your
business, your domain, the topics, if
it's a, say you have a paint store.
It understands colors, it understands
relationships of colors, it understands,
enamel versus I don't know paint
that well, but all the different things
you would need to know about paint.
If it's in the pharmaceutical industry,
you would have a very different
understanding of that industry.
And if you look at just the domain
context absent of the content you
have and absent of the user context,
just the domain context, that's where
you would have a knowledge graph.
And if people are building
ontologies, taxonomies, all those
relationships, you can just roll
those up and say, collectively those
go in and make a knowledge graph.
That is a modeling of your domain,
independent of your content,
and independent of your users.
And so then you look at the intersections.
Okay, if I have a knowledge
graph, that's my context.
And I look at the intersection of
that with my content, what do I have?
In that case, it's semantic search.
So when you marry the domain
understanding, all the words,
terminology, phrases, relationships,
with the content you actually
have, you're doing semantic search.
And so there's multiple
techniques for that.
There's just leveraging embeddings
with transformers, especially if
they've been fine tuned to your domain.
There's also lots of.
semantic search techniques we've been
using for a decade or more around sparse
lexical types of matching, where you're
understanding the meaning of words, you're
expanding the query, you're, if someone
says, barbecue, or top, top barbecue near
Atlanta, then you can understand that near
Atlanta is a location, and then you can
filter down to a radius, and do some query
rewriting to understand those things.
That doesn't require a transformer.
It's one way of doing semantic search.
So that's the intersection there.
Zooming back out.
Content understanding, user
understanding, and domain understanding.
And if you intersect content and
user, you've got personalized search.
If you intersect the content
understanding and the domain
understanding, you've got semantic search.
And there's the third.
Intersection, which is less interesting.
But if you look at all of them
together, the very center where
all of those contexts overlap,
content, user, and domain, is where
you really get at user intent.
And to me, anytime you're building
a search system, whether it's
traditional search, whether it's RAG
for generative AI, you need to have
all three of those contexts in order
to effectively get the most relevant
results to solve solve the problem.
Nicolay Gerold: Yep.
And do you think like all
of these three contexts?
Have an impact across like the entire
retrieval pipeline and there's basically
like the query intent retrieval re ranking
and now like also the like generation
part which is coming closer and closer.
Do you think like each one of those should
have a place in each of these components?
Trey Grainger: I don't want to say
absolutely always, but in general, yes.
One of the things, just zooming out
a little bit, that I find the most
challenging with the current era
we're in, which is lots of focus
on vector databases, lots of focus
on transformer based encoding.
Most of those encodings are text
based, or even if there's images
and it's multimodal you might get
You'll get the content context.
You might get the domain context if it's
general or if you fine tuned to your data.
But almost none of these systems
are getting the user context.
Because it's hard.
Because in order to get the user context
and understand your users, you have to
collect their queries, you have to collect
their clicks, you have to collect what
they purchase and act on, and if you do
it well, you actually need to collect the
things that they don't click on as well.
You need to know what they saw but didn't
click on to know what they looked over.
And so there, there's a whole
realm of building click models
to try to train some of these
things, which we could talk about.
But the reality is, the reason that
people are using these RAG platforms
and I would say almost formulaically,
these libraries that sort of give you
out of the box RAG, is because it's easy.
Not that LLMs as a concept are easy,
though they're very sophisticated, but
being able to write a few lines of code.
encode my text using some off the
shelf model, put it in a vector
database, query it, and get results
that look somewhat like magic is
easy for someone to, to implement.
Adding in the user context and doing
all this collection of user behavior
and processing it is a lot of work.
Therefore not only do most people
not do it most people I've talked
to who are, Working on startups in
this space don't even know about it.
They don't even know to ask the question
of how should I be incorporating my
user behavior into into the system?
Nicolay Gerold: Yeah, and I
think I will be just rattling
off like a few techniques.
I understand domain understanding.
You can use across the board.
For example, you can even use
graph embeddings for your search.
You can use them for re ranking by
furthering the results and looking at the
different entities which are used, for
example, in content should be very clear.
But I think like personalization is Mhm.
Probably, or the like user bucket is
the most difficult, especially because
architectural wise, it involves a lot
of different components as well from
the client to basically your database.
If you actually think about it, like
the, from a systems perspective, what
are the different components which are
actually necessary to implement this user
understanding or use a loop completely.
Trey Grainger: Okay I'll start with let's
say you were gonna go build this tomorrow.
What would you need to do?
Then I'll get into how you apply it.
First thing as you mentioned, there's
a user interface component, right?
The users interacting, they're
clicking on things, what have you.
You need to collect that user interaction.
So what does that mean?
Probably some sort of
a JavaScript library.
There's stuff like Snowplow out there
you can use to collect user behavior in
the browser and pass it to the back end.
But, what you ultimately want to be
collecting is every query the user runs.
The results in the order they were
returned to the user whatever the user
clicks on and then any other subsequent
user signals like, purchases, add to
carts if you can get it, things like
returns or, posted a bad comment on
a on a review site or what have you.
Any signals, behavior you can get from the
user about how they liked and interacted
with the product as a result of that
query is fair game for as a signal.
So you have to collect those things,
know what to collect, know how to
format them, know where to save them.
Once you've done that, and that's
like step one, is collect the data.
Once you've built that though, which
is not trivial, but it's doable.
And in the, in my book, I've got a
whole section on the right format
to store it in and how to save it.
Also the open search project recently
built a add on maybe it's going to
be part of the core at some point
called user behavior insights, which
is basically a schema for how to
collect this data appropriately.
And they've open sourced it and I
think they even have implementations
for some other engines like solar.
So I think they're trying to more
create a standard for the industry
in terms of how to do this.
But regardless of whether you build
something bespoke or try to leverage
something that's more standardized,
you have to collect the data.
But then once you've done it,
you have the ability to build all
kinds of models based upon that.
So some of them are personalization based.
Based upon a particular user and what
they've interacted with, you can look
at their, you can look at all the
products they interacted with and
average those vectors together and
then use that as a, something that sort
of represents that user's interests.
And then the next time they run a
search, you can, boost it up or down
a little bit based upon, interpreting
that vector in the context of the
user's vector, that's one way to do it.
Another way is just to
generate recommendations and
use those to boost results.
There's, you can apply guardrails
where you can cluster all
of your documents together.
Say if it's e commerce, you cluster
it into maybe a hundred categories.
And then from there, you can personalize
within those categories when they search
for things within those categories.
So as an example, if, I've got this one
in the book, but if somebody goes and
searches for a Hello Kitty water bottle
or a Hello Kitty plush toy or something
like that, and then they search for
a, Black GE electric razor for their
face, and then maybe they go search
for a stainless steel Range or stove or
something like that and then their next
search they search for microwave That
user is probably not looking to see a
Hello Kitty microwave That thing does
actually exist in my sample dataset.
They're probably not wanting
a Hello Kitty microwave.
They happen to Hello Kitty, and they
wanted a water bottle or a plush toy, but
they're, when they looked for appliances,
they were looking for a stainless steel
range, which means probably they want
to see a stainless steel microwave.
Microwave.
Similarly, if they search for a GE
razor and they search for a stainless
steel Samsung range, they're probably
not looking for a GE microwave
just because they got a GE razor.
Because the situation is there, there's
two different, very different categories.
And you have to make sure that when you're
personalizing, you don't over personalize.
Because there's no better way to
make your users really angry with
you than to stick them in a bucket
and get them stuck in that bucket,
which is not their actual intent.
So personalization is something
you have to be careful about.
Nicolay Gerold: So basically the
cluster would allow me to basically
cluster the behavior or whatever
he did and the results that were
returned into certain areas.
And when the cluster is too divergent
from the current query, I can
basically decide to exclude the
results which are also in this cluster.
The example being like
the Hello Kitty microwave.
Trey Grainger: Yeah, and in particular
the way I usually build this out is
if I'm focused on personalization, say
a user runs a search, I generate an
embedding I do my inspector search to
find results I will get a personalization
vector for that user that has all of
their relevant behavior but what I'll
do is I'll filter that Set of items
down to only items that are within the
cluster of the user's query or maybe
like the five or ten nearest neighbors
depending upon, how many clusters I have.
So I can find either in, in the exact
cluster or in a very nearby cluster.
And then I'll only personalize
if I know that the behaviors
are within the same category as
the kind of query that came in.
If that makes sense.
Nicolay Gerold: So basically you contrast
the result sets, you run a regular search.
For example, you run
the personalized search.
So basically like a recommendation
system with a personalization
vector and you compare the results.
Sets, and then there is
no overlap or a very low.
Check our distance, you actually tend to
disregard the personalization results.
Trey Grainger: Yeah, so I actually
for my personalization vector, I'm
actually filtering out the I'm filtering
out the behaviors that are not in a
cluster that is similar to my query
before I even apply the personalization
Nicolay Gerold: interesting.
Trey Grainger: So think of it like
I'm doing on the fly recommendations
and I'm choosing the data points.
that I'm using for the recommendations
based upon the user's query and filtering
down to only behaviors that are similar to
the user's query or in similar categories
as the results of the user's query.
So it's what you said, but I'm
actually doing the filtering at,
pre match pre matching and ranking.
I'm doing it up front to filter down the
behaviors I'm going to use to personalize.
Nicolay Gerold: Yeah, and I think that's
already leads us into the next question.
How do you actually handle the conflicting
signals between the other buckets as well?
So basically, if there is a
conflict in the domain and the
user or domain and the content.
Trey Grainger: Yeah it's
a very good question.
There's a couple of
different approaches to this.
Let me lay some groundwork here
and maybe give a few concepts
and terms before I answer this.
I mentioned signals and collecting
the signals and that you can
use them for personalization.
That's one way to use them.
I tend to think of ranking
algorithms in a couple of buckets.
One of those buckets is what I would
call a popularized relevance algorithm.
So this is something
like signals boosting.
So in the context here say someone's
running a search for iPad and they
get a whole bunch of search results
back for iPad chargers and iPad
dongles and just not great results
because the search engine natively
doesn't know, what an iPad is and,
how to boost one of what's popular.
So what you do with signals boosting
is you take all the queries for iPad.
And then you see which results
people are actually clicking
on, purchasing, favoring.
And then, with enough signals, you can
actually know exactly the right answer.
Not by doing some sophisticated
machine learning algorithm,
but literally by counting.
You say, for this query, what
are the top results people
are purchasing that they want?
And then you have your top page or two
of results that you show to the user.
The user's literally telling you,
these are the results we want.
The users, and then you show those
results, and if you do anything more
sophisticated than that to try to
figure out, what you should be showing
your user, you're probably going to
lose sales if you're in e commerce.
So that's signals boosting.
It's popularized relevance.
It's just looking at what the popular
queries and popular items are for
those queries and just, using the data.
So that's one, one level.
Another level is personalized relevance.
So that's what we were
talking about a minute ago.
Whether you're generating recommendations,
whether you're so it's a way of
interleaving them and the results
as recommendations or whether you're
actually taking the personalization
vector that represents the user
and using it to adjust the results.
That's where you understand
the user and then you cater the
results to the specific user.
And then a third category is what
I call generalized relevance.
And so this is your traditional
learning to rank or machine
learning ranking approaches.
Where you're taking lots of queries
and lots of results and you're either
automatically learning from the
signals, a click model that you can use
as implicit judgments to train your.
Machine learning model, your
learning to rank model, or you're
getting explicit user judgments
by having people annotate data.
Either way, you build this model that
takes lots of features, it looks at the
judgments of query to document pairs,
and then it learns which features
matter in general across your domain.
So the generalized relevance model
is able to take queries it's never
seen before and to, apply a ranking
algorithm to rank results based
upon what features in documents and
queries normally matter the most.
And so I tend to layer these where you
there's actually one other category which
I'll briefly touch on which is a semantic.
Category where you can semantic relevance,
where you can learn from user behaviors.
For example, they typed, somebody
typed in the query manger.
Usually after someone types in manger,
they type another query that is manager.
So you learn that there's a
misspelling not from your content,
but from actual user behavior.
Every time somebody types in iPad
with a misspelling, you correct it.
So you learn what the synonyms to things
are, what the acronyms for things are,
what the misspellings are, and you can use
that to help interpret queries as well.
But if you just look at
maybe the first three.
If you think of ranking as a not a
pipeline, but as a sort of, um, as
a hierarchy, at your base level, you
need some kind of a base matching.
It can be for lexical,
it can be BM25 algorithm.
For dense vector search, it can
just be cosine or what have you,
but you need like a base level.
ranking algorithm and then on top of
that you have this generalized ranking
algorithm like learning to a learning
to rank model of ranking classifier on
top of that you can apply the signals
boosting and then on top of that you
can apply the personalized search and
some people will take all of these
things And throw them in a pot, almost
a witch's cauldron and they just stir it
up and they apply like XGBoost or some
machine learning algorithm and they say
whatever comes out is the magic potion.
That's my ranking algorithm.
You can do that.
But.
There's things that if you keep
them separate there's ways that you
can apply them in different ways.
Or if we think of it almost
like a pipeline, right?
You can try something,
see if there's results.
If not, you try the next thing and you
can put them together in different ways.
So I, I tend to actually do the latter.
It's a little bit more hands on, but
it gives you fine grain control over
what you're going to apply when.
If you want to turn
something off, turn it on.
Your one big model isn't overburdensome,
because you actually have all
these knobs that you can tune.
And so I tend to think of them as a
stack, and a set of techniques that
you pick and choose when you want them.
You can definitely combine them together,
and you would do that in your generalized
model, in your ranking classifier.
And, there's other things like
cross encoders, which are a form of
ranking classifier we can talk about.
But there, there's just all
these different options.
And my preference is not to
just throw them in a cauldron
and then get a potion out.
My preference is to intentionally
analyze and use each one where it makes
sense and to layer them appropriately.
Nicolay Gerold: Yeah, and this
also gives you the capability to
use query understanding in a much
more sophisticated way in the end
that you actually have based on the
classifications you're doing, you have
different pipes you're running through.
For example, just for head queries,
you might not even have to run
like anything sophisticated.
You just put it into a cache
and just return whatever is
in the cache and that's it.
Trey Grainger: Your top most popular
queries, you should already know the
answer to those before you run them, if
you're not doing user personalization.
But if you want to do user
personalization, then okay, that's going
to change the dynamic and maybe you start
with those and then personalize those.
Also with personalization, I
mentioned earlier, you want
to have kind of a light touch.
If you over personalize,
people get really frustrated.
So usually when I personalize results,
I don't personalize all the results.
I'll take the first page of results, and
I maybe personalize the second result,
and maybe the fourth and fifth result, and
then I leave the rest of the page alone.
Why?
Because I'm hopeful that my algorithm
for understanding user intent is really
good, and they're going to see the
second, fourth, and fifth result, which
are usually above the fold, and be like,
oh, hey, that's the thing I wanted.
This engine seems like it's
really smart, it knows about me.
But if I'm wrong, and the whole page is
Hello Kitty microwaves, and, Things I'm
not looking for then the user is going to
leave and be frustrated and potentially
never come back So I think for something
like personalization It's useful to keep
the algorithm on the side a little bit and
use it to interleave Results as opposed to
having it take over as a core piece of the
overall ranking system if that makes sense
Nicolay Gerold: Yeah, and when.
You actually think about this, like what
are the key components, especially in code
that you're setting up to implement this,
because when you don't do this carefully,
it's like just a jumble of if else or case
switch conditions, which basically pipe
the user request into a certain direction.
What's your technological setup
to actually keep this into
structure in an understandable way?
That also allows you to figure
out, like, where the mistakes
are in your different pipelines.
Trey Grainger: Yeah, so The answer is
it depends and it varies based upon the
technology stack people are using what
have you but I'll tell you my go to and
a lot of this is in the book as well.
I so I tend to take a
query when it comes in.
I'm going to ignore document
understanding for a second, but in
terms of query understanding, I tend
to take a query when it comes in and
have a a knowledge graph of known
entities, known things in my domain.
So I mentioned a query earlier,
it was like, top barbecue in
Atlanta or something like that.
I, what I'll do is I'll take the query
in, and the first, the very first
thing I'll do is I'll take it and
I'll pass it to a an entity extractor
but where the entity extractor
has my, my knowledge graph in it.
So I'll say, hey, do I know
what the word top means?
Oh, actually, yes.
I've got something in my system
called a semantic function.
And a semantic function says the
word top means in the context of a
restaurant search, popular restaurants.
What does that mean?
It means, five stars or the
higher the stars, the better.
And so my system will say,
oh, top is a known word in my
domain understanding context.
In my knowledge graph, it's a known word.
That is going to change
how I interpret this query.
And so from that point on, down the
line when this final query goes to
the search engine, I'm not searching
for some embedding vector that roughly
corresponds with some semantic notion
of what the word popularity means.
No, I'm searching for boost my results
by the number of stars because the
user wants popular restaurants, right?
So that's what top means.
When I go to barbecue,
okay, this is a word.
It's not in my knowledge graph.
I don't really know what it means.
So then I'm going to actually Go to
something like so I've got something in
the book I talk about, which is a semantic
knowledge graph, which actually just uses
the index to say, Hey, what are the other
most semantically related words to this?
And it brings back things like barbecue,
brisket, pork, ribs, things like that,
but it can also bring back a category of,
Hey, this is, the category is barbecue
restaurant or Southern restaurant or
something like that, or home cooking
or whatever the category might be.
And so it can help both.
Boost the category and or filter the
category as well as expand the term.
So that's a sparse lexical expansion.
It's similar to what you
would get with something like
splayed or similar techniques.
You can, in fact, you could
use splayed for that piece.
You could say, hey, I don't
know what barbecue is.
Let me go ask splayed.
Then let me expand here to a a
splayed field if you know what
splayed is to, to the listeners.
So you could do that there, and then
when I go to near Atlanta same thing,
near is a semantic function, that means
that if the thing after it is a city,
then get the lat long and do a radius
search for however many kilometers
around the city, say 50 kilometers.
And then my query to the engine actually
becomes I, no embeddings whatsoever
necessarily, it becomes an interpreted,
interpretation of that query that
explicitly maps to, A 50 kilometer radius
near Atlanta, boost the most popular
restaurants, find restaurants that
have the word barbecue, or are in this
category of things related to barbecue,
and here's some other related terms.
And that will get you a very tight data
specific data in your index specific
answer that almost perfectly expresses
the meaning of the user as opposed to
Let me take top barbecue near Atlanta,
pass it to a transformer encoder.
And it's going to give me some semantic
understanding of, popular things and,
things having to do with Atlanta.
Which would include the Atlanta Hawks,
which would include, the location, which,
it's it's going to under, it's going
to understand what you're asking, but
it's not going to be able to get the
right results from your search engine.
Unless it's been very explicitly trained
on how to query your search engine to
get those results, if that makes sense.
And so it's I want to say
it's a next level above LLMs.
It's not.
It's just a different approach to try
to focus on getting the user intent.
So to answer your question about the
text stack at any point it, where
I was interpreting barbecue and,
getting that, sparse lexical vector,
I could also do a dense vector there.
I could take the word barbecue and
expand that into a, dense vector
representation for an embedding,
and do that as part of my search.
I could at any point in here I can
go back and forth between sparse
lexical and dense vector capabilities.
But the answer to your question was like,
the tooling and how do you do this stuff
and my answer is I've got my knowledge
graph, I have an ability to interpret
the query based upon the knowledge graph,
and then whether I use dense vector
embeddings, sparse vector embeddings,
or some other interpretation technique.
They're all just tools in my
tool belt that you can apply
to the to the problem at hand.
Nicolay Gerold: Yeah, and I'm
really interested, architecturally,
the personalization component.
If you retrieve the different
user behavior vectors, in the end,
based on a user ID, how do you use
those, especially with, in light
on the clustering we mentioned?
Before as well, where you actually, you
retrieve those and then you have the
query and how do you filter out what
user behavior vectors to actually ignore.
Trey Grainger: Sure.
Okay let's go back to the microwave
Hello Kitty example from earlier.
Walk through it, so a user comes in
they've run a bunch of searches I forget
what I said, like Hello Kitty, plush toy,
or water bottle of white GE electric razor
a handful of other things, and then they
search for a stainless steel GE range.
So now they come and they
run a search for microwave.
So the question is, okay, we
could just do content context, we
could just run a keyword search
and find all the microwaves.
Or.
We could try to understand what
a microwave means in the context
of my domain and filter down
to appliances so that I'm not
seeing microwave cleaning kits.
I'm only seeing actual microwaves.
I could do that.
That's more of a semantic search, right?
Or I can take the personalization
context, which is what you asked about.
So what I would do in this case to,
I would do both of the first two, but
I would also for the personalization
piece of it, I would go in, I would say,
okay, I've got a query for microwave.
Let me take that and pass that
to, assuming I'm doing a dense
vector representation here, let
me pass that to my transformer.
I'm going to get an encoded version of
this as a dense vector and with that dense
vector, I'm now going to figure out in
my so I've previously clustered all of my
products, and I've got a hundred clusters.
I'm going to now find the nearest neighbor
to this vector in vector space, and
I'm going to say, oh, hey, the nearest
neighbor is, I'm going to make this up.
A kitchen appliances.
Great.
So found that and then I might use that
or I might say, Hey, actually find me
my other four, four more around that.
So I'm getting my top five nearest
neighbors to the microwave embedding.
So I might get, kitchen appliances,
small kitchen appliances, large
household appliances, whatever
those clusters are, right?
So that will make up the sphere of.
What I care about, and then to
personalize, I go to my either
my service or wherever I'm
storing my user behaviors, right?
All their signals, I then go
say, get me the list of signals.
Maybe it's one, maybe it's a hundred,
maybe it's a thousand, whatever
the previous user behaviors are.
Just filter those down to these clusters
and then now maybe down to five or
six or, a smaller number of behaviors.
Those behaviors, which in this
case I'm representing as vectors.
Each of those the vectors representing
the items that have been interacted with.
I can then take those and say these
are the relevant behaviors that on
behalf of the user we think the user
would care to be considered in the
context of this personalization,
context of this one query.
So I take all of those, and they all
have to do with kitchen appliances,
and I say, all right, let me average
these together and get a personalization
vector that represents the user.
It's like a user profile, but it's
contextualized to that one query
that they're running right now.
And then I say, okay, let me take
that, and within the context of
this vector, I probably have things
represented like, Stainless steel,
maybe a price range if they looked at
things within a certain price range.
Maybe a brand, GE if GE is somehow
represented in my embedding.
And so all of those concepts
based upon their past behavior.
Get bubbled up in this
personalization embedding.
And then I apply that as part of
my query to, boost results up or
down, or potentially just to run
a separate personalization query
and interleave like I talked about.
So that would be the general flow
for how I would go from users
tumping a query now to getting their
contextualized behaviors brought back.
That would be the flow.
Nicolay Gerold: That's so interesting.
I want to touch quickly on the different
search engines, especially like
OpenSearch, Elastic, Solr and Vespa.
What are, like, the scenarios
where you really would say you
would pick one over the other?
I'm not interested in exact
trade offs, but rather what are
scenarios where one really shines?
Trey Grainger: Yeah so I'll preface
this with saying I like them all and,
try to be generally vendor neutral.
I, I do search consulting and happy to
work with anybody, work on any of them.
But it's interesting.
I'm going to be a little PC
on this answer, so sorry.
I, so I told you I got started with Solr.
I love Solar.
Solar's near and dear to my heart.
It's an Apache Software
Foundation project.
It's great.
It doesn't have the commercial
backing that the rest of them do.
And so the development and upkeep, I
would say is slower than I would like.
And so I definitely see some of
the other engines surpassing it.
And it's the default in my book, I'm
actually using, so the book's got solar,
open search and others that you can use
with the book, but it's near and dear to
my heart but depending upon what you're
looking for it, it, there might be other
technologies that either are faster,
have some capabilities, things like that
so Elasticsearch came about to replace
solar they, took a lot of market share
a lot of really good technology there.
They had, issues with licensing where
they went non open source for a while
and now they're technically open source.
The license is not my favorite license,
but but as a result of that, Amazon
forks Elasticsearch and has OpenSearch.
At this point, I would say not
enough time has gone on that I would.
Draw a clear differentiating line
between Elasticsearch and OpenSearch.
OpenSearch is doing some new things,
Elasticsearch is doing some new
things and both are really cool.
So I think both are great projects.
I I like the governance
model of OpenSearch better.
They've got, the OpenSearch Foundation
is part of the Linux Foundation.
I just, I love open source.
I just feel confident about it.
But Elasticsearch has great technology.
They've got great engineers,
really smart people.
And they do good work.
And so if you're in that, if you're in
their ecosystem, in their sphere like I, I
think you can't go wrong with any of them.
Vest was an interesting one, because
they they've been around for probably
the longest of any of them because they
were the Yahoo search engine but they
were internal in Yahoo, then they split
out there's like Verizon Media, all
that kind of stuff, but they internally,
they split out, then it became an open
source company project, and then they
became an open source company, and then
they raised a bunch of VC capital and I
look at Vespa and the, what I generally
see in terms of, my work with it and
in terms of implementation is it's
like everything in the kitchen sink.
It's very powerful.
It was written by people who are
very smart and think about search
in a very, I don't want to say
correct way but like it's very clear
that they're somewhat visionary in
terms of how they've designed it.
But it's also intimidating when people
go to try to get started with Vespa,
they're like this like I'm talking in
the terminology of vectors and dimensions
and it's like from day one, there's
like a learning curve to get over.
And so what I tend to see is people
who hit a limit with some of the
other engines will go to Vespa because
they're like, Hey, I need all this
power and it's worth the learning curve.
And people who are trying
to, approach search, not.
Maybe without as much sophistication, just
might be scared of it because it's a lot.
And that's not to say that Vespa is the
best at all things because it's not.
It's just to say that it's
very capable and it's getting
more approachable over time.
But I think, at least in, in my
circles I think Vespa is very powerful.
And to the point where it
intimidates some people.
And I think that'll change
over time, I'm sure.
But I hopefully I'm not saying
anything that anybody would
disagree with or be offended by.
But they all have their place
and they're all really good.
And they also all, I should
mention leverage each other
as competition to get better.
Vespa is focused on getting simpler
Elasticsearch, I'm going to say,
the open search pressure forced
them to go back to open source.
Maybe they would claim
there's other reasons, but.
Either way Solr, when Elastic
came about, changed a lot of its
architecture to meet the demand.
So I think it's really good
to have a lot of options.
It's really good to have competition.
And I think everyone can see,
their strengths and their
weaknesses and try to minimize
weaknesses and focus on strengths.
But they're all great.
I recommend all of them for
a project, depending on the
project and what your needs are.
No, no bias towards, any one particular.
Sorry.
Nicolay Gerold: And what are the
most exciting or promising things
on the horizon for AI poet search?
Trey Grainger: The most
promising things on the horizon.
Ah, man there's so many things.
I write about a lot in the book, and
I think that my message to people,
whether you're getting into search
afresh because of generative AI,
whether you've been here for a while
and you're trying to get more of the
AI aspects is there's a wide spectrum
of tools and techniques and approaches
many of which we've talked about today.
And it's really useful to familiarize
yourself with them and try to apply them
where it makes sense and just to know
what's possible and then how to do it.
So that's start, the starting point is
there's a lot of unexplored territory
that most people just need to explore.
But beyond that, in terms of
the future I'm really excited
about a couple of things.
One is these late interaction
models like Colbert Copali.
Where it's funny, we we, so that's one
of the, we've gone from a bunch of a
bag of words and a bunch of keywords and
I'm matching on each keyword and then
adding up, how well each of my keywords
matched to, and we shifted the pendulum
to, Hey, let's just take the whole
document and turn it into one embedding.
That represents the meaning
of the entire document.
And then we decided, oh wait,
actually we're losing too
much, nuance from the document.
So let's shift back now, and
let's start chunking documents.
This sentence gets an embedding,
and this sentence gets an
embedding, and this sentence.
And then we're like now each sentence
doesn't have the context of the
whole document or around it, so now
let's contextualize the sentences.
Overlapping chunks and taking chunks and
doing contextualized embeddings where
I take an embedding of the document
and an embedding of the sentence and
I combine them together to have the
sentence in the context of the document.
We're doing all these weird amalgamations
of trying to split apart the document
in some kind of semantic chunks.
And it works, but it's a
lot of pain and, it's a lot.
What I like about something like Colbert.
And this later interaction model, is
it goes back almost to the original
idea of mashing on individual keywords.
But it does it with embeddings.
So it goes through a document and
says every keyword in this document,
I'm going to create an embedding for.
Similar to what we used to do
with Word2Vec back in the day.
Every word has an embedding, but
unlike Word2Vec, which always has the
same meaning of every word and it's
not contextualized by surrounding
words, with Colbert, we give each
word a contextualized meaning
based upon the context surrounding
it in the rest of the document.
So each word gets its own contextualized
embedding based upon its particular
meaning and interpretation there.
We do the same thing with the query.
So the query comes in, we
take, each individual query
word, we generate embeddings.
But what we can then do is we take all
of the embeddings, all the words in the
query and all the words in the document.
Represent them as embeddings, and
then compare the similarity of
each of those embeddings as we go.
And then just take, for each word
in the query, we take the maximum
score of any word in the document,
and then we can add those together.
It, it's not that, it's not complicated
at all, but it does take a lot of extra
storage to store that many embeddings.
But what it does is it allows
us to go back to a I'm
matching on keywords again.
I'm matching and ranking on keywords,
but I'm interpreting the meaning
of those keywords based upon the
semantic context around it as opposed
to just is this string there or is
it not, which is what we did with,
traditional Boolean matching in VM25.
So I and those models
are vastly outperforming.
These other approaches that we're trying.
And when you look at Colpally
versus all the OCR stuff that we
talked about before it's massively
outperforming what was there before.
So I think just, not that we're getting
back to the roots, because this is a new
technique, but this notion of thinking
of words as, contexts of meaning.
It is like clusters of meaning, and they
have some context and matching on the
clusters of meaning and how they combine.
I think is core to what it
means to interpret language.
And I like that these new approaches
are, I think, more appropriately
focused on interpreting language as
opposed to just using an embedding
model and expecting magic to happen.
Nicolay Gerold: Yeah, and if people
want to start building this stuff
we talked about, what are, in your
opinion the best resources out there?
And I'm already gonna say your book
is probably the best one I've found so
far, so I'm gonna throw that in there.
So maybe shout out, like,
where people can find that?
Trey Grainger: Yeah.
For the book.
Yeah, so it's Manning is the publisher.
So if you go to AIPoweredSearch.
com You can buy the book.
Also, it hasn't launched yet, because
the book's literally gone to the presses,
and probably a couple weeks, and you'll
have it in your hand if you've bought it.
But we also have a website
launching for it, and part of that
website is actually a community.
One thing I'm trying to do is make
the book not just not just a book,
but also to create it as a resource
to have a website in a community where
people can join and actually have
discussions and talk and interact and
discuss the latest and greatest and
the forefront of AI powered search.
And so we've got like three, we've
got maybe 400 people members who've
joined already, mostly search experts
and people from conferences and things
like that, but in terms of resources.
The book is a key one the website, but
I'm trying to create a larger community
around AI powered search, the concepts
in the book, and really promote the
companies and the people in the space.
I'm hoping that will be, one
of the best resources in our
field for this going forward.
But otherwise, yeah, I'll shout out
to all the companies in the space.
Who are doing, massive evangelism.
All of them do it, OpenSearch,
Weaviate Vespa, everybody.
I would say that if I look at the
quality of the content coming out of
or at least the amount of really smart
people who are trying to explain this
to people companies like WeV8 do a
really good job with, their evangelism.
Vespa I think you had Joe on your Joe on
your He's been on the podcast recently.
He's super bright and does a
great job of evangelism as well.
There's so many people
from all these companies.
I shouldn't probably be calling out
specific names because there's so many.
But I hope that with the book and
with the website, we can actually pull
a lot of these people in and try to
like cross post and share a lot of this
content to just really get it out there.
Because there are a lot of people
who are trying to get into search.
Get into retrieval and try to understand
it and the resources are out there, but
they're definitely spread across, you
know everywhere different companies blogs
and YouTube channels and I think that
having a resource where we can pull it
together and share will be really helpful
Nicolay Gerold: Nice, and I will
put all of that in the show notes.
If people want to follow along with you
personally where would you point them?
Trey Grainger: Twitter, LinkedIn
Trey Grainger I've got a company
called Search Kernel I'm going to
be hanging out in the AI powered
community, so if you go to community.
aipoweredsearch, all one word, com you
can pretty much catch me there any time
if you want to ask questions or hang out.
That's where I would go.
Nicolay Gerold: So what can we take away
when we are building search applications?
First of all, RAG is not
just a vector search.
You want to build RAG as a
continuous feedback loop.
So generative AI helps to interpret
queries and summarize results and
also add for example, metadata
filters to improve the search results.
The retrieval provides up to
date and accurate information
to generate better results.
So each component enhances each other
and we can even loop, we can do like a
chantic rack where we ask the LLM whether
we have enough results and basically
run a new retrieval with a newly or
generated query or just an adjusted query.
And through that, we have Like way more
capabilities than this only vector search
component and then we can basically
layer additional search techniques
on top like for example a traditional
lexical search or more personalization
through personalization vectors and
stuff like that So we basically build
a pipeline So first, like, basically,
query understanding, you interpret the
query, you classify the intent, you
extract entities, you extract filters.
So, for example, type filters
are a very common one.
Then you retrieve, you use
multiple retrieval strategies.
You use keyword search, you use vector
search, and then you basically combine
them, you fuse them, and re rank.
And that's it.
At domain specific filters.
So here you could place your
knowledge graph and lastly, you
enhance the results with a lens.
You can summarize them.
You can gather additional context.
You can generate follow up questions.
And then lastly, you have like the
iterative refinement, like a loop.
you don't have an adequate answer
to the user query, or you don't have
enough results or enough good results
that you can actually answer a query.
You reformulate it and you run an
additional retrieval, or if it's
enough, you generate a final response.
And I think this is a really interesting
like architecture pattern because it's
also, it creates components, it creates a
separation of concerns, and you can also
Decide okay, what is actually necessary?
I think Trey showed like the different
architectural components and how you
could combine them and for you what's
most likely is interesting is like
Picking a few which you think bring
the biggest bang for your buck and
just implementing those and Which are
actually most suited for your use case.
For example, in e commerce, you
might not want to use like this
iterative refinement or like really
expensive result enhancement.
Also, you won't use it because
it will add too much latency.
So you take the components that
actually suit your use case.
And you build a pipeline like that.
And I think like having an overview
of all the different things you
could do and for what they are
well suited is really interesting.
And this really leads into like his
three critical context frameworks,
which help you actually decide, okay.
In my type of search system I'm
building, what is actually relevant?
Because his three contexts, like
to summarize again, or paraphrase,
you have content, you have
user, and you have to domain.
The content basically is
the raw document processing.
The user context is the behavioral
patterns and preferences.
The domain context is like the
business and domain specific knowledge.
And for business and domain
specific, there might be a mismatch.
What is business and domain specific
for your user and for your company?
And, Then you basically have the
implementation of these different
contexts, like for content, vector
embedding, keyword indexes, document
structure understanding, for user,
click tracking, query logging,
purchase, interaction history,
session behavior, user clusters, for
domain context, knowledge graphs,
entity relations, business rules,
taxonomies, semantic functions.
And you basically have these
different technologies.
And basically like a tool set
and then you look at your search
application and look at especially
like, okay What's going wrong?
Like My users in the front end are
not really clicking the results,
but where is it coming from?
Am I serving inadequate result?
Like is the content actually bad
then I might have to fix my content
is Is it not personalized enough?
Then I have to improve the user context
and basically treat it as a grab bag of
different techniques that you can layer
with each other to find a solution.
And in most cases, it won't be like one
or the other, but a combination of those.
So you need to combine them in the end
to build like a complete search system.
But you always fix the
biggest issue first.
Like, where it's burning the most.
Is it content garbage, is
it search garbage, or is it
the personalization garbage?
Pick this, fix it, go to the next.
And, this is like, one of the
things he mentioned he often sees
when people do this, is like this
witch's cauldron anti pattern.
Which, is, Is that people are using
like one monolithic ranking model So
you take whatever signal you find like
bm25 vector similarity um popular items
click through rates conversion signals
user preferences recent behavior of the
users you take all of it and Slap it into
like a massive model And basically hope
something good comes out of it And it's
all good and well if it works, congrats.
But if it doesn't work, you will
have like a lot of troubles because
you can't debug it and you will need
to retrain the entire model from
scratch to actually try to fix it.
So your iteration loops are way longer.
So instead, you shouldn't throw
everything together in one part, but
rather separate the different components.
Find smaller little fixes
and componentize it.
Like, this is also like, it's really
software engineering ask that you want
to separate the different components,
isolate them so you can easily trace bugs.
You can tune them in isolation and
you can easily AB test adjustments
and also make it easier to maintain
because you can change a similar
system or a singular component.
And he really advocated this like
layered architecture, um, with a
different signals, um, which can
be turned off and on, which is
something I found really interesting.
So you basically have that
base layer, which is always on
like BM25 and Vector Similarity
Search, which does like the split.
basic, basic relevance scoring.
I mean, these are like your
core matching criteria.
Then you have like the middle layer,
which is more like a signal boosting,
which could be like popular items,
click through rates, conversion
signals, historical performance as well.
And then you have like the
top layer, which is more like
the personalization part.
So user preferences, recent behavior,
category of affinities, or The positions
and you keep these independent and can
then switch switch them on switch them
off also on the query type And use them
together Isolate them and you really
can see okay where are my results coming
from I have Uh, basically in my result
set I delivered to my users like he
clicked on one of the bottom ones so
I Got them out of the base layer, but
my personalization layer or my signal
boosting layer basically downranked
it So basically you can see, okay,
where might my issue be coming from?
And then you can tune the component,
which actually was in the wrong.
And one more part in
that is basically his.
This is the smart personalization.
Um, the, this technique he mentioned
is basically the user behavior
clustering, where you can create
different category clusters and map
user interactions to the clusters.
I think this was a really
interesting approach.
I actually, I think I need to try to code
it up to really understand it, but this
could be something really interesting
and something pretty easy to implement.
And maybe to note last, I think
we already went a bunch into
that, like, modern search systems,
like, where it is evolving.
And we are really getting into, like,
a more contextual understanding.
With, for example, cold
poly, but also cold word.
Where we have, like, word level or
image patch level understanding.
And we have embeddings for each.
And we can maintain word relationships.
And also we can trace back,
okay, where might it be coming
from and why is it relevant?
And this is much more compute intensive,
but also leads to way better results.
And I think we will see more and more of
that, especially as the technologies are
getting better to actually deploy them.
Um, and this could also mean
that we might get away with worse
retrieval systems, but with more.
Powerful re ranking systems.
I think this really fits into the general.
theme of generative AI where we have
more inference time compute and so we
over search, so basically we retrieve
a lot of documents and then we run them
through a re ranking system and when
it's efficient and fast enough, um, we
might actually get very decent results or
way better results than we do currently
when we are just using semantic search.
Yeah, that's it.
Um, I will stop talking now, I think this
is already the longest episode we had.
Um, I really can only endorse
buying the book from Trey.
Um, I think I read it twice right
now, once when I bought it and
once when I got Trey on as a guest.
Um, and, um, it came out a few
weeks ago, like the final version.
And get it.
I think it's available everywhere.
Also join Trace community
here for a search.
I'm also in there.
Um, if you want to chat.
Otherwise, we will be
continuing our search next
week and always like subscribe.
It helps a lot.
Leave a comment, um, especially
on the podcasting platforms.
Otherwise, I will see you
next week and see you soon.
Listen to How AI Is Built using one of many popular podcasting apps or directories.