Perspectives

AI-Powered Search and Retrieval-Augmented Generation 

AI is changing a lot of things, but one big change is search. You may have noticed Google returning “AI Summaries” and other search engines returning similar responses enhanced with AI. These new summaries provide an overview in plain language and often include relevant links before the traditional list of results. You might wonder, is this something we can integrate to provide better on-site search responses for our own site?

Built-in CMS search tools, which often use SOLR or other keyword-based tools, often work great to provide relevant search results on your site. But what if we could provide better results or enhance the results to be in a more conversational or personal style? With AI techniques like vector search and retrieval-augmented generation, we have an alternative search solution. Can these new AI techniques return better results than conventional keyword search?

What is Vector Search?

Vector search uses AI (technically a machine-learning model, but I’ll just call it AI for simplicity) to find related records based on vector embeddings. Sounds complicated, and its implementation is complex, but understanding how it works at a high level isn’t too difficult.

Imagine the content you want to index, let’s say all the blog posts on a website. Using AI, we generate vector embeddings for each post from the content within the post and store those vectors alongside the content in a database. You can think of vectors like coordinates on a map. They make it possible to compare content’s semantic similarity by comparing how close different posts vectors are to each other.

Now, when we search for blog posts, we convert our search text to vectors (also using AI), then search for the closest vectors to our search vectors. Thinking again of our map, instead of searching for “restaurants near me,” you’re searching for “blog posts near my question.” Similarly, we’re returning results ranked from closest to furthest.

So, what’s the difference between this and traditional search? Traditional search is usually based on keywords. While this is still a good approach in many circumstances, this new method of similarity search can provide a different set of results that are similar based on meaning.

For example, imagine a search for “Golden Retriever.” A keyword search might return results for only golden retrievers, whereas a vector similarity search may also return results relating to any breed or general content about dogs. Depending on the context, you may prefer one search strategy over the other.

Retrieval-Augmented Generation

So, what is retrieval-augmented generation (RAG) and how does that relate to vector search?

Retrieval augmentation is just doing a search first (retrieval) and providing the results to the LLM. This gives the LLM some context, which it can use to provide a more natural response rather than just a list of documents. Depending on what parameters or prompt you use to trigger the LLM, this could be used for a chatbot or anything where a more specifically formatted response is desired.

The retrieval could be any type of retrieval; it’s not limited to vector search. For instance, it could include web search results, a dump of documents, or vector search results. Combining this with a vector search means utilizing AI twice, in both retrieval and generation.

With RAG, we can ensure that only the most relevant context is used to generate a response. This approach is more cost-effective because AI systems charge based on how much text they process. Each piece of text (called a “token”) has a small cost. AI providers also have usage limits on their systems. Even as LLMs have become exponentially more powerful than they were at their debut, RAG helps us remain under these usage limits so we can make more requests.

When implementing RAG, there are many things to consider. How granular should you split up the content you are indexing? Your content must be split into records, and each record will need to be sent to AI for vectors to be generated and attached to the record for the similarity comparison. Smaller records mean more vectors but, depending on the data, may not provide the best results. Larger records mean fewer vectors but may provide more context than needed.

With AI, experimentation is still the key. There are no hard answers on how to do anything; guidelines and best practices are still being formed. The best results for your use case will be found through trial and refinement.

Subscribe to our newsletter

Get our insights and perspectives delivered to your inbox.