Your Vector DB is just guessing.
If you are frustrated that your RAG system retrieves irrelevant documents, it’s because you are trusting Vector Search too much.
To understand why it fails, think of a Dating App.
1. Vector Search is "Swiping Right" Vector Search (Embeddings) looks at a profile picture and a bio. It makes a split-second decision based on general "Vibes" (similarity).
Query: "Senior Python Engineer."
Vector DB: Finds a resume that says "I hate Python" because the words are similar. It is fast, but superficial. It swipes right on everyone who matches the keywords, even if they are a bad fit.
2. Reranking is "The First Date" You cannot marry someone based on a swipe. You need to go on a date. A Reranker (Cross-Encoder) sits down with the candidate and actually listens to what they say. It reads the specific grammar, the "nots," and the nuances. It is slower, but it tells you the truth.
If you skip the date and propose marriage to the first person you swiped right on (sending raw Vector Search results to the LLM), you will have a very bad time.
1. The Architecture: Two-Stage Retrieval
We fix this by using both methods in a pipeline.
The Dragnet (Vector DB): We ask the Vector DB for 50 documents. We know 40 of them will be trash, but we want to make sure we catch the right one somewhere in the pile. (High Recall).
The Filter (Reranker): We pass those 50 documents to a Reranker model. It scores them from 0.0 to 1.0 based on how well they answer the question. We keep the Top 5. (High Precision).
The result? You stop feeding "noise" to your LLM.
2. The Code: Implementing the Filter
You don't need a massive GPU for this. We can use FlashRank, a tiny library that runs on your CPU.
from flashrank import Ranker, RerankRequest
# 1. Setup the "Judge" (The Reranker)
# This model is tiny (~40MB) and runs locally.
ranker = Ranker(model_name="ms-marco-TinyBERT-L-2-v2")
def search_with_brain(user_query):
# Step A: The Swipe (Vector DB)
# Get 50 candidates. Many will be irrelevant.
candidates = vector_db.search(query=user_query, k=50)
# Step B: The Date (Reranking)
# The ranker compares the query to each document deeply.
request = RerankRequest(query=user_query, passages=candidates)
results = ranker.rerank(request)
# Step C: The Marriage (Top 5)
# Only send the absolute best matches to GPT-4
top_5 = results[:5]
return top_5
This simple changes usually bumps retrieval accuracy from 60% to 85%.
3. THE CEREBRAL GYM: Solution & New Puzzle
Yesterday's solution (The CAP Theorem)
The puzzle was: In a distributed system over the internet, which of the 3 CAP properties is non-negotiable?
The Answer: Partition Tolerance (P). You cannot guarantee the internet cables won't be cut. Network failures (Partitions) are a fact of physics. You only get to choose between CP (Go offline to save data) or AP (Stay online but serve old data).
Today's puzzle (The Lost Money) You are building a banking app.
Transaction A reads your balance:
$100.Transaction B reads your balance:
$100.Transaction A adds interest ($10) and writes
$110\.Transaction B withdraws money ($50) and writes
$50\.
The Result: The database now says $50. The $10 interest check just vanished into the void because Transaction B overwrote Transaction A's work without knowing it.
The Question: What is the specific name of this concurrency bug?
(Reply with the name!)
4. THE PULSE: Industry Signals
FlashRank The library used in the code above. It is ultra-lightweight and designed to run in serverless environments (like AWS Lambda) where you can't install heavy AI libraries.
RAGatouille (ColBERT) If you want to get fancy, this library implements ColBERT. It's a special type of search that is almost as fast as a Vector DB but as smart as a Reranker. It's the current "State of the Art" for retrieval.
Sci-Kit LLM Everyone knows Scikit-Learn. This wrapper lets you use LLMs inside standard SKLearn pipelines. You can use GPT-4 as a "Classifier" or "Transformer" just like you would use a standard Logistic Regression.
5. THE LATENT SPACE
"Recall is for the machine. Precision is for the user."
Your Vector DB is a dragnet. It catches tires, boots, and fish. Your Reranker is the quality control officer. If you cook the tires and serve them to your LLM, don't be surprised when it chokes.
Add a reranker.
See you tomorrow.
Harsh Kathiriya - Query & Context

