The Sunday Commit.

Grab your coffee. ☕

We’ve spent the last three days deep in the trenches—building EtLT pipelines, enforcing Pydantic schemas, and one production horror story.

Today, we pause the IDE and look at the Architecture.

Sundays at Query & Context are for "System Design Thinking." We focus on the decisions that save you 100 hours of coding later.

Today’s topic: Caching.

1. THE CONCEPT: The Infinite Context Window

There is a myth that Senior Engineers memorize documentation. They don't. In fact, most Staff Engineers I know look up basic Python syntax daily.

The difference between a Junior and a Senior engineer isn't Compute (typing speed/IQ); it’s Context Caching.

  • Junior: Re-computes the problem from scratch every time. "How do I connect to Postgres again?"

  • Senior: Has "cached" the architecture patterns. "I know this needs a connection pooler because we hit connection limits last year."

In 2026, with AI coding assistants, "syntax memorization" has a market value of $0. Context Management is the only skill that matters.

Your goal for this week: Stop trying to memorize the code. Start memorizing the patterns.

2. THE ARCHITECTURE: Context Caching in LLMs

This concept applies literally to our infrastructure too.

One of the biggest money-wasters in GenAI right now is Input Redundancy.

If you are building a "Chat with your Docs" bot, users often ask five follow-up questions about the same 50-page PDF.

  • Standard RAG: You re-send the entire 50-page PDF text (the context) to the LLM for every single question.

  • The Cost: You are paying for those input tokens 5 times.

  • The Latency: The LLM has to re-process that text 5 times.

The Fix: Context Caching (Prompt Caching) Anthropic and Google have rolled this out recently. You can mark a specific segment of your prompt (the heavy system instructions or the massive PDF content) as "Cached."

The Engineering Pattern: When designing your prompt builder, separate the Static Context from the Dynamic Query.

# Conceptual Architecture for Context Caching

messages = [
    {
        "role": "system",
        "content": [
            {
                "type": "text", 
                "text": massive_compliance_pdf_content, 
                "cache_control": {"type": "ephemeral"} # <--- The Magic Flag
            }
        ]
    },
    {
        "role": "user",
        "content": "What is the termination clause?"
    }
]

Why this matters for your Monday Morning: Check your API bills. If you are sending the same System Prompt or RAG Context chunks repeatedly within a short window, you are literally lighting money on fire. Enable Caching.

3. THE CEREBRAL GYM: Solutions & Whiteboarding

Yesterday's Solution (Docker Networking)

The Challenge: Why does localhost fail when connecting from a container to the host database?

The Answer: A Docker container has its own isolated network namespace. Inside the container, localhost (127.0.0.1) refers to the container itself, not your laptop. Since your Postgres is running on your laptop (not inside that container), the connection is refused.

The Fix: On Docker Desktop (Mac/Windows), use the special magic DNS name: host.docker.internal

  • Config: DB_HOST = "host.docker.internal"

  • Linux Users: This feature wasn't native to Linux for a long time. You typically have to use --add-host=host.docker.internal:host-gateway in your run command or docker-compose to enable this bridge.

Today's Puzzle (System Design)

Sunday is for whiteboarding. No code today, just logic.

The Challenge: You are designing a Notification System (like Twitter/X). Millions of events happen per second. However, if a celebrity like Elon Musk tweets, 50 million people might need to be notified instantly. This is the "Thundering Herd" problem.

The Question: If you push 50 million notifications into a standard queue (like Kafka/SQS) linearly, the users at the end of the list won't get the notification for 4 hours. How do you architect the queue consumers to ensure everyone gets it within 60 seconds?

(Reply with your architectural approach. I'll share the "Fan-out on Write" vs "Fan-out on Read" trade-offs tomorrow.)

4. RESOURCE OF THE WEEK

Since we are discussing patterns, I strictly recommend "The System Design Primer" on GitHub. It is the bible of backend engineering.

But for the AI era, I recommend a new favorite: The Chip Huyen Blog. If you want to understand how "Data Engineering" becomes "AI Engineering," her posts on Real-time Machine Learning are essential reading for your Sunday afternoon.

5. THE LATENT SPACE

"Simplicity is prerequisite for reliability."

Edsger W. Dijkstra

We often over-engineer RAG pipelines with complex re-ranking agents and graph databases. Sometimes, the best search engine is just... a good SQL query with a keyword filter.

Don't let the AI hype make you forget the fundamentals.

Rest up. We write code at dawn.

See you tomorrow.
Harsh Kathiriya - Query & Context

Keep Reading