The "Zombie Data" Incident.
I was building a collaborative "To-Do List" app (like Trello). We had two servers behind a Load Balancer.
User A (connected to Server 1) creates a card: "Fix Bug."
User B (connected to Server 2) sees the card and immediately deletes it.
The Database Logic: We used "Last Write Wins" (LWW). If two updates happen to the same ID, we keep the one with the higher Timestamp.
The Disaster: User B deleted the card. But 1 second later, the card reappeared. User B deleted it again. It reappeared again. It was a zombie card.
The Root Cause: Server 1’s clock was perfect (12:00:05). Server 2’s clock was lagging by 2 seconds (12:00:03).
User A (Server 1) creates card. Timestamp:
12:00:05.User B (Server 2) deletes card. Timestamp:
12:00:03(because the server was lagging).The Database compares them:
Write:
12:00:05Delete:
12:00:03
The Database thinks the "Write" happened after the "Delete." It ignores the delete.
I learned the hard way: You cannot trust Date.now() in a distributed system.
1. The Failure: Clock Drift
Even with NTP (Network Time Protocol), servers drift. In AWS, it is common for clocks to drift by 50ms–500ms. If your logic depends on Server A and Server B having the exact same time, you will lose data during race conditions.
2. The Fix: Vector Clocks (Logical Time)
To fix this, we stop tracking "Time" and start tracking "Causality." We use Vector Clocks.
Instead of a timestamp, every piece of data carries a "Version Vector"—a list of counters, one for each server. [Server1: 0, Server2: 0]
The Scenario Fixed:
User A (Server 1) writes.
Increment Server 1 counter:
[1, 0].Save Data with Version
[1, 0].
User B (Server 2) reads
[1, 0]and deletes.Increment Server 2 counter:
[1, 1].Send Delete with Version
[1, 1].
The Database Check:
Current Data:
[1, 0]Incoming Delete:
[1, 1]
The DB sees that [1, 1] is "greater than" [1, 0] (because all numbers are equal or higher). It knows the Delete is the child of the Write. It accepts the delete, regardless of what the wall clock says.
3. THE CEREBRAL GYM: Solution & New Puzzle
Yesterday's solution (Time)
The puzzle was: What is the advanced Lamport Clock that uses an array of integers to track concurrent events?
The Answer: Vector Clocks.
Today's puzzle (Rate Limiting) Saturday is for Systems.
You want to limit API requests to 100 per minute. The naive approach (reset counter at :00) allows a burst:
100 requests at 11:59:59.
100 requests at 12:00:01.
Total: 200 requests in 2 seconds. This crashes the server.
You switch to an algorithm that acts like a bucket with a hole in the bottom. Requests fill the bucket; the hole drains them at a constant rate. If the bucket overflows, requests are rejected.
The Question: What is the specific name of this rate-limiting algorithm?
(Reply with the name!)
4. THE PULSE: Tools of the week
Automerge / Yjs (CRDTs) If you are building a collaborative app (Google Docs, Figma, Trello clone), do not implement Vector Clocks yourself. Use a CRDT (Conflict-free Replicated Data Type) library like Yjs. It handles all the vector math, merging, and sync logic for you. Link: github.com/yjs/yjs
CockroachDB (HLC) CockroachDB uses a fascinating hybrid called Hybrid Logical Clocks (HLC). It combines physical time (NTP) with logical counters to get the best of both worlds: Human-readable timestamps that are also causally correct. Link: cockroachlabs.com/blog/living-without-atomic-clocks
Amazon Time Sync Service If you are on AWS, use this. It uses satellite-connected atomic clocks in each availability zone to keep EC2 instances within microseconds of UTC. It reduces drift, but doesn't eliminate it. Link: aws.amazon.com/about-aws/whats-new/2017/11/introducing-amazon-time-sync-service
5. THE LATENT SPACE
"Time is an illusion. Timing is everything."
Einstein was right about the universe, and he was right about servers. In a distributed system, there is no "Now." There is only "Before" and "After." If you rely on the clock on the wall, you are just guessing.
Have a synchronized weekend.
See you tomorrow.
Harsh Kathiriya - Query & Context

