The "Write Speed" Barrier.

You are building an IoT app. You have 10,000 sensors sending temperature data every second. You try to insert this into Postgres (INSERT INTO readings...). Your disk usage hits 100%. The database slows to a crawl.

Why? You are fighting the B-Tree.

1. The Problem: Random I/O

Traditional databases (Postgres, MySQL) use a B-Tree to organize data. Think of a B-Tree like a sorted Library Bookshelf. When you insert a new "Book" (Row), you have to:

  1. Find the exact spot on the shelf where it belongs (Alphabetical order).

  2. Shift the other books to make room.

  3. Place the book.

On a hard drive, this causes Random I/O. The disk head has to jump around wildly to find the right "Page" to update. This is slow.

2. The Solution: The LSM Tree (Sequential I/O)

High-speed databases (Cassandra, RocksDB, ScyllaDB) use a Log-Structured Merge-Tree (LSM Tree).

Think of an LSM Tree like a Journal. When a new "Book" comes in, you don't put it on the shelf. You just write it down on a notepad on your desk (The MemTable).

  • Writing to a notepad is instant. You just write the next line. (Sequential I/O).

  • You don't care about sorting it yet.

The Workflow:

  1. MemTable (RAM): All writes go here first. It's fast because it's in memory.

  2. SSTable (Disk): When the RAM gets full, we flush the sorted list to the disk as a permanent file. We never modify this file again (Immutable).

  3. Compaction: Later, a background process wakes up, reads the messy pile of files, merges them, throws away deleted data, and creates a clean, sorted file.

By delaying the sorting, we turn "Random Writes" into "Sequential Writes." This makes LSM Trees 10x to 100x faster at writing than B-Trees.

The Trade-off

If LSM Trees are so fast, why doesn't everyone use them? Read Speed.

  • B-Tree (Postgres): "I know exactly where the book is on the shelf." (Fast Reads).

  • LSM Tree (Cassandra): "I have to check the Notepad (RAM), then check the Pile on the Desk (Level 0), then check the Box in the closet (Level 1)..." (Slower Reads).

Use Postgres for consistent data (Banking). Use Cassandra/LSM for massive ingest streams (Chat Logs, Sensor Data).

3. The Cerebral Gym

Yesterday's solution (The Write Structure)

The puzzle was: What is the name of the log-structured tree architecture used by RocksDB?

The Answer: LSM Tree (Log-Structured Merge-Tree).

Today's puzzle (Counting Distinct Items) Wednesday is for Big Data.

You have a stream of 1 Billion IP addresses hitting your server. You want to count how many UNIQUE visitors you have.

  1. Storing them in a Set takes 100GB of RAM.

  2. You are okay with being 99% accurate (e.g., answering "1,005,000" instead of "1,000,000").

  3. You want to use only 12 KB of RAM.

The Question: What is the specific name of the probabilistic algorithm used to count unique elements with very little memory?

(Reply with the algorithm name!)

4. THE PULSE: Resources of the day

  • RocksDB (The Engine) This is the C++ library (created by Facebook) that powers almost everything. It is an embeddable LSM Tree engine. MySQL (MyRocks), Kafka Streams, and CockroachDB all use RocksDB under the hood to handle writes. Link: rocksdb.org

  • ScyllaDB If you like Cassandra (LSM Tree) but hate Java (Garbage Collection pauses), look at Scylla. It is a C++ rewrite of Cassandra. It is absurdly fast and claims to handle 1 million writes per second on a single node. Link: scylladb.com

  • LevelDB The grandfather of them all. Written by Jeff Dean at Google. It is simpler than RocksDB but great for learning how LSM Trees work by reading the source code. Link: github.com/google/leveldb

5. THE LATENT SPACE

"Append-only is the only truth."

In life and in databases, you cannot change the past. B-Trees try to "rewrite" history by updating pages in place. LSM Trees accept that history is immutable. They just append a new record saying "The value is now X."

If you need speed, stop organizing. Just start writing.

See you tomorrow.
Harsh Kathiriya - Query & Context

Keep Reading