Architecture: Kafka

Every time you place an order, stream a video, or tap a payment terminal, dozens of systems need to know about it — simultaneously, reliably, in order. Getting data from where it's produced to where it needs to be processed, at scale, without losing anything, is one of the hardest problems in distributed systems. Apache Kafka was built specifically to solve it.

Kafka is now the backbone of real-time data infrastructure at most large technology companies. This post walks through how it's architected: what the pieces are, how data flows through them, and why the design choices were made the way they were.

What Kafka is — and what it isn't

Kafka is a distributed event streaming platform. At its simplest, it's a system where producers write records and consumers read them. But calling it a message queue undersells it significantly. Unlike a traditional message queue, Kafka:

Persists everything to disk — messages aren't deleted when consumed. They're retained for a configurable period (hours, days, forever). Multiple consumers can read the same data independently.
Scales horizontally — you add brokers to handle more load. There's no single bottleneck.
Maintains order within a partition — consumers receive messages in the exact order they were produced.
Supports replay — a consumer can rewind and re-read events from any point in time. This is invaluable for debugging, auditing, and rebuilding downstream state.

These properties make Kafka suitable not just for messaging, but for event sourcing, log aggregation, stream processing, and as the connective tissue between microservices.

The core abstraction: the log

Everything in Kafka is built on a single, simple data structure: the append-only log. A log is a sequence of records, ordered by time. New records are always appended to the end. Records are never modified or deleted mid-stream. Each record has a sequential integer position called an offset.

This is not a novel idea — it's the same structure as a database transaction log or a git commit history. But Kafka makes the log the primary interface for data exchange between systems, distributed across many machines, at very high throughput.

The beauty of the log model is its simplicity: a producer just appends records, a consumer just reads records from a given offset. State lives in the offset — "I've read up to position 4,721" is all a consumer needs to know to resume from where it left off.

Topics and partitions

A topic is a named category of records — think of it as a named feed or channel. Producers write to topics; consumers read from them. Topics are where you model your domain: an orders topic, a user-events topic, a payments topic.

A single topic can handle a huge volume of data, but a single machine can't store or process it all. Kafka solves this with partitions. Each topic is split into one or more partitions, and each partition is an independent log stored on a broker. Partitions are the unit of parallelism in Kafka.

When a producer sends a record, it goes into exactly one partition. The assignment is determined by:

A message key — records with the same key always go to the same partition (consistent hashing). This is what guarantees ordering for related events — all events for a given order ID, user ID, or account always land in the same partition, in order.
Round-robin — if there's no key, records are distributed evenly across partitions for load balancing.
Custom partitioner — application code can implement its own routing logic.

Within a partition, ordering is strictly preserved. Across partitions, there is no ordering guarantee. This is a deliberate trade-off: if you need global ordering, use one partition (and give up parallelism). If you need throughput, use many partitions (and accept per-key ordering only).

Brokers: the storage layer

A broker is a single Kafka server. It receives records from producers, stores them on disk, and serves them to consumers. A Kafka cluster is a group of brokers working together.

Each partition lives on exactly one broker at a time — that broker is the partition leader. All reads and writes for that partition go through the leader. Other brokers may hold replica copies of the same partition for fault tolerance, but they are followers — they replicate from the leader and can take over if it fails.

A single broker can handle thousands of partitions and millions of messages per second. The practical limits are disk I/O and network bandwidth, not CPU — Kafka's architecture is designed to use sequential disk writes, which are far faster than random writes and even competitive with in-memory operations on modern hardware.

When a producer or consumer first connects to the cluster, it can connect to any broker — every broker is a bootstrap server that knows the full cluster topology. The client then connects directly to the leader broker for each partition it cares about.

Replication and fault tolerance

Every topic in Kafka is configured with a replication factor — the number of copies of each partition across brokers. A replication factor of 3 means each partition has one leader and two follower replicas on different brokers.

Follower replicas continuously fetch from the leader and stay caught up. Kafka tracks which replicas are current using the In-Sync Replica (ISR) set. A replica is in-sync if it hasn't fallen too far behind the leader. Only ISR members are eligible to become the new leader if the current leader fails.

A write is only considered committed when all ISR members have acknowledged it. This is controlled by the producer's acks setting:

acks=0 — fire and forget. No acknowledgement. Maximum throughput, no durability guarantee.
acks=1 — leader acknowledges after writing locally. Fast, but data can be lost if the leader fails before replicating.
acks=all — all ISR replicas acknowledge. Strongest durability guarantee. Recommended for production.

Producers

Producers are the client applications that write records to Kafka. A record consists of an optional key, a value (the payload), optional headers, and a timestamp.

Producers don't write one record at a time — they accumulate records in an in-memory buffer and send them in batches. Batching is configurable: flush when the batch reaches a certain size, or after a certain time has elapsed (whichever comes first). Larger batches mean higher throughput and better compression ratios. Smaller batches mean lower latency. Tuning this trade-off is one of the most common Kafka producer optimisations.

Producers can also compress batches before sending — LZ4, Snappy, Gzip, and Zstandard are all supported. Compression happens on the producer, is stored compressed on the broker, and is decompressed by the consumer. This reduces both network bandwidth and disk usage significantly for text-based payloads like JSON.

Consumers and consumer groups

A consumer reads records from one or more topic partitions. Consumers pull from brokers — they ask the broker for records at a given offset, rather than the broker pushing to them. This gives consumers full control over their own pace and lets them replay data by simply resetting their offset.

A consumer group is a set of consumers that collectively consume a topic. Kafka assigns each partition to exactly one consumer within the group. This is the parallelism model: if you have 6 partitions and 3 consumers in a group, each consumer handles 2 partitions. Add a 4th consumer and partitions are rebalanced. Add a 7th consumer and one sits idle — you can't have more active consumers than partitions.

Multiple consumer groups can independently consume the same topic. Each group maintains its own offset, completely independently. A group of analytics consumers can be at offset 50,000 while a group of fraud-detection consumers is at offset 49,800 — they don't interfere with each other.

Consumers commit their offsets back to Kafka (stored in an internal topic called __consumer_offsets). This is what enables fault tolerance: if a consumer crashes, it restarts and reads its last committed offset to pick up from where it left off.

The control plane: KRaft

Every Kafka cluster needs a control plane: something to track which brokers exist, which partitions are assigned where, which replica is the leader for each partition, and what the configuration of each topic is. For most of Kafka's history, this was handled by Apache ZooKeeper — a separate distributed coordination service that ran alongside Kafka.

ZooKeeper was a dependency that added operational complexity: you had to deploy, monitor, and scale a separate system just to run Kafka. As clusters grew to thousands of brokers and millions of partitions, ZooKeeper became a scalability bottleneck.

Kafka 4.0 (released 2024) removed ZooKeeper entirely. The replacement is KRaft (Kafka Raft) — a consensus protocol built directly into Kafka itself. A small set of brokers act as controllers, using the Raft algorithm to elect a leader among themselves and replicate a metadata log. Regular brokers subscribe to this controller quorum for cluster state updates.

All cluster metadata — topics, partitions, ISR sets, configurations, ACLs — is stored in a single special internal partition called __cluster_metadata, replicated among the controller nodes. This is metadata-as-a-Kafka-topic, which means Kafka's existing replication machinery handles it, and there's no external system to manage.

The benefits are significant: faster failover, simpler deployment, better scalability, and one less system in your infrastructure stack.

Kafka's APIs

Kafka exposes four main APIs that together cover the full range of streaming use cases:

Producer API

Allows applications to publish records to topics. Covered above. Used by any system that is a source of events.

Consumer API

Allows applications to subscribe to topics and process the stream of records. Covered above. Used by any system that reacts to events.

Kafka Streams API

A client-side library for building stream processing applications — programs that consume from one or more topics, transform or aggregate the data, and produce results to output topics. Kafka Streams runs inside your application process (no separate cluster needed) and handles state management, windowing, joins, and exactly-once processing semantics. Common uses: real-time aggregations, enrichment, filtering, sessionization.

Kafka Connect API

Kafka Connect is a framework for building and running connectors — reusable integrations between Kafka and external systems. A source connector pulls data from a database, S3 bucket, or SaaS API into Kafka. A sink connector pushes data from Kafka into Elasticsearch, a data warehouse, or another database. The Confluent Hub has hundreds of ready-made connectors. Connect runs as its own cluster of workers, scales horizontally, and handles offset management and fault tolerance automatically.

Retention and compaction

Kafka doesn't delete records when they're consumed. Retention is time-based or size-based, configured per topic:

Time-based — delete records older than N hours/days. Default is 7 days.
Size-based — delete the oldest records when the partition exceeds a configured size.
Log compaction — instead of time/size deletion, keep only the latest record for each key. Older records with the same key are garbage collected. Used for topics that model current state rather than a history of events — e.g. a user profile topic where only the latest profile matters.

Retention is what makes Kafka suitable as a data source for new consumers to bootstrap from. A new analytics service can start from the beginning of a 30-day retention window and catch up, rather than needing a separate database snapshot.

Delivery guarantees

Kafka supports three levels of delivery guarantee, configurable at the producer and consumer level:

At most once — records may be lost, never duplicated. Highest throughput. Used where occasional loss is acceptable (metrics, click events).
At least once — records are never lost, but may be delivered more than once on failure. Default behaviour. Consumers must handle duplicates (idempotent processing).
Exactly once — records are delivered precisely once, end-to-end. Enabled via Kafka's transactions API and idempotent producer configuration. Highest correctness guarantee, with some throughput overhead. Essential for financial transactions, inventory updates, and anything where double-processing causes real-world harm.

The full architecture

Here's everything together — producers, brokers, partitions, replication, consumer groups, KRaft controllers, Connect, and Streams:

Common use cases

Event-driven microservices — services publish events to Kafka topics rather than calling each other directly. This decouples producers from consumers: the order service publishes an order.placed event; the inventory service, the notification service, and the fraud detection service all consume it independently. Adding a new consumer requires no change to the producer.

Real-time data pipelines — Kafka sits between data sources (databases, applications, IoT devices) and data sinks (data warehouses, search indexes, analytics systems). Kafka Connect handles the ingestion and egress; Kafka provides the durable, replayable buffer in the middle.

Stream processing — Kafka Streams and ksqlDB allow you to run continuous queries over streams of data: "count orders per region in the last 5 minutes," "join user events with profile data," "detect anomalies in payment patterns." Results are written back to output topics.

Activity tracking — LinkedIn built Kafka originally to track user activity (page views, clicks, searches) at scale. Every action generates an event; downstream systems aggregate, personalise, and analyse the stream.

Log aggregation — consolidate logs from dozens of services into a central Kafka topic. Consumers ship logs to Elasticsearch, Splunk, or a data lake. The Kafka buffer smooths out spikes and prevents log loss during downstream outages.

Summary

Kafka's architecture is built around three ideas: the append-only log as the fundamental data structure, partitions as the unit of parallelism and distribution, and consumer groups as the unit of independent consumption. Brokers store and serve data; the KRaft controller quorum manages cluster metadata using Raft consensus (replacing ZooKeeper entirely as of Kafka 4.0). Replication across brokers with ISR tracking provides fault tolerance without sacrificing throughput.

What makes Kafka different from a traditional message queue is durability and replayability. Data isn't deleted when consumed. Multiple independent consumers can read the same stream at their own pace. This turns Kafka from a transient pipe into a persistent, replayable record of everything that happened in your system — which is an extremely powerful foundation for building distributed applications.

Related on this blog: Architecture series