Port 9092: Apache Kafka — The Commit Log of the Internet

Port 9092 carries Apache Kafka traffic. Every time Netflix knows what you want to watch next, every time Uber finds you a driver, every time LinkedIn suggests a connection — those decisions began as events flowing through this port. Kafka processes trillions of events daily across tens of thousands of companies. It is the nervous system of the real-time Internet.

The Weight of What Flows Here

Apache Kafka is not a message queue. It is not a pub-sub system in the traditional sense. Kafka is a distributed, replicated, append-only commit log¹. At its core, it does one thing: it writes events to disk, in order, forever. That simplicity is the source of its power.

When you click play on Netflix, that event enters Kafka. When you request an Uber, that event enters Kafka. When you scroll past a post on LinkedIn, that event enters Kafka. Netflix runs 36 Kafka clusters processing 700 billion events per day². LinkedIn processes over 4 trillion events daily through more than 3,000 pipelines³. Uber streams millions of GPS updates per second through Kafka to match riders with drivers across the globe⁴.

Port 9092 is the default listener. It carries unencrypted traffic between Kafka clients and brokers — the plaintext channel where producers write and consumers read. For encrypted connections, port 9093 handles TLS traffic⁵. But 9092 remains the canonical address, the one that appears in every tutorial, every configuration file, every developer's muscle memory.

The Problem They Were Desperate to Solve

In 2010, LinkedIn had 90 million members and a data problem that threatened to swallow the company whole⁶.

Jay Kreps, Neha Narkhede, and Jun Rao were engineers on LinkedIn's data infrastructure team. They needed to ingest massive amounts of event data from the website and feed it into both real-time processing systems and batch analytics platforms like Hadoop. The existing solutions — RabbitMQ, ActiveMQ, enterprise messaging systems — could not handle the scale⁷.

"Everyone wanted to build fancy machine-learning algorithms," Kreps later explained, "but without the data, the algorithms were useless. Getting the data from source systems and reliably moving it around was very difficult."⁸

They spent about a year building the first version of Kafka. It was crucial for the initial version to handle exponential growth, which meant designing it as a distributed system from the outset⁹. In 2011, Kafka went into production at LinkedIn, immediately becoming the backbone of the company's real-time infrastructure. That same year, it was ingesting more than 1 billion events per day¹⁰.

LinkedIn open-sourced Kafka in early 2011. It graduated from the Apache Incubator on October 23, 2012¹¹. In November 2014, Kreps, Narkhede, and Rao left LinkedIn to found Confluent, the company that now provides enterprise Kafka services¹².

Why "Kafka"?

When the project needed a name for open-sourcing, Jay Kreps chose to name it after the author Franz Kafka¹³.

"I thought that since Kafka was a system optimized for writing, using a writer's name would make sense," Kreps wrote on Quora. "I had taken a lot of lit classes in college and liked Franz Kafka. Plus the name sounded cool for an open source project."¹⁴

Franz Kafka (1883–1924) was a German-language novelist whose works — The Metamorphosis, The Trial, The Castle — explored themes of alienation, bureaucracy, and transformation¹⁵. The term "Kafkaesque" describes situations where faceless systems overpower individuals in surreal, nightmarish ways¹⁶.

There is something fitting about the name. Apache Kafka transforms chaos into order. It takes the overwhelming flood of events generated by modern systems and makes them manageable, replayable, analyzable. Franz Kafka wrote about a man who woke up transformed into an insect. Apache Kafka transforms raw events into insight, millions of times per second.

How Kafka Actually Works

Kafka's genius lies in its simplicity. It is built on the commit log — the same data structure that databases use to ensure durability¹⁷.

The Append-Only Log

A Kafka topic is an ordered sequence of records. Producers append messages to the end of the log. Consumers read from any position in the log, tracking their progress through an offset¹⁸. Messages are never modified after being written. There is no in-place mutation, no locking around updates, no index maintenance on every write¹⁹.

This design enables remarkable performance. Kafka writes sequentially to disk. The operating system batches writes efficiently. The log can be memory-mapped, letting the page cache do most of the heavy lifting. Reads are just as efficient because consumers also read sequentially²⁰.

Partitions and Parallelism

Each topic is divided into partitions — the fundamental unit of parallelism in Kafka²¹. A topic might have dozens or hundreds of partitions, each replicated across multiple brokers for fault tolerance. Producers choose which partition receives each message, either round-robin for load balancing or by a key for ordering guarantees²².

The Protocol

Kafka uses a binary protocol over TCP²³. The client initiates a socket connection and writes a sequence of request messages, reading back corresponding responses. No handshake is required on connection or disconnection. The protocol is optimized for efficiency, using a "message set" abstraction that groups messages together to reduce network round-trip overhead²⁴.

The protocol defines how clients discover the cluster topology. Any broker can answer a metadata request describing the current state: what topics exist, which partitions they have, which broker leads each partition, and the host and port information for all brokers²⁵.

Zero-Copy Transfer

One of Kafka's key optimizations is the use of sendfile() system calls for zero-copy data transfers. Data moves directly from disk to network socket without passing through application memory. This can improve throughput by 2-3x compared to traditional approaches²⁶.

The Architecture Evolution: From ZooKeeper to KRaft

For over a decade, Kafka depended on Apache ZooKeeper (port 2181) for metadata management — tracking which brokers were alive, which partitions existed, who led each partition²⁷. This worked, but it meant operating two distributed systems instead of one.

The traditional ZooKeeper architecture had a ceiling of roughly 200,000 partitions per cluster²⁸. As organizations scaled their Kafka deployments, this became a hard constraint.

Apache Kafka 4.0, released on March 18, 2025, completed the transition to KRaft (Kafka Raft) — a consensus protocol that handles metadata management within Kafka itself²⁹. This eliminates the ZooKeeper dependency entirely. With KRaft, Kafka can potentially handle millions of partitions per cluster, simplifying deployment and reducing operational complexity³⁰.

Security Considerations

Kafka's security model has evolved significantly since its origins as internal LinkedIn infrastructure.

The Early Days: No Authentication

Early versions of Kafka had minimal security. Anyone who could reach port 9092 could produce or consume messages. This was acceptable for internal use at LinkedIn but became a serious concern as Kafka spread to production environments handling sensitive data.

Modern Security Features

Current Kafka deployments typically use:

TLS encryption (port 9093) for data in transit
SASL authentication (PLAIN, SCRAM, GSSAPI/Kerberos, OAUTHBEARER)
ACLs for authorization at the topic, consumer group, and cluster level

Notable Vulnerabilities

Kafka has had several significant CVEs over the years:

CVE-2025-27817/CVE-2025-27818: Arbitrary file read and SSRF vulnerabilities allowing attackers to read environment variables, arbitrary disk content, or send requests to unexpected locations³¹
CVE-2023-25194: Remote code execution through Kafka Connect's SASL JAAS configuration, enabling JNDI injection attacks³²
CVE-2022-34917: Denial of service through improper input validation, allowing remote attackers to allocate large amounts of memory on brokers³³

Older versions (0.9.x through 1.0.0) had vulnerabilities allowing authenticated users to interfere with data replication or impersonate other users through crafted protocol messages³⁴.

Best Practices

Never expose port 9092 to the public Internet
Use TLS encryption (port 9093 or 9094) for all production traffic
Enable authentication and authorization
Keep Kafka updated — the project maintains an active CVE disclosure process³⁵

The Ecosystem

Kafka does not exist in isolation. A rich ecosystem has grown around it:

Kafka Connect: A framework for connecting Kafka to external systems — databases, search indexes, file systems, cloud services. Connectors handle the plumbing of moving data in and out of Kafka.

Kafka Streams: A client library for building stream processing applications. It enables transformations, aggregations, and joins on data flowing through Kafka topics.

Schema Registry: Manages schemas for Kafka messages, ensuring producers and consumers agree on data formats. Typically uses Apache Avro, Protobuf, or JSON Schema.

ksqlDB: A streaming SQL engine that lets you query Kafka topics using familiar SQL syntax.

Port	Service	Relationship
9093	Kafka TLS	Encrypted client connections
9094	Kafka SASL	Authenticated connections
2181	ZooKeeper	Legacy metadata management (pre-KRaft)
8081	Schema Registry	Schema management for Kafka messages
8083	Kafka Connect	Connector framework REST API

Who Uses This Port

More than 80% of Fortune 100 companies run Apache Kafka³⁶. Nearly 50,000 companies use it globally, representing a 26.7% market share in the enterprise application integration category³⁷.

The list reads like a who's who of technology:

Netflix: 36 clusters, 700 billion events daily
LinkedIn: 4 trillion events daily, the birthplace of Kafka
Uber: Real-time matching of riders and drivers across 200+ cities
Airbnb: Event-driven architecture powering personalized recommendations
Salesforce: The central nervous system of their microservices architecture
Agoda: Trillions of events streaming daily across multiple data centers³⁸

The companies using Kafka span every industry: 27% are in Information Technology, 15% in Computer Software, 5% in Financial Services³⁹. 46% are medium-sized companies, 22% are large enterprises, and 33% are small businesses⁴⁰.

The Genuine Strangeness

Consider what port 9092 carries.

Every recommendation Netflix makes, every surge price Uber calculates, every "people you may know" suggestion on LinkedIn — these begin as events flowing through Kafka. The platform does not process data. It processes decisions. Billions of them. Every day.

Kafka's design is almost absurdly simple. Append to a log. Read from the log. That is the entirety of the mechanism. Yet this simplicity enables Netflix to detect streaming issues before users notice them, enables Uber to match millions of riders with drivers in real time, enables LinkedIn to process the professional graph of nearly a billion people⁴¹.

The system named after an author who wrote about bureaucratic nightmares has become the most efficient bureaucracy ever built — one where every event is recorded, ordered, and made available for judgment by the algorithms that increasingly shape human experience.

Frequently Asked Questions

Port 9090: Prometheus — The Internet's Pulse Monitor

Port 9200: Elasticsearch — The Search Engine Born from a Recipe App

Was this page helpful?

😔

🤨

😃