Port 9411: Zipkin — The Watcher of Distributed Systems

Port 9411 is the default port for Zipkin, the distributed tracing system that Twitter created to answer a question that haunts every engineer who has ever debugged a microservices architecture: "Where did the time go?"

When a user clicks a button and nothing happens for three seconds, where did those three seconds disappear? In a monolith, you attach a profiler. In a distributed system with dozens of services, each calling others, each with its own logs, its own latency, its own potential for failure: you need something else. You need a way to follow a single request as it hops from service to service, accumulating milliseconds like a snowball rolling downhill.

That something else is distributed tracing. And Zipkin, listening on port 9411, is where the traces come home.

What Zipkin Does

Zipkin collects timing data from distributed systems. When a request enters your system, Zipkin assigns it a trace ID, a 64 or 128-bit identifier that will follow that request everywhere it goes. Each operation within that request, each service call, each database query, becomes a span with its own ID, linked to its parent span, all sharing the same trace ID.

The result is a tree. The root span is your entry point, the user's request hitting your API gateway. The branches are every downstream call: the authentication service, the user database, the recommendation engine, the payment processor. Each node in the tree knows when it started, when it finished, and who its parent was.

When something goes wrong, when that button click takes three seconds instead of three hundred milliseconds, you can pull up the trace and see exactly where the time went. Service A took 50ms. Service B took 100ms. Service C called the database and waited 2.8 seconds. There's your problem.

How It Works

Zipkin operates on a simple principle: every service in your system instruments its outgoing and incoming requests, recording four key moments:

Client Send (cs): The client makes the request
Server Receive (sr): The server receives the request
Server Send (ss): The server sends the response
Client Receive (cr): The client receives the response

The difference between cs and sr tells you network latency plus clock skew. The difference between sr and ss tells you how long the server actually spent processing. These annotations, timestamped and tagged with their trace and span IDs, flow to Zipkin over port 9411.¹

Zipkin accepts traces via HTTP POST to /api/v2/spans in JSON or Protocol Buffers format. It can also accept traces via gRPC or consume them from Kafka topics. The flexibility matters because in a system with thousands of services generating millions of spans per second, you need options for how that data flows.²

The Origin Story

In April 2010, Google published a paper called "Dapper, a Large-Scale Distributed Systems Tracing Infrastructure."³ The paper described how Google had solved the distributed tracing problem: by instrumenting their RPC framework (Stubby, a precursor to gRPC) to automatically propagate trace context, they achieved something remarkable. Application developers didn't have to write tracing code. It just happened, everywhere, all the time.

At Twitter, engineers Johan Oskarsson and Franklin Hu read that paper and recognized their own pain. Twitter's architecture was rapidly decomposing into microservices, and debugging latency issues had become a nightmare of log correlation and guesswork.⁴

During Twitter's first Hack Week in 2010, they built a prototype. They implemented the Dapper paper's ideas on top of Finagle, Twitter's RPC framework. They called it Big Brother Bird, because Twitter named everything after birds, and this bird would be watching everything.⁵

Big Brother Bird became Zipkin. The B3 in those HTTP headers, X-B3-TraceId and X-B3-SpanId, stands for Big Brother Bird. The name stuck even as the project evolved.⁶

In June 2012, Twitter open-sourced Zipkin under the Apache 2.0 license, making it the first production-grade distributed tracing system available to the world.⁷ The original Scala codebase was later rewritten in Java using Spring Boot in 2016, but the core ideas, the data model, and crucially, the B3 propagation format, remained stable.

The Data Model

A Zipkin span contains:⁸

traceId: 64 or 128-bit identifier shared by all spans in a trace
id: 64-bit identifier unique to this span
parentId: The id of the parent span (null for root spans)
name: A human-readable name for the operation
timestamp: When the span started (microseconds since epoch)
duration: How long the span lasted (microseconds)
annotations: Timestamped events within the span (cs, sr, ss, cr)
tags: Key-value pairs for additional context (http.path, db.statement, error)

This model has proven remarkably durable. OpenTelemetry, the CNCF project that aims to standardize observability, still supports B3 propagation and can export to Zipkin-compatible backends.

B3 Propagation

When service A calls service B, how does service B know it's part of the same trace? Through HTTP headers:⁹

X-B3-TraceId: 463ac35c9f6413ad48485a3953bb6124
X-B3-SpanId: a2fb4a1d1a96d312
X-B3-ParentSpanId: 0020000000000001
X-B3-Sampled: 1

Or, more compactly, in a single header:

b3: 463ac35c9f6413ad48485a3953bb6124-a2fb4a1d1a96d312-1-0020000000000001

This propagation format became so widely adopted that even systems that don't use Zipkin often support B3 headers for interoperability.

Security Considerations

Zipkin itself is an internal service. Port 9411 should never be exposed to the public Internet. The data flowing through it contains detailed information about your system's internal architecture, timing characteristics, and potentially sensitive metadata in span tags.

Historically, Zipkin has been affected by vulnerabilities in its dependencies:

CVE-2021-44228 (Log4Shell): The infamous Log4j vulnerability affected Zipkin deployments that included the vulnerable logging library.¹⁰
XSS vulnerabilities: The Zipkin UI has had cross-site scripting issues where malicious data in span tags or annotations could execute in the browser of someone viewing traces.¹¹

Keep Zipkin updated. Don't expose it to untrusted networks. Treat your traces like you treat your logs: as sensitive operational data.

The Ecosystem

Zipkin was the first, but it wasn't the last. Uber created Jaeger in 2015, contributing it to the CNCF in 2017. Jaeger was designed for cloud-native environments from the start, with better Kubernetes integration and adaptive sampling.¹²

OpenTelemetry emerged as an effort to standardize instrumentation across all observability backends. As of 2024, Jaeger 2.0 has OpenTelemetry at its core, and Zipkin remains a supported export target.¹³

Netflix uses Zipkin to trace requests across its streaming infrastructure. When you press play and the video starts, Zipkin spans recorded which services handled the authentication, the content lookup, the CDN selection, the DRM validation.¹⁴

Port 14268: Jaeger collector HTTP endpoint
Port 16686: Jaeger query UI
Port 4317: OpenTelemetry collector gRPC
Port 4318: OpenTelemetry collector HTTP

Running Zipkin

The simplest way to run Zipkin:

docker run -d -p 9411:9411 openzipkin/zipkin

Then open http://localhost:9411 in your browser. The UI shows recent traces, lets you search by service name or trace ID, and visualizes the timing of each span in a waterfall diagram.

Frequently Asked Questions

Port 9300: Elasticsearch Transport — The Nervous System of Search

Port 9443: The Admin's Port — Where Enterprise Systems Show Their Control Panels

Was this page helpful?

😔

🤨

😃