Architecture

Architecture Overview

Tracers live in your applications and record timing and metadata aboutoperations that took place. They often instrument libraries, so that their useis transparent to users. For example, an instrumented web server records when itreceived a request and when it sent a response. The trace data collected iscalled a Span.

Instrumentation is written to be safe in production and have little overhead.For this reason, they only propagate IDs in-band, to tell the receiver there’sa trace in progress. Completed spans are reported to Zipkin out-of-band,similar to how applications report metrics asynchronously.

For example, when an operation is being traced and it needs to make an outgoinghttp request, a few headers are added to propagate IDs. Headers are not used tosend details such as the operation name.

The component in an instrumented app that sends data to Zipkin is called aReporter. Reporters send trace data via one of several transports to Zipkincollectors, which persist trace data to storage. Later, storage is queried bythe API to provide data to the UI.

Here’s a diagram describing this flow:

Zipkin architecture

To see if a tracer or instrumentation library already exists for your platform, seeour list.

Example flow

As mentioned in the overview, identifiers are sent in-band and details are sentout-of-band to Zipkin. In both cases, trace instrumentation is responsible forcreating valid traces and rendering them properly. For example, a tracer ensuresparity between the data it sends in-band (downstream) and out-of-band (async toZipkin).

Here’s an example sequence of http tracing where user code calls the resource/foo. This results in a single span, sent asynchronously to Zipkin after usercode receives the http response.

  1. ┌─────────────┐ ┌───────────────────────┐ ┌─────────────┐ ┌──────────────────┐
  2. User Code Trace Instrumentation Http Client Zipkin Collector
  3. └─────────────┘ └───────────────────────┘ └─────────────┘ └──────────────────┘
  4. ┌─────────┐
  5. ──┤GET /foo ├─▶ ────┐
  6. └─────────┘ record tags
  7. ◀───┘
  8. ────┐
  9. add trace headers
  10. ◀───┘
  11. ────┐
  12. record timestamp
  13. ◀───┘
  14. ┌─────────────────┐
  15. ──┤GET /foo ├─▶
  16. X-B3-TraceId: aa ────┐
  17. X-B3-SpanId: 6b
  18. └─────────────────┘ invoke
  19. request
  20. ┌────────┐ ◀───┘
  21. ◀─────┤200 OK ├───────
  22. ────┐ └────────┘
  23. record duration
  24. ┌────────┐ ◀───┘
  25. ◀──┤200 OK ├──
  26. └────────┘ ┌────────────────────────────────┐
  27. ──┤ asynchronously report span ├────▶
  28. │{
  29. "traceId": "aa",
  30. "id": "6b",
  31. "name": "get",
  32. "timestamp": 1483945573944000,│
  33. "duration": 386000,
  34. "annotations": [
  35. │--snip--
  36. └────────────────────────────────┘

Trace instrumentation report spans asynchronously to prevent delays or failuresrelating to the tracing system from delaying or breaking user code.

Transport

Spans sent by the instrumented library must be transported from the servicesbeing traced to Zipkin collectors. There are three primary transports: HTTP,Kafka and Scribe.

Components

There are 4 components that make up Zipkin:

  • collector
  • storage
  • search
  • web UI

Zipkin Collector

Once the trace data arrives at the Zipkin collector daemon, it is validated,stored, and indexed for lookups by the Zipkin collector.

Storage

Zipkin was initially built to store data on Cassandra since Cassandra isscalable, has a flexible schema, and is heavily used within Twitter. However, wemade this component pluggable. In addition to Cassandra, we natively supportElasticSearch and MySQL. Other back-ends might be offered as third partyextensions.

Zipkin Query Service

Once the data is stored and indexed, we need a way to extract it. The querydaemon provides a simple JSON API for finding and retrieving traces. The primaryconsumer of this API is the Web UI.

Web UI

We created a GUI that presents a nice interface for viewing traces. The web UIprovides a method for viewing traces based on service, time, and annotations.Note: there is no built-in authentication in the UI!