avatarVishal Gupta

Summary

The web content provides a comprehensive guide on setting up distributed tracing using OpenTelemetry, Grafana Tempo, and Prometheus with Docker, to collect, store, and visualize trace data for microservices applications.

Abstract

The guide outlines the process of implementing a distributed tracing system using OpenTelemetry for data collection, Grafana Tempo for cost-effective trace storage and querying, and Prometheus for metrics storage. It details the necessary prerequisites, including Docker, and provides step-by-step instructions for setting up Tempo, Grafana, k6 for trace generation, and Prometheus using Docker Compose. The configuration files for each service are included, along with commands for running the setup. The article emphasizes Tempo's scalability and seamless integration with Grafana, which allows for efficient visualization and troubleshooting alongside logs and metrics. It concludes with instructions on how to visualize traces in Grafana and suggests further expansion by integrating OpenTelemetry into applications.

Opinions

  • The author suggests that Grafana Tempo is a highly scalable and cost-effective solution for storing trace data due to its use of object storage.
  • Tempo's integration with Grafana is highlighted as a key advantage, providing a unified platform for observing traces, logs, and metrics.
  • The use of Docker and Docker Compose is assumed to be familiar to the reader, as it is a core component of the setup process.
  • The author implies that the setup described is suitable for production environments, with notes on configuration adjustments for such deployments.
  • The article promotes the OpenTelemetry project as the go-to solution for telemetry data capture, emphasizing its compatibility with Tempo and other tracing backends.

Setting Up Distributed Tracing with OpenTelemetry, Tempo, and Grafana

In modern applications, distributed tracing provides deep visibility into how requests flow through various services in a microservices architecture. By using OpenTelemetry, Tempo, and Grafana, you can collect, store, and visualize trace data to diagnose performance issues, troubleshoot errors, and understand application behavior.

In this guide, we will walk through how to set up distributed tracing using OpenTelemetry, store traces in Grafana Tempo, store metrics in Prometheus and visualize them in Grafana. We will use k6-tracing app to generate traces.

Prerequisites

Before we start, make sure you have the following:

  • Docker: For running Tempo, K6 and Grafana.

You will also need a basic understanding of Docker and how to run containers, familiarity with Observability 101.

Why Use Grafana Tempo for Tracing?

Grafana Tempo offers a cost-effective, highly scalable solution for storing and querying trace data. By using object storage instead of traditional indexing, Tempo reduces storage costs significantly and simplifies maintenance. It integrates seamlessly with Grafana, making it easy to visualize traces alongside logs and metrics for efficient troubleshooting. With native support for OpenTelemetry, Jaeger, and Zipkin, Tempo is flexible for diverse environments and scales well for high-volume trace data, making it ideal for applications with high traffic.

In short, Tempo is a great choice for teams seeking an affordable, scalable, and Grafana-integrated tracing backend.

Step 1: Setting Up Tempo, Grafana, K6 & Prometheus with Docker

First, we will run Grafana and Tempo in Docker containers. Tempo will store traces, and Grafana will be used to visualize them.

Setup Grafana, Tempo, K6 & Prometheus

Create a docker-compose.yml the following files.

mkdir tempo-tracing
mkdir tempo-tracing/shared

cd tempo-tracing
touch docker-componse.yml
touch otel-collector.yaml

cd shared
touch grafana-datasources.yaml
touch prometheus.yaml
touch tempo.yaml

At this point of time, your directory structure should look like below —

$ tempo-tracing % tree
.
├── docker-componse.yml
├── otel-collector.yaml
└── shared
    ├── grafana-datasources.yaml
    ├── prometheus.yaml
    └── tempo.yaml

1 directory, 5 files

Update setup files

Add below content to all the files created above.

  1. ./docker-componse.yml
services:

  # Tempo runs as user 10001, and docker compose creates the volume as root.
  # As such, we need to chown the volume in order for Tempo to start correctly.
  init:
    image: &tempoImage grafana/tempo:latest
    user: root
    entrypoint:
      - "chown"
      - "10001:10001"
      - "/var/tempo"
    volumes:
      - ./tempo-data:/var/tempo

  tempo:
    image: *tempoImage
    command: [ "-config.file=/etc/tempo.yaml" ]
    volumes:
      - ./shared/tempo.yaml:/etc/tempo.yaml
      - ./tempo-data:/var/tempo
    ports:
      - "14268"  # jaeger ingest
      - "3200"   # tempo
      - "4317"  # otlp grpc
      - "4318"  # otlp http
      - "9411"   # zipkin
    depends_on:
      - init

  # Generate fake traces...
  k6-tracing:
    image: ghcr.io/grafana/xk6-client-tracing:v0.0.5
    environment:
      - ENDPOINT=otel-collector:4317
    restart: always
    depends_on:
      - otel-collector

  # And put them in an OTEL collector pipeline...
  otel-collector:
    image: otel/opentelemetry-collector:0.86.0
    command: [ "--config=/etc/otel-collector.yaml" ]
    volumes:
      - ./otel-collector.yaml:/etc/otel-collector.yaml

  prometheus:
    image: prom/prometheus:latest
    command:
      - --config.file=/etc/prometheus.yaml
      - --web.enable-remote-write-receiver
      - --enable-feature=exemplar-storage
      - --enable-feature=native-histograms
    volumes:
      - ./shared/prometheus.yaml:/etc/prometheus.yaml
    ports:
      - "9090:9090"

  grafana:
    image: grafana/grafana:11.0.0
    volumes:
      - ./shared/grafana-datasources.yaml:/etc/grafana/provisioning/datasources/datasources.yaml
    environment:
      - GF_AUTH_ANONYMOUS_ENABLED=true
      - GF_AUTH_ANONYMOUS_ORG_ROLE=Admin
      - GF_AUTH_DISABLE_LOGIN_FORM=true
      - GF_FEATURE_TOGGLES_ENABLE=traceqlEditor
    ports:
      - "3000:3000"

2. ./otel-collector.yaml

The OpenTelemetry Collector (otel-collector) service configuration is stored in otel-collector.yaml. This configuration specifies where to receive trace data and how to send it to Tempo. Ensure the file includes settings similar to this:

receivers:
  otlp:
    protocols:
      grpc:
exporters:
  otlp:
    endpoint: tempo:4317
    tls:
      insecure: true
service:
  pipelines:
    traces:
      receivers: [otlp]
      exporters: [otlp]

3. ./shared/tempo.yaml

stream_over_http_enabled: true
server:
  http_listen_port: 3200
  log_level: info

query_frontend:
  search:
    duration_slo: 5s
    throughput_bytes_slo: 1.073741824e+09
    metadata_slo:
        duration_slo: 5s
        throughput_bytes_slo: 1.073741824e+09
  trace_by_id:
    duration_slo: 5s

distributor:
  receivers:                           # this configuration will listen on all ports and protocols that tempo is capable of.
    jaeger:                            # the receives all come from the OpenTelemetry collector.  more configuration information can
      protocols:                       # be found there: https://github.com/open-telemetry/opentelemetry-collector/tree/main/receiver
        thrift_http:                   #
        grpc:                          # for a production deployment you should only enable the receivers you need!
        thrift_binary:
        thrift_compact:
    zipkin:
    otlp:
      protocols:
        http:
        grpc:
    opencensus:

ingester:
  max_block_duration: 5m               # cut the headblock when this much time passes. this is being set for demo purposes and should probably be left alone normally

compactor:
  compaction:
    block_retention: 1h                # overall Tempo trace retention. set for demo purposes

metrics_generator:
  registry:
    external_labels:
      source: tempo
      cluster: docker-compose
  storage:
    path: /var/tempo/generator/wal
    remote_write:
      - url: http://prometheus:9090/api/v1/write
        send_exemplars: true
  traces_storage:
    path: /var/tempo/generator/traces

storage:
  trace:
    backend: local                     # backend configuration to use
    wal:
      path: /var/tempo/wal             # where to store the wal locally
    local:
      path: /var/tempo/blocks

overrides:
  defaults:
    metrics_generator:
      processors: [service-graphs, span-metrics, local-blocks] # enables metrics generator
      generate_native_histograms: both

4. ./shared/prometheus.yaml

global:
  scrape_interval:     15s
  evaluation_interval: 15s

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: [ 'localhost:9090' ]
  - job_name: 'tempo'
    static_configs:
      - targets: [ 'tempo:3200' ]

5. ./grafana-datasources.yaml

apiVersion: 1

datasources:
- name: Prometheus
  type: prometheus
  uid: prometheus
  access: proxy
  orgId: 1
  url: http://prometheus:9090
  basicAuth: false
  isDefault: false
  version: 1
  editable: false
  jsonData:
    httpMethod: GET
- name: Tempo
  type: tempo
  access: proxy
  orgId: 1
  url: http://tempo:3200
  basicAuth: false
  isDefault: true
  version: 1
  editable: false
  apiVersion: 1
  uid: tempo
  jsonData:
    httpMethod: GET
    serviceMap:
      datasourceUid: prometheus
    streamingEnabled:
      search: true

Step 2: Review the docker-compose.yaml Configuration

The docker-compose.yaml file in this repository includes four main services:

  • Tempo: Acts as the trace storage backend.
  • OpenTelemetry Collector: Aggregates trace data from applications.
  • Grafana: Visualizes the stored traces.
  • k6: Generates sample traces for testing.
  • Prometheus: For storing metrics

Step 3: Run Docker Compose

Once the configurations are ready, use Docker Compose to start all services:

docker-compose -f docker-componse.yml up -d
visgupta@blr-mpht2 tempo-tracing % docker-compose -f docker-componse.yml up -d
[+] Building 0.0s (0/0)                      docker:desktop-linux
[+] Running 7/7
 ✔ Network tempo-tracing_default             Created        0.1s 
 ✔ Container tempo-tracing-otel-collector-1  Started        0.1s 
 ✔ Container tempo-tracing-prometheus-1      Started        0.1s 
 ✔ Container tempo-tracing-init-1            Started        0.1s 
 ✔ Container tempo-tracing-grafana-1         Started        0.1s 
 ✔ Container tempo-tracing-k6-tracing-1      Started        0.1s 
 ✔ Container tempo-tracing-tempo-1           Started        0.1s

You should now have five services running: Tempo, OpenTelemetry Collector, Prometheus, Grafana, and k6. Verify that each service is up and running using:

docker-compose -f docker-componse.yml ps

Step 4: Visualize Traces

Grafana is available at http://localhost:3000/ and the datasources Tempo and Prometheus already setup. k6-tracing is generating calls with traces continuously. We can head straight to Grafana and start visualizing traces in Explore window.

Trace lookup Grafana UI

Select any trace Id to see its associated spans. You can visualize amount taken to complete each of these spans.

Span details can be viewed by any of the associated spans.

This setup provides a foundation for distributed tracing. You can expand this by configuring OpenTelemetry in your applications to send traces through the collector, using the same endpoint configured for k6.

Check out this article to know step by step instructions to setup distributed tracing using Jaegar and OpenTelemetry — https://vishynit.medium.com/distributed-tracing-in-kubernetes-using-opentelemetry-jaegar-a-step-by-step-guide-a48899c2b27a

Thanks for reading.

Tempo
Grafana
Tracing
Observability
Distributed Tracing
Recommended from ReadMedium