Skip to main content
01.07.2026

Mesh Metrics for OpenTelemetry

head-image

Most teams adopt OpenTelemetry to see what applications report through traces, metrics, and logs. That is necessary, but it does not always show what happens between services on the network. A recent CNCF post on OTel and mesh-derived metrics shows why Linkerd proxy metrics are worth adding to the same pipeline.

What Are Mesh-Derived Metrics?

Mesh-derived metrics are measurements emitted by a service mesh proxy instead of application code. In Linkerd, each meshed pod includes a linkerd-proxy sidecar that exposes Prometheus metrics on port 4191.

Because the proxy sits on the request path, it reports traffic between workloads without SDK changes, image rebuilds, or framework-specific instrumentation. The scope is east-west service traffic inside the mesh. Ingress and business events still need their own instrumentation.

Why SRE Teams Should Care

Application telemetry and mesh telemetry answer different questions:

  • OpenTelemetry app metrics show business and code-level behavior, such as checkout failures, queue depth, or handler latency.
  • Mesh metrics show request rate, success rate, latency, TCP bytes, open connections, destination workload, and mTLS identity.
  • Distributed traces show the call graph and the failing span when a service path breaks.

That split matters during incidents. A proxy may classify a gRPC response as failed even when the HTTP status is 200, because the gRPC status is carried in trailers. Mesh metrics catch the failed call, while traces explain where and why it failed.

Collector Pattern

The practical pattern is simple: scrape mesh proxy endpoints with an OpenTelemetry Collector, filter the noisy families, tag the data as layer=mesh, and export it beside normal application metrics.

receivers:
  prometheus/mesh:
    config:
      scrape_configs:
        - job_name: linkerd-mesh
          kubernetes_sd_configs:
            - role: pod
          relabel_configs:
            - source_labels: [__meta_kubernetes_pod_container_name]
              action: keep
              regex: linkerd-proxy
            - source_labels: [__meta_kubernetes_pod_ip]
              action: replace
              target_label: __address__
              regex: (.+)
              replacement: $1:4191

processors:
  resource/mesh:
    attributes:
      - key: layer
        value: mesh
        action: insert

Keep the first version focused. Start with response_total, response_latency_ms, tcp_open_connections, tcp_read_bytes_total, and tcp_write_bytes_total. Per-route metrics can come later, after you set cardinality limits.

Operational Tips

Do not replace application telemetry with mesh telemetry. Use both. Trust mesh metrics for east-west traffic, mTLS identity, and service-to-service success rate. Trust application metrics for domain context. Trust traces for root cause.

Also be careful with labels. Proxy metrics can include pod, workload, route, destination, status, and identity dimensions. Set cardinality budgets before shipping everything to a long-retention backend.

Finally, build dashboards around comparisons. Plot app latency next to mesh latency. Alert on gRPC failures even when HTTP status looks healthy. Use the gap between the two layers to find network overhead, retries, or slow middleware.

Conclusion

Mesh-derived metrics make OpenTelemetry stacks more useful for Kubernetes operations. They expose the service-to-service layer that application code may not report, and they help SRE teams catch failures that hide behind incomplete app metrics.

If your team wants incident workflows that connect telemetry, services, runbooks, and automation, Akmatori is an open-source AI agent platform for SRE teams. For infrastructure to run and test cloud-native observability pipelines, Gcore provides global cloud and edge capacity.

Automate incident response and prevent on-call burnout with AI-driven agents!