Mesh Metrics for OpenTelemetry

Most teams adopt OpenTelemetry to see what applications report through traces, metrics, and logs. That is necessary, but it does not always show what happens between services on the network. A recent CNCF post on OTel and mesh-derived metrics shows why Linkerd proxy metrics are worth adding to the same pipeline.
What Are Mesh-Derived Metrics?
Mesh-derived metrics are measurements emitted by a service mesh proxy instead of application code. In Linkerd, each meshed pod includes a linkerd-proxy sidecar that exposes Prometheus metrics on port 4191.
Because the proxy sits on the request path, it reports traffic between workloads without SDK changes, image rebuilds, or framework-specific instrumentation. The scope is east-west service traffic inside the mesh. Ingress and business events still need their own instrumentation.
Why SRE Teams Should Care
Application telemetry and mesh telemetry answer different questions:
- OpenTelemetry app metrics show business and code-level behavior, such as checkout failures, queue depth, or handler latency.
- Mesh metrics show request rate, success rate, latency, TCP bytes, open connections, destination workload, and mTLS identity.
- Distributed traces show the call graph and the failing span when a service path breaks.
That split matters during incidents. A proxy may classify a gRPC response as failed even when the HTTP status is 200, because the gRPC status is carried in trailers. Mesh metrics catch the failed call, while traces explain where and why it failed.
Collector Pattern
The practical pattern is simple: scrape mesh proxy endpoints with an OpenTelemetry Collector, filter the noisy families, tag the data as layer=mesh, and export it beside normal application metrics.
receivers:
prometheus/mesh:
config:
scrape_configs:
- job_name: linkerd-mesh
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_container_name]
action: keep
regex: linkerd-proxy
- source_labels: [__meta_kubernetes_pod_ip]
action: replace
target_label: __address__
regex: (.+)
replacement: $1:4191
processors:
resource/mesh:
attributes:
- key: layer
value: mesh
action: insert
Keep the first version focused. Start with response_total, response_latency_ms, tcp_open_connections, tcp_read_bytes_total, and tcp_write_bytes_total. Per-route metrics can come later, after you set cardinality limits.
Operational Tips
Do not replace application telemetry with mesh telemetry. Use both. Trust mesh metrics for east-west traffic, mTLS identity, and service-to-service success rate. Trust application metrics for domain context. Trust traces for root cause.
Also be careful with labels. Proxy metrics can include pod, workload, route, destination, status, and identity dimensions. Set cardinality budgets before shipping everything to a long-retention backend.
Finally, build dashboards around comparisons. Plot app latency next to mesh latency. Alert on gRPC failures even when HTTP status looks healthy. Use the gap between the two layers to find network overhead, retries, or slow middleware.
Conclusion
Mesh-derived metrics make OpenTelemetry stacks more useful for Kubernetes operations. They expose the service-to-service layer that application code may not report, and they help SRE teams catch failures that hide behind incomplete app metrics.
If your team wants incident workflows that connect telemetry, services, runbooks, and automation, Akmatori is an open-source AI agent platform for SRE teams. For infrastructure to run and test cloud-native observability pipelines, Gcore provides global cloud and edge capacity.
