Retina: Kubernetes Network Observability with eBPF

Kubernetes network incidents are hard because the failure rarely stays in one layer. A bad DNS answer, dropped packet, noisy node, or policy mistake can look like an application outage. Retina gives SRE teams a Kubernetes-native way to collect network telemetry and start targeted captures when dashboards are not enough.
What is Retina?
Retina is an open-source Kubernetes network observability platform from Microsoft. It uses eBPF to collect network signals from workloads and nodes, then exports metrics to systems such as Prometheus and Azure Monitor. Teams can visualize the data in Grafana, Azure Log Analytics, or another observability backend.
The important part for operators is scope. Retina is cloud-agnostic and supports Linux, Windows, and Azure Linux nodes. That makes it useful for platform teams running mixed clusters or trying to standardize troubleshooting workflows across environments.
Key Features
- eBPF-based telemetry: Collects network metrics from the kernel with low operational friction.
- Prometheus metrics: Exposes industry-standard metrics that fit existing Kubernetes dashboards.
- On-demand packet captures: Starts focused captures from the CLI or CRD when an incident needs packet-level evidence.
- Configurable plugins: Enables signals such as drop reasons, packet forwarding, Linux utilization, DNS, and packet parsing.
- Multi-OS support: Works across Linux, Windows Server 2022, and Azure Linux node pools.
Installation
Install Retina with the Helm chart published to GHCR:
VERSION=$(curl -sL https://api.github.com/repos/microsoft/retina/releases/latest | jq -r .name)
helm upgrade --install retina oci://ghcr.io/microsoft/retina/charts/retina \
--version "$VERSION" \
--set image.tag="$VERSION" \
--set operator.tag="$VERSION" \
--set logLevel=info \
--set enabledPlugin_linux="[dropreason,packetforward,linuxutil,dns]"
After deployment, connect Retina to Prometheus and import Grafana dashboards from the project documentation. Start with the basic plugin set before enabling heavier packet parsing on large nodes.
Capture Workflow
Install the CLI through Krew:
kubectl krew install retina
kubectl retina version
During an incident, create a capture against a workload selector:
kubectl retina capture create --pod-selectors app=my-service
kubectl retina trace list
This gives responders a repeatable path from symptom to packet evidence. Instead of SSHing into nodes or running ad hoc tcpdump commands, teams can use Kubernetes identity, selectors, and audit trails.
Operational Tips
Run Retina as part of your standard observability stack, not only during outages. Metrics such as DNS request behavior, packet drops, and forwarding signals are useful baselines before an incident starts.
For production clusters, define capture policies with the CRD path so teams can standardize retention, namespace access, and storage location. Also document which plugins are enabled per cluster. A dashboard built for packet parsing data will be misleading if the plugin is disabled.
Conclusion
Retina is a practical addition for teams that need deeper Kubernetes network visibility without replacing their existing metrics stack. It turns eBPF signals and packet captures into a normal cluster workflow, which is exactly what SRE teams need when service-to-service traffic gets weird.
At Akmatori, we help SRE teams build intelligent automation that responds to incidents and manages infrastructure. For GPU-accelerated AI workloads, check out Gcore cloud infrastructure with global edge locations.
