May Virtual Meetup-Ghost in the Mesh: AI-Assisted RCA Across Distributed Systems
Your dashboards are green. CPU is healthy. Infrastructure looks stable.
Yet customers are abandoning checkout.
This session explores a realistic distributed outage across Kafka, Redis, PostgreSQL and microservices where:
-
retry storms amplify failures,
-
cache stampedes overload databases,
-
Kafka lag silently propagates latency,
-
and logs become noisy and misleading.
Using OpenTelemetry with Grafana, Tempo, Loki, and Mimir, distributed tracing exposes the hidden causal chain behind modern production incidents, while AI-assisted root cause analysis correlates traces, logs, metrics, and service dependencies to accelerate incident investigation.
Attendees gain practical insight into:
-
why metrics alone are insufficient,
-
how traces expose latency propagation,
-
how to correlate logs, metrics, and traces effectively,
-
and how AI-assisted observability improves troubleshooting in complex distributed systems.
“Your dashboards said healthy.
Your traces told the truth."