Creating self-healing software systems via effective usage of telemetry data and AI agents

Modern software systems operate in complex, dynamic environments where failures are inevitable. Traditional monitoring and manual incident response are no longer sufficient to ensure resilience or customer satisfaction. This talk explores how to design and implement self-healing software systems by combining telemetry data with an AI-driven agentic approach. We’ll start by examining how high-quality telemetry forms the foundation for detecting anomalies and predicting failures. Next, we’ll show how modern GenAI (LLMs) can transform this telemetry into actionable insights for AI agents that interpret data, pinpoint root causes, and apply automated fixes. Through a practical, real-world example, you’ll see how telemetry and AI work together to create adaptive feedback loops that continuously improve system reliability, while freeing engineers from repetitive operational tasks.

Related Talks

Integrate WireMock into LocalStack for End-to-End Local Testing
Watch recording
Watch recording
Simulate Microservices, Cloud Services, and Everything Else with WireMock & LocalStack
Watch recording
Watch recording
Getting started with the LocalStack Model Context Protocol (MCP) Server
Watch recording
Watch recording

Launch yourself in the world of local cloud development

Try for free
Try for free
Talk to Sales
Talk to Sales