Creating self-healing software systems via effective usage of telemetry data and AI agents

Modern software systems operate in complex, dynamic environments where failures are inevitable. Traditional monitoring and manual incident response are no longer sufficient to ensure resilience or customer satisfaction. This talk explores how to design and implement self-healing software systems by combining telemetry data with an AI-driven agentic approach. We’ll start by examining how high-quality telemetry forms the foundation for detecting anomalies and predicting failures. Next, we’ll show how modern GenAI (LLMs) can transform this telemetry into actionable insights for AI agents that interpret data, pinpoint root causes, and apply automated fixes. Through a practical, real-world example, you’ll see how telemetry and AI work together to create adaptive feedback loops that continuously improve system reliability, while freeing engineers from repetitive operational tasks.

Related Talks

Simulate Microservices, Cloud Services, and Everything Else with WireMock & LocalStack

In this live session, WireMock CTO Tom Akehurst will introduce hybrid API simulation (local + cloud) with WireMock Runner. Tom will explain why we built Runner, how developers are using it today, and how it fits into modern dev and test workflows - such as simulating APIs during testing, prototyping, and AI-native development.

Learn More
Learn More
Simulating outages with LocalStack Chaos API

LocalStack Chaos API enables you to simulate outages in any AWS region or service. Chaos API provides an easy way to implement chaos engineering experiments to test a wide variety of simulated outages and failures within your application safely, without impacting your production users.Common examples can include:- Region-wide outages- DNS failovers- Service failures- Network faultsAll the testing scenarios described above can be executed within LocalStack, providing thorough coverage for critical situations in a matter of minutes rather than hours or days.In this presentation by Viren Nadkarni, we explore how Chaos API is leveraged to perform service failures in a local environment while using robust error handling to address and mitigate such issues.## Resources- Documentation: https://docs.localstack.cloud/user-guide/chaos-engineering/chaos-api/- Get access: https://www.localstack.cloud/contact

Learn More
Learn More
Serverless with more infrastructure code and less application code

With the growing Serverless workloads, managing and maintaining them is best recommended with Infrastructure as Code (IaC). While this holds the complete infrastructure and its configurations, we could have events from one service destined to another via configuration. When building these configurations, we could also reduce the application code making it more maintainable and scalable.In this session, Jones walked us through a fully end-to-end solution built with Amazon EventBridge and AWS Step Functions with SDK integrations which have helped him to improvise the application with just IaC and very minimal application code.

Learn More
Learn More

Launch yourself in the world of local cloud development

Try for free
Try for free
Talk to Sales
Talk to Sales