Creating self-healing software systems via effective usage of telemetry data and AI agents

Modern software systems operate in complex, dynamic environments where failures are inevitable. Traditional monitoring and manual incident response are no longer sufficient to ensure resilience or customer satisfaction. This talk explores how to design and implement self-healing software systems by combining telemetry data with an AI-driven agentic approach. We’ll start by examining how high-quality telemetry forms the foundation for detecting anomalies and predicting failures. Next, we’ll show how modern GenAI (LLMs) can transform this telemetry into actionable insights for AI agents that interpret data, pinpoint root causes, and apply automated fixes. Through a practical, real-world example, you’ll see how telemetry and AI work together to create adaptive feedback loops that continuously improve system reliability, while freeing engineers from repetitive operational tasks.

Related Talks

Building LocalStack with LocalStack

LocalStack’s core cloud emulator allows us to run our own cloud application - including its infrastructure - locally, which provides an efficient developer experience at the start of the entire software development lifecycle (SDLC). This experience enables us to build our product features in a way that closely matches what our customers are looking for — a comprehensive developer platform that facilitates local multi-cloud development across different providers and services.In this session from LocalStack Community Meetup April '24, Lukas Pichler showcases how to use the LocalStack core cloud emulator and other novel solutions, to build, test, and integrate new features in our LocalStack Web Application. He broadly discusses:• Application Overview• How do we enable local cloud development?How do we use LocalStack in CI?• How do we use LocalStack to enable application previews and E2E testing?• Conclusion

Learn More
Learn More
Break It Till You Make It: Introducing Chaos into Your Stack Locally

What happens when your cloud services fail? 💥In this final episode of our series, we dive into the LocalStack Chaos Dashboard to simulate real-world outages—like DynamoDB errors—and see how your app responds under pressure. Learn how to intentionally break your systems locally so you can ship more resilient applications in production.📘 Read the full blog post for step-by-step details: https://localstack-blog-preview-pr-121.surge.sh/break-it-till-you-make-it-chaos-engineering/

Learn More
Learn More
Automate Your Tests with GitHub Actions & LocalStack

Bring your tests to CI/CD with GitHub Actions! In this episode, we’ll show how to integrate LocalStack into your workflow, so your tests run automatically on every push without touching real AWS resources.Whether you're testing Lambda, DynamoDB, S3, or beyond LocalStack makes it possible to run everything locally, even in your CI workflows.🔗 Read the companion blog post: https://blog.localstack.cloud/automate-your-tests-with-github-actions-and-localstack/

Learn More
Learn More

Launch yourself in the world of local cloud development

Try for free
Try for free
Talk to Sales
Talk to Sales