Announcing Disaster Recovery Testing at Gremlin 🚀
Do you know how your system will respond when major outages strike?
✅ Verify resilience
✅ Validate DR/Business Continuity plans
✅ Prove regulation compliance
Learn more: hubs.la/Q041wjby0
If you had 1 issue per x lines of code before, & now you're shipping 10x with 1.7x as many issues, you now have 17 issues for every one you had before. 😅
The solution is simple: implementing programmatic, scalable reliability guardrails.
hubs.la/Q04h1-yS0
$37 Billion .
That was the cost of outages in 2025- and with systems becoming increasingly complex, more money is on the line every year.
Are your systems ready for the next major incident?
hubs.la/Q046YrRG0
How long does your DR testing cycle take?
For most teams: months. 1000s of engineering hours.
We think that's broken- & we've been building a better answer.
🚀Free webinar: More Resilience, Less Overhead: How to Modernize Disaster Recovery Testing:
hubs.la/Q04jJYGH0
Companies across industries are building agentic pipelines to ship features faster than ever. 🚀
But not without risk.
Reliability guardrails ensure your org can take advantage of this new velocity while ensuring systems remain resilient and reliable:
hubs.la/Q04h1JWV0
Join Gremlin on 6/30 as we explore:
➡️ Where current disaster recovery verifications fail
➡️ How to break down the most common failures into individual tests
➡️ And how your team can simulate disaster scenarios organization-wide:
hubs.la/Q04jJY2k0
“Does your chaos engineering tool integrate with your AI tooling?" wasn't a question anyone asked in 2020.
It's now one of 15 you should be asking before onboarding any new tooling.
Get the 2026 guide here:
hubs.la/Q04drJgl0
Announcing no-code application fault injection 🎉
Now you can prove the reliability of your serverless applications without modifying a single line of code.
Read more⬇️
gremlin.com/blog/announcing-…
“There's this pie chart of everything that can go wrong. And only like half or two thirds of it lives in staging. You're never gonna find a set of failures until you test in production.”
— @KoltonAndrus
Gremlin is used by some of the leading retailers in the world across industries, including beauty, apparel, and more. These testing best practices have helped them build reliable, resilient POS systems that customers can count on.
Learn more: hubs.la/Q046Y1Wy0
As Chaos Engineering adoption increased, we found organizations running into the same hurdles when they tried to scale.
The only way an org can improve reliability at scale is to build on standards, validation testing, and reporting.
Read more: hubs.la/Q046Y1j90
A surprising number of organizations have never tested a full regional failover.
Coordinating a safe, controlled test across teams & services is hard.
Make testing reliability actionable with Gremlin's Disaster Recovery Testing: hubs.la/Q046YqR60
AI-driven development = faster releases = more reliability risk.
Your chaos engineering tool needs to keep up — automated testing in production, on a schedule, tracking reliability over time.
Here’s what to look for in 2026: hubs.la/Q04drRgN0
💡 “ When I get to hear stories like, ‘Hey, we just had our holiday sales event kick off and everything went smoothly and I didn't have to wake up in the middle of the night.’
That is really the true definition of reliability.”
What’s the difference between programs that succeed and the ones that fade?
If you want to build an effective, long-lasting reliability program in your company, then make sure you start by asking these key questions: hubs.la/Q046Y4Jp0
5 minutes. That’s how much downtime some of the world’s largest enterprises will tolerate.
Discover what it takes to make a SaaS platform like Gremlin highly available, and how your organization can benefit from what we learned.
hubs.la/Q046X-nN0
AI has massively accelerated code deployment...but not without risk.
That’s where reliability guardrails come in.
Reliability guardrails ensure your org can take advantage of this new velocity while ensuring systems remain resilient & reliable.
hubs.la/Q04h1xPg0
💡In this clip from an AI roundtable with Gremlin, Nobl9, and PagerDuty, Mandi Walls discusses how companies will want to audit AI to keep it reliable... and what that means for your team.
After every major outage, the same questions show up in the postmortem:
Why didn’t we see this earlier?
Why didn’t failover work?
Why did recovery take so long?
In many cases, the answer is simple:
The scenario had never been tested before.
“It's the conversations around figuring out where that score has changed. It's very much a team activity because that's where all of those great questions come from.”
At Gremlin, we run reliability tests- and discuss them!- every week, so we can always keep improving. 🚀
Consumption-based pricing for chaos engineering sounds reasonable until you realize it actively discourages testing. 🛠️
Outages are inevitable, so make sure your system can proactively prepare, with help from our 2026 Chaos Engineering buyer's guide: hubs.la/Q04drKBC0