Member-only story

What I learned after accidentally breaking our production database

Learning from failures

Anthony Trad
3 min readDec 6, 2022
Purge Production meme

Introduction

Messing up with the production database, what’s interesting about this problem is that whatever process you implement, you get surprised that it happened over and over again. It’s like a virus mutating each time to bite you in the a**!

Even for the most senior engineer, things can and will always happen. But how should you think and what should you do to tackle this the best way possible?

Context

I recently moved to a new job, so the environment and all the process was still new to me. You also have this pressure that you want to pass your probation and the last thing you want to have is your manager telling you you messed up in one of your first tasks.

Let’s discuss what happened:

We’re in a microservices architecture and using AWS. A Wednesday morning, an incident was triggered showing that some clients cannot process new orders. After some digging, we found that we introduced some breaking changes in our events that are being published adding this unwanted behavior. Those events were not correctly processed so we rolled back and replayed them accordingly.

--

--

Anthony Trad
Anthony Trad

Written by Anthony Trad

Senior Software Engineer focused on .NET and the Cloud. Reconsidering major principles and patterns, ideas are my own.

No responses yet