Publicizing Mistakes

I was reading Our Journey to Cloud Cadence, Lessons Learned at Microsoft Developer Division  and saw this bit on transparency:

“No incident is done until we have captured and communicated the learning from it. We use the “5 Whys” to capture the multiple perspectives on what can be done better next time. If we have an outage that affects customers, Brian Harry, as the responsible executive, publishes a blog explaining what went wrong and what we learned. The blog creates a public record and accountability that holds us to the improvement.

It’s quite remarkable to see the customer reaction to these blogs. We describe how badly we have failed, and customers thank us. This transparency is quite a pleasure, because it leads to a much more fruitful relationship than the usual vendor-customer interaction. “

Not every customer will thank you, at least in my experience, and when they don’t – it sucks. I’m not saying that to dissuade you from explaining what happened, but it’s important that if you venture into transparency that you understand that some days it’s going to sting.

The rest of the paper is worth reading. I like reading about the telemetry/alerting/resolution process and wish we all had that. I don’t think I’ve ever worked at a place that came close to doing it well.