That new enterprise prospect looks promising. You've been in talks for months, and now as the negotiations are happening, they bring up their outlook on cybersecurity. You learn that they require a SOC2 certification for all deals of this magnitude. So, you look into it and realize that this isn't something you send the engineering team to for a weekend course. It will be something that takes you 6 months to a year to lock down and that is if everything goes well. In order to make it happen, you now need to shift priorities to focus on quality characteristics of security.
This deal would be massive for the company. So, you hire a firm to help you get through the process. It requires a lot of your engineering time to refactor portions of the system and document everything needed to support certification. You put your best engineer Dave on it full-time. He is making great progress and working well with the firm you hired. But 5 weeks in you get a call that the production system isn't processing transactions and it hasn't been all day. The team works into the evening and is able to stabilize the system. But all of those transactions from earlier are missing details. It will be a painstaking process for your customer service team to reach out to everyone affected and manually process the transactions.
The next day Dave pulls you aside and tells you two things you didn't want to hear. First, he did a quick look and it seems transactions have been intermittently failing for several weeks. The second thing was that he was slated to start on improving traceability right before he was put onto the security work. If he had been given a few days, the logging needed to track and automatically reprocess failed transactions would have been there. You know this mess is going to cost your customers, especially for those who have unknowingly been affected for weeks. This can't happen again. So, you pull engineers off of a new feature build and get them working on traceability.
Four weeks later you get a call from a frustrated customer. They are trying to manage their devices on your platform and the system is unresponsive. You get them on a call and ask them to show you the issue live. You can see the system does respond sometimes, but there is high latency and often times out. It doesn't provide much error information, so the customer isn't sure whether this particular request was successful or not. After you consult your engineering team, they admit that they were aware the system is unstable and have been actively trying to resolve it the last few days. One of your managers stops by your office later to tell you this is because no one was taking his requests to proactively scale up the device cache. The work was part of that new feature the team was taken off of weeks earlier and it was long overdue. Now, you have a reliability problem that can no longer be ignored.
What are you supposed to do now? Security gets you the big sale you desperately need. Traceability keeps your revenue coming in. Reliability keeps your customers from jumping to your competitors. And now you're worried that there are other quality domains waiting to burn you that aren't even on your radar. You know you're going to have to focus on these domains now. And that is going to make for an uncomfortable next board meeting because it will delay those new features you promised to deliver.
The quality of your software systems says a lot about your outlook. Most leaders don't think about quality holistically at all. When they do think about it, it's one specific quality characteristic and only because the absence of it will cost them a deal.
There is no easy way out of these situations. You have to make hard decisions about where to focus and what qualities are most important. However, there is never a better time to define and seek alignment on your quality model and culture.
You're not likely going to get it right the first time you define a model. But the important part is that you make it a priority and commit to a first draft.
Start by answering some basic questions:
What software quality factors are important to our customers?
What does great quality look like in each system?
What training and policies will allow the team to build these qualities into the products?
The whole point of this story was to emphasize that your software quality needs change over time. So, your quality model and culture need to evolve with the changing demands. It is an iterative process, which should be familiar to you in the realm of software engineering. You must build the quality systems alongside the rest of the requirements.
Quality is a mindset. With AI tools, it's never been easier for individuals to pump out a prototype for a new feature. But without that regard for engineering excellence, the quality you need to run your business will be ever elusive. It all starts with an orientation towards holding software quality as a major value. I would even recommend setting some OKRs or KPIs around the qualities you want to grow in your system. Consistent measurement will force you to not just change your mindset about quality, but also to keep perpetuating it.
Here's the part that should keep you up at night.
The transaction failures, the latency, the device cache: none of those were the real problem. The real problem was that every one of them was known. Someone on your team saw it coming. It just never made it onto the list of things that mattered until it was already costing you.
So the question isn't whether your systems will break.
It's this: what is the next thing your team already knows about, that you haven't heard yet?