I Assessed A Vibe-Coded System Before It Hit Production. Here Is What I Found.

Vibe coding, a term attributed to an X post made by AI-leader Andrej Karpathy, has gained a lot of popularity in the software realm over the last year [1]. The essence of vibe coding is where an individual asks a code-generating AI tool to build software using natural language, English in particular. These tools essentially let any individual build software applications that Karpathy said were "not too bad for throwaway weekend projects". The problem with letting AI tools generate code with no expert supervision, risk assessment, or validation is that it is not likely to produce high-quality software. As a software engineer, I know how to assess and validate the outputs of these code-generating tools. The combination of software engineer and AI tooling can actually be quite powerful and productive. However, when individuals without this expertise generate software using the latest LLMs and hook it up to their production systems, it comes with many risks to their organizations.

It would be akin to having the custodian at a doctors office input your symptoms into their favorite AI chat service and use the outputs to diagnose your illness and offer a treatment. The likelihood of solving patients' medical problems is very low. We wouldn't accept this in a doctor's office. So, why would we accept it in a medical device manufacturer, or most other domains for that matter? If your answer isn't yet that we shouldn't accept it, let me share a story. This is from a medical device manufacturer where a team was asking to hook up their vibe-coded prototype to the company's production systems with no review and oversight from software engineers.

Recently, I was pulled in to assess and diagnose a vibe-coded application that a team was wanting to hook up to their supply chain management and manufacturing production system. It was produced by a team member who had extra time and really wanted to solve their problems. The problem was that they were still running multi-million dollar long-duration acquisition and management of tooling and other supplies using Excel spreadsheets. That means no data consistency, no data change tracking, not great access control, and big opportunities to make mistakes. I could definitely appreciate the need for something better. I applauded the team member for taking this initiative and building his prototype. It was awesome to see him trying to use his favorite LLM to solve a real-world problem with no training or experience in software engineering. However, when I looked at the implementation, asked him about how it works, and what were his specifications, I started to see how risky the solution would be to connect to the production system. I saw how much refactoring was needed to reverse engineer the requirements and make it reasonable to maintain.

It wasn't just my experience and instincts that were seeing the issues with vibe coding. Others have been reporting concerns about taking AI-generated code right from responses and into production systems. CodeRabbit compared AI-generated versus human-generated pull requests and reported that AI-generated code produced 1.7 times more issues than human-generated code [2]. The same report showed that security vulnerabilities were 2.7 times more common in AI-generated code, among other concerning findings. An Escape.tech study found that 60% of vibe-coded applications exhibited security vulnerabilities [3]. Vibe coding may be able to produce something quickly and individuals without engineering expertise can create applications. However, the risk of deploying such applications could be devastating to businesses.

The problem at the heart of vibe coding is the lack of review. AI systems are built as non-deterministic solutions by design. However, if we want a system to be reliable, it needs to be deterministic. We need to specify exactly how it will work, including how it will fail. It needs to conform to standards that make it easy to evolve and to maintain. If we ask the tools the same question, we should get the exact same answer, but this is not how the tools were built. There are, however, multiple tried and tested ways of dealing with this issue. The first one being to review the code that is produced by these tools. The problem for non-engineers is that they won't have the skills or expertise to do this effectively. So, the real solution is to hire more engineers, give them the tooling, and let them build reliable solutions while gaining any productivity boosts that the tooling brings. The solution is not to fire engineers and try to replace them with AI agents.

If you or your team are using vibe coded solutions to support your organization, I suggest you do these three things to try and mitigate any damage that may come from releasing these projects to the world.

1. Have the code assessed.

You might not be a software engineer, or have any on staff, but you can always hire them to give you an assessment. They can point out potential security risks, logical flaws, and data consistency issues. If you want to use the vibe coded system to run your business and serve your customers, the price of an assessment is far cheaper than the thousands you may pay in reputational damage and lost revenue.

2. Add important context to your prompts.

"As an expert software engineer…" is an easy addition that will change the way the LLM responds. Even this basic line added to each prompt, will trigger the LLM to draw from its knowledge on software engineering, and is likely to lead to a better solution.

3. Vibes cannot replace specifications.

AI models are non-deterministic, as we discussed. They will wander if you let them. You need to be exact and specific about what you want built. The best thing you can do is specify what you want with painstaking detail, just like software engineers do. This context will produce more robust and specific results.

Based on my own experience and recent studies, it is clear this is not a future problem. The problem of unreliable solutions generated by AI models powering businesses is here now. The low barrier to entry will only accelerate the number of vibe coded solutions being deployed, which accelerates the problem as well. The companies that address it now, before the first incident, will be the ones that earn the trust of their customers and their board. The ones that wait will learn the hard way.

If you are responsible for software your customers depend on, the time to address this is before the incident. Not after.

I Assessed A Vibe-Coded System Before It Hit Production. Here Is What I Found.

References